Re: [PATCH v2 00/12] KVM: Add idempotent controls for migrating system counter state

2021-07-16 Thread Oliver Upton
On Fri, Jul 16, 2021 at 2:26 PM Oliver Upton  wrote:
>
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
>
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces new vCPU device attributes to provide userspace access to the
> vCPU's system counter offset.
>
> Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
> ioctls to provide userspace with a (host_tsc, realtime) instant. This is
> essential for a VMM to perform precise migration of the guest's system
> counters.
>
> Patches 2-3 add support for x86 by shoehorning the new controls into the
> pre-existing synchronization heuristics.
>
> Patches 4-5 implement a test for the new additions to
> KVM_{GET,SET}_CLOCK.
>
> Patches 6-7 implement at test for the tsc offset attribute introduced in
> patch 3.
>
> Patch 8 adds a device attribute for the arm64 virtual counter-timer
> offset.
>
> Patch 9 extends the test from patch 7 to cover the arm64 virtual
> counter-timer offset.
>
> Patch 10 adds a device attribute for the arm64 physical counter-timer
> offset. Currently, this is implemented as a synthetic register, forcing
> the guest to trap to the host and emulating the offset in the fast exit
> path. Later down the line we will have hardware with FEAT_ECV, which
> allows the hypervisor to perform physical counter-timer offsetting in
> hardware (CNTPOFF_EL2).
>
> Patch 11 extends the test from patch 7 to cover the arm64 physical
> counter-timer offset.
>
> Patch 12 introduces a benchmark to measure the overhead of emulation in
> patch 10.
>
> Physical counter benchmark
> --
>
> The following data was collected by running 1 iterations of the
> benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> parameter.
>
> nVHE
> 
>
> +++-+
> |   Metric   | Native | Trapped |
> +++-+
> | Average| 54ns   | 148ns   |
> | Standard Deviation | 124ns  | 122ns   |
> | 95th Percentile| 258ns  | 348ns   |
> +++-+
>
> VHE
> ---
>
> +++-+
> |   Metric   | Native | Trapped |
> +++-+
> | Average| 53ns   | 152ns   |
> | Standard Deviation | 92ns   | 94ns|
> | 95th Percentile| 204ns  | 307ns   |
> +++-+
>
> This series applies cleanly to the following commit:
>
> 1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2")

v1: https://lore.kernel.org/kvm/20210608214742.1897483-1-oup...@google.com/

> v1 -> v2:
>   - Reimplemented as vCPU device attributes instead of a distinct ioctl.
>   - Added the (realtime, host_tsc) instant support to
> KVM_{GET,SET}_CLOCK
>   - Changed the arm64 implementation to broadcast counter offset values
> to all vCPUs in a guest. This upholds the architectural expectations
> of a consistent counter-timer across CPUs.
>   - Fixed a bug with traps in VHE mode. We now configure traps on every
> transition into a guest to handle differing VMs (trapped, emulated).
>
> Oliver Upton (12):
>   KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
>   KVM: x86: Refactor tsc synchronization code
>   KVM: x86: Expose TSC offset controls to userspace
>   tools: arch: x86: pull in pvclock headers
>   selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
>   selftests: KVM: Add helpers for vCPU device attributes
>   selftests: KVM: Introduce system counter offset test
>   KVM: arm64: Allow userspace to configure a vCPU's virtual offset
>   selftests: KVM: Add support for aarch64 to system_counter_offset_test
>   KVM: arm64: Provide userspace access to the physical counter offset
>   selftests: KVM: Test physical counter offsetting
>   selftests: KVM: Add counter emulation benchmark

[PATCH v2 11/12] selftests: KVM: Test physical counter offsetting

2021-07-16 Thread Oliver Upton
Test that userpace adjustment of the guest physical counter-timer
results in the correct view of within the guest.

Signed-off-by: Oliver Upton 
---
 .../selftests/kvm/include/aarch64/processor.h | 12 
 .../kvm/system_counter_offset_test.c  | 29 ---
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 3168cdbae6ee..7f53d90e9512 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -141,4 +141,16 @@ static inline uint64_t read_cntvct_ordered(void)
return r;
 }
 
+static inline uint64_t read_cntpct_ordered(void)
+{
+   uint64_t r;
+
+   __asm__ __volatile__("isb\n\t"
+"mrs %0, cntpct_el0\n\t"
+"isb\n\t"
+: "=r"(r));
+
+   return r;
+}
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
index 88ad997f5b69..3eed9dcb7693 100644
--- a/tools/testing/selftests/kvm/system_counter_offset_test.c
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -57,6 +57,7 @@ static uint64_t host_read_guest_system_counter(struct 
test_case *test)
 
 enum arch_counter {
VIRTUAL,
+   PHYSICAL,
 };
 
 struct test_case {
@@ -68,23 +69,41 @@ static struct test_case test_cases[] = {
{ .counter = VIRTUAL, .offset = 0 },
{ .counter = VIRTUAL, .offset = 180 * NSEC_PER_SEC },
{ .counter = VIRTUAL, .offset = -180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL, .offset = 0 },
+   { .counter = PHYSICAL, .offset = 180 * NSEC_PER_SEC },
+   { .counter = PHYSICAL, .offset = -180 * NSEC_PER_SEC },
 };
 
 static void check_preconditions(struct kvm_vm *vm)
 {
if (!_vcpu_has_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
-  KVM_ARM_VCPU_TIMER_OFFSET_VTIMER))
+  KVM_ARM_VCPU_TIMER_OFFSET_VTIMER) &&
+   !_vcpu_has_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
+  KVM_ARM_VCPU_TIMER_OFFSET_PTIMER))
return;
 
-   print_skip("KVM_ARM_VCPU_TIMER_OFFSET_VTIMER not supported; skipping 
test");
+   print_skip("KVM_ARM_VCPU_TIMER_OFFSET_{VTIMER,PTIMER} not supported; 
skipping test");
exit(KSFT_SKIP);
 }
 
 static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
 {
+   u64 attr = 0;
+
+   switch (test->counter) {
+   case VIRTUAL:
+   attr = KVM_ARM_VCPU_TIMER_OFFSET_VTIMER;
+   break;
+   case PHYSICAL:
+   attr = KVM_ARM_VCPU_TIMER_OFFSET_PTIMER;
+   break;
+   default:
+   TEST_ASSERT(false, "unrecognized counter index %u",
+   test->counter);
+   }
+
vcpu_access_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
-   KVM_ARM_VCPU_TIMER_OFFSET_VTIMER, &test->offset,
-   true);
+   attr, &test->offset, true);
 }
 
 static uint64_t guest_read_system_counter(struct test_case *test)
@@ -92,6 +111,8 @@ static uint64_t guest_read_system_counter(struct test_case 
*test)
switch (test->counter) {
case VIRTUAL:
return read_cntvct_ordered();
+   case PHYSICAL:
+   return read_cntpct_ordered();
default:
GUEST_ASSERT(0);
}
-- 
2.32.0.402.g57bb445576-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 12/12] selftests: KVM: Add counter emulation benchmark

2021-07-16 Thread Oliver Upton
Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare the elapsed cycles to a direct read of the hardware
register.

Reviewed-by: Ricardo Koller 
Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 215 ++
 3 files changed, 217 insertions(+)
 create mode 100644 
tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 2752813d5090..1d811c6a769b 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 /aarch64/debug-exceptions
+/aarch64/counter_emulation_benchmark
 /aarch64/get-reg-list
 /aarch64/vgic_init
 /s390x/memop
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index d89908108c97..e560a3e74bc2 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -86,6 +86,7 @@ TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
 TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
+TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
 TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c 
b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
new file mode 100644
index ..73aeb6cdebfe
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
@@ -0,0 +1,215 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * counter_emulation_benchmark.c -- test to measure the effects of counter
+ * emulation on guest reads of the physical counter.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+
+#define _GNU_SOURCE
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static struct counter_values {
+   uint64_t cntvct_start;
+   uint64_t cntpct;
+   uint64_t cntvct_end;
+} counter_values;
+
+static uint64_t nr_iterations = 1000;
+
+static void do_test(void)
+{
+   /*
+* Open-coded approach instead of using helper methods to keep a tight
+* interval around the physical counter read.
+*/
+   asm volatile("isb\n\t"
+"mrs %[cntvct_start], cntvct_el0\n\t"
+"isb\n\t"
+"mrs %[cntpct], cntpct_el0\n\t"
+"isb\n\t"
+"mrs %[cntvct_end], cntvct_el0\n\t"
+"isb\n\t"
+: [cntvct_start] "=r"(counter_values.cntvct_start),
+[cntpct] "=r"(counter_values.cntpct),
+[cntvct_end] "=r"(counter_values.cntvct_end));
+}
+
+static void guest_main(void)
+{
+   int i;
+
+   for (i = 0; i < nr_iterations; i++) {
+   do_test();
+   GUEST_SYNC(i);
+   }
+
+   for (i = 0; i < nr_iterations; i++) {
+   do_test();
+   GUEST_SYNC(i);
+   }
+
+   GUEST_DONE();
+}
+
+static bool enter_guest(struct kvm_vm *vm)
+{
+   struct ucall uc;
+
+   vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+   switch (get_ucall(vm, VCPU_ID, &uc)) {
+   case UCALL_DONE:
+   return true;
+   case UCALL_SYNC:
+   break;
+   case UCALL_ABORT:
+   TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+   __FILE__, uc.args[1]);
+   break;
+   default:
+   TEST_ASSERT(false, "unexpected exit: %s",
+   exit_reason_str(vcpu_state(vm, 
VCPU_ID)->exit_reason));
+   break;
+   }
+
+   /* more work to do in the guest */
+   return false;
+}
+
+static double counter_frequency(void)
+{
+   uint32_t freq;
+
+   asm volatile("mrs %0, cntfrq_el0"
+: "=r" (freq));
+
+   return freq / 100.0;
+}
+
+static void log_csv(FILE *csv, bool trapped)
+{
+   double freq = counter_frequency();
+
+   fprintf(csv, "%s,%.02f,%lu,%lu,%lu\n",
+   trapped ? "true" : "false", freq,
+   counter_values.cntvct_start,
+   counter_values.cntpct,
+   counter_values.cntvct_end);
+}
+
+static double run_loop(struct kvm_vm *vm, FILE *csv, bool trapped)
+{
+   double avg = 0;
+   int i;
+
+   for (i = 0; i < nr_iterations; i++) {
+   

[PATCH v2 07/12] selftests: KVM: Introduce system counter offset test

2021-07-16 Thread Oliver Upton
Introduce a KVM selftest to verify that userspace manipulation of the
TSC (via the new vCPU attribute) results in the correct behavior within
the guest.

Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../kvm/system_counter_offset_test.c  | 133 ++
 3 files changed, 135 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/system_counter_offset_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index d0877d01e771..2752813d5090 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -50,3 +50,4 @@
 /set_memory_region_test
 /steal_time
 /kvm_binary_stats_test
+/system_counter_offset_test
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index f7e24f334c6e..7bf2e5fb1d5a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -83,6 +83,7 @@ TEST_GEN_PROGS_x86_64 += memslot_perf_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
+TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
new file mode 100644
index ..7e9015770759
--- /dev/null
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Tests for adjusting the system counter from userspace
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+#ifdef __x86_64__
+
+struct test_case {
+   uint64_t tsc_offset;
+};
+
+static struct test_case test_cases[] = {
+   { 0 },
+   { 180 * NSEC_PER_SEC },
+   { -180 * NSEC_PER_SEC },
+};
+
+static void check_preconditions(struct kvm_vm *vm)
+{
+   if (!_vcpu_has_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL, 
KVM_VCPU_TSC_OFFSET))
+   return;
+
+   print_skip("KVM_VCPU_TSC_OFFSET not supported; skipping test");
+   exit(KSFT_SKIP);
+}
+
+static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
+{
+   vcpu_access_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL,
+   KVM_VCPU_TSC_OFFSET, &test->tsc_offset, true);
+}
+
+static uint64_t guest_read_system_counter(struct test_case *test)
+{
+   return rdtsc();
+}
+
+static uint64_t host_read_guest_system_counter(struct test_case *test)
+{
+   return rdtsc() + test->tsc_offset;
+}
+
+#else /* __x86_64__ */
+
+#error test not implemented for this architecture!
+
+#endif
+
+#define GUEST_SYNC_CLOCK(__stage, __val)   \
+   GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0)
+
+static void guest_main(void)
+{
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+   struct test_case *test = &test_cases[i];
+
+   GUEST_SYNC_CLOCK(i, guest_read_system_counter(test));
+   }
+
+   GUEST_DONE();
+}
+
+static void handle_sync(struct ucall *uc, uint64_t start, uint64_t end)
+{
+   uint64_t obs = uc->args[2];
+
+   TEST_ASSERT(start <= obs && obs <= end,
+   "unexpected system counter value: %"PRIu64" expected range: 
[%"PRIu64", %"PRIu64"]",
+   obs, start, end);
+
+   pr_info("system counter value: %"PRIu64" expected range [%"PRIu64", 
%"PRIu64"]\n",
+   obs, start, end);
+}
+
+static void handle_abort(struct ucall *uc)
+{
+   TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0],
+ __FILE__, uc->args[1]);
+}
+
+static void enter_guest(struct kvm_vm *vm)
+{
+   uint64_t start, end;
+   struct ucall uc;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+   struct test_case *test = &test_cases[i];
+
+   setup_system_counter(vm, test);
+   start = host_read_guest_system_counter(test);
+   vcpu_run(vm, VCPU_ID);
+   end = host_read_guest_system_counter(test);
+
+   switch (get_ucall(vm, VCPU_ID, &uc)) {
+   case UCALL_SYNC:
+   handle_sync(&uc, start, end);
+   break;
+   case UCALL_ABORT:
+   handle_abort(&uc);
+   return;
+   case UCALL_DONE:
+   return;
+   }
+   }
+}
+
+int main(void)
+{
+   struct kvm_vm *vm;
+
+   vm = vm_create_default(VCPU_ID, 0, guest_main);
+   check_preconditions(vm);
+   ucall_init(vm, NULL);
+
+   enter_guest(vm);
+   kvm_

[PATCH v2 03/12] KVM: x86: Expose TSC offset controls to userspace

2021-07-16 Thread Oliver Upton
To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack 
Signed-off-by: Oliver Upton 
---
 arch/x86/include/asm/kvm_host.h |   1 +
 arch/x86/include/uapi/asm/kvm.h |   4 +
 arch/x86/kvm/x86.c  | 166 
 3 files changed, 171 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e527d7259415..45134b7b14d6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1070,6 +1070,7 @@ struct kvm_arch {
u64 last_tsc_nsec;
u64 last_tsc_write;
u32 last_tsc_khz;
+   u64 last_tsc_offset;
u64 cur_tsc_nsec;
u64 cur_tsc_write;
u64 cur_tsc_offset;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index a6c327f8ad9e..0b22e1e84e78 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -503,4 +503,8 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_{GET,SET,HAS}_DEVICE_ATTR */
+#define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */
+#define   KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e1b7c8b67428..d22de0a1988a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2411,6 +2411,11 @@ static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu 
*vcpu, u64 l1_offset)
static_call(kvm_x86_write_tsc_offset)(vcpu, vcpu->arch.tsc_offset);
 }
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+   return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 
l1_multiplier)
 {
vcpu->arch.l1_tsc_scaling_ratio = l1_multiplier;
@@ -2467,6 +2472,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 offset, u64 tsc,
kvm->arch.last_tsc_nsec = ns;
kvm->arch.last_tsc_write = tsc;
kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+   kvm->arch.last_tsc_offset = offset;
 
vcpu->arch.last_guest_tsc = tsc;
 
@@ -4914,6 +4920,136 @@ static int kvm_set_guest_paused(struct kvm_vcpu *vcpu)
return 0;
 }
 
+static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu,
+struct kvm_device_attr *attr)
+{
+   int r;
+
+   switch (attr->attr) {
+   case KVM_VCPU_TSC_OFFSET:
+   r = 0;
+   break;
+   default:
+   r = -ENXIO;
+   }
+
+   return r;
+}
+
+static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu,
+struct kvm_device_attr *attr)
+{
+   void __user *uaddr = (void __user *)attr->addr;
+   int r;
+
+   switch (attr->attr) {
+   case KVM_VCPU_TSC_OFFSET: {
+   u64 offset;
+
+   offset = kvm_vcpu_read_tsc_offset(vcpu);
+   r = -EFAULT;
+   if (copy_to_user(uaddr, &offset, sizeof(offset)))
+   break;
+
+   r = 0;
+   }
+   default:
+   r = -ENXIO;
+   }
+
+   return r;
+}
+
+static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu,
+struct kvm_device_attr *attr)
+{
+   void __user *uaddr = (void __user *)attr->addr;
+   struct kvm *kvm = vcpu->kvm;
+   int r;
+
+   switch (attr->attr) {
+   case KVM_VCPU_TSC_OFFSET: {
+   u64 offset, tsc, ns;
+   unsigned long flags;
+   bool matched;
+
+   r = -EFAULT;
+   if (copy_from_user(&offset, uaddr, sizeof(offset)))
+   break;
+
+   raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+
+   matched = (vcpu->arch.virtual_tsc_khz &&
+  kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz 
&&
+  kvm->arch.last_tsc_offset == offset);
+
+   tsc = kvm_scale_tsc(vcpu, rdtsc(), 
vcpu->arch.l1_tsc_scaling_ratio) + offset;
+   ns = get_kvmclock_base_ns();
+
+   __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
+   raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+
+   r = 0;
+   break;
+   }
+   default:
+ 

[PATCH v2 02/12] KVM: x86: Refactor tsc synchronization code

2021-07-16 Thread Oliver Upton
Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock gtod lock while holding the tsc write lock,
whereas before these locks were disjoint.

Reviewed-by: David Matlack 
Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/locking.rst |  11 +++
 arch/x86/kvm/x86.c | 106 +
 2 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst 
b/Documentation/virt/kvm/locking.rst
index 35eca377543d..ac62e1c76694 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -30,6 +30,9 @@ On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -216,6 +219,14 @@ time it will be set using the Dirty tracking mechanism 
described above.
 :Comment:  'raw' because hardware enabling/disabling must be atomic /wrt
migration.
 
+:Name: kvm_arch::pvclock_gtod_sync_lock
+:Type: raw_spinlock_t
+:Arch: x86
+:Protects: kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+   cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:  'raw' because updating the kvm master clock must not be
+   preempted.
+
 :Name: kvm_arch::tsc_write_lock
 :Type: raw_spinlock
 :Arch: x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4e803632cdca..e1b7c8b67428 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2441,13 +2441,73 @@ static inline bool kvm_check_tsc_unstable(void)
return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ *
+ * Must hold kvm->arch.tsc_write_lock to call this function.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+ u64 ns, bool matched)
+{
+   struct kvm *kvm = vcpu->kvm;
+   bool already_matched;
+   unsigned long flags;
+
+   lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+   already_matched =
+  (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+   /*
+* We track the most recent recorded KHZ, write and time to
+* allow the matching interval to be extended at each write.
+*/
+   kvm->arch.last_tsc_nsec = ns;
+   kvm->arch.last_tsc_write = tsc;
+   kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+   vcpu->arch.last_guest_tsc = tsc;
+
+   /* Keep track of which generation this VCPU has synchronized to */
+   vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+   vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+   vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+   kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+   spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
+   if (!matched) {
+   /*
+* We split periods of matched TSC writes into generations.
+* For each generation, we track the original measured
+* nanosecond time, offset, and write, so if TSCs are in
+* sync, we can match exact offset, and if not, we can match
+* exact software computation in compute_guest_tsc()
+*
+* These values are tracked in kvm->arch.cur_xxx variables.
+*/
+   kvm->arch.nr_vcpus_matched_tsc = 0;
+   kvm->arch.cur_tsc_generation++;
+   kvm->arch.cur_tsc_nsec = ns;
+   kvm->arch.cur_tsc_write = tsc;
+   kvm->arch.cur_tsc_offset = offset;
+   matched = false;
+   } else if (!already_matched) {
+   kvm->arch.nr_vcpus_matched_tsc++;
+   }
+
+   kvm_track_tsc_matching(vcpu);
+   spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
struct kvm *kvm = vcpu->kvm;
u64 offset, ns, elapsed;
unsigned long flags;
-   bool matched;
-   bool already_matched;
+   bool matched = false;
bool synchronizing = false;
 
raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2493,51 +2553,11 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, 
u64 data)
offset = kvm_compute_l1_tsc_offset(vcpu, data);
}
matched

[PATCH v2 08/12] KVM: arm64: Allow userspace to configure a vCPU's virtual offset

2021-07-16 Thread Oliver Upton
Add a new vCPU attribute that allows userspace to directly manipulate
the virtual counter-timer offset. Exposing such an interface allows for
the precise migration of guest virtual counter-timers, as it is an
indepotent interface.

Uphold the existing behavior of writes to CNTVOFF_EL2 for this new
interface, wherein a write to a single vCPU is broadcasted to all vCPUs
within a VM.

Signed-off-by: Oliver Upton 
---
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 arch/arm64/kvm/arch_timer.c   | 68 ++-
 2 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..008d0518d2b1 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -365,6 +365,7 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_ARM_VCPU_TIMER_CTRL1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
+#define   KVM_ARM_VCPU_TIMER_OFFSET_VTIMER 2
 #define KVM_ARM_VCPU_PVTIME_CTRL   2
 #define   KVM_ARM_VCPU_PVTIME_IPA  0
 
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 3df67c127489..d2b1b13af658 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -1305,7 +1305,7 @@ static void set_timer_irqs(struct kvm *kvm, int 
vtimer_irq, int ptimer_irq)
}
 }
 
-int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+int kvm_arm_timer_set_attr_irq(struct kvm_vcpu *vcpu, struct kvm_device_attr 
*attr)
 {
int __user *uaddr = (int __user *)(long)attr->addr;
struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
@@ -1338,7 +1338,39 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct 
kvm_device_attr *attr)
return 0;
 }
 
-int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+int kvm_arm_timer_set_attr_offset(struct kvm_vcpu *vcpu, struct 
kvm_device_attr *attr)
+{
+   u64 __user *uaddr = (u64 __user *)(long)attr->addr;
+   u64 offset;
+
+   if (get_user(offset, uaddr))
+   return -EFAULT;
+
+   switch (attr->attr) {
+   case KVM_ARM_VCPU_TIMER_OFFSET_VTIMER:
+   update_vtimer_cntvoff(vcpu, offset);
+   break;
+   default:
+   return -ENXIO;
+   }
+
+   return 0;
+}
+
+int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+   switch (attr->attr) {
+   case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
+   case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
+   return kvm_arm_timer_set_attr_irq(vcpu, attr);
+   case KVM_ARM_VCPU_TIMER_OFFSET_VTIMER:
+   return kvm_arm_timer_set_attr_offset(vcpu, attr);
+   }
+
+   return -ENXIO;
+}
+
+int kvm_arm_timer_get_attr_irq(struct kvm_vcpu *vcpu, struct kvm_device_attr 
*attr)
 {
int __user *uaddr = (int __user *)(long)attr->addr;
struct arch_timer_context *timer;
@@ -1359,11 +1391,43 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, 
struct kvm_device_attr *attr)
return put_user(irq, uaddr);
 }
 
+int kvm_arm_timer_get_attr_offset(struct kvm_vcpu *vcpu, struct 
kvm_device_attr *attr)
+{
+   u64 __user *uaddr = (u64 __user *)(long)attr->addr;
+   struct arch_timer_context *timer;
+   u64 offset;
+
+   switch (attr->attr) {
+   case KVM_ARM_VCPU_TIMER_OFFSET_VTIMER:
+   timer = vcpu_vtimer(vcpu);
+   break;
+   default:
+   return -ENXIO;
+   }
+
+   offset = timer_get_offset(timer);
+   return put_user(offset, uaddr);
+}
+
+int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+   switch (attr->attr) {
+   case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
+   case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
+   return kvm_arm_timer_get_attr_irq(vcpu, attr);
+   case KVM_ARM_VCPU_TIMER_OFFSET_VTIMER:
+   return kvm_arm_timer_get_attr_offset(vcpu, attr);
+   }
+
+   return -ENXIO;
+}
+
 int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 {
switch (attr->attr) {
case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
case KVM_ARM_VCPU_TIMER_IRQ_PTIMER:
+   case KVM_ARM_VCPU_TIMER_OFFSET_VTIMER:
return 0;
}
 
-- 
2.32.0.402.g57bb445576-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 04/12] tools: arch: x86: pull in pvclock headers

2021-07-16 Thread Oliver Upton
Copy over approximately clean versions of the pvclock headers into
tools. Reconcile headers/symbols missing in tools that are unneeded.

Signed-off-by: Oliver Upton 
---
 tools/arch/x86/include/asm/pvclock-abi.h |  48 +++
 tools/arch/x86/include/asm/pvclock.h | 103 +++
 2 files changed, 151 insertions(+)
 create mode 100644 tools/arch/x86/include/asm/pvclock-abi.h
 create mode 100644 tools/arch/x86/include/asm/pvclock.h

diff --git a/tools/arch/x86/include/asm/pvclock-abi.h 
b/tools/arch/x86/include/asm/pvclock-abi.h
new file mode 100644
index ..1436226efe3e
--- /dev/null
+++ b/tools/arch/x86/include/asm/pvclock-abi.h
@@ -0,0 +1,48 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PVCLOCK_ABI_H
+#define _ASM_X86_PVCLOCK_ABI_H
+#ifndef __ASSEMBLY__
+
+/*
+ * These structs MUST NOT be changed.
+ * They are the ABI between hypervisor and guest OS.
+ * Both Xen and KVM are using this.
+ *
+ * pvclock_vcpu_time_info holds the system time and the tsc timestamp
+ * of the last update. So the guest can use the tsc delta to get a
+ * more precise system time.  There is one per virtual cpu.
+ *
+ * pvclock_wall_clock references the point in time when the system
+ * time was zero (usually boot time), thus the guest calculates the
+ * current wall clock by adding the system time.
+ *
+ * Protocol for the "version" fields is: hypervisor raises it (making
+ * it uneven) before it starts updating the fields and raises it again
+ * (making it even) when it is done.  Thus the guest can make sure the
+ * time values it got are consistent by checking the version before
+ * and after reading them.
+ */
+
+struct pvclock_vcpu_time_info {
+   u32   version;
+   u32   pad0;
+   u64   tsc_timestamp;
+   u64   system_time;
+   u32   tsc_to_system_mul;
+   s8tsc_shift;
+   u8flags;
+   u8pad[2];
+} __attribute__((__packed__)); /* 32 bytes */
+
+struct pvclock_wall_clock {
+   u32   version;
+   u32   sec;
+   u32   nsec;
+} __attribute__((__packed__));
+
+#define PVCLOCK_TSC_STABLE_BIT (1 << 0)
+#define PVCLOCK_GUEST_STOPPED  (1 << 1)
+/* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */
+#define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
+#endif /* __ASSEMBLY__ */
+#endif /* _ASM_X86_PVCLOCK_ABI_H */
diff --git a/tools/arch/x86/include/asm/pvclock.h 
b/tools/arch/x86/include/asm/pvclock.h
new file mode 100644
index ..2628f9a6330b
--- /dev/null
+++ b/tools/arch/x86/include/asm/pvclock.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_X86_PVCLOCK_H
+#define _ASM_X86_PVCLOCK_H
+
+#include 
+#include 
+
+/* some helper functions for xen and kvm pv clock sources */
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
+void pvclock_set_flags(u8 flags);
+unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
+void pvclock_resume(void);
+
+void pvclock_touch_watchdogs(void);
+
+static __always_inline
+unsigned pvclock_read_begin(const struct pvclock_vcpu_time_info *src)
+{
+   unsigned version = src->version & ~1;
+   /* Make sure that the version is read before the data. */
+   rmb();
+   return version;
+}
+
+static __always_inline
+bool pvclock_read_retry(const struct pvclock_vcpu_time_info *src,
+   unsigned version)
+{
+   /* Make sure that the version is re-read after the data. */
+   rmb();
+   return version != src->version;
+}
+
+/*
+ * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction,
+ * yielding a 64-bit result.
+ */
+static inline u64 pvclock_scale_delta(u64 delta, u32 mul_frac, int shift)
+{
+   u64 product;
+#ifdef __i386__
+   u32 tmp1, tmp2;
+#else
+   unsigned long tmp;
+#endif
+
+   if (shift < 0)
+   delta >>= -shift;
+   else
+   delta <<= shift;
+
+#ifdef __i386__
+   __asm__ (
+   "mul  %5   ; "
+   "mov  %4,%%eax ; "
+   "mov  %%edx,%4 ; "
+   "mul  %5   ; "
+   "xor  %5,%5; "
+   "add  %4,%%eax ; "
+   "adc  %5,%%edx ; "
+   : "=A" (product), "=r" (tmp1), "=r" (tmp2)
+   : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) );
+#elif defined(__x86_64__)
+   __asm__ (
+   "mulq %[mul_frac] ; shrd $32, %[hi], %[lo]"
+   : [lo]"=a"(product),
+ [hi]"=d"(tmp)
+   : "0"(delta),
+ [mul_frac]"rm"((u64)mul_frac));
+#else
+#error implement me!
+#endif
+
+   return product;
+}
+
+static __always_inline
+u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc)
+{
+   u64 delta = tsc - src->tsc_timestamp;
+   u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul,
+src->tsc_shift);
+   return src-

[PATCH v2 09/12] selftests: KVM: Add support for aarch64 to system_counter_offset_test

2021-07-16 Thread Oliver Upton
KVM/arm64 now allows userspace to adjust the guest virtual counter-timer
via a vCPU device attribute. Test that changes to the virtual
counter-timer offset result in the correct view being presented to the
guest.

Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/Makefile  |  1 +
 .../selftests/kvm/include/aarch64/processor.h | 12 +
 .../kvm/system_counter_offset_test.c  | 54 ++-
 3 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index 7bf2e5fb1d5a..d89908108c97 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -96,6 +96,7 @@ TEST_GEN_PROGS_aarch64 += kvm_page_table_test
 TEST_GEN_PROGS_aarch64 += set_memory_region_test
 TEST_GEN_PROGS_aarch64 += steal_time
 TEST_GEN_PROGS_aarch64 += kvm_binary_stats_test
+TEST_GEN_PROGS_aarch64 += system_counter_offset_test
 
 TEST_GEN_PROGS_s390x = s390x/memop
 TEST_GEN_PROGS_s390x += s390x/resets
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h 
b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 27dc5c2e56b9..3168cdbae6ee 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -129,4 +129,16 @@ void vm_install_sync_handler(struct kvm_vm *vm,
 
 #define isb()  asm volatile("isb" : : : "memory")
 
+static inline uint64_t read_cntvct_ordered(void)
+{
+   uint64_t r;
+
+   __asm__ __volatile__("isb\n\t"
+"mrs %0, cntvct_el0\n\t"
+"isb\n\t"
+: "=r"(r));
+
+   return r;
+}
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/system_counter_offset_test.c 
b/tools/testing/selftests/kvm/system_counter_offset_test.c
index 7e9015770759..88ad997f5b69 100644
--- a/tools/testing/selftests/kvm/system_counter_offset_test.c
+++ b/tools/testing/selftests/kvm/system_counter_offset_test.c
@@ -53,7 +53,59 @@ static uint64_t host_read_guest_system_counter(struct 
test_case *test)
return rdtsc() + test->tsc_offset;
 }
 
-#else /* __x86_64__ */
+#elif __aarch64__ /* __x86_64__ */
+
+enum arch_counter {
+   VIRTUAL,
+};
+
+struct test_case {
+   enum arch_counter counter;
+   uint64_t offset;
+};
+
+static struct test_case test_cases[] = {
+   { .counter = VIRTUAL, .offset = 0 },
+   { .counter = VIRTUAL, .offset = 180 * NSEC_PER_SEC },
+   { .counter = VIRTUAL, .offset = -180 * NSEC_PER_SEC },
+};
+
+static void check_preconditions(struct kvm_vm *vm)
+{
+   if (!_vcpu_has_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
+  KVM_ARM_VCPU_TIMER_OFFSET_VTIMER))
+   return;
+
+   print_skip("KVM_ARM_VCPU_TIMER_OFFSET_VTIMER not supported; skipping 
test");
+   exit(KSFT_SKIP);
+}
+
+static void setup_system_counter(struct kvm_vm *vm, struct test_case *test)
+{
+   vcpu_access_device_attr(vm, VCPU_ID, KVM_ARM_VCPU_TIMER_CTRL,
+   KVM_ARM_VCPU_TIMER_OFFSET_VTIMER, &test->offset,
+   true);
+}
+
+static uint64_t guest_read_system_counter(struct test_case *test)
+{
+   switch (test->counter) {
+   case VIRTUAL:
+   return read_cntvct_ordered();
+   default:
+   GUEST_ASSERT(0);
+   }
+
+   /* unreachable */
+   return 0;
+}
+
+static uint64_t host_read_guest_system_counter(struct test_case *test)
+{
+   return read_cntvct_ordered() - test->offset;
+}
+
+#else /* __aarch64__ */
 
 #error test not implemented for this architecture!
 
-- 
2.32.0.402.g57bb445576-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 01/12] KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK

2021-07-16 Thread Oliver Upton
Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.

Suggested-by: Paolo Bonzini 
Signed-off-by: Oliver Upton 
---
 Documentation/virt/kvm/api.rst  |  42 +++--
 arch/x86/include/asm/kvm_host.h |   3 +
 arch/x86/kvm/x86.c  | 149 
 include/uapi/linux/kvm.h|   7 +-
 4 files changed, 137 insertions(+), 64 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b9ddce5638f5..26bb01a6e82e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -993,20 +993,34 @@ such as migration.
 When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the
 set of bits that KVM can return in struct kvm_clock_data's flag member.
 
-The only flag defined now is KVM_CLOCK_TSC_STABLE.  If set, the returned
-value is the exact kvmclock value seen by all VCPUs at the instant
-when KVM_GET_CLOCK was called.  If clear, the returned value is simply
-CLOCK_MONOTONIC plus a constant offset; the offset can be modified
-with KVM_SET_CLOCK.  KVM will try to make all VCPUs follow this clock,
-but the exact value read by each VCPU could differ, because the host
-TSC is not stable.
+FLAGS:
+
+KVM_CLOCK_TSC_STABLE.  If set, the returned value is the exact kvmclock
+value seen by all VCPUs at the instant when KVM_GET_CLOCK was called.
+If clear, the returned value is simply CLOCK_MONOTONIC plus a constant
+offset; the offset can be modified with KVM_SET_CLOCK.  KVM will try
+to make all VCPUs follow this clock, but the exact value read by each
+VCPU could differ, because the host TSC is not stable.
+
+KVM_CLOCK_REAL_TIME.  If set, the `realtime` field in the kvm_clock_data
+structure is populated with the value of the host's real time
+clocksource at the instant when KVM_GET_CLOCK was called. If clear,
+the `realtime` field does not contain a value.
+
+KVM_CLOCK_HOST_TSC.  If set, the `host_tsc` field in the kvm_clock_data
+structure is populated with the value of the host's timestamp counter (TSC)
+at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field
+does not contain a value.
 
 ::
 
   struct kvm_clock_data {
__u64 clock;  /* kvmclock current value */
__u32 flags;
-   __u32 pad[9];
+   __u32 pad0;
+   __u64 realtime;
+   __u64 host_tsc;
+   __u32 pad[4];
   };
 
 
@@ -1023,12 +1037,22 @@ Sets the current timestamp of kvmclock to the value 
specified in its parameter.
 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on 
scenarios
 such as migration.
 
+FLAGS:
+
+KVM_CLOCK_REAL_TIME.  If set, KVM will compare the value of the `realtime` 
field
+with the value of the host's real time clocksource at the instant when
+KVM_SET_CLOCK was called. The difference in elapsed time is added to the final
+kvmclock value that will be provided to guests.
+
 ::
 
   struct kvm_clock_data {
__u64 clock;  /* kvmclock current value */
__u32 flags;
-   __u32 pad[9];
+   __u32 pad0;
+   __u64 realtime;
+   __u64 host_tsc;
+   __u32 pad[4];
   };
 
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 974cbfb1eefe..e527d7259415 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1936,4 +1936,7 @@ int kvm_cpu_dirty_log_size(void);
 
 int alloc_all_memslots_rmaps(struct kvm *kvm);
 
+#define KVM_CLOCK_VALID_FLAGS  \
+   (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REAL_TIME | KVM_CLOCK_HOST_TSC)
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d715ae9f9108..4e803632cdca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2780,17 +2780,24 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 #endif
 }
 
-u64 get_kvmclock_ns(struct kvm *kvm)
+/**
+ * Returns true if realtime and TSC values were written back to the caller.
+ * Returns false if a clock triplet cannot be obtained, such as if the host's
+ * realtime clock is not based on the TSC.
+ */
+static bool get_kvmclock_and_realtime(struct kvm *kvm, u64 *kvmclock_ns,
+ u64 *realtime_ns, u64 *tsc)
 {
struct kvm_arch *ka = &kvm->arch;
struct pvclock_vcpu_time_info hv_clock;
unsigned long flags;
-   u64 ret;
+   bool ret = false;
 
spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags);
if (!ka->use_maste

[PATCH v2 06/12] selftests: KVM: Add helpers for vCPU device attributes

2021-07-16 Thread Oliver Upton
vCPU file descriptors are abstracted away from test code in KVM
selftests, meaning that tests cannot directly access a vCPU's device
attributes. Add helpers that tests can use to get at vCPU device
attributes.

Signed-off-by: Oliver Upton 
---
 .../testing/selftests/kvm/include/kvm_util.h  |  9 +
 tools/testing/selftests/kvm/lib/kvm_util.c| 38 +++
 2 files changed, 47 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index a8ac5d52e17b..1b3ef5757819 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -240,6 +240,15 @@ int _kvm_device_access(int dev_fd, uint32_t group, 
uint64_t attr,
 int kvm_device_access(int dev_fd, uint32_t group, uint64_t attr,
  void *val, bool write);
 
+int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+ uint64_t attr);
+int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr);
+int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t 
group,
+ uint64_t attr, void *val, bool write);
+int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr, void *val, bool write);
+
 const char *exit_reason_str(unsigned int exit_reason);
 
 void virt_pgd_alloc(struct kvm_vm *vm);
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c 
b/tools/testing/selftests/kvm/lib/kvm_util.c
index 10a8ed691c66..b595e7dc3fc5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2040,6 +2040,44 @@ int kvm_device_access(int dev_fd, uint32_t group, 
uint64_t attr,
return ret;
 }
 
+int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+ uint64_t attr)
+{
+   struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+   TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid);
+
+   return _kvm_device_check_attr(vcpu->fd, group, attr);
+}
+
+int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+uint64_t attr)
+{
+   int ret = _vcpu_has_device_attr(vm, vcpuid, group, attr);
+
+   TEST_ASSERT(!ret, "KVM_HAS_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", 
ret, errno);
+   return ret;
+}
+
+int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t 
group,
+uint64_t attr, void *val, bool write)
+{
+   struct vcpu *vcpu = vcpu_find(vm, vcpuid);
+
+   TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid);
+
+   return _kvm_device_access(vcpu->fd, group, attr, val, write);
+}
+
+int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group,
+   uint64_t attr, void *val, bool write)
+{
+   int ret = _vcpu_access_device_attr(vm, vcpuid, group, attr, val, write);
+
+   TEST_ASSERT(!ret, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: 
%i", ret, errno);
+   return ret;
+}
+
 /*
  * VM Dump
  *
-- 
2.32.0.402.g57bb445576-goog

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v2 05/12] selftests: KVM: Add test for KVM_{GET,SET}_CLOCK

2021-07-16 Thread Oliver Upton
Add a selftest for the new KVM clock UAPI that was introduced. Ensure
that the KVM clock is consistent between userspace and the guest, and
that the difference in realtime will only ever cause the KVM clock to
advance forward.

Signed-off-by: Oliver Upton 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../testing/selftests/kvm/include/kvm_util.h  |   2 +
 .../selftests/kvm/x86_64/kvm_clock_test.c | 210 ++
 4 files changed, 214 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_clock_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index 06a351b4f93b..d0877d01e771 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -11,6 +11,7 @@
 /x86_64/emulator_error_test
 /x86_64/get_cpuid_test
 /x86_64/get_msr_index_features
+/x86_64/kvm_clock_test
 /x86_64/kvm_pv_test
 /x86_64/hyperv_clock
 /x86_64/hyperv_cpuid
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index b853be2ae3c6..f7e24f334c6e 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -46,6 +46,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/get_cpuid_test
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid
 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features
+TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test
 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test
 TEST_GEN_PROGS_x86_64 += x86_64/mmu_role_test
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h 
b/tools/testing/selftests/kvm/include/kvm_util.h
index 010b59b13917..a8ac5d52e17b 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -19,6 +19,8 @@
 #define KVM_DEV_PATH "/dev/kvm"
 #define KVM_MAX_VCPUS 512
 
+#define NSEC_PER_SEC 10L
+
 /*
  * Callers of kvm_util only have an incomplete/opaque description of the
  * structure kvm_util is using to maintain the state of a VM.
diff --git a/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c 
b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
new file mode 100644
index ..34c48f2dde54
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/kvm_clock_test.c
@@ -0,0 +1,210 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2021, Google LLC.
+ *
+ * Tests for adjusting the KVM clock from userspace
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+#define VCPU_ID 0
+
+struct test_case {
+   uint64_t kvmclock_base;
+   int64_t realtime_offset;
+};
+
+static struct test_case test_cases[] = {
+   { .kvmclock_base = 0 },
+   { .kvmclock_base = 180 * NSEC_PER_SEC },
+   { .kvmclock_base = 0, .realtime_offset = -180 * NSEC_PER_SEC },
+   { .kvmclock_base = 0, .realtime_offset = 180 * NSEC_PER_SEC },
+};
+
+#define GUEST_SYNC_CLOCK(__stage, __val)   \
+   GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0)
+
+static void guest_main(vm_paddr_t pvti_pa, struct pvclock_vcpu_time_info *pvti)
+{
+   int i;
+
+   wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED);
+   for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+   GUEST_SYNC_CLOCK(i, __pvclock_read_cycles(pvti, rdtsc()));
+   }
+
+   GUEST_DONE();
+}
+
+#define EXPECTED_FLAGS (KVM_CLOCK_REAL_TIME | KVM_CLOCK_HOST_TSC)
+
+static inline void assert_flags(struct kvm_clock_data *data)
+{
+   TEST_ASSERT((data->flags & EXPECTED_FLAGS) == EXPECTED_FLAGS,
+   "unexpected clock data flags: %x (want set: %x)",
+   data->flags, EXPECTED_FLAGS);
+}
+
+static void handle_sync(struct ucall *uc, struct kvm_clock_data *start,
+   struct kvm_clock_data *end)
+{
+   uint64_t obs, exp_lo, exp_hi;
+
+   obs = uc->args[2];
+   exp_lo = start->clock;
+   exp_hi = end->clock;
+
+   assert_flags(start);
+   assert_flags(end);
+
+   TEST_ASSERT(exp_lo <= obs && obs <= exp_hi,
+   "unexpected kvm-clock value: %"PRIu64" expected range: 
[%"PRIu64", %"PRIu64"]",
+   obs, exp_lo, exp_hi);
+
+   pr_info("kvm-clock value: %"PRIu64" expected range [%"PRIu64", 
%"PRIu64"]\n",
+   obs, exp_lo, exp_hi);
+}
+
+static void handle_abort(struct ucall *uc)
+{
+   TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0],
+ __FILE__, uc->args[1]);
+}
+
+static void setup_clock(struct kvm_vm *vm, struct test_case *test_case)
+{
+   struct kvm_clock_data data;
+
+   memset(&data, 0, sizeof(data));
+
+   data.clock = test_case->kvmclock_base;
+   if (test_case->realtime_offset) {
+   struct timespec ts;
+   int r;
+
+   

[PATCH v2 10/12] KVM: arm64: Provide userspace access to the physical counter offset

2021-07-16 Thread Oliver Upton
Presently, KVM provides no facilities for correctly migrating a guest
that depends on the physical counter-timer. While most guests (barring
NV, of course) should not depend on the physical counter-timer, an
operator may still wish to provide a consistent view of the physical
counter-timer across migrations.

Provide userspace with a new vCPU attribute to modify the guest physical
counter-timer offset. Since the base architecture doesn't provide a
physical counter-timer offset register, emulate the correct behavior by
trapping accesses to the physical counter-timer whenever the offset
value is non-zero.

Uphold the same behavior as CNTVOFF_EL2 and broadcast the physical
offset to all vCPUs whenever written. This guarantees that the
counter-timer we provide the guest remains architectural, wherein all
views of the counter-timer are consistent across vCPUs. Reconfigure
timer traps for VHE on every guest entry, as different VMs will now have
different traps enabled. Enable physical counter traps for nVHE whenever
the offset is nonzero (we already trap physical timer registers in
nVHE).

FEAT_ECV provides a guest physical counter-timer offset register
(CNTPOFF_EL2), but ECV-enabled hardware is nonexistent at the time of
writing so support for it was elided for the sake of the author :)

Signed-off-by: Oliver Upton 
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/include/asm/kvm_hyp.h  |  2 -
 arch/arm64/include/asm/sysreg.h   |  1 +
 arch/arm64/include/uapi/asm/kvm.h |  1 +
 arch/arm64/kvm/arch_timer.c   | 50 ---
 arch/arm64/kvm/arm.c  |  4 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h   | 23 +++
 arch/arm64/kvm/hyp/include/hyp/timer-sr.h | 26 
 arch/arm64/kvm/hyp/nvhe/switch.c  |  2 -
 arch/arm64/kvm/hyp/nvhe/timer-sr.c| 21 +-
 arch/arm64/kvm/hyp/vhe/timer-sr.c | 27 
 include/kvm/arm_arch_timer.h  |  2 -
 12 files changed, 136 insertions(+), 24 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/hyp/timer-sr.h

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..de92fa678924 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -204,6 +204,7 @@ enum vcpu_sysreg {
SP_EL1,
SPSR_EL1,
 
+   CNTPOFF_EL2,
CNTVOFF_EL2,
CNTV_CVAL_EL0,
CNTV_CTL_EL0,
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 9d60b3006efc..01eb3864e50f 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -65,10 +65,8 @@ void __vgic_v3_save_aprs(struct vgic_v3_cpu_if *cpu_if);
 void __vgic_v3_restore_aprs(struct vgic_v3_cpu_if *cpu_if);
 int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
 
-#ifdef __KVM_NVHE_HYPERVISOR__
 void __timer_enable_traps(struct kvm_vcpu *vcpu);
 void __timer_disable_traps(struct kvm_vcpu *vcpu);
-#endif
 
 #ifdef __KVM_NVHE_HYPERVISOR__
 void __sysreg_save_state_nvhe(struct kvm_cpu_context *ctxt);
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 347ccac2341e..243e36c088e7 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -505,6 +505,7 @@
 #define SYS_AMEVCNTR0_MEM_STALLSYS_AMEVCNTR0_EL0(3)
 
 #define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0)
+#define SYS_CNTPCT_EL0 sys_reg(3, 3, 14, 0, 1)
 
 #define SYS_CNTP_TVAL_EL0  sys_reg(3, 3, 14, 2, 0)
 #define SYS_CNTP_CTL_EL0   sys_reg(3, 3, 14, 2, 1)
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index 008d0518d2b1..3e42c72d4c68 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -366,6 +366,7 @@ struct kvm_arm_copy_mte_tags {
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER1
 #define   KVM_ARM_VCPU_TIMER_OFFSET_VTIMER 2
+#define   KVM_ARM_VCPU_TIMER_OFFSET_PTIMER 3
 #define KVM_ARM_VCPU_PVTIME_CTRL   2
 #define   KVM_ARM_VCPU_PVTIME_IPA  0
 
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index d2b1b13af658..05ec385e26b5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -89,7 +89,10 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
switch(arch_timer_ctx_index(ctxt)) {
case TIMER_VTIMER:
return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+   case TIMER_PTIMER:
+   return __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
default:
+   WARN_ONCE(1, "unrecognized timer index %ld", 
arch_timer_ctx_index(ctxt));
return 0;
}
 }
@@ -134,6 +137,9 @@ static void timer_set_offset(struct arch_timer_context 
*ctxt, u64 offset)
case TIMER_VTIMER:
__vcpu_sys_reg(vcp

[PATCH v2 00/12] KVM: Add idempotent controls for migrating system counter state

2021-07-16 Thread Oliver Upton
KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state is not idempotent: the host system counter is still
oscillating between the attempted save and restore. Furthermore, VMMs
may wish to transparently live migrate guest VMs, meaning that they
include the elapsed time due to live migration blackout in the guest
system counter view. The VMM thread could be preempted for any number of
reasons (scheduler, L0 hypervisor under nested) between the time that
it calculates the desired guest counter value and when KVM actually sets
this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of system counter offsets.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in new ioctls. This series
introduces new vCPU device attributes to provide userspace access to the
vCPU's system counter offset.

Patch 1 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK
ioctls to provide userspace with a (host_tsc, realtime) instant. This is
essential for a VMM to perform precise migration of the guest's system
counters.

Patches 2-3 add support for x86 by shoehorning the new controls into the
pre-existing synchronization heuristics.

Patches 4-5 implement a test for the new additions to
KVM_{GET,SET}_CLOCK.

Patches 6-7 implement at test for the tsc offset attribute introduced in
patch 3.

Patch 8 adds a device attribute for the arm64 virtual counter-timer
offset.

Patch 9 extends the test from patch 7 to cover the arm64 virtual
counter-timer offset.

Patch 10 adds a device attribute for the arm64 physical counter-timer
offset. Currently, this is implemented as a synthetic register, forcing
the guest to trap to the host and emulating the offset in the fast exit
path. Later down the line we will have hardware with FEAT_ECV, which
allows the hypervisor to perform physical counter-timer offsetting in
hardware (CNTPOFF_EL2).

Patch 11 extends the test from patch 7 to cover the arm64 physical
counter-timer offset.

Patch 12 introduces a benchmark to measure the overhead of emulation in
patch 10.

Physical counter benchmark
--

The following data was collected by running 1 iterations of the
benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
machine with 2 80-core Ampere Altra SoCs. Measurements were collected
for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
parameter.

nVHE


+++-+
|   Metric   | Native | Trapped |
+++-+
| Average| 54ns   | 148ns   |
| Standard Deviation | 124ns  | 122ns   |
| 95th Percentile| 258ns  | 348ns   |
+++-+

VHE
---

+++-+
|   Metric   | Native | Trapped |
+++-+
| Average| 53ns   | 152ns   |
| Standard Deviation | 92ns   | 94ns|
| 95th Percentile| 204ns  | 307ns   |
+++-+

This series applies cleanly to the following commit:

1889228d80fe ("KVM: selftests: smm_test: Test SMM enter from L2")

v1 -> v2:
  - Reimplemented as vCPU device attributes instead of a distinct ioctl.
  - Added the (realtime, host_tsc) instant support to
KVM_{GET,SET}_CLOCK
  - Changed the arm64 implementation to broadcast counter offset values
to all vCPUs in a guest. This upholds the architectural expectations
of a consistent counter-timer across CPUs.
  - Fixed a bug with traps in VHE mode. We now configure traps on every
transition into a guest to handle differing VMs (trapped, emulated).

Oliver Upton (12):
  KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Expose TSC offset controls to userspace
  tools: arch: x86: pull in pvclock headers
  selftests: KVM: Add test for KVM_{GET,SET}_CLOCK
  selftests: KVM: Add helpers for vCPU device attributes
  selftests: KVM: Introduce system counter offset test
  KVM: arm64: Allow userspace to configure a vCPU's virtual offset
  selftests: KVM: Add support for aarch64 to system_counter_offset_test
  KVM: arm64: Provide userspace access to the physical counter offset
  selftests: KVM: Test physical counter offsetting
  selftests: KVM: Add counter emulation benchmark

 Documentation/virt/kvm/api.rst|  42 +-
 Documentation/virt/kvm/locking.rst|  11 +
 arch/arm64/include/asm/kvm_host.h |   1 +
 arch/arm64/include/asm/kvm_hyp.h  |   2 -
 arch/arm64/include/asm/sysreg.h   |   1 +
 arch/arm64/include/uapi/asm/kvm.h |   2 +

[PATCH V8 01/18] perf/core: Use static_call to optimize perf_guest_info_callbacks

2021-07-16 Thread Zhu Lingshan
From: Like Xu 

For "struct perf_guest_info_callbacks", the two fields "is_in_guest"
and "is_user_mode" are replaced with a new multiplexed member named
"state", and the "get_guest_ip" field will be renamed to "get_ip".

For arm64, xen and kvm/x86, the application of DEFINE_STATIC_CALL_RET0
could make all that perf_guest_cbs stuff suck less. For arm, csky, nds32,
and riscv, just applied some renamed refactoring.

Cc: Will Deacon 
Cc: Marc Zyngier 
Cc: Guo Ren 
Cc: Nick Hu 
Cc: Paul Walmsley 
Cc: Boris Ostrovsky 
Cc: linux-arm-ker...@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-c...@vger.kernel.org
Cc: linux-ri...@lists.infradead.org
Cc: xen-de...@lists.xenproject.org
Suggested-by: Peter Zijlstra (Intel) 
Original-by: Peter Zijlstra (Intel) 
Signed-off-by: Like Xu 
Signed-off-by: Zhu Lingshan 
Reviewed-by: Boris Ostrovsky 
---
 arch/arm/kernel/perf_callchain.c   | 16 +++-
 arch/arm64/kernel/perf_callchain.c | 29 +-
 arch/arm64/kvm/perf.c  | 22 -
 arch/csky/kernel/perf_callchain.c  |  4 +--
 arch/nds32/kernel/perf_event_cpu.c | 16 +++-
 arch/riscv/kernel/perf_callchain.c |  4 +--
 arch/x86/events/core.c | 39 --
 arch/x86/events/intel/core.c   |  7 +++---
 arch/x86/include/asm/kvm_host.h|  2 +-
 arch/x86/kvm/pmu.c |  2 +-
 arch/x86/kvm/x86.c | 37 +++-
 arch/x86/xen/pmu.c | 32 ++--
 include/linux/perf_event.h | 12 ++---
 kernel/events/core.c   |  9 +++
 14 files changed, 144 insertions(+), 87 deletions(-)

diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
index 3b69a76d341e..1ce30f86d6c7 100644
--- a/arch/arm/kernel/perf_callchain.c
+++ b/arch/arm/kernel/perf_callchain.c
@@ -64,7 +64,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, 
struct pt_regs *regs
 {
struct frame_tail __user *tail;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (perf_guest_cbs && perf_guest_cbs->state()) {
/* We don't support guest os callchain now */
return;
}
@@ -100,7 +100,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
 {
struct stackframe fr;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (perf_guest_cbs && perf_guest_cbs->state()) {
/* We don't support guest os callchain now */
return;
}
@@ -111,8 +111,8 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx 
*entry, struct pt_regs *re
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-   return perf_guest_cbs->get_guest_ip();
+   if (perf_guest_cbs && perf_guest_cbs->state())
+   return perf_guest_cbs->get_ip();
 
return instruction_pointer(regs);
 }
@@ -120,9 +120,13 @@ unsigned long perf_instruction_pointer(struct pt_regs 
*regs)
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
int misc = 0;
+   unsigned int state = 0;
 
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-   if (perf_guest_cbs->is_user_mode())
+   if (perf_guest_cbs)
+   state = perf_guest_cbs->state();
+
+   if (perf_guest_cbs && state) {
+   if (state & PERF_GUEST_USER)
misc |= PERF_RECORD_MISC_GUEST_USER;
else
misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/arm64/kernel/perf_callchain.c 
b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..1b344e23fd2f 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2015 ARM Limited
  */
 #include 
+#include 
 #include 
 
 #include 
@@ -99,10 +100,25 @@ compat_user_backtrace(struct compat_frame_tail __user 
*tail,
 }
 #endif /* CONFIG_COMPAT */
 
+DEFINE_STATIC_CALL_RET0(arm64_guest_state, *(perf_guest_cbs->state));
+DEFINE_STATIC_CALL_RET0(arm64_guest_get_ip, *(perf_guest_cbs->get_ip));
+
+void arch_perf_update_guest_cbs(void)
+{
+   static_call_update(arm64_guest_state, (void *)&__static_call_return0);
+   static_call_update(arm64_guest_get_ip, (void *)&__static_call_return0);
+
+   if (perf_guest_cbs && perf_guest_cbs->state)
+   static_call_update(arm64_guest_state, perf_guest_cbs->state);
+
+   if (perf_guest_cbs && perf_guest_cbs->get_ip)
+   static_call_update(arm64_guest_get_ip, perf_guest_cbs->get_ip);
+}
+
 void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
 struct pt_regs *regs)
 {
-   if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+   if (static_call(arm64_guest_state)()) {
/* We don't support guest os callchain now */
retu

Re: [kvm-unit-tests] its-migration segmentation fault

2021-07-16 Thread Po-Hsu Lin
On Tue, Jun 15, 2021 at 3:11 PM Po-Hsu Lin  wrote:
>
> On Tue, Jun 15, 2021 at 2:37 PM Andrew Jones  wrote:
> >
> > On Tue, Jun 15, 2021 at 11:21:05AM +0800, Po-Hsu Lin wrote:
> > > On Fri, Nov 20, 2020 at 8:35 PM Andrew Jones  wrote:
> > > >
> > > > On Fri, Nov 20, 2020 at 12:02:10PM +, Alexandru Elisei wrote:
> > > > > When running all the tests with taskset -c 0-3 ./run_tests.sh on a 
> > > > > rockpro64 (on
> > > > > the Cortex-a53 cores) the its-migration test hangs. In the log file I 
> > > > > see:
> > > > >
> > > > > run_migration timeout -k 1s --foreground 90s 
> > > > > /usr/bin/qemu-system-aarch64
> > > > > -nodefaults -machine virt,gic-version=host,accel=kvm -cpu host -device
> > > > > virtio-serial-device -device virtconsole,chardev=ctd -chardev 
> > > > > testdev,id=ctd
> > > > > -device pci-testdev -display none -serial stdio -kernel arm/gic.flat 
> > > > > -smp 6
> > > > > -machine gic-version=3 -append its-migration # -initrd 
> > > > > /tmp/tmp.OrlQiorBpY
> > > > > ITS: MAPD devid=2 size = 0x8 itt=0x4042 valid=1
> > > > > ITS: MAPD devid=7 size = 0x8 itt=0x4043 valid=1
> > > > > MAPC col_id=3 target_addr = 0x3 valid=1
> > > > > MAPC col_id=2 target_addr = 0x2 valid=1
> > > > > INVALL col_id=2
> > > > > INVALL col_id=3
> > > > > MAPTI dev_id=2 event_id=20 -> phys_id=8195, col_id=3
> > > > > MAPTI dev_id=7 event_id=255 -> phys_id=8196, col_id=2
> > > > > Now migrate the VM, then press a key to continue...
> > > > > scripts/arch-run.bash: line 103: 48549 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > >  48550 Segmentation fault  (core dumped) | ncat -U $1
> > > > > scripts/arch-run.bash: line 103: 48568 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > >  48569 Segmentation fault  (core dumped) | ncat -U $1
> > > > > scripts/arch-run.bash: line 103: 48583 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > >  48584 Segmentation fault  (core dumped) | ncat -U $1
> > > > > [..]
> > > > > scripts/arch-run.bash: line 103: 49414 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > >  49415 Segmentation fault  (core dumped) | ncat -U $1
> > > > > qemu-system-aarch64: terminating on signal 15 from pid 48496 (timeout)
> > > > > qemu-system-aarch64: terminating on signal 15 from pid 48504 (timeout)
> > > > > scripts/arch-run.bash: line 103: 49430 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > >  49431 Segmentation fault  (core dumped) | ncat -U $1
> > > > > scripts/arch-run.bash: line 103: 49445 Doneecho 
> > > > > '{ "execute":
> > > > > "qmp_capabilities" }{ "execute":' "$2" '}'
> > > > > [..]
> > > >
> > > > Is your ncat segfaulting? It looks like it from this output. Have you
> > > > tried running your ncat with a UNIX socket independently of this test?
> > > >
> > > > Is this the first time you've tried this test in this environment, or
> > > > is this a regression for you?
> > > >
> > > > >
> > > > > If I run the test manually:
> > > > >
> > > > > $ taskset -c 0-3 ./arm-run arm/gic.flat -smp 4 -machine gic-version=3 
> > > > > -append
> > > > > 'its-migration'
> > > >
> > > > This won't work because we need run_tests.sh to setup the 
> > > > run_migration()
> > > > call. The only ways to run migration tests separately are
> > > >
> > > >  $ ./run_tests.sh its-migration
> > > >
> > > > and
> > > >
> > > >  $ tests/its-migration
> > > >
> > > > For the second one you need to do 'make standalone' first.
> > > >
> > > >
> > > > >
> > > > > /usr/bin/qemu-system-aarch64 -nodefaults -machine 
> > > > > virt,gic-version=host,accel=kvm
> > > > > -cpu host -device virtio-serial-device -device 
> > > > > virtconsole,chardev=ctd -chardev
> > > > > testdev,id=ctd -device pci-testdev -display none -serial stdio -kernel
> > > > > arm/gic.flat -smp 4 -machine gic-version=3 -append its-migration # 
> > > > > -initrd
> > > > > /tmp/tmp.OtsTj3QD4J
> > > > > ITS: MAPD devid=2 size = 0x8 itt=0x403a valid=1
> > > > > ITS: MAPD devid=7 size = 0x8 itt=0x403b valid=1
> > > > > MAPC col_id=3 target_addr = 0x3 valid=1
> > > > > MAPC col_id=2 target_addr = 0x2 valid=1
> > > > > INVALL col_id=2
> > > > > INVALL col_id=3
> > > > > MAPTI dev_id=2 event_id=20 -> phys_id=8195, col_id=3
> > > > > MAPTI dev_id=7 event_id=255 -> phys_id=8196, col_id=2
> > > > > Now migrate the VM, then press a key to continue...
> > > > >
> > > > > And the test hangs here after I press a key.
> > > >
> > > > The test doesn't get your input because of the ' > > > run_qemu(),
> > > > which ./arm-run calls. So it's not hanging it's just waiting forever on
> > > > the key press.
> > > Hello Andrew,
> > > We have found this waiting for k