Re: kvm-unit-tests: psci_cpu_on_test FAILed
On 02/08/2019 11:56, Zenghui Yu wrote: > Hi folks, > > Running kvm-unit-tests with Linux 5.3.0-rc2 on Kunpeng 920, we will get > the following fail info: > > [...] > FAIL psci (4 tests, 1 unexpected failures) > [...] > and > [...] > INFO: unexpected cpu_on return value: caller=CPU9, ret=-2 > FAIL: cpu-on > SUMMARY: 4 tests, 1 unexpected failures > > > I think this is an issue had been fixed once by commit 6c7a5dce22b3 > ("KVM: arm/arm64: fix races in kvm_psci_vcpu_on"), which makes use of > kvm->lock mutex to fix the race between two PSCI_CPU_ON calls - one > does reset on the MPIDR register whilst another reads it. > > But commit 358b28f09f0 ("arm/arm64: KVM: Allow a VCPU to fully reset > itself") later moves the reset work into check_vcpu_requests(), by > making a KVM_REQ_VCPU_RESET request in PSCI code. Thus the reset work > has not been protected by kvm->lock mutex anymore, and the race shows up > again... > > Do we need a fix for this issue? At least achieve a mutex execution > between the reset of MPIDR and kvm_mpidr_to_vcpu()? The thing is that the way we reset registers is marginally insane. Yes, it catches most reset bugs. It also introduces many more in the rest of the paths. The fun part is that there is hardly a need for resetting MPIDR. It has already been set when we've created the vcpu. It is the poisoning of the sysreg array that creates a situation where the MPIDR is temporarily invalid. So instead of poisoning the array, how about we just keep track of the registers for which we've called a reset function? It should be enough to track the most obvious bugs... I've cobbled the following patch together, which seems to fix the issue on my TX2 with 64 vcpus. Thoughts? M. diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index f26e181d881c..17f46ee7dc83 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -2254,13 +2254,17 @@ static int emulate_sys_reg(struct kvm_vcpu *vcpu, } static void reset_sys_reg_descs(struct kvm_vcpu *vcpu, - const struct sys_reg_desc *table, size_t num) + const struct sys_reg_desc *table, size_t num, + unsigned long *bmap) { unsigned long i; for (i = 0; i < num; i++) - if (table[i].reset) + if (table[i].reset) { table[i].reset(vcpu, [i]); + if (bmap) + set_bit(i, bmap); + } } /** @@ -2772,21 +2776,23 @@ void kvm_sys_reg_table_init(void) */ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu) { + unsigned long *bmap; size_t num; const struct sys_reg_desc *table; - /* Catch someone adding a register without putting in reset entry. */ - memset(>arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs)); + bmap = bitmap_alloc(NR_SYS_REGS, GFP_KERNEL); /* Generic chip reset first (so target could override). */ - reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs)); + reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs), bmap); table = get_target_table(vcpu->arch.target, true, ); - reset_sys_reg_descs(vcpu, table, num); + reset_sys_reg_descs(vcpu, table, num, bmap); for (num = 1; num < NR_SYS_REGS; num++) { - if (WARN(__vcpu_sys_reg(vcpu, num) == 0x4242424242424242, + if (WARN(bmap && !test_bit(num, bmap), "Didn't reset __vcpu_sys_reg(%zi)\n", num)) break; } + + kfree(bmap); } -- Jazz is not dead, it just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 6/9] KVM: arm64: Provide a PV_TIME device to user space
Allow user space to inform the KVM host where in the physical memory map the paravirtualized time structures should be located. A device is created which provides the base address of an array of Stolen Time (ST) structures, one for each VCPU. There must be (64 * total number of VCPUs) bytes of memory available at this location. The address is given in terms of the physical address visible to the guest and must be 64 byte aligned. The memory should be marked as reserved to the guest to stop it allocating it for other purposes. Signed-off-by: Steven Price --- arch/arm64/include/asm/kvm_mmu.h | 2 + arch/arm64/include/uapi/asm/kvm.h | 6 + arch/arm64/kvm/Makefile | 1 + include/uapi/linux/kvm.h | 2 + virt/kvm/arm/mmu.c| 44 +++ virt/kvm/arm/pvtime.c | 190 ++ 6 files changed, 245 insertions(+) create mode 100644 virt/kvm/arm/pvtime.c diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index befe37d4bc0e..88c8a4b2836f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -157,6 +157,8 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm); void kvm_free_stage2_pgd(struct kvm *kvm); int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, phys_addr_t pa, unsigned long size, bool writable); +int kvm_phys_addr_memremap(struct kvm *kvm, phys_addr_t guest_ipa, + phys_addr_t pa, unsigned long size, bool writable); int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run); diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 9a507716ae2f..95516a4198ea 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -367,6 +367,12 @@ struct kvm_vcpu_events { #define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS #define KVM_PSCI_RET_DENIEDPSCI_RET_DENIED +/* Device Control API: PV_TIME */ +#define KVM_DEV_ARM_PV_TIME_PADDR 0 +#define KVM_DEV_ARM_PV_TIME_ST0 +#define KVM_DEV_ARM_PV_TIME_STATE_SIZE 1 +#define KVM_DEV_ARM_PV_TIME_STATE 2 + #endif #endif /* __ARM_KVM_H__ */ diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 73dce4d47d47..5ffbdc39e780 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -14,6 +14,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/e kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index a7c19540ce21..04bffafa0708 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1222,6 +1222,8 @@ enum kvm_device_type { #define KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_ARM_VGIC_ITS KVM_DEV_TYPE_XIVE, #define KVM_DEV_TYPE_XIVE KVM_DEV_TYPE_XIVE + KVM_DEV_TYPE_ARM_PV_TIME, +#define KVM_DEV_TYPE_ARM_PV_TIME KVM_DEV_TYPE_ARM_PV_TIME KVM_DEV_TYPE_MAX, }; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 38b4c910b6c3..be28a4aee451 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1368,6 +1368,50 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, return ret; } +/** + * kvm_phys_addr_memremap - map a memory range to guest IPA + * + * @kvm: The KVM pointer + * @guest_ipa: The IPA at which to insert the mapping + * @pa:The physical address of the memory + * @size: The size of the mapping + */ +int kvm_phys_addr_memremap(struct kvm *kvm, phys_addr_t guest_ipa, + phys_addr_t pa, unsigned long size, bool writable) +{ + phys_addr_t addr, end; + int ret = 0; + unsigned long pfn; + struct kvm_mmu_memory_cache cache = { 0, }; + + end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK; + pfn = __phys_to_pfn(pa); + + for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { + pte_t pte = pfn_pte(pfn, PAGE_S2); + + if (writable) + pte = kvm_s2pte_mkwrite(pte); + + ret = mmu_topup_memory_cache(, +kvm_mmu_cache_min_pages(kvm), +KVM_NR_MEM_OBJS); + if (ret) + goto out; + spin_lock(>mmu_lock); + ret = stage2_set_pte(kvm, , addr, , 0); + spin_unlock(>mmu_lock); + if (ret) + goto out; + + pfn++; + } + +out: + mmu_free_memory_cache(); +
[PATCH 9/9] arm64: Retrieve stolen time as paravirtualized guest
Enable paravirtualization features when running under a hypervisor supporting the PV_TIME_ST hypercall. For each (v)CPU, we ask the hypervisor for the location of a shared page which the hypervisor will use to report stolen time to us. We set pv_time_ops to the stolen time function which simply reads the stolen value from the shared page for a VCPU. We guarantee single-copy atomicity using READ_ONCE which means we can also read the stolen time for another VCPU than the currently running one while it is potentially being updated by the hypervisor. Signed-off-by: Steven Price --- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/kvm.c| 155 + include/linux/cpuhotplug.h | 1 + 3 files changed, 157 insertions(+) create mode 100644 arch/arm64/kernel/kvm.c diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 478491f07b4f..eb36edf9b930 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -63,6 +63,7 @@ obj-$(CONFIG_CRASH_CORE) += crash_core.o obj-$(CONFIG_ARM_SDE_INTERFACE)+= sdei.o obj-$(CONFIG_ARM64_SSBD) += ssbd.o obj-$(CONFIG_ARM64_PTR_AUTH) += pointer_auth.o +obj-$(CONFIG_PARAVIRT) += kvm.o obj-y += vdso/ probes/ obj-$(CONFIG_COMPAT_VDSO) += vdso32/ diff --git a/arch/arm64/kernel/kvm.c b/arch/arm64/kernel/kvm.c new file mode 100644 index ..245398c79dae --- /dev/null +++ b/arch/arm64/kernel/kvm.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2019 Arm Ltd. + +#define pr_fmt(fmt) "kvmarm-pv: " fmt + +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include + +struct kvmarm_stolen_time_region { + struct pvclock_vcpu_stolen_time_info *kaddr; +}; + +static DEFINE_PER_CPU(struct kvmarm_stolen_time_region, stolen_time_region); + +static bool steal_acc = true; +static int __init parse_no_stealacc(char *arg) +{ + steal_acc = false; + return 0; +} +early_param("no-steal-acc", parse_no_stealacc); + +/* return stolen time in ns by asking the hypervisor */ +static u64 kvm_steal_clock(int cpu) +{ + struct kvmarm_stolen_time_region *reg; + + reg = per_cpu_ptr(_time_region, cpu); + if (!reg->kaddr) { + pr_warn_once("stolen time enabled but not configured for cpu %d\n", +cpu); + return 0; + } + + return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time)); +} + +static int disable_stolen_time_current_cpu(void) +{ + struct kvmarm_stolen_time_region *reg; + + reg = this_cpu_ptr(_time_region); + if (!reg->kaddr) + return 0; + + memunmap(reg->kaddr); + memset(reg, 0, sizeof(*reg)); + + return 0; +} + +static int stolen_time_dying_cpu(unsigned int cpu) +{ + return disable_stolen_time_current_cpu(); +} + +static int init_stolen_time_cpu(unsigned int cpu) +{ + struct kvmarm_stolen_time_region *reg; + struct arm_smccc_res res; + + reg = this_cpu_ptr(_time_region); + + if (reg->kaddr) + return 0; + + arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_ST, ); + + if ((long)res.a0 < 0) + return -EINVAL; + + reg->kaddr = memremap(res.a0, + sizeof(struct pvclock_vcpu_stolen_time_info), + MEMREMAP_WB); + + if (reg->kaddr == NULL) { + pr_warn("Failed to map stolen time data structure\n"); + return -EINVAL; + } + + if (le32_to_cpu(reg->kaddr->revision) != 0 || + le32_to_cpu(reg->kaddr->attributes) != 0) { + pr_warn("Unexpected revision or attributes in stolen time data\n"); + return -ENXIO; + } + + return 0; +} + +static int kvm_arm_init_stolen_time(void) +{ + int ret; + + ret = cpuhp_setup_state(CPUHP_AP_ARM_KVMPV_STARTING, + "hypervisor/kvmarm/pv:starting", + init_stolen_time_cpu, stolen_time_dying_cpu); + if (ret < 0) + return ret; + return 0; +} + +static bool has_kvm_steal_clock(void) +{ + struct arm_smccc_res res; + + /* To detect the presence of PV time support we require SMCCC 1.1+ */ + if (psci_ops.smccc_version < SMCCC_VERSION_1_1) + return false; + + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, +ARM_SMCCC_HV_PV_FEATURES, ); + + if (res.a0 != SMCCC_RET_SUCCESS) + return false; + + arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_FEATURES, +ARM_SMCCC_HV_PV_TIME_ST, ); + + if (res.a0 != SMCCC_RET_SUCCESS) + return false; + + return true; +} + +static int __init kvm_guest_init(void) +{ + int ret = 0; + + if
[PATCH 7/9] arm/arm64: Provide a wrapper for SMCCC 1.1 calls
SMCCC 1.1 calls may use either HVC or SMC depending on the PSCI conduit. Rather than coding this in every call site provide a macro which uses the correct instruction. The macro also handles the case where no PSCI conduit is configured returning a not supported error in res, along with returning the conduit used for the call. This allow us to remove some duplicated code and will be useful later when adding paravirtualized time hypervisor calls. Signed-off-by: Steven Price --- include/linux/arm-smccc.h | 44 +++ 1 file changed, 44 insertions(+) diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index e7f129f26ebd..eee1e832221d 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -303,6 +303,50 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, #define SMCCC_RET_NOT_SUPPORTED-1 #define SMCCC_RET_NOT_REQUIRED -2 +/* Like arm_smccc_1_1* but always returns SMCCC_RET_NOT_SUPPORTED. + * Used when the PSCI conduit is not defined. The empty asm statement + * avoids compiler warnings about unused variables. + */ +#define __fail_smccc_1_1(...) \ + do {\ + __declare_args(__count_args(__VA_ARGS__), __VA_ARGS__); \ + asm ("" __constraints(__count_args(__VA_ARGS__))); \ + if (___res) \ + ___res->a0 = SMCCC_RET_NOT_SUPPORTED; \ + } while (0) + +/* + * arm_smccc_1_1_invoke() - make an SMCCC v1.1 compliant call + * + * This is a variadic macro taking one to eight source arguments, and + * an optional return structure. + * + * @a0-a7: arguments passed in registers 0 to 7 + * @res: result values from registers 0 to 3 + * + * This macro will make either an HVC call or an SMC call depending on the + * current PSCI conduit. If no valid conduit is available then -1 + * (SMCCC_RET_NOT_SUPPORTED) is returned in @res.a0 (if supplied). + * + * The return value also provides the conduit that was used. + */ +#define arm_smccc_1_1_invoke(...) ({ \ + int method = psci_ops.conduit; \ + switch (method) { \ + case PSCI_CONDUIT_HVC: \ + arm_smccc_1_1_hvc(__VA_ARGS__); \ + break; \ + case PSCI_CONDUIT_SMC: \ + arm_smccc_1_1_smc(__VA_ARGS__); \ + break; \ + default:\ + __fail_smccc_1_1(__VA_ARGS__); \ + method = PSCI_CONDUIT_NONE; \ + break; \ + } \ + method; \ + }) + /* Paravirtualised time calls (defined by ARM DEN0057A) */ #define ARM_SMCCC_HV_PV_FEATURES \ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 8/9] arm/arm64: Make use of the SMCCC 1.1 wrapper
Rather than directly choosing which function to use based on psci_ops.conduit, use the new arm_smccc_1_1 wrapper instead. In some cases we still need to do some operations based on the conduit, but the code duplication is removed. No functional change. Signed-off-by: Steven Price --- arch/arm/mm/proc-v7-bugs.c | 13 +++--- arch/arm64/kernel/cpu_errata.c | 80 -- 2 files changed, 33 insertions(+), 60 deletions(-) diff --git a/arch/arm/mm/proc-v7-bugs.c b/arch/arm/mm/proc-v7-bugs.c index 9a07916af8dd..8eb52f3385e7 100644 --- a/arch/arm/mm/proc-v7-bugs.c +++ b/arch/arm/mm/proc-v7-bugs.c @@ -78,12 +78,13 @@ static void cpu_v7_spectre_init(void) if (psci_ops.smccc_version == SMCCC_VERSION_1_0) break; + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, +ARM_SMCCC_ARCH_WORKAROUND_1, ); + if ((int)res.a0 != 0) + return; + switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: - arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, - ARM_SMCCC_ARCH_WORKAROUND_1, ); - if ((int)res.a0 != 0) - break; per_cpu(harden_branch_predictor_fn, cpu) = call_hvc_arch_workaround_1; cpu_do_switch_mm = cpu_v7_hvc_switch_mm; @@ -91,10 +92,6 @@ static void cpu_v7_spectre_init(void) break; case PSCI_CONDUIT_SMC: - arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, - ARM_SMCCC_ARCH_WORKAROUND_1, ); - if ((int)res.a0 != 0) - break; per_cpu(harden_branch_predictor_fn, cpu) = call_smc_arch_workaround_1; cpu_do_switch_mm = cpu_v7_smc_switch_mm; diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 1e43ba5c79b7..400a49aaae85 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -215,40 +215,31 @@ static int detect_harden_bp_fw(void) if (psci_ops.smccc_version == SMCCC_VERSION_1_0) return -1; + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, +ARM_SMCCC_ARCH_WORKAROUND_1, ); + + switch ((int)res.a0) { + case 1: + /* Firmware says we're just fine */ + return 0; + case 0: + break; + default: + return -1; + } + switch (psci_ops.conduit) { case PSCI_CONDUIT_HVC: - arm_smccc_1_1_hvc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, - ARM_SMCCC_ARCH_WORKAROUND_1, ); - switch ((int)res.a0) { - case 1: - /* Firmware says we're just fine */ - return 0; - case 0: - cb = call_hvc_arch_workaround_1; - /* This is a guest, no need to patch KVM vectors */ - smccc_start = NULL; - smccc_end = NULL; - break; - default: - return -1; - } + cb = call_hvc_arch_workaround_1; + /* This is a guest, no need to patch KVM vectors */ + smccc_start = NULL; + smccc_end = NULL; break; case PSCI_CONDUIT_SMC: - arm_smccc_1_1_smc(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, - ARM_SMCCC_ARCH_WORKAROUND_1, ); - switch ((int)res.a0) { - case 1: - /* Firmware says we're just fine */ - return 0; - case 0: - cb = call_smc_arch_workaround_1; - smccc_start = __smccc_workaround_1_smc_start; - smccc_end = __smccc_workaround_1_smc_end; - break; - default: - return -1; - } + cb = call_smc_arch_workaround_1; + smccc_start = __smccc_workaround_1_smc_start; + smccc_end = __smccc_workaround_1_smc_end; break; default: @@ -338,6 +329,7 @@ void __init arm64_enable_wa2_handling(struct alt_instr *alt, void arm64_set_ssbd_mitigation(bool state) { + int conduit; if (!IS_ENABLED(CONFIG_ARM64_SSBD)) { pr_info_once("SSBD disabled by kernel configuration\n"); return; @@ -351,19 +343,10 @@ void arm64_set_ssbd_mitigation(bool state) return; } - switch (psci_ops.conduit) { - case PSCI_CONDUIT_HVC: -
[PATCH 2/9] KVM: arm/arm64: Factor out hypercall handling from PSCI code
From: Christoffer Dall We currently intertwine the KVM PSCI implementation with the general dispatch of hypercall handling, which makes perfect sense because PSCI is the only category of hypercalls we support. However, as we are about to support additional hypercalls, factor out this functionality into a separate hypercall handler file. Signed-off-by: Christoffer Dall [steven.pr...@arm.com: rebased] Signed-off-by: Steven Price --- arch/arm/kvm/Makefile| 2 +- arch/arm/kvm/handle_exit.c | 2 +- arch/arm64/kvm/Makefile | 1 + arch/arm64/kvm/handle_exit.c | 4 +- include/kvm/arm_hypercalls.h | 43 ++ include/kvm/arm_psci.h | 2 +- virt/kvm/arm/hypercalls.c| 59 + virt/kvm/arm/psci.c | 84 +--- 8 files changed, 110 insertions(+), 87 deletions(-) create mode 100644 include/kvm/arm_hypercalls.h create mode 100644 virt/kvm/arm/hypercalls.c diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile index 531e59f5be9c..ef4d01088efc 100644 --- a/arch/arm/kvm/Makefile +++ b/arch/arm/kvm/Makefile @@ -23,7 +23,7 @@ obj-y += kvm-arm.o init.o interrupts.o obj-y += handle_exit.o guest.o emulate.o reset.o obj-y += coproc.o coproc_a15.o coproc_a7.o vgic-v3-coproc.o obj-y += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o -obj-y += $(KVM)/arm/psci.o $(KVM)/arm/perf.o +obj-y += $(KVM)/arm/psci.o $(KVM)/arm/perf.o $(KVM)/arm/hypercalls.o obj-y += $(KVM)/arm/aarch32.o obj-y += $(KVM)/arm/vgic/vgic.o diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c index 2a6a1394d26e..e58a89d2f13f 100644 --- a/arch/arm/kvm/handle_exit.c +++ b/arch/arm/kvm/handle_exit.c @@ -9,7 +9,7 @@ #include #include #include -#include +#include #include #include "trace.h" diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 3ac1a64d2fb9..73dce4d47d47 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -13,6 +13,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp/ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index 706cca23f0d2..aacfc55de44c 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -11,8 +11,6 @@ #include #include -#include - #include #include #include @@ -22,6 +20,8 @@ #include #include +#include + #define CREATE_TRACE_POINTS #include "trace.h" diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h new file mode 100644 index ..35a5abcc4ca3 --- /dev/null +++ b/include/kvm/arm_hypercalls.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2019 Arm Ltd. */ + +#ifndef __KVM_ARM_HYPERCALLS_H +#define __KVM_ARM_HYPERCALLS_H + +#include + +int kvm_hvc_call_handler(struct kvm_vcpu *vcpu); + +static inline u32 smccc_get_function(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 0); +} + +static inline unsigned long smccc_get_arg1(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 1); +} + +static inline unsigned long smccc_get_arg2(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 2); +} + +static inline unsigned long smccc_get_arg3(struct kvm_vcpu *vcpu) +{ + return vcpu_get_reg(vcpu, 3); +} + +static inline void smccc_set_retval(struct kvm_vcpu *vcpu, +unsigned long a0, +unsigned long a1, +unsigned long a2, +unsigned long a3) +{ + vcpu_set_reg(vcpu, 0, a0); + vcpu_set_reg(vcpu, 1, a1); + vcpu_set_reg(vcpu, 2, a2); + vcpu_set_reg(vcpu, 3, a3); +} + +#endif diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h index 632e78bdef4d..5b58bd2fe088 100644 --- a/include/kvm/arm_psci.h +++ b/include/kvm/arm_psci.h @@ -40,7 +40,7 @@ static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm) } -int kvm_hvc_call_handler(struct kvm_vcpu *vcpu); +int kvm_psci_call(struct kvm_vcpu *vcpu); struct kvm_one_reg; diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c new file mode 100644 index ..f875241bd030 --- /dev/null +++ b/virt/kvm/arm/hypercalls.c @@ -0,0 +1,59 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2019 Arm Ltd. + +#include +#include + +#include + +#include +#include + +int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) +{ + u32 func_id = smccc_get_function(vcpu); + u32 val = SMCCC_RET_NOT_SUPPORTED; + u32 feature; + + switch (func_id) { +
[PATCH 5/9] KVM: Allow kvm_device_ops to be const
Currently a kvm_device_ops structure cannot be const without triggering compiler warnings. However the structure doesn't need to be written to and, by marking it const, it can be read-only in memory. Add some more const keywords to allow this. Signed-off-by: Steven Price --- include/linux/kvm_host.h | 4 ++-- virt/kvm/kvm_main.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5c5b5867024c..be31a6f8351a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1236,7 +1236,7 @@ extern unsigned int halt_poll_ns_grow_start; extern unsigned int halt_poll_ns_shrink; struct kvm_device { - struct kvm_device_ops *ops; + const struct kvm_device_ops *ops; struct kvm *kvm; void *private; struct list_head vm_node; @@ -1289,7 +1289,7 @@ struct kvm_device_ops { void kvm_device_get(struct kvm_device *dev); void kvm_device_put(struct kvm_device *dev); struct kvm_device *kvm_device_from_filp(struct file *filp); -int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type); +int kvm_register_device_ops(const struct kvm_device_ops *ops, u32 type); void kvm_unregister_device_ops(u32 type); extern struct kvm_device_ops kvm_mpic_ops; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 887f3b0c2b60..8c12110ec87a 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3035,14 +3035,14 @@ struct kvm_device *kvm_device_from_filp(struct file *filp) return filp->private_data; } -static struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { +static const struct kvm_device_ops *kvm_device_ops_table[KVM_DEV_TYPE_MAX] = { #ifdef CONFIG_KVM_MPIC [KVM_DEV_TYPE_FSL_MPIC_20] = _mpic_ops, [KVM_DEV_TYPE_FSL_MPIC_42] = _mpic_ops, #endif }; -int kvm_register_device_ops(struct kvm_device_ops *ops, u32 type) +int kvm_register_device_ops(const struct kvm_device_ops *ops, u32 type) { if (type >= ARRAY_SIZE(kvm_device_ops_table)) return -ENOSPC; @@ -3063,7 +3063,7 @@ void kvm_unregister_device_ops(u32 type) static int kvm_ioctl_create_device(struct kvm *kvm, struct kvm_create_device *cd) { - struct kvm_device_ops *ops = NULL; + const struct kvm_device_ops *ops = NULL; struct kvm_device *dev; bool test = cd->flags & KVM_CREATE_DEVICE_TEST; int type; -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 4/9] KVM: arm64: Support stolen time reporting via shared structure
Implement the service call for configuring a shared structre between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. The hypervisor allocates memory which is placed at an IPA chosen by user space. The hypervisor then uses WRITE_ONCE() to update the shared structre ensuring single copy atomicity of the 64-bit unsigned value that reports stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: Steven Price --- arch/arm64/include/asm/kvm_host.h | 13 +- arch/arm64/kvm/Kconfig| 1 + include/kvm/arm_hypercalls.h | 1 + include/linux/kvm_types.h | 2 + virt/kvm/arm/arm.c| 18 virt/kvm/arm/hypercalls.c | 70 +++ 6 files changed, 104 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f656169db8c3..78f270190d43 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -44,6 +44,7 @@ KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_IRQ_PENDINGKVM_ARCH_REQ(1) #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(2) +#define KVM_REQ_RECORD_STEAL KVM_ARCH_REQ(3) DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); @@ -83,6 +84,11 @@ struct kvm_arch { /* Mandated version of PSCI */ u32 psci_version; + + struct kvm_arch_pvtime { + void *st; + gpa_t st_base; + } pvtime; }; #define KVM_NR_MEM_OBJS 40 @@ -338,8 +344,13 @@ struct kvm_vcpu_arch { /* True when deferrable sysregs are loaded on the physical CPU, * see kvm_vcpu_load_sysregs and kvm_vcpu_put_sysregs. */ bool sysregs_loaded_on_cpu; -}; + /* Guest PV state */ + struct { + u64 steal; + u64 last_steal; + } steal; +}; /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \ sve_ffr_offset((vcpu)->arch.sve_max_vl))) diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index a67121d419a2..d8b88e40d223 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -39,6 +39,7 @@ config KVM select IRQ_BYPASS_MANAGER select HAVE_KVM_IRQ_BYPASS select HAVE_KVM_VCPU_RUN_PID_CHANGE + select SCHEDSTATS ---help--- Support hosting virtualized guest machines. We don't support KVM with 16K page tables yet, due to the multiple diff --git a/include/kvm/arm_hypercalls.h b/include/kvm/arm_hypercalls.h index 35a5abcc4ca3..9f0710ab4292 100644 --- a/include/kvm/arm_hypercalls.h +++ b/include/kvm/arm_hypercalls.h @@ -7,6 +7,7 @@ #include int kvm_hvc_call_handler(struct kvm_vcpu *vcpu); +int kvm_update_stolen_time(struct kvm_vcpu *vcpu); static inline u32 smccc_get_function(struct kvm_vcpu *vcpu) { diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index bde5374ae021..1c88e69db3d9 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -35,6 +35,8 @@ typedef unsigned long gva_t; typedef u64gpa_t; typedef u64gfn_t; +#define GPA_INVALID(~(gpa_t)0) + typedef unsigned long hva_t; typedef u64hpa_t; typedef u64hfn_t; diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index f645c0fbf7ec..ebd963d2580b 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -40,6 +40,10 @@ #include #include +#include +#include +#include + #ifdef REQUIRES_VIRT __asm__(".arch_extension virt"); #endif @@ -135,6 +139,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm->arch.max_vcpus = vgic_present ? kvm_vgic_get_max_vcpus() : KVM_MAX_VCPUS; + kvm->arch.pvtime.st_base = GPA_INVALID; return ret; out_free_stage2_pgd: kvm_free_stage2_pgd(kvm); @@ -371,6 +376,7 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) kvm_vcpu_load_sysregs(vcpu); kvm_arch_vcpu_load_fp(vcpu); kvm_vcpu_pmu_restore_guest(vcpu); + kvm_make_request(KVM_REQ_RECORD_STEAL, vcpu); if (single_task_running()) vcpu_clear_wfe_traps(vcpu); @@ -617,6 +623,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu) smp_rmb(); } +static void vcpu_req_record_steal(struct kvm_vcpu *vcpu) +{ + int idx; + + idx = srcu_read_lock(>kvm->srcu); + kvm_update_stolen_time(vcpu); + srcu_read_unlock(>kvm->srcu, idx); +} + static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) {
[PATCH 3/9] KVM: arm64: Implement PV_FEATURES call
This provides a mechanism for querying which paravirtualized features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized clock features we're about to add. Signed-off-by: Steven Price --- arch/arm64/include/asm/pvclock-abi.h | 20 include/linux/arm-smccc.h| 14 ++ virt/kvm/arm/hypercalls.c| 9 + 3 files changed, 43 insertions(+) create mode 100644 arch/arm64/include/asm/pvclock-abi.h diff --git a/arch/arm64/include/asm/pvclock-abi.h b/arch/arm64/include/asm/pvclock-abi.h new file mode 100644 index ..1f7cdc102691 --- /dev/null +++ b/arch/arm64/include/asm/pvclock-abi.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2019 Arm Ltd. */ + +#ifndef __ASM_PVCLOCK_ABI_H +#define __ASM_PVCLOCK_ABI_H + +/* The below structures and constants are defined in ARM DEN0057A */ + +struct pvclock_vcpu_stolen_time_info { + __le32 revision; + __le32 attributes; + __le64 stolen_time; + /* Structure must be 64 byte aligned, pad to that size */ + u8 padding[48]; +} __packed; + +#define PV_VM_TIME_NOT_SUPPORTED -1 +#define PV_VM_TIME_INVALID_PARAMETERS -2 + +#endif diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 080012a6f025..e7f129f26ebd 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -45,6 +45,7 @@ #define ARM_SMCCC_OWNER_SIP2 #define ARM_SMCCC_OWNER_OEM3 #define ARM_SMCCC_OWNER_STANDARD 4 +#define ARM_SMCCC_OWNER_STANDARD_HYP 5 #define ARM_SMCCC_OWNER_TRUSTED_APP48 #define ARM_SMCCC_OWNER_TRUSTED_APP_END49 #define ARM_SMCCC_OWNER_TRUSTED_OS 50 @@ -302,5 +303,18 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, #define SMCCC_RET_NOT_SUPPORTED-1 #define SMCCC_RET_NOT_REQUIRED -2 +/* Paravirtualised time calls (defined by ARM DEN0057A) */ +#define ARM_SMCCC_HV_PV_FEATURES \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_64,\ + ARM_SMCCC_OWNER_STANDARD_HYP,\ + 0x20) + +#define ARM_SMCCC_HV_PV_TIME_ST\ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_64,\ + ARM_SMCCC_OWNER_STANDARD_HYP,\ + 0x22) + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/ diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c index f875241bd030..2906b2df99df 100644 --- a/virt/kvm/arm/hypercalls.c +++ b/virt/kvm/arm/hypercalls.c @@ -5,6 +5,7 @@ #include #include +#include #include #include @@ -48,6 +49,14 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) break; } break; + case ARM_SMCCC_HV_PV_FEATURES: + val = SMCCC_RET_SUCCESS; + break; + } + break; + case ARM_SMCCC_HV_PV_FEATURES: + feature = smccc_get_arg1(vcpu); + switch (feature) { } break; default: -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 0/9] arm64: Stolen time support
This series add support for paravirtualized time for arm64 guests and KVM hosts following the specification in Arm's document DEN 0057A: https://developer.arm.com/docs/den0057/a It implements support for stolen time, allowing the guest to identify time when it is forcibly not executing. It doesn't implement support for Live Physical Time (LPT) as there are some concerns about the overheads and approach in the above specification, and I expect an updated version of the specification to be released soon with just the stolen time parts. I previously posted a series including LPT (as well as stolen time): https://lore.kernel.org/kvmarm/20181212150226.38051-1-steven.pr...@arm.com/ Patches 2, 5, 7 and 8 are cleanup patches and could be taken separately. Christoffer Dall (1): KVM: arm/arm64: Factor out hypercall handling from PSCI code Steven Price (8): KVM: arm64: Document PV-time interface KVM: arm64: Implement PV_FEATURES call KVM: arm64: Support stolen time reporting via shared structure KVM: Allow kvm_device_ops to be const KVM: arm64: Provide a PV_TIME device to user space arm/arm64: Provide a wrapper for SMCCC 1.1 calls arm/arm64: Make use of the SMCCC 1.1 wrapper arm64: Retrieve stolen time as paravirtualized guest Documentation/virtual/kvm/arm/pvtime.txt | 107 + arch/arm/kvm/Makefile| 2 +- arch/arm/kvm/handle_exit.c | 2 +- arch/arm/mm/proc-v7-bugs.c | 13 +- arch/arm64/include/asm/kvm_host.h| 13 +- arch/arm64/include/asm/kvm_mmu.h | 2 + arch/arm64/include/asm/pvclock-abi.h | 20 +++ arch/arm64/include/uapi/asm/kvm.h| 6 + arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/cpu_errata.c | 80 -- arch/arm64/kernel/kvm.c | 155 ++ arch/arm64/kvm/Kconfig | 1 + arch/arm64/kvm/Makefile | 2 + arch/arm64/kvm/handle_exit.c | 4 +- include/kvm/arm_hypercalls.h | 44 ++ include/kvm/arm_psci.h | 2 +- include/linux/arm-smccc.h| 58 +++ include/linux/cpuhotplug.h | 1 + include/linux/kvm_host.h | 4 +- include/linux/kvm_types.h| 2 + include/uapi/linux/kvm.h | 2 + virt/kvm/arm/arm.c | 18 +++ virt/kvm/arm/hypercalls.c| 138 virt/kvm/arm/mmu.c | 44 ++ virt/kvm/arm/psci.c | 84 +- virt/kvm/arm/pvtime.c| 190 +++ virt/kvm/kvm_main.c | 6 +- 27 files changed, 848 insertions(+), 153 deletions(-) create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt create mode 100644 arch/arm64/include/asm/pvclock-abi.h create mode 100644 arch/arm64/kernel/kvm.c create mode 100644 include/kvm/arm_hypercalls.h create mode 100644 virt/kvm/arm/hypercalls.c create mode 100644 virt/kvm/arm/pvtime.c -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/9] KVM: arm64: Document PV-time interface
Introduce a paravirtualization interface for KVM/arm64 based on the "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A. This only adds the details about "Stolen Time" as the details of "Live Physical Time" have not been fully agreed. User space can specify a reserved area of memory for the guest and inform KVM to populate the memory with information on time that the host kernel has stolen from the guest. A hypercall interface is provided for the guest to interrogate the hypervisor's support for this interface and the location of the shared memory structures. Signed-off-by: Steven Price --- Documentation/virtual/kvm/arm/pvtime.txt | 107 +++ 1 file changed, 107 insertions(+) create mode 100644 Documentation/virtual/kvm/arm/pvtime.txt diff --git a/Documentation/virtual/kvm/arm/pvtime.txt b/Documentation/virtual/kvm/arm/pvtime.txt new file mode 100644 index ..e6ae9799e1d5 --- /dev/null +++ b/Documentation/virtual/kvm/arm/pvtime.txt @@ -0,0 +1,107 @@ +Paravirtualized time support for arm64 +== + +Arm specification DEN0057/A defined a standard for paravirtualised time +support for Aarch64 guests: + +https://developer.arm.com/docs/den0057/a + +KVM/Arm64 implements the stolen time part of this specification by providing +some hypervisor service calls to support a paravirtualized guest obtaining a +view of the amount of time stolen from its execution. + +Two new SMCCC compatible hypercalls are defined: + +PV_FEATURES 0xC520 +PV_TIME_ST 0xC522 + +These are only available in the SMC64/HVC64 calling convention as +paravirtualized time is not available to 32 bit Arm guests. + +PV_FEATURES +Function ID: (uint32) : 0xC520 +PV_func_id: (uint32) : Either PV_TIME_LPT or PV_TIME_ST +Return value: (int32) : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant + PV-time feature is supported by the hypervisor. + +PV_TIME_ST +Function ID: (uint32) : 0xC522 +Return value: (int64) : IPA of the stolen time data structure for this + (V)CPU. On failure: + NOT_SUPPORTED (-1) + +Stolen Time +--- + +The structure pointed to by the PV_TIME_ST hypercall is as follows: + + Field | Byte Length | Byte Offset | Description + --- | --- | --- | -- + Revision| 4 | 0 | Must be 0 for version 0.1 + Attributes | 4 | 4 | Must be 0 + Stolen time | 8 | 8 | Stolen time in unsigned + | | | nanoseconds indicating how + | | | much time this VCPU thread + | | | was involuntarily not + | | | running on a physical CPU. + +The structure will be updated by the hypervisor periodically as time is stolen +from the VCPU. It will be present within a reserved region of the normal +memory given to the guest. The guest should not attempt to write into this +memory. There is a structure by VCPU of the guest. + +User space interface + + +User space can request that KVM provide the paravirtualized time interface to +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example: + +struct kvm_create_device pvtime_device = { +.type = KVM_DEV_TYPE_ARM_PV_TIME, +.attr = 0, +.flags = 0, +}; + +pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, _device); + +The guest IPA of the structures must be given to KVM. This is the base address +of an array of stolen time structures (one for each VCPU). For example: + +struct kvm_device_attr st_base = { +.group = KVM_DEV_ARM_PV_TIME_PADDR, +.attr = KVM_DEV_ARM_PV_TIME_ST, +.addr = (u64)(unsigned long)_paddr +}; + +ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, _base); + +For migration (or save/restore) of a guest it is necessary to save the contents +of the shared page(s) and later restore them. KVM_DEV_ARM_PV_TIME_STATE_SIZE +provides the size of this data and KVM_DEV_ARM_PV_TIME_STATE allows the state +to be read/written. + +It is also necessary for the physical address to be set identically when +restoring. + +void *save_state(int fd, u64 attr, u32 *size) { +struct kvm_device_attr get_size = { +.group = KVM_DEV_ARM_PV_TIME_STATE_SIZE, +.attr = attr, +.addr = (u64)(unsigned long)size +}; + +ioctl(fd, KVM_GET_DEVICE_ATTR, get_size); + +void *buffer = malloc(*size); + +struct kvm_device_attr get_state = { +.group = KVM_DEV_ARM_PV_TIME_STATE, +.attr = attr, +.addr = (u64)(unsigned long)size +}; + +ioctl(fd, KVM_GET_DEVICE_ATTR, buffer); +} + +void *st_state =
Re: [PATCH] arm64/kvm: fix -Wimplicit-fallthrough warnings
On 02/08/2019 15:23, Qian Cai wrote: > The commit a892819560c4 ("KVM: arm64: Prepare to handle deferred > save/restore of 32-bit registers") introduced vcpu_write_spsr32() but > seems forgot to add "break" between the switch statements and generates > compilation warnings below. Also, adding a default statement as in > vcpu_read_spsr32(). See https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git/commit/?id=3d584a3c85d6fe2cf878f220d4ad7145e7f89218 The default statement is pretty pointless by construction. Thanks, M. -- Jazz is not dead, it just smells funny... ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] arm64/kvm: fix -Wimplicit-fallthrough warnings
The commit a892819560c4 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers") introduced vcpu_write_spsr32() but seems forgot to add "break" between the switch statements and generates compilation warnings below. Also, adding a default statement as in vcpu_read_spsr32(). In file included from ./arch/arm64/include/asm/kvm_emulate.h:19, from arch/arm64/kvm/regmap.c:13: arch/arm64/kvm/regmap.c: In function 'vcpu_write_spsr32': ./arch/arm64/include/asm/kvm_hyp.h:31:3: warning: this statement may fall through [-Wimplicit-fallthrough=] asm volatile(ALTERNATIVE(__msr_s(r##nvh, "%x0"), \ ^~~ ./arch/arm64/include/asm/kvm_hyp.h:46:31: note: in expansion of macro 'write_sysreg_elx' #define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12) ^~~~ arch/arm64/kvm/regmap.c:180:3: note: in expansion of macro 'write_sysreg_el1' write_sysreg_el1(v, SYS_SPSR); ^~~~ arch/arm64/kvm/regmap.c:181:2: note: here case KVM_SPSR_ABT: ^~~~ In file included from ./arch/arm64/include/asm/cputype.h:132, from ./arch/arm64/include/asm/cache.h:8, from ./include/linux/cache.h:6, from ./include/linux/printk.h:9, from ./include/linux/kernel.h:15, from ./include/asm-generic/bug.h:18, from ./arch/arm64/include/asm/bug.h:26, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:9, from arch/arm64/kvm/regmap.c:11: ./arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall through [-Wimplicit-fallthrough=] asm volatile("msr " __stringify(r) ", %x0" \ ^~~ arch/arm64/kvm/regmap.c:182:3: note: in expansion of macro 'write_sysreg' write_sysreg(v, spsr_abt); ^~~~ arch/arm64/kvm/regmap.c:183:2: note: here case KVM_SPSR_UND: ^~~~ In file included from ./arch/arm64/include/asm/cputype.h:132, from ./arch/arm64/include/asm/cache.h:8, from ./include/linux/cache.h:6, from ./include/linux/printk.h:9, from ./include/linux/kernel.h:15, from ./include/asm-generic/bug.h:18, from ./arch/arm64/include/asm/bug.h:26, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:9, from arch/arm64/kvm/regmap.c:11: ./arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall through [-Wimplicit-fallthrough=] asm volatile("msr " __stringify(r) ", %x0" \ ^~~ arch/arm64/kvm/regmap.c:184:3: note: in expansion of macro 'write_sysreg' write_sysreg(v, spsr_und); ^~~~ arch/arm64/kvm/regmap.c:185:2: note: here case KVM_SPSR_IRQ: ^~~~ In file included from ./arch/arm64/include/asm/cputype.h:132, from ./arch/arm64/include/asm/cache.h:8, from ./include/linux/cache.h:6, from ./include/linux/printk.h:9, from ./include/linux/kernel.h:15, from ./include/asm-generic/bug.h:18, from ./arch/arm64/include/asm/bug.h:26, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:9, from arch/arm64/kvm/regmap.c:11: ./arch/arm64/include/asm/sysreg.h:837:2: warning: this statement may fall through [-Wimplicit-fallthrough=] asm volatile("msr " __stringify(r) ", %x0" \ ^~~ arch/arm64/kvm/regmap.c:186:3: note: in expansion of macro 'write_sysreg' write_sysreg(v, spsr_irq); ^~~~ arch/arm64/kvm/regmap.c:187:2: note: here case KVM_SPSR_FIQ: ^~~~ Fixes: a892819560c4 ("KVM: arm64: Prepare to handle deferred save/restore of 32-bit registers") Signed-off-by: Qian Cai --- arch/arm64/kvm/regmap.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/arm64/kvm/regmap.c b/arch/arm64/kvm/regmap.c index 0d60e4f0af66..c94e9bc3e8eb 100644 --- a/arch/arm64/kvm/regmap.c +++ b/arch/arm64/kvm/regmap.c @@ -178,13 +178,20 @@ void vcpu_write_spsr32(struct kvm_vcpu *vcpu, unsigned long v) switch (spsr_idx) { case KVM_SPSR_SVC: write_sysreg_el1(v, SYS_SPSR); + break; case KVM_SPSR_ABT: write_sysreg(v, spsr_abt); + break; case KVM_SPSR_UND: write_sysreg(v, spsr_und); + break; case KVM_SPSR_IRQ: write_sysreg(v, spsr_irq); + break; case KVM_SPSR_FIQ: write_sysreg(v, spsr_fiq); + break; + default: + BUG(); } } -- 1.8.3.1
Re: kvm-unit-tests: psci_cpu_on_test FAILed
On Fri, Aug 02, 2019 at 06:56:51PM +0800, Zenghui Yu wrote: > Hi folks, > > Running kvm-unit-tests with Linux 5.3.0-rc2 on Kunpeng 920, we will get > the following fail info: > > [...] > FAIL psci (4 tests, 1 unexpected failures) > [...] > and > [...] > INFO: unexpected cpu_on return value: caller=CPU9, ret=-2 > FAIL: cpu-on > SUMMARY: 4 tests, 1 unexpected failures > > > I think this is an issue had been fixed once by commit 6c7a5dce22b3 > ("KVM: arm/arm64: fix races in kvm_psci_vcpu_on"), which makes use of > kvm->lock mutex to fix the race between two PSCI_CPU_ON calls - one > does reset on the MPIDR register whilst another reads it. > > But commit 358b28f09f0 ("arm/arm64: KVM: Allow a VCPU to fully reset > itself") later moves the reset work into check_vcpu_requests(), by > making a KVM_REQ_VCPU_RESET request in PSCI code. Thus the reset work > has not been protected by kvm->lock mutex anymore, and the race shows up > again... > > Do we need a fix for this issue? At least achieve a mutex execution > between the reset of MPIDR and kvm_mpidr_to_vcpu()? > > I noticed this too, but I put it pretty low on my TODO because it's a safe failure (no host crash, just an unexpected PSCI_RET_INVALID_PARAMS gets returned because the valid MPIDR doesn't look valid for a moment.) Also, the test is quite pathological, especially when the host has many CPUs, so I wouldn't expect this to show up on a sane guest. I agree it would be nice to get it fixed eventually though. Thanks, drew ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
kvm-unit-tests: psci_cpu_on_test FAILed
Hi folks, Running kvm-unit-tests with Linux 5.3.0-rc2 on Kunpeng 920, we will get the following fail info: [...] FAIL psci (4 tests, 1 unexpected failures) [...] and [...] INFO: unexpected cpu_on return value: caller=CPU9, ret=-2 FAIL: cpu-on SUMMARY: 4 tests, 1 unexpected failures I think this is an issue had been fixed once by commit 6c7a5dce22b3 ("KVM: arm/arm64: fix races in kvm_psci_vcpu_on"), which makes use of kvm->lock mutex to fix the race between two PSCI_CPU_ON calls - one does reset on the MPIDR register whilst another reads it. But commit 358b28f09f0 ("arm/arm64: KVM: Allow a VCPU to fully reset itself") later moves the reset work into check_vcpu_requests(), by making a KVM_REQ_VCPU_RESET request in PSCI code. Thus the reset work has not been protected by kvm->lock mutex anymore, and the race shows up again... Do we need a fix for this issue? At least achieve a mutex execution between the reset of MPIDR and kvm_mpidr_to_vcpu()? Thanks, zenghui ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 2/2] KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence
On 02/08/19 12:37, Marc Zyngier wrote: > When a vpcu is about to block by calling kvm_vcpu_block, we call > back into the arch code to allow any form of synchronization that > may be required at this point (SVN stops the AVIC, ARM synchronises > the VMCR and enables GICv4 doorbells). But this synchronization > comes in quite late, as we've potentially waited for halt_poll_ns > to expire. > > Instead, let's move kvm_arch_vcpu_blocking() to the beginning of > kvm_vcpu_block(), which on ARM has several benefits: > > - VMCR gets synchronised early, meaning that any interrupt delivered > during the polling window will be evaluated with the correct guest > PMR > - GICv4 doorbells are enabled, which means that any guest interrupt > directly injected during that window will be immediately recognised > > Tang Nianyao ran some tests on a GICv4 machine to evaluate such > change, and reported up to a 10% improvement for netperf: > > > netperf result: > D06 as server, intel 8180 server as client > with change: > package 512 bytes - 5500 Mbits/s > package 64 bytes - 760 Mbits/s > without change: > package 512 bytes - 5000 Mbits/s > package 64 bytes - 710 Mbits/s > > > Signed-off-by: Marc Zyngier > --- > virt/kvm/kvm_main.c | 7 +++ > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 887f3b0c2b60..90d429c703cb 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -2322,6 +2322,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > bool waited = false; > u64 block_ns; > > + kvm_arch_vcpu_blocking(vcpu); > + > start = cur = ktime_get(); > if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) { > ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns); > @@ -2342,8 +2344,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > } while (single_task_running() && ktime_before(cur, stop)); > } > > - kvm_arch_vcpu_blocking(vcpu); > - > for (;;) { > prepare_to_swait_exclusive(>wq, , > TASK_INTERRUPTIBLE); > > @@ -2356,9 +2356,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) > > finish_swait(>wq, ); > cur = ktime_get(); > - > - kvm_arch_vcpu_unblocking(vcpu); > out: > + kvm_arch_vcpu_unblocking(vcpu); > block_ns = ktime_to_ns(cur) - ktime_to_ns(start); > > if (!vcpu_valid_wakeup(vcpu)) > Acked-by: Paolo Bonzini ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/2] KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
Since commit commit 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: sta...@vger.kernel.org # 4.12 Fixes: 328e56647944 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: Marc Zyngier --- include/kvm/arm_vgic.h | 1 + virt/kvm/arm/arm.c | 11 +++ virt/kvm/arm/vgic/vgic-v2.c | 9 - virt/kvm/arm/vgic/vgic-v3.c | 7 ++- virt/kvm/arm/vgic/vgic.c| 11 +++ virt/kvm/arm/vgic/vgic.h| 2 ++ 6 files changed, 39 insertions(+), 2 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 46bbc949c20a..7a30524a80ee 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -350,6 +350,7 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu); void kvm_vgic_load(struct kvm_vcpu *vcpu); void kvm_vgic_put(struct kvm_vcpu *vcpu); +void kvm_vgic_vmcr_sync(struct kvm_vcpu *vcpu); #define irqchip_in_kernel(k) (!!((k)->arch.vgic.in_kernel)) #define vgic_initialized(k)((k)->arch.vgic.initialized) diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index acc43242a310..d9a650bfaf22 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -323,6 +323,17 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) { + /* +* If we're about to block (most likely because we've just hit a +* WFI), we need to sync back the state of the GIC CPU interface +* so that we have the lastest PMR and group enables. This ensures +* that kvm_arch_vcpu_runnable has up-to-date data to decide +* whether we have pending interrupts. +*/ + preempt_disable(); + kvm_vgic_vmcr_sync(vcpu); + preempt_enable(); + kvm_vgic_v4_enable_doorbell(vcpu); } diff --git a/virt/kvm/arm/vgic/vgic-v2.c b/virt/kvm/arm/vgic/vgic-v2.c index 6dd5ad706c92..96aab77d0471 100644 --- a/virt/kvm/arm/vgic/vgic-v2.c +++ b/virt/kvm/arm/vgic/vgic-v2.c @@ -484,10 +484,17 @@ void vgic_v2_load(struct kvm_vcpu *vcpu) kvm_vgic_global_state.vctrl_base + GICH_APR); } -void vgic_v2_put(struct kvm_vcpu *vcpu) +void vgic_v2_vmcr_sync(struct kvm_vcpu *vcpu) { struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2; cpu_if->vgic_vmcr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VMCR); +} + +void vgic_v2_put(struct kvm_vcpu *vcpu) +{ + struct vgic_v2_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v2; + + vgic_v2_vmcr_sync(vcpu); cpu_if->vgic_apr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_APR); } diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index c2c9ce009f63..0c653a1e5215 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -662,12 +662,17 @@ void vgic_v3_load(struct kvm_vcpu *vcpu) __vgic_v3_activate_traps(vcpu); } -void vgic_v3_put(struct kvm_vcpu *vcpu) +void vgic_v3_vmcr_sync(struct kvm_vcpu *vcpu) { struct vgic_v3_cpu_if *cpu_if = >arch.vgic_cpu.vgic_v3; if (likely(cpu_if->vgic_sre)) cpu_if->vgic_vmcr = kvm_call_hyp_ret(__vgic_v3_read_vmcr); +} + +void vgic_v3_put(struct kvm_vcpu *vcpu) +{ + vgic_v3_vmcr_sync(vcpu); kvm_call_hyp(__vgic_v3_save_aprs, vcpu); diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c index 04786c8ec77e..13d4b38a94ec 100644 --- a/virt/kvm/arm/vgic/vgic.c +++
[PATCH 0/2] KVM: arm/arm64: Fix guest's PMR synchronization when blocking on WFI
It recently came to light that if we run a guest that actively uses interrupt priorities to block interrupts, vcpus can end-up being blocked while they shouldn't, leading to an unresponsive guest (a slightly less than desirable outcome). Patch #1 fixes the issue (which has been with us since 4.12), which I plan to take in for 5.3 with immediate backport to stable. Patch #2 is more of an RFC, as it also impacts the SVN AVIC support. It moves the kvm_arch_vcpu_blocking callback to happen earlier, leading to much better performances on ARM, and leading to the above fix to be applied at the best possible spot. I'd welcome any comment/testing on this, specially on non-ARM systems. Marc Zyngier (2): KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence include/kvm/arm_vgic.h | 1 + virt/kvm/arm/arm.c | 11 +++ virt/kvm/arm/vgic/vgic-v2.c | 9 - virt/kvm/arm/vgic/vgic-v3.c | 7 ++- virt/kvm/arm/vgic/vgic.c| 11 +++ virt/kvm/arm/vgic/vgic.h| 2 ++ virt/kvm/kvm_main.c | 7 +++ 7 files changed, 42 insertions(+), 6 deletions(-) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/2] KVM: Call kvm_arch_vcpu_blocking early into the blocking sequence
When a vpcu is about to block by calling kvm_vcpu_block, we call back into the arch code to allow any form of synchronization that may be required at this point (SVN stops the AVIC, ARM synchronises the VMCR and enables GICv4 doorbells). But this synchronization comes in quite late, as we've potentially waited for halt_poll_ns to expire. Instead, let's move kvm_arch_vcpu_blocking() to the beginning of kvm_vcpu_block(), which on ARM has several benefits: - VMCR gets synchronised early, meaning that any interrupt delivered during the polling window will be evaluated with the correct guest PMR - GICv4 doorbells are enabled, which means that any guest interrupt directly injected during that window will be immediately recognised Tang Nianyao ran some tests on a GICv4 machine to evaluate such change, and reported up to a 10% improvement for netperf: netperf result: D06 as server, intel 8180 server as client with change: package 512 bytes - 5500 Mbits/s package 64 bytes - 760 Mbits/s without change: package 512 bytes - 5000 Mbits/s package 64 bytes - 710 Mbits/s Signed-off-by: Marc Zyngier --- virt/kvm/kvm_main.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 887f3b0c2b60..90d429c703cb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2322,6 +2322,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) bool waited = false; u64 block_ns; + kvm_arch_vcpu_blocking(vcpu); + start = cur = ktime_get(); if (vcpu->halt_poll_ns && !kvm_arch_no_poll(vcpu)) { ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns); @@ -2342,8 +2344,6 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) } while (single_task_running() && ktime_before(cur, stop)); } - kvm_arch_vcpu_blocking(vcpu); - for (;;) { prepare_to_swait_exclusive(>wq, , TASK_INTERRUPTIBLE); @@ -2356,9 +2356,8 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) finish_swait(>wq, ); cur = ktime_get(); - - kvm_arch_vcpu_unblocking(vcpu); out: + kvm_arch_vcpu_unblocking(vcpu); block_ns = ktime_to_ns(cur) - ktime_to_ns(start); if (!vcpu_valid_wakeup(vcpu)) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
On Fri, Aug 02, 2019 at 10:11:38AM +, Alexandru Elisei wrote: > These are the changes that I made to kvm-unit-tests (the diff can be applied > on > top of upstream master, 2130fd4154ad ("tscdeadline_latency: Check condition > first before loop")): It's great to hear that you're doing this. You may find these bit-rotted commits useful too https://github.com/rhdrjones/kvm-unit-tests/commits/arm64/hyp-mode Thanks, drew > > diff --git a/arm/cstart64.S b/arm/cstart64.S > index b0e8baa1a23a..a7631b5a1801 100644 > --- a/arm/cstart64.S > +++ b/arm/cstart64.S > @@ -51,6 +51,17 @@ start: > b 1b > > 1: > + mrs x4, CurrentEL > + cmp x4, CurrentEL_EL2 > + b.ne1f > + mrs x4, mpidr_el1 > + msr vmpidr_el2, x4 > + mrs x4, midr_el1 > + msr vpidr_el2, x4 > + ldr x4, =(HCR_EL2_TGE | HCR_EL2_E2H) > + msr hcr_el2, x4 > + isb > +1: > /* set up stack */ > mov x4, #1 > msr spsel, x4 > @@ -101,6 +112,17 @@ get_mmu_off: > > .globl secondary_entry > secondary_entry: > + mrs x0, CurrentEL > + cmp x0, CurrentEL_EL2 > + b.ne1f > + mrs x0, mpidr_el1 > + msr vmpidr_el2, x0 > + mrs x0, midr_el1 > + msr vpidr_el2, x0 > + ldr x0, =(HCR_EL2_TGE | HCR_EL2_E2H) > + msr hcr_el2, x0 > + isb > +1: > /* Enable FP/ASIMD */ > mov x0, #(3 << 20) > msr cpacr_el1, x0 > diff --git a/lib/arm/asm/psci.h b/lib/arm/asm/psci.h > index 7b956bf5987d..07297a27e0ce 100644 > --- a/lib/arm/asm/psci.h > +++ b/lib/arm/asm/psci.h > @@ -3,6 +3,15 @@ > #include > #include > > +enum psci_conduit { > + PSCI_CONDUIT_HVC, > + PSCI_CONDUIT_SMC, > +}; > + > +extern void psci_init(void); > +extern void psci_set_conduit(enum psci_conduit conduit); > +extern enum psci_conduit psci_get_conduit(void); > + > extern int psci_invoke(unsigned long function_id, unsigned long arg0, >unsigned long arg1, unsigned long arg2); > extern int psci_cpu_on(unsigned long cpuid, unsigned long entry_point); > diff --git a/lib/arm/psci.c b/lib/arm/psci.c > index c3d399064ae3..20ad4b944738 100644 > --- a/lib/arm/psci.c > +++ b/lib/arm/psci.c > @@ -6,13 +6,14 @@ > * > * This work is licensed under the terms of the GNU LGPL, version 2. > */ > +#include > +#include > #include > #include > #include > #include > > -__attribute__((noinline)) > -int psci_invoke(unsigned long function_id, unsigned long arg0, > +static int psci_invoke_hvc(unsigned long function_id, unsigned long arg0, > unsigned long arg1, unsigned long arg2) > { > asm volatile( > @@ -22,6 +23,63 @@ int psci_invoke(unsigned long function_id, unsigned long > arg0, > return function_id; > } > > +static int psci_invoke_smc(unsigned long function_id, unsigned long arg0, > + unsigned long arg1, unsigned long arg2) > +{ > + asm volatile( > + "smc #0" > + : "+r" (function_id) > + : "r" (arg0), "r" (arg1), "r" (arg2)); > + return function_id; > +} > + > +/* > + * Initialize to something sensible, so the exit fallback psci_system_off > still > + * works before calling psci_init when booted at EL1. > + */ > +static enum psci_conduit psci_conduit = PSCI_CONDUIT_HVC; > +static int (*psci_fn)(unsigned long, unsigned long, unsigned long, > + unsigned long) = _invoke_hvc; > + > +void psci_set_conduit(enum psci_conduit conduit) > +{ > + psci_conduit = conduit; > + if (conduit == PSCI_CONDUIT_HVC) > + psci_fn = _invoke_hvc; > + else > + psci_fn = _invoke_smc; > +} > + > +enum psci_conduit psci_get_conduit(void) > +{ > + return psci_conduit; > +} > + > +int psci_invoke(unsigned long function_id, unsigned long arg0, > + unsigned long arg1, unsigned long arg2) > +{ > + return psci_fn(function_id, arg0, arg1, arg2); > +} > + > +void psci_init(void) > +{ > + const char *conduit; > + int ret; > + > + ret = dt_get_psci_conduit(); > + assert(ret == 0 || ret == -FDT_ERR_NOTFOUND); > + > + if (ret == -FDT_ERR_NOTFOUND) > + conduit = "hvc"; > + > + assert(strcmp(conduit, "hvc") == 0 || strcmp(conduit, "smc") == 0); > + > + if (strcmp(conduit, "hvc") == 0) > + psci_set_conduit(PSCI_CONDUIT_HVC); > + else > + psci_set_conduit(PSCI_CONDUIT_SMC); > +} > + > int psci_cpu_on(unsigned long cpuid, unsigned long entry_point) > { > #ifdef __arm__ > diff --git a/lib/arm/setup.c b/lib/arm/setup.c > index 4f02fca85607..e0dc9e4801b0 100644 > --- a/lib/arm/setup.c > +++ b/lib/arm/setup.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "io.h" > > @@ -164,7 +165,11 @@ void setup(const void *fdt) > freemem += initrd_size; >
Re: [PATCH 00/59] KVM: arm64: ARMv8.3 Nested Virtualization support
Hi, On 6/21/19 10:37 AM, Marc Zyngier wrote: > I've taken over the maintenance of this series originally written by > Jintack and Christoffer. Since then, the series has been substantially > reworked, new features (and most probably bugs) have been added, and > the whole thing rebased multiple times. If anything breaks, please > blame me, and nobody else. > > As you can tell, this is quite big. It is also remarkably incomplete > (we're missing many critical bits for fully emulate EL2), but the idea > is to start merging things early in order to reduce the maintenance > headache. What we want to achieve is that with NV disabled, there is > no performance overhead and no regression. The only thing I intend to > merge ASAP is the first patch in the series, because it should have > zero effect and is a reasonable cleanup. > > The series is roughly divided in 4 parts: exception handling, memory > virtualization, interrupts and timers. There are of course some > dependencies, but you'll hopefully get the gist of it. > > For the most courageous of you, I've put out a branch[1] containing this > and a bit more. Of course, you'll need some userspace. Andre maintains > a hacked version of kvmtool[1] that takes a --nested option, allowing > the guest to be started at EL2. You can run the whole stack in the > Foundation model. Don't be in a hurry ;-). > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git > kvm-arm64/nv-wip-5.2-rc5 > [2] git://linux-arm.org/kvmtool.git nv/nv-wip-5.2-rc5 > > Andre Przywara (4): > KVM: arm64: nv: Handle virtual EL2 registers in > vcpu_read/write_sys_reg() > KVM: arm64: nv: Save/Restore vEL2 sysregs > KVM: arm64: nv: Handle traps for timer _EL02 and _EL2 sysregs > accessors > KVM: arm64: nv: vgic: Allow userland to set VGIC maintenance IRQ > > Christoffer Dall (16): > KVM: arm64: nv: Introduce nested virtualization VCPU feature > KVM: arm64: nv: Reset VCPU to EL2 registers if VCPU nested virt is set > KVM: arm64: nv: Allow userspace to set PSR_MODE_EL2x > KVM: arm64: nv: Add nested virt VCPU primitives for vEL2 VCPU state > KVM: arm64: nv: Handle trapped ERET from virtual EL2 > KVM: arm64: nv: Emulate PSTATE.M for a guest hypervisor > KVM: arm64: nv: Trap EL1 VM register accesses in virtual EL2 > KVM: arm64: nv: Only toggle cache for virtual EL2 when SCTLR_EL2 > changes > KVM: arm/arm64: nv: Support multiple nested stage 2 mmu structures > KVM: arm64: nv: Implement nested Stage-2 page table walk logic > KVM: arm64: nv: Handle shadow stage 2 page faults > KVM: arm64: nv: Unmap/flush shadow stage 2 page tables > KVM: arm64: nv: arch_timer: Support hyp timer emulation > KVM: arm64: nv: vgic-v3: Take cpu_if pointer directly instead of vcpu > KVM: arm64: nv: vgic: Emulate the HW bit in software > KVM: arm64: nv: Add nested GICv3 tracepoints > > Dave Martin (1): > KVM: arm64: Migrate _elx sysreg accessors to msr_s/mrs_s > > Jintack Lim (21): > arm64: Add ARM64_HAS_NESTED_VIRT cpufeature > KVM: arm64: nv: Add EL2 system registers to vcpu context > KVM: arm64: nv: Support virtual EL2 exceptions > KVM: arm64: nv: Inject HVC exceptions to the virtual EL2 > KVM: arm64: nv: Trap SPSR_EL1, ELR_EL1 and VBAR_EL1 from virtual EL2 > KVM: arm64: nv: Trap CPACR_EL1 access in virtual EL2 > KVM: arm64: nv: Set a handler for the system instruction traps > KVM: arm64: nv: Handle PSCI call via smc from the guest > KVM: arm64: nv: Respect virtual HCR_EL2.TWX setting > KVM: arm64: nv: Respect virtual CPTR_EL2.TFP setting > KVM: arm64: nv: Respect the virtual HCR_EL2.NV bit setting > KVM: arm64: nv: Respect virtual HCR_EL2.TVM and TRVM settings > KVM: arm64: nv: Respect the virtual HCR_EL2.NV1 bit setting > KVM: arm64: nv: Emulate EL12 register accesses from the virtual EL2 > KVM: arm64: nv: Configure HCR_EL2 for nested virtualization > KVM: arm64: nv: Pretend we only support larger-than-host page sizes > KVM: arm64: nv: Introduce sys_reg_desc.forward_trap > KVM: arm64: nv: Rework the system instruction emulation framework > KVM: arm64: nv: Trap and emulate AT instructions from virtual EL2 > KVM: arm64: nv: Trap and emulate TLBI instructions from virtual EL2 > KVM: arm64: nv: Nested GICv3 Support > > Marc Zyngier (17): > KVM: arm64: Move __load_guest_stage2 to kvm_mmu.h > KVM: arm64: nv: Reset VMPIDR_EL2 and VPIDR_EL2 to sane values > KVM: arm64: nv: Handle SPSR_EL2 specially > KVM: arm64: nv: Refactor vcpu_{read,write}_sys_reg > KVM: arm64: nv: Don't expose SVE to nested guests > KVM: arm64: nv: Hide RAS from nested guests > KVM: arm/arm64: nv: Factor out stage 2 page table data from struct kvm > KVM: arm64: nv: Move last_vcpu_ran to be per s2 mmu > KVM: arm64: nv: Don't always start an S2 MMU search from the beginning > KVM: arm64: nv: Propagate CNTVOFF_EL2 to the virtual EL1 timer > KVM: arm64: nv: Load timer before the GIC > KVM: arm64: nv: Implement maintenance