Re: [PATCH kernel v4] KVM: PPC: Make KVM_CAP_IRQFD_RESAMPLE support platform dependent
Paolo, ping? :) On 27/10/2022 18:38, Alexey Kardashevskiy wrote: Paolo, ping? On 04/10/2022 10:57, Alexey Kardashevskiy wrote: When introduced, IRQFD resampling worked on POWER8 with XICS. However KVM on POWER9 has never implemented it - the compatibility mode code ("XICS-on-XIVE") misses the kvm_notify_acked_irq() call and the native XIVE mode does not handle INTx in KVM at all. This moved the capability support advertising to platforms and stops advertising it on XIVE, i.e. POWER9 and later. This should cause no behavioural change for other architectures. Signed-off-by: Alexey Kardashevskiy Acked-by: Nicholas Piggin Acked-by: Marc Zyngier --- Changes: v4: * removed incorrect clause about changing behavoir on MIPS and RISCV v3: * removed all ifdeferry * removed the capability for MIPS and RISCV * adjusted the commit log about MIPS and RISCV v2: * removed ifdef for ARM64. --- arch/arm64/kvm/arm.c | 1 + arch/powerpc/kvm/powerpc.c | 6 ++ arch/s390/kvm/kvm-s390.c | 1 + arch/x86/kvm/x86.c | 1 + virt/kvm/kvm_main.c | 1 - 5 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 2ff0ef62abad..d2daa4d375b5 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -218,6 +218,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_ATTRIBUTES: case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: + case KVM_CAP_IRQFD_RESAMPLE: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index fb1490761c87..908ce8bd91c9 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -593,6 +593,12 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) break; #endif +#ifdef CONFIG_HAVE_KVM_IRQFD + case KVM_CAP_IRQFD_RESAMPLE: + r = !xive_enabled(); + break; +#endif + case KVM_CAP_PPC_ALLOC_HTAB: r = hv_enabled; break; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index edfd4bbd0cba..7521adadb81b 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -577,6 +577,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_SET_GUEST_DEBUG: case KVM_CAP_S390_DIAG318: case KVM_CAP_S390_MEM_OP_EXTENSION: + case KVM_CAP_IRQFD_RESAMPLE: r = 1; break; case KVM_CAP_SET_GUEST_DEBUG2: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 43a6a7efc6ec..2d6c5a8fdf14 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4395,6 +4395,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VAPIC: case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: + case KVM_CAP_IRQFD_RESAMPLE: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 584a5bab3af3..05cf94013f02 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4447,7 +4447,6 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) #endif #ifdef CONFIG_HAVE_KVM_IRQFD case KVM_CAP_IRQFD: - case KVM_CAP_IRQFD_RESAMPLE: #endif case KVM_CAP_IOEVENTFD_ANY_LENGTH: case KVM_CAP_CHECK_EXTENSION_VM: -- Alexey ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 31/50] KVM: x86: Do CPU compatibility checks in x86 code
On Mon, Dec 05, 2022, Isaku Yamahata wrote: > On Wed, Nov 30, 2022 at 11:09:15PM +, > > index 66f16458aa97..3571bc968cf8 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9277,10 +9277,36 @@ static inline void kvm_ops_update(struct > > kvm_x86_init_ops *ops) > > kvm_pmu_ops_update(ops->pmu_ops); > > } > > > > +struct kvm_cpu_compat_check { > > + struct kvm_x86_init_ops *ops; > > + int *ret; > > minor nitpick: just int ret. I don't see the necessity of the pointer. > Anyway overall it looks good to me. ... > > @@ -9360,6 +9386,14 @@ static int __kvm_x86_vendor_init(struct > > kvm_x86_init_ops *ops) > > if (r != 0) > > goto out_mmu_exit; > > > > + c.ret = > > + c.ops = ops; > > + for_each_online_cpu(cpu) { > > + smp_call_function_single(cpu, kvm_x86_check_cpu_compat, , 1); > > + if (r < 0) > > Here it can be "c.ret < 0". No, because the below goto leads to "return r", i.e. "c.ret" needs to be propagated to "r". That's why the code does the admittedly funky "int *ret" thing. FWIW, this gets cleanup in the end. "struct kvm_cpu_compat_check" goes away and "" is passed directly to kvm_x86_check_cpu_compat. > > + goto out_hardware_unsetup; ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 39/50] KVM: x86: Move CPU compat checks hook to kvm_x86_ops (from kvm_x86_init_ops)
On Wed, Nov 30, 2022 at 11:09:23PM +, Sean Christopherson wrote: > Move the .check_processor_compatibility() callback from kvm_x86_init_ops > to kvm_x86_ops to allow a future patch to do compatibility checks during > CPU hotplug. > > Do kvm_ops_update() before compat checks so that static_call() can be > used during compat checks. > > Signed-off-by: Sean Christopherson > --- > arch/x86/include/asm/kvm-x86-ops.h | 1 + > arch/x86/include/asm/kvm_host.h| 3 ++- > arch/x86/kvm/svm/svm.c | 5 +++-- > arch/x86/kvm/vmx/vmx.c | 16 +++ > arch/x86/kvm/x86.c | 31 +++--- > 5 files changed, 25 insertions(+), 31 deletions(-) > > diff --git a/arch/x86/include/asm/kvm-x86-ops.h > b/arch/x86/include/asm/kvm-x86-ops.h > index abccd51dcfca..dba2909e5ae2 100644 > --- a/arch/x86/include/asm/kvm-x86-ops.h > +++ b/arch/x86/include/asm/kvm-x86-ops.h > @@ -14,6 +14,7 @@ BUILD_BUG_ON(1) > * to make a definition optional, but in this case the default will > * be __static_call_return0. > */ > +KVM_X86_OP(check_processor_compatibility) > KVM_X86_OP(hardware_enable) > KVM_X86_OP(hardware_disable) > KVM_X86_OP(hardware_unsetup) > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index d79aedf70908..ba74fea6850b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1518,6 +1518,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool > dest_mode_logical) > struct kvm_x86_ops { > const char *name; > > + int (*check_processor_compatibility)(void); > + > int (*hardware_enable)(void); > void (*hardware_disable)(void); > void (*hardware_unsetup)(void); > @@ -1729,7 +1731,6 @@ struct kvm_x86_nested_ops { > }; > > struct kvm_x86_init_ops { > - int (*check_processor_compatibility)(void); > int (*hardware_setup)(void); > unsigned int (*handle_intel_pt_intr)(void); > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 9f94efcb9aa6..c2e95c0d9fd8 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -543,7 +543,7 @@ static bool kvm_is_svm_supported(void) > return true; > } > > -static int __init svm_check_processor_compat(void) > +static int svm_check_processor_compat(void) > { > if (!kvm_is_svm_supported()) > return -EIO; > @@ -4695,6 +4695,8 @@ static int svm_vm_init(struct kvm *kvm) > static struct kvm_x86_ops svm_x86_ops __initdata = { > .name = KBUILD_MODNAME, > > + .check_processor_compatibility = svm_check_processor_compat, > + > .hardware_unsetup = svm_hardware_unsetup, > .hardware_enable = svm_hardware_enable, > .hardware_disable = svm_hardware_disable, > @@ -5079,7 +5081,6 @@ static __init int svm_hardware_setup(void) > > static struct kvm_x86_init_ops svm_init_ops __initdata = { > .hardware_setup = svm_hardware_setup, > - .check_processor_compatibility = svm_check_processor_compat, > > .runtime_ops = _x86_ops, > .pmu_ops = _pmu_ops, > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 2a8a6e481c76..6416ed5b7f89 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -2520,8 +2520,7 @@ static bool cpu_has_perf_global_ctrl_bug(void) > return false; > } > > -static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, > - u32 msr, u32 *result) > +static int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt, u32 msr, u32 > *result) > { > u32 vmx_msr_low, vmx_msr_high; > u32 ctl = ctl_min | ctl_opt; > @@ -2539,7 +2538,7 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 > ctl_opt, > return 0; > } > > -static __init u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) > +static u64 adjust_vmx_controls64(u64 ctl_opt, u32 msr) > { > u64 allowed; > > @@ -2548,8 +2547,8 @@ static __init u64 adjust_vmx_controls64(u64 ctl_opt, > u32 msr) > return ctl_opt & allowed; > } > > -static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf, > - struct vmx_capability *vmx_cap) > +static int setup_vmcs_config(struct vmcs_config *vmcs_conf, > + struct vmx_capability *vmx_cap) > { > u32 vmx_msr_low, vmx_msr_high; > u32 _pin_based_exec_control = 0; > @@ -2710,7 +2709,7 @@ static __init int setup_vmcs_config(struct vmcs_config > *vmcs_conf, > return 0; > } > > -static bool __init kvm_is_vmx_supported(void) > +static bool kvm_is_vmx_supported(void) > { > if (!cpu_has_vmx()) { > pr_err("CPU doesn't support VMX\n"); > @@ -2726,7 +2725,7 @@ static bool __init kvm_is_vmx_supported(void) > return true; > } > > -static int __init vmx_check_processor_compat(void) > +static int vmx_check_processor_compat(void) > { > struct vmcs_config vmcs_conf; > struct vmx_capability vmx_cap;
Re: [PATCH v2 31/50] KVM: x86: Do CPU compatibility checks in x86 code
On Wed, Nov 30, 2022 at 11:09:15PM +, Sean Christopherson wrote: > Move the CPU compatibility checks to pure x86 code, i.e. drop x86's use > of the common kvm_x86_check_cpu_compat() arch hook. x86 is the only > architecture that "needs" to do per-CPU compatibility checks, moving > the logic to x86 will allow dropping the common code, and will also > give x86 more control over when/how the compatibility checks are > performed, e.g. TDX will need to enable hardware (do VMXON) in order to > perform compatibility checks. > > Signed-off-by: Sean Christopherson > --- > arch/x86/kvm/svm/svm.c | 2 +- > arch/x86/kvm/vmx/vmx.c | 2 +- > arch/x86/kvm/x86.c | 49 -- > 3 files changed, 40 insertions(+), 13 deletions(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 19e81a99c58f..d7ea1c1175c2 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -5103,7 +5103,7 @@ static int __init svm_init(void) >* Common KVM initialization _must_ come last, after this, /dev/kvm is >* exposed to userspace! >*/ > - r = kvm_init(_init_ops, sizeof(struct vcpu_svm), > + r = kvm_init(NULL, sizeof(struct vcpu_svm), >__alignof__(struct vcpu_svm), THIS_MODULE); > if (r) > goto err_kvm_init; > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 654d81f781da..8deb1bd60c10 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -8592,7 +8592,7 @@ static int __init vmx_init(void) >* Common KVM initialization _must_ come last, after this, /dev/kvm is >* exposed to userspace! >*/ > - r = kvm_init(_init_ops, sizeof(struct vcpu_vmx), > + r = kvm_init(NULL, sizeof(struct vcpu_vmx), >__alignof__(struct vcpu_vmx), THIS_MODULE); > if (r) > goto err_kvm_init; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 66f16458aa97..3571bc968cf8 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9277,10 +9277,36 @@ static inline void kvm_ops_update(struct > kvm_x86_init_ops *ops) > kvm_pmu_ops_update(ops->pmu_ops); > } > > +struct kvm_cpu_compat_check { > + struct kvm_x86_init_ops *ops; > + int *ret; minor nitpick: just int ret. I don't see the necessity of the pointer. Anyway overall it looks good to me. Reviewed-by: Isaku Yamahata > +}; > + > +static int kvm_x86_check_processor_compatibility(struct kvm_x86_init_ops > *ops) > +{ > + struct cpuinfo_x86 *c = _data(smp_processor_id()); > + > + WARN_ON(!irqs_disabled()); > + > + if (__cr4_reserved_bits(cpu_has, c) != > + __cr4_reserved_bits(cpu_has, _cpu_data)) > + return -EIO; > + > + return ops->check_processor_compatibility(); > +} > + > +static void kvm_x86_check_cpu_compat(void *data) > +{ > + struct kvm_cpu_compat_check *c = data; > + > + *c->ret = kvm_x86_check_processor_compatibility(c->ops); > +} > + > static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) > { > + struct kvm_cpu_compat_check c; > u64 host_pat; > - int r; > + int r, cpu; > > if (kvm_x86_ops.hardware_enable) { > pr_err("kvm: already loaded vendor module '%s'\n", > kvm_x86_ops.name); > @@ -9360,6 +9386,14 @@ static int __kvm_x86_vendor_init(struct > kvm_x86_init_ops *ops) > if (r != 0) > goto out_mmu_exit; > > + c.ret = > + c.ops = ops; > + for_each_online_cpu(cpu) { > + smp_call_function_single(cpu, kvm_x86_check_cpu_compat, , 1); > + if (r < 0) Here it can be "c.ret < 0". > + goto out_hardware_unsetup; > + } > + > /* >* Point of no return! DO NOT add error paths below this point unless >* absolutely necessary, as most operations from this point forward > @@ -9402,6 +9436,8 @@ static int __kvm_x86_vendor_init(struct > kvm_x86_init_ops *ops) > kvm_init_msr_list(); > return 0; > > +out_hardware_unsetup: > + ops->runtime_ops->hardware_unsetup(); > out_mmu_exit: > kvm_mmu_vendor_module_exit(); > out_free_percpu: > @@ -12037,16 +12073,7 @@ void kvm_arch_hardware_disable(void) > > int kvm_arch_check_processor_compat(void *opaque) > { > - struct cpuinfo_x86 *c = _data(smp_processor_id()); > - struct kvm_x86_init_ops *ops = opaque; > - > - WARN_ON(!irqs_disabled()); > - > - if (__cr4_reserved_bits(cpu_has, c) != > - __cr4_reserved_bits(cpu_has, _cpu_data)) > - return -EIO; > - > - return ops->check_processor_compatibility(); > + return 0; > } > > bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu) > -- > 2.38.1.584.g0f3c55d4c2-goog > -- Isaku Yamahata ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 04/16] KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
On Mon, Dec 05, 2022 at 12:05:32PM +, Marc Zyngier wrote: > On Thu, 01 Dec 2022 16:51:46 +, > Ricardo Koller wrote: > > > > On Thu, Dec 01, 2022 at 08:47:47AM -0800, Ricardo Koller wrote: > > > On Sun, Nov 13, 2022 at 04:38:20PM +, Marc Zyngier wrote: > > > > The PMU architecture makes a subtle difference between a 64bit > > > > counter and a counter that has a 64bit overflow. This is for example > > > > the case of the cycle counter, which can generate an overflow on > > > > a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being > > > > done on 64 bits. > > > > > > > > Use this distinction in the few cases where it matters in the code, > > > > as we will reuse this with PMUv3p5 long counters. > > > > > > > > Signed-off-by: Marc Zyngier > > > > --- > > > > arch/arm64/kvm/pmu-emul.c | 43 --- > > > > 1 file changed, 31 insertions(+), 12 deletions(-) > > > > > > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > > > > index 69b67ab3c4bf..d050143326b5 100644 > > > > --- a/arch/arm64/kvm/pmu-emul.c > > > > +++ b/arch/arm64/kvm/pmu-emul.c > > > > @@ -50,6 +50,11 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm) > > > > * @select_idx: The counter index > > > > */ > > > > static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx) > > > > +{ > > > > + return (select_idx == ARMV8_PMU_CYCLE_IDX); > > > > +} > > > > + > > > > +static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 > > > > select_idx) > > > > { > > > > return (select_idx == ARMV8_PMU_CYCLE_IDX && > > > > __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC); > > > > @@ -57,7 +62,8 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu > > > > *vcpu, u64 select_idx) > > > > > > > > static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx) > > > > { > > > > - return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX); > > > > + return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX && > > > > + !kvm_pmu_idx_has_64bit_overflow(vcpu, idx)); > > > > } > > > > > > > > static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc) > > > > @@ -97,7 +103,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, > > > > u64 select_idx) > > > > counter += perf_event_read_value(pmc->perf_event, > > > > , > > > > ); > > > > > > > > - if (select_idx != ARMV8_PMU_CYCLE_IDX) > > > > + if (!kvm_pmu_idx_is_64bit(vcpu, select_idx)) > > > > counter = lower_32_bits(counter); > > > > > > > > return counter; > > > > @@ -423,6 +429,23 @@ static void kvm_pmu_counter_increment(struct > > > > kvm_vcpu *vcpu, > > > > } > > > > } > > > > > > > > +/* Compute the sample period for a given counter value */ > > > > +static u64 compute_period(struct kvm_vcpu *vcpu, u64 select_idx, u64 > > > > counter) > > > > +{ > > > > + u64 val; > > > > + > > > > + if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) { > > > > + if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) > > > > + val = -(counter & GENMASK(31, 0)); > > > > > > If I understand things correctly, this might be missing another mask: > > > > > > + if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) { > > > + val = -(counter & GENMASK(31, 0)); > > > + val &= GENMASK(31, 0); > > > + } else { > > > > > > For example, if the counter is 64-bits wide, it overflows at 32-bits, > > > and it is _one_ sample away from overflowing at 32-bits: > > > > > > 0x01010101_ > > > > > > Then "val = (-counter) & GENMASK(63, 0)" would return 0x_0001. > > > > Sorry, this should be: > > > > Then "val = -(counter & GENMASK(31, 0))" would return > > 0x_0001. > > > > > But the right period is 0x_0001 (it's one sample away from > > > overflowing). > > Yup, this is a bit bogus. But this can be simplified by falling back > to the normal 32bit handling (on top of the pmu-unchained branch): > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > index d8ea39943086..24908400e190 100644 > --- a/arch/arm64/kvm/pmu-emul.c > +++ b/arch/arm64/kvm/pmu-emul.c > @@ -461,14 +461,10 @@ static u64 compute_period(struct kvm_pmc *pmc, u64 > counter) > { > u64 val; > > - if (kvm_pmc_is_64bit(pmc)) { > - if (!kvm_pmc_has_64bit_overflow(pmc)) > - val = -(counter & GENMASK(31, 0)); > - else > - val = (-counter) & GENMASK(63, 0); > - } else { > + if (kvm_pmc_is_64bit(pmc) && kvm_pmc_has_64bit_overflow(pmc)) Great, thanks! Yes, that definitely makes things simpler ^. > + val = (-counter) & GENMASK(63, 0); > + else > val = (-counter) & GENMASK(31, 0); > - } > > return val; > } > >
[GIT PULL] KVM/arm64 updates for 6.2
Paolo, Here's the meat of the KVM/arm64 updates for 6.2, and it is quite packed with stuff (see the tag for the laundry list). Things to notice: - We have the usual shared branch with arm64, due to the never-ending sysreg repainting. I'm sure it will stop one day... - There is a lot of selftest conflicts with your own branch, see: https://lore.kernel.org/r/20221201112432.4cb9a...@canb.auug.org.au https://lore.kernel.org/r/20221201113626.438f1...@canb.auug.org.au https://lore.kernel.org/r/20221201115741.7de32...@canb.auug.org.au https://lore.kernel.org/r/20221201120939.3c19f...@canb.auug.org.au https://lore.kernel.org/r/20221201131623.18ebc...@canb.auug.org.au for a rather exhaustive collection. - For the 6.3 cycle, we are going to experiment with Oliver taking care of most of the patch herding. I'm sure he'll do a great job, but if there is the odd mistake, please cut him some slack and blame me instead. Please pull, M. The following changes since commit f0c4d9fc9cc9462659728d168387191387e903cc: Linux 6.1-rc4 (2022-11-06 15:07:11 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git tags/kvmarm-6.2 for you to fetch changes up to 753d734f3f347e7fc49b819472bbf61dcfc1a16f: Merge remote-tracking branch 'arm64/for-next/sysregs' into kvmarm-master/next (2022-12-05 14:39:53 +) KVM/arm64 updates for 6.2 - Enable the per-vcpu dirty-ring tracking mechanism, together with an option to keep the good old dirty log around for pages that are dirtied by something other than a vcpu. - Switch to the relaxed parallel fault handling, using RCU to delay page table reclaim and giving better performance under load. - Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option, which multi-process VMMs such as crosvm rely on. - Merge the pKVM shadow vcpu state tracking that allows the hypervisor to have its own view of a vcpu, keeping that state private. - Add support for the PMUv3p5 architecture revision, bringing support for 64bit counters on systems that support it, and fix the no-quite-compliant CHAIN-ed counter support for the machines that actually exist out there. - Fix a handful of minor issues around 52bit VA/PA support (64kB pages only) as a prefix of the oncoming support for 4kB and 16kB pages. - Add/Enable/Fix a bunch of selftests covering memslots, breakpoints, stage-2 faults and access tracking. You name it, we got it, we probably broke it. - Pick a small set of documentation and spelling fixes, because no good merge window would be complete without those. As a side effect, this tag also drags: - The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring series - A shared branch with the arm64 tree that repaints all the system registers to match the ARM ARM's naming, and resulting in interesting conflicts Anshuman Khandual (1): KVM: arm64: PMU: Replace version number '0' with ID_AA64DFR0_EL1_PMUVer_NI Catalin Marinas (4): mm: Do not enable PG_arch_2 for all 64-bit architectures arm64: mte: Fix/clarify the PG_mte_tagged semantics KVM: arm64: Simplify the sanitise_mte_tags() logic arm64: mte: Lock a page for MTE tag initialisation Fuad Tabba (3): KVM: arm64: Add hyp_spinlock_t static initializer KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Gavin Shan (14): KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h KVM: Support dirty ring in conjunction with bitmap KVM: arm64: Enable ring-based dirty memory tracking KVM: selftests: Use host page size to map ring buffer in dirty_log_test KVM: selftests: Clear dirty ring states between two modes in dirty_log_test KVM: selftests: Automate choosing dirty ring size in dirty_log_test KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm() KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm() KVM: selftests: memslot_perf_test: Probe memory slots for once KVM: selftests: memslot_perf_test: Support variable guest page size KVM: selftests: memslot_perf_test: Consolidate memory KVM: selftests: memslot_perf_test: Report optimal memory slots KVM: Push dirty information unconditionally to backup bitmap James Morse (38): arm64/sysreg: Standardise naming for ID_MMFR0_EL1 arm64/sysreg: Standardise naming for ID_MMFR4_EL1 arm64/sysreg: Standardise naming for ID_MMFR5_EL1 arm64/sysreg: Standardise naming for ID_ISAR0_EL1 arm64/sysreg: Standardise naming for ID_ISAR4_EL1 arm64/sysreg: Standardise naming for ID_ISAR5_EL1
Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.
On Mon, 5 Dec 2022 11:40:31 +, Ryan Roberts wrote: > get_user_mapping_size() uses kvm's pgtable library to walk a user space > page table created by the kernel, and in doing so, fakes up the metadata > that the library needs, including ia_bits, which defines the size of the > input address. > > For the case where the kernel is compiled for 52 VA bits but runs on HW > that does not support LVA, it will fall back to 48 VA bits at runtime. > Therefore we must use vabits_actual rather than VA_BITS to get the true > address size. > > [...] Applied to next, thanks! [1/1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS. commit: 219072c09abde0f1d0a6ce091be375e8eb7d08f0 Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.
On 05/12/2022 13:49, Marc Zyngier wrote: > Hi Ryan, > > Thanks for that. > > On Mon, 05 Dec 2022 11:40:31 +, > Ryan Roberts wrote: >> >> get_user_mapping_size() uses kvm's pgtable library to walk a user space >> page table created by the kernel, and in doing so, fakes up the metadata >> that the library needs, including ia_bits, which defines the size of the >> input address. > > It isn't supposed to "fake" anything. It simply provides the > information that the walker needs to correctly parse the page tables. Apologies - poor choice of words. > >> >> For the case where the kernel is compiled for 52 VA bits but runs on HW >> that does not support LVA, it will fall back to 48 VA bits at runtime. >> Therefore we must use vabits_actual rather than VA_BITS to get the true >> address size. >> >> This is benign in the current code base because the pgtable library only >> uses it for error checking. >> >> Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute >> the THP mapping size") > > nit: this should appear on a single line, without a line-break in the > middle [1]...> >> > > ... without a blank line between Fixes: and the rest of the tags. Ahh, thanks for the pointer. I'll admit that checkpatch did raise this but I assumed it was a false positive, because assumed the 75 chars per line rule would override. > > And while I'm on the "trivial remarks" train, drop the full stop at > the end of the subject line. Yep, will do. > >> Signed-off-by: Ryan Roberts >> --- >> arch/arm64/kvm/mmu.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 4efb983cff43..1ef0704420d9 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 >> addr) >> { >> struct kvm_pgtable pgt = { >> .pgd= (kvm_pte_t *)kvm->mm->pgd, >> -.ia_bits= VA_BITS, >> +.ia_bits= vabits_actual, >> .start_level= (KVM_PGTABLE_MAX_LEVELS - >> CONFIG_PGTABLE_LEVELS), >> .mm_ops = _user_mm_ops, >> -- >> 2.25.1 >> >> > > Other than the above nits, this is well spotted. I need to regenerate > the kvmarm/next branch after the sysreg attack from James, so I'll try > and fold that in. Sounds like you are happy to tend to the nits and don't need me to repost? > > Thanks, > > M. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n139 > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.
Hi Ryan, Thanks for that. On Mon, 05 Dec 2022 11:40:31 +, Ryan Roberts wrote: > > get_user_mapping_size() uses kvm's pgtable library to walk a user space > page table created by the kernel, and in doing so, fakes up the metadata > that the library needs, including ia_bits, which defines the size of the > input address. It isn't supposed to "fake" anything. It simply provides the information that the walker needs to correctly parse the page tables. > > For the case where the kernel is compiled for 52 VA bits but runs on HW > that does not support LVA, it will fall back to 48 VA bits at runtime. > Therefore we must use vabits_actual rather than VA_BITS to get the true > address size. > > This is benign in the current code base because the pgtable library only > uses it for error checking. > > Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute > the THP mapping size") nit: this should appear on a single line, without a line-break in the middle [1]... > ... without a blank line between Fixes: and the rest of the tags. And while I'm on the "trivial remarks" train, drop the full stop at the end of the subject line. > Signed-off-by: Ryan Roberts > --- > arch/arm64/kvm/mmu.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 4efb983cff43..1ef0704420d9 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 > addr) > { > struct kvm_pgtable pgt = { > .pgd= (kvm_pte_t *)kvm->mm->pgd, > - .ia_bits= VA_BITS, > + .ia_bits= vabits_actual, > .start_level= (KVM_PGTABLE_MAX_LEVELS - > CONFIG_PGTABLE_LEVELS), > .mm_ops = _user_mm_ops, > -- > 2.25.1 > > Other than the above nits, this is well spotted. I need to regenerate the kvmarm/next branch after the sysreg attack from James, so I'll try and fold that in. Thanks, M. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n139 -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [External] Re: [v2 0/6] KVM: arm64: implement vcpu_is_preempted check
On 24/11/2022 13:55, Usama Arif wrote: On 18/11/2022 00:20, Marc Zyngier wrote: On Mon, 07 Nov 2022 12:00:44 +, Usama Arif wrote: On 06/11/2022 16:35, Marc Zyngier wrote: On Fri, 04 Nov 2022 06:20:59 +, Usama Arif wrote: This patchset adds support for vcpu_is_preempted in arm64, which allows the guest to check if a vcpu was scheduled out, which is useful to know incase it was holding a lock. vcpu_is_preempted can be used to improve performance in locking (see owner_on_cpu usage in mutex_spin_on_owner, mutex_can_spin_on_owner, rtmutex_spin_on_owner and osq_lock) and scheduling (see available_idle_cpu which is used in several places in kernel/sched/fair.c for e.g. in wake_affine to determine which CPU can run soonest): [...] pvcy shows a smaller overall improvement (50%) compared to vcpu_is_preempted (277%). Host side flamegraph analysis shows that ~60% of the host time when using pvcy is spent in kvm_handle_wfx, compared with ~1.5% when using vcpu_is_preempted, hence vcpu_is_preempted shows a larger improvement. And have you worked out *why* we spend so much time handling WFE? M. Its from the following change in pvcy patchset: diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c index e778eefcf214..915644816a85 100644 --- a/arch/arm64/kvm/handle_exit.c +++ b/arch/arm64/kvm/handle_exit.c @@ -118,7 +118,12 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu) } if (esr & ESR_ELx_WFx_ISS_WFE) { - kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu)); + int state; + while ((state = kvm_pvcy_check_state(vcpu)) == 0) + schedule(); + + if (state == -1) + kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu)); } else { if (esr & ESR_ELx_WFx_ISS_WFxT) vcpu_set_flag(vcpu, IN_WFIT); If my understanding is correct of the pvcy changes, whenever pvcy returns an unchanged vcpu state, we would schedule to another vcpu. And its the constant scheduling where the time is spent. I guess the affects are much higher when the lock contention is very high. This can be seem from the pvcy host side flamegraph as well with (~67% of the time spent in the schedule() call in kvm_handle_wfx), For reference, I have put the graph at: https://uarif1.github.io/pvlock/perf_host_pvcy_nmi.svg The real issue here is that we don't try to pick the right vcpu to run, and strictly rely on schedule() to eventually pick something that can run. An interesting to do would be to try and fit the directed yield mechanism there. It would be a lot more interesting than the one-off vcpu_is_preempted hack, as it gives us a low-level primitive on which to construct things (pvcy is effectively a mwait-like primitive). We could use kvm_vcpu_yield_to to yield to a specific vcpu, but how would we determine which vcpu to yield to? IMO vcpu_is_preempted is very well integrated in a lot of core kernel code, i.e. mutex, rtmutex, rwsem and osq_lock. It is also used in scheduler to determine better which vCPU we can run on soonest, select idle core, etc. I am not sure if all of these cases will be optimized by pvcy? Also, with vcpu_is_preempted, some of the lock heavy benchmarks come down from spending around 50% of the time in lock to less than 1% (so not sure how much more room is there for improvement). We could also use vcpu_is_preempted to optimize IPI performance (along with directed yield to target IPI vCPU) similar to how its done in x86 (https://lore.kernel.org/all/1560255830-8656-2-git-send-email-wanpen...@tencent.com/). This case definitely wont be covered by pvcy. Considering all the above, i.e. the core kernel integration already present and possible future usecases of vcpu_is_preempted, maybe its worth making vcpu_is_preempted work on arm independently of pvcy? Hi, Just wanted to check if there are any comments on above? I can send a v3 with the doc and code fixes suggested in the earlier reviews if it makes sense? Thanks, Usama Thanks, Usama M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 04/16] KVM: arm64: PMU: Distinguish between 64bit counter and 64bit overflow
On Thu, 01 Dec 2022 16:51:46 +, Ricardo Koller wrote: > > On Thu, Dec 01, 2022 at 08:47:47AM -0800, Ricardo Koller wrote: > > On Sun, Nov 13, 2022 at 04:38:20PM +, Marc Zyngier wrote: > > > The PMU architecture makes a subtle difference between a 64bit > > > counter and a counter that has a 64bit overflow. This is for example > > > the case of the cycle counter, which can generate an overflow on > > > a 32bit boundary if PMCR_EL0.LC==0 despite the accumulation being > > > done on 64 bits. > > > > > > Use this distinction in the few cases where it matters in the code, > > > as we will reuse this with PMUv3p5 long counters. > > > > > > Signed-off-by: Marc Zyngier > > > --- > > > arch/arm64/kvm/pmu-emul.c | 43 --- > > > 1 file changed, 31 insertions(+), 12 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c > > > index 69b67ab3c4bf..d050143326b5 100644 > > > --- a/arch/arm64/kvm/pmu-emul.c > > > +++ b/arch/arm64/kvm/pmu-emul.c > > > @@ -50,6 +50,11 @@ static u32 kvm_pmu_event_mask(struct kvm *kvm) > > > * @select_idx: The counter index > > > */ > > > static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, u64 select_idx) > > > +{ > > > + return (select_idx == ARMV8_PMU_CYCLE_IDX); > > > +} > > > + > > > +static bool kvm_pmu_idx_has_64bit_overflow(struct kvm_vcpu *vcpu, u64 > > > select_idx) > > > { > > > return (select_idx == ARMV8_PMU_CYCLE_IDX && > > > __vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_LC); > > > @@ -57,7 +62,8 @@ static bool kvm_pmu_idx_is_64bit(struct kvm_vcpu *vcpu, > > > u64 select_idx) > > > > > > static bool kvm_pmu_counter_can_chain(struct kvm_vcpu *vcpu, u64 idx) > > > { > > > - return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX); > > > + return (!(idx & 1) && (idx + 1) < ARMV8_PMU_CYCLE_IDX && > > > + !kvm_pmu_idx_has_64bit_overflow(vcpu, idx)); > > > } > > > > > > static struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc) > > > @@ -97,7 +103,7 @@ u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, > > > u64 select_idx) > > > counter += perf_event_read_value(pmc->perf_event, , > > >); > > > > > > - if (select_idx != ARMV8_PMU_CYCLE_IDX) > > > + if (!kvm_pmu_idx_is_64bit(vcpu, select_idx)) > > > counter = lower_32_bits(counter); > > > > > > return counter; > > > @@ -423,6 +429,23 @@ static void kvm_pmu_counter_increment(struct > > > kvm_vcpu *vcpu, > > > } > > > } > > > > > > +/* Compute the sample period for a given counter value */ > > > +static u64 compute_period(struct kvm_vcpu *vcpu, u64 select_idx, u64 > > > counter) > > > +{ > > > + u64 val; > > > + > > > + if (kvm_pmu_idx_is_64bit(vcpu, select_idx)) { > > > + if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) > > > + val = -(counter & GENMASK(31, 0)); > > > > If I understand things correctly, this might be missing another mask: > > > > + if (!kvm_pmu_idx_has_64bit_overflow(vcpu, select_idx)) { > > + val = -(counter & GENMASK(31, 0)); > > + val &= GENMASK(31, 0); > > + } else { > > > > For example, if the counter is 64-bits wide, it overflows at 32-bits, > > and it is _one_ sample away from overflowing at 32-bits: > > > > 0x01010101_ > > > > Then "val = (-counter) & GENMASK(63, 0)" would return 0x_0001. > > Sorry, this should be: > > Then "val = -(counter & GENMASK(31, 0))" would return > 0x_0001. > > > But the right period is 0x_0001 (it's one sample away from > > overflowing). Yup, this is a bit bogus. But this can be simplified by falling back to the normal 32bit handling (on top of the pmu-unchained branch): diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index d8ea39943086..24908400e190 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -461,14 +461,10 @@ static u64 compute_period(struct kvm_pmc *pmc, u64 counter) { u64 val; - if (kvm_pmc_is_64bit(pmc)) { - if (!kvm_pmc_has_64bit_overflow(pmc)) - val = -(counter & GENMASK(31, 0)); - else - val = (-counter) & GENMASK(63, 0); - } else { + if (kvm_pmc_is_64bit(pmc) && kvm_pmc_has_64bit_overflow(pmc)) + val = (-counter) & GENMASK(63, 0); + else val = (-counter) & GENMASK(31, 0); - } return val; } which satisfies the requirement without any extra masking, and makes it plain that only a 64bit counter with 64bit overflow gets its period computed on the full 64bit, and that anyone else gets the 32bit truncation. I'll stash yet another patch on top and push it onto -next. Thanks! M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list
[PATCH v1] KVM: arm64: Fix benign bug with incorrect use of VA_BITS.
get_user_mapping_size() uses kvm's pgtable library to walk a user space page table created by the kernel, and in doing so, fakes up the metadata that the library needs, including ia_bits, which defines the size of the input address. For the case where the kernel is compiled for 52 VA bits but runs on HW that does not support LVA, it will fall back to 48 VA bits at runtime. Therefore we must use vabits_actual rather than VA_BITS to get the true address size. This is benign in the current code base because the pgtable library only uses it for error checking. Fixes: 6011cf68c885 ("KVM: arm64: Walk userspace page tables to compute the THP mapping size") Signed-off-by: Ryan Roberts --- arch/arm64/kvm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 4efb983cff43..1ef0704420d9 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -641,7 +641,7 @@ static int get_user_mapping_size(struct kvm *kvm, u64 addr) { struct kvm_pgtable pgt = { .pgd= (kvm_pte_t *)kvm->mm->pgd, - .ia_bits= VA_BITS, + .ia_bits= vabits_actual, .start_level= (KVM_PGTABLE_MAX_LEVELS - CONFIG_PGTABLE_LEVELS), .mm_ops = _user_mm_ops, -- 2.25.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm