Re: [PATCH] arm64: Introduce ISAR6 CPU ID register
On 12/12/2019 10:01 PM, Mark Rutland wrote: > On Thu, Dec 12, 2019 at 03:22:13PM +, Suzuki Kuruppassery Poulose wrote: >> On 12/12/2019 14:46, Mark Rutland wrote: >>> On Thu, Dec 12, 2019 at 03:44:23PM +0530, Anshuman Khandual wrote: +#define ID_ISAR6_JSCVT_SHIFT 0 +#define ID_ISAR6_DP_SHIFT 4 +#define ID_ISAR6_FHM_SHIFT8 +#define ID_ISAR6_SB_SHIFT 12 +#define ID_ISAR6_SPECRES_SHIFT16 +#define ID_ISAR6_BF16_SHIFT 20 +#define ID_ISAR6_I8MM_SHIFT 24 >>> @@ -399,6 +399,7 @@ static const struct __ftr_reg_entry { ARM64_FTR_REG(SYS_ID_ISAR4_EL1, ftr_generic_32bits), ARM64_FTR_REG(SYS_ID_ISAR5_EL1, ftr_id_isar5), ARM64_FTR_REG(SYS_ID_MMFR4_EL1, ftr_id_mmfr4), >>> + ARM64_FTR_REG(SYS_ID_ISAR6_EL1, ftr_generic_32bits), >>> >>> Using ftr_generic_32bits exposes the lowest-common-denominator for all >>> 4-bit fields in the register, and I don't think that's the right thing >>> to do here, because: >>> >>> * We have no idea what ID_ISAR6 bits [31:28] may mean in future. >>> >>> * AFAICT, the instructions described by ID_ISAR6.SPECRES (from the >>>ARMv8.0-PredInv extension) operate on the local PE and are not >>>broadcast. To make those work as a guest expects, the host will need >>>to do additional things (e.g. to preserve that illusion when a vCPU is >>>migrated from one pCPU to another and back). >>> >>> Given that, think we should add an explicit ftr_id_isar6 which only >>> exposes the fields that we're certain are safe to expose to a guest >>> (i.e. without SPECRES). >> >> Agree. Thanks for pointing this out. I recommended the usage of >> generic_32bits table without actually looking at the feature >> definitions. > > No worries; this is /really/ easy to miss! > > Looking again, comparing to ARM DDI 0487E.a, there are a few other > things we should probably sort out: > > * ID_DFR0 fields need more thought; we should limit what we expose here. > I don't think it's valid for us to expose TraceFilt, and I suspect we Sure, will go ahead and drop TraceFilt [28..31] from ID_DFR0 register. > need to add capping for debug features we currently emulate. Could you please elaborate ? > > * ID_ISAR0[31:28] are RES0 in ARMv8, Reserved/UNK in ARMv7. > We should probably ftr_id_isar0 so we can hide those bits. Sure, will do. > > * ID_ISAR5[23:10] are RES0 > We handle this already! :) I may be missing something here but some of these fields are already there. #define ID_ISAR5_RDM_SHIFT 24 #define ID_ISAR5_CRC32_SHIFT16 #define ID_ISAR5_SHA2_SHIFT 12 #define ID_ISAR5_SHA1_SHIFT 8 #define ID_ISAR5_AES_SHIFT 4 #define ID_ISAR5_SEVL_SHIFT 0 static const struct arm64_ftr_bits ftr_id_isar5[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_RDM_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_CRC32_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_SHA2_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_SHA1_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_AES_SHIFT, 4, 0), ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR5_SEVL_SHIFT, 4, 0), ARM64_FTR_END, }; > > * ID_MMFR4.SpecSEI should be trated as higher safe. > We should update ftr_id_mmfr4 to handle this and other fields. Sure but should we also export other fields as higher safe in there ? > > * ID_PFR0 is missing DIT and CSV2 > We should probably add these (but neither RAS not AMU). Sure, will do. > > * ID_PFR2 is missing > We should probably add this for SSBS and CSV3. Sure but should we add corresponding ID_AA64PFR2_EL1 register as well ? > > Thanks, > Mark. > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [kvm-unit-tests PATCH 05/16] arm/arm64: ITS: Introspection tests
Hi Eric, I have to admit that this is the first time I've looked into the kvm-unit-tests code, so only some minor comments inline :) On 2019/12/16 22:02, Eric Auger wrote: Detect the presence of an ITS as part of the GICv3 init routine, initialize its base address and read few registers the IIDR, the TYPER to store its dimensioning parameters. This is our first ITS test, belonging to a new "its" group. Signed-off-by: Eric Auger [...] diff --git a/lib/arm/asm/gic-v3-its.h b/lib/arm/asm/gic-v3-its.h new file mode 100644 index 000..2ce483e --- /dev/null +++ b/lib/arm/asm/gic-v3-its.h @@ -0,0 +1,116 @@ +/* + * All ITS* defines are lifted from include/linux/irqchip/arm-gic-v3.h + * + * Copyright (C) 2016, Red Hat Inc, Andrew Jones + * + * This work is licensed under the terms of the GNU LGPL, version 2. + */ +#ifndef _ASMARM_GIC_V3_ITS_H_ +#define _ASMARM_GIC_V3_ITS_H_ + +#ifndef __ASSEMBLY__ + +#define GITS_CTLR 0x +#define GITS_IIDR 0x0004 +#define GITS_TYPER 0x0008 +#define GITS_CBASER0x0080 +#define GITS_CWRITER 0x0088 +#define GITS_CREADR0x0090 +#define GITS_BASER 0x0100 + +#define GITS_TYPER_PLPIS(1UL << 0) +#define GITS_TYPER_IDBITS_SHIFT 8 +#define GITS_TYPER_DEVBITS_SHIFT13 +#define GITS_TYPER_DEVBITS(r) r) >> GITS_TYPER_DEVBITS_SHIFT) & 0x1f) + 1) +#define GITS_TYPER_PTA (1UL << 19) +#define GITS_TYPER_HWCOLLCNT_SHIFT 24 + +#define GITS_CTLR_ENABLE(1U << 0) + +#define GITS_CBASER_VALID (1UL << 63) +#define GITS_CBASER_SHAREABILITY_SHIFT (10) +#define GITS_CBASER_INNER_CACHEABILITY_SHIFT(59) +#define GITS_CBASER_OUTER_CACHEABILITY_SHIFT(53) +#define GITS_CBASER_SHAREABILITY_MASK \ + GIC_BASER_SHAREABILITY(GITS_CBASER, SHAREABILITY_MASK) +#define GITS_CBASER_INNER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, MASK) +#define GITS_CBASER_OUTER_CACHEABILITY_MASK \ + GIC_BASER_CACHEABILITY(GITS_CBASER, OUTER, MASK) +#define GITS_CBASER_CACHEABILITY_MASK GITS_CBASER_INNER_CACHEABILITY_MASK + +#define GITS_CBASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GITS_CBASER, InnerShareable) + +#define GITS_CBASER_nCnBGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nCnB) +#define GITS_CBASER_nC GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, nC) +#define GITS_CBASER_RaWtGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWt) +#define GITS_CBASER_RaWbGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWt) s/RaWt/RaWb/ +#define GITS_CBASER_WaWtGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWt) +#define GITS_CBASER_WaWbGIC_BASER_CACHEABILITY(GITS_CBASER, INNER, WaWb) +#define GITS_CBASER_RaWaWt GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWt) +#define GITS_CBASER_RaWaWb GIC_BASER_CACHEABILITY(GITS_CBASER, INNER, RaWaWb) + +#define GITS_BASER_NR_REGS 8 + +#define GITS_BASER_VALID(1UL << 63) +#define GITS_BASER_INDIRECT (1ULL << 62) + +#define GITS_BASER_INNER_CACHEABILITY_SHIFT (59) +#define GITS_BASER_OUTER_CACHEABILITY_SHIFT (53) +#define GITS_BASER_CACHEABILITY_MASK 0x7 + +#define GITS_BASER_nCnB GIC_BASER_CACHEABILITY(GITS_BASER, INNER, nCnB) + +#define GITS_BASER_TYPE_SHIFT (56) +#define GITS_BASER_TYPE(r) (((r) >> GITS_BASER_TYPE_SHIFT) & 7) +#define GITS_BASER_ENTRY_SIZE_SHIFT (48) +#define GITS_BASER_ENTRY_SIZE(r)r) >> GITS_BASER_ENTRY_SIZE_SHIFT) & 0x1f) + 1) +#define GITS_BASER_SHAREABILITY_SHIFT (10) +#define GITS_BASER_InnerShareable \ + GIC_BASER_SHAREABILITY(GITS_BASER, InnerShareable) +#define GITS_BASER_PAGE_SIZE_SHIFT (8) +#define GITS_BASER_PAGE_SIZE_4K (0UL << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGE_SIZE_16K(1UL << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGE_SIZE_64K(2UL << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGE_SIZE_MASK (3UL << GITS_BASER_PAGE_SIZE_SHIFT) +#define GITS_BASER_PAGES_MAX256 +#define GITS_BASER_PAGES_SHIFT (0) +#define GITS_BASER_NR_PAGES(r) (((r) & 0xff) + 1) +#define GITS_BASER_PHYS_ADDR_MASK 0xF000 + +#define GITS_BASER_TYPE_NONE0 +#define GITS_BASER_TYPE_DEVICE 1 +#define GITS_BASER_TYPE_VCPU2 +#define GITS_BASER_TYPE_CPU 3 '3' is one of the reserved values of the GITS_BASER.Type field, and what do we expect with a "GITS_BASER_TYPE_CPU" table type? ;-) I think we can copy (and might update in the future) all these macros against the latest Linux kernel.
Re: [PATCH v4 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap
On Tue, Dec 17, 2019 at 05:24:46PM -0500, Peter Xu wrote: > On Tue, Dec 17, 2019 at 12:40:29PM -0800, Sean Christopherson wrote: > > Explicitly free an allocated-but-unused dirty bitmap instead of relying > > on kvm_free_memslot() if an error occurs in __kvm_set_memory_region(). > > There is no longer a need to abuse kvm_free_memslot() to free arch > > specific resources as arch specific code is now called only after the > > common flow is guaranteed to succeed. Arch code can still fail, but > > it's responsible for its own cleanup in that case. > > > > Eliminating the error path's abuse of kvm_free_memslot() paves the way > > for simplifying kvm_free_memslot(), i.e. dropping its @dont param. > > > > Signed-off-by: Sean Christopherson > > --- > > virt/kvm/kvm_main.c | 7 --- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index d403e93e3028..6b2261a9e139 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -1096,7 +1096,7 @@ int __kvm_set_memory_region(struct kvm *kvm, > > > > slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); > > if (!slots) > > - goto out_free; > > + goto out_bitmap; > > memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); > > > > if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) { > > @@ -1144,8 +1144,9 @@ int __kvm_set_memory_region(struct kvm *kvm, > > if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > > slots = install_new_memslots(kvm, as_id, slots); > > kvfree(slots); > > -out_free: > > - kvm_free_memslot(kvm, &new, &old); > > +out_bitmap: > > + if (new.dirty_bitmap && !old.dirty_bitmap) > > + kvm_destroy_dirty_bitmap(&new); > > What if both the old and new have KVM_MEM_LOG_DIRTY_PAGES set? > kvm_free_memslot() did cover that but I see that you explicitly > dropped it. Could I ask why? Thanks, In that case, old.dirty_bitmap == new.dirty_bitmap, i.e. shouldn't be freed by this error path since doing so would result in a use-after-free via the old memslot. The kvm_free_memslot() logic is the same, albeit in a very twisted way. In __kvm_set_memory_region(), @old and @new start with the same dirty_bitmap. new = old = *slot; And @new is modified based on KVM_MEM_LOG_DIRTY_PAGES. If LOG_DIRTY_PAGES is set in both @new and @old, then both the "if" and "else if" evaluate false, i.e. new.dirty_bitmap == old.dirty_bitmap. /* Allocate/free page dirty bitmap as needed */ if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) new.dirty_bitmap = NULL; else if (!new.dirty_bitmap) { r = kvm_create_dirty_bitmap(&new); if (r) return r; } Subbing "@free <= @new" and "@dont <= @old" in kvm_free_memslot() static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, struct kvm_memory_slot *dont) { if (!dont || free->dirty_bitmap != dont->dirty_bitmap) kvm_destroy_dirty_bitmap(free); yeids this, since @old is obviously non-NULL if (new.dirty_bitmap != old.dirty_bitmap) kvm_destroy_dirty_bitmap(&new); The dirty_bitmap allocation logic guarantees that new.dirty_bitmap is a) NULL (the "if" case") b) != old.dirty_bitmap iff old.dirty_bitmap == NULL (the "else if" case) c) == old.dirty_bitmap (the implicit "else" case). kvm_free_memslot() frees @new.dirty_bitmap iff its != @old.dirty_bitmap, thus the explicit destroy only needs to check for (b). ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn
On Tue, Dec 17, 2019 at 12:40:33PM -0800, Sean Christopherson wrote: > Explicitly free the metadata arrays (stored in slot->arch) in the old > memslot structure when moving the memslot's base gfn is committed. This > eliminates x86's dependency on kvm_free_memslot() being called when a > memlsot move is committed, and paves the way for removing the funky code > in kvm_free_memslot() that conditionally frees structures based on its > @dont param. > > Signed-off-by: Sean Christopherson Reviewed-by: Peter Xu -- Peter Xu ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
On Tue, Dec 17, 2019 at 02:20:59PM -0800, Sean Christopherson wrote: > > For example, I see PPC has this: > > > > struct kvm_arch_memory_slot { > > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > > unsigned long *rmap; > > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > > }; > > > > I started to look into HV code of it a bit, then I see... > > > > - kvm_arch_create_memslot(kvmppc_core_create_memslot_hv) init > > slot->arch.rmap, > > - kvm_arch_flush_shadow_memslot(kvmppc_core_flush_memslot_hv) didn't free > > it, > > - kvm_arch_prepare_memory_region(kvmppc_core_prepare_memory_region_hv) is > > nop. > > > > So Does it have similar issue? > > No, KVM doesn't allow a memslot's size to be changed, and PPC's rmap > allocation is directly tied to the size of the memslot. The x86 bug exists > because the size of its metadata arrays varies based on the alignment of > the base gfn. Yes, I was actually thinking those rmap would be invalid rather than the size after the move. But I think kvm_arch_flush_shadow_memslot() will flush all of them anyways... So yes it seems fine. Thanks, -- Peter Xu ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap
On Tue, Dec 17, 2019 at 12:40:29PM -0800, Sean Christopherson wrote: > Explicitly free an allocated-but-unused dirty bitmap instead of relying > on kvm_free_memslot() if an error occurs in __kvm_set_memory_region(). > There is no longer a need to abuse kvm_free_memslot() to free arch > specific resources as arch specific code is now called only after the > common flow is guaranteed to succeed. Arch code can still fail, but > it's responsible for its own cleanup in that case. > > Eliminating the error path's abuse of kvm_free_memslot() paves the way > for simplifying kvm_free_memslot(), i.e. dropping its @dont param. > > Signed-off-by: Sean Christopherson > --- > virt/kvm/kvm_main.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index d403e93e3028..6b2261a9e139 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -1096,7 +1096,7 @@ int __kvm_set_memory_region(struct kvm *kvm, > > slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); > if (!slots) > - goto out_free; > + goto out_bitmap; > memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); > > if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) { > @@ -1144,8 +1144,9 @@ int __kvm_set_memory_region(struct kvm *kvm, > if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) > slots = install_new_memslots(kvm, as_id, slots); > kvfree(slots); > -out_free: > - kvm_free_memslot(kvm, &new, &old); > +out_bitmap: > + if (new.dirty_bitmap && !old.dirty_bitmap) > + kvm_destroy_dirty_bitmap(&new); What if both the old and new have KVM_MEM_LOG_DIRTY_PAGES set? kvm_free_memslot() did cover that but I see that you explicitly dropped it. Could I ask why? Thanks, -- Peter Xu ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
On Tue, Dec 17, 2019 at 04:56:40PM -0500, Peter Xu wrote: > On Tue, Dec 17, 2019 at 12:40:23PM -0800, Sean Christopherson wrote: > > Reallocate a rmap array and recalcuate large page compatibility when > > moving an existing memslot to correctly handle the alignment properties > > of the new memslot. The number of rmap entries required at each level > > is dependent on the alignment of the memslot's base gfn with respect to > > that level, e.g. moving a large-page aligned memslot so that it becomes > > unaligned will increase the number of rmap entries needed at the now > > unaligned level. ... > I think the error-prone part is: > > new = old = *slot; Lol, IMO the error-prone part is the entire memslot mess :-) > Where IMHO it would be better if we only copy pointers explicitly when > under control, rather than blindly copying all the pointers in the > structure which even contains sub-structures. Long term, yes, that would be ideal. For the immediate bug fix, reworking common KVM and other arch code would be unnecessarily dangerous and would make it more difficult to backport the fix to stable branches. I actually briefly considered moving the slot->arch handling into arch code as part of the bug fix, but the memslot code has many subtle dependencies, e.g. PPC and x86 rely on common KVM code to copy slot->arch when flags are being changed. I'll happily clean up the slot->arch code once this series is merged. There is refactoring in this series that will make it a lot easier to do additional clean up. > For example, I see PPC has this: > > struct kvm_arch_memory_slot { > #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE > unsigned long *rmap; > #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ > }; > > I started to look into HV code of it a bit, then I see... > > - kvm_arch_create_memslot(kvmppc_core_create_memslot_hv) init > slot->arch.rmap, > - kvm_arch_flush_shadow_memslot(kvmppc_core_flush_memslot_hv) didn't free it, > - kvm_arch_prepare_memory_region(kvmppc_core_prepare_memory_region_hv) is > nop. > > So Does it have similar issue? No, KVM doesn't allow a memslot's size to be changed, and PPC's rmap allocation is directly tied to the size of the memslot. The x86 bug exists because the size of its metadata arrays varies based on the alignment of the base gfn. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region()
On Tue, Dec 17, 2019 at 12:40:27PM -0800, Sean Christopherson wrote: > Allocate the various metadata structures associated with a new memslot > during kvm_arch_prepare_memory_region(), which paves the way for > removing kvm_arch_create_memslot() altogether. Moving x86's memory > allocation only changes the order of kernel memory allocations between > x86 and common KVM code. > > No functional change intended. (I still think it's a functional change, though...) > > Signed-off-by: Sean Christopherson Reviewed-by: Peter Xu -- Peter Xu ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
On Tue, Dec 17, 2019 at 12:40:23PM -0800, Sean Christopherson wrote: > Reallocate a rmap array and recalcuate large page compatibility when > moving an existing memslot to correctly handle the alignment properties > of the new memslot. The number of rmap entries required at each level > is dependent on the alignment of the memslot's base gfn with respect to > that level, e.g. moving a large-page aligned memslot so that it becomes > unaligned will increase the number of rmap entries needed at the now > unaligned level. > > Not updating the rmap array is the most obvious bug, as KVM accesses > garbage data beyond the end of the rmap. KVM interprets the bad data as > pointers, leading to non-canonical #GPs, unexpected #PFs, etc... > > general protection fault: [#1] SMP > CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:rmap_get_first+0x37/0x50 [kvm] > Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 > RSP: 0018:c921bbc8 EFLAGS: 00010246 > RAX: 00617461642e RBX: 00617461642e RCX: 0012 > RDX: 88827400f568 RSI: c921bbe0 RDI: 88827400f570 > RBP: 0010 R08: c921bd00 R09: c921bda8 > R10: c921bc48 R11: R12: 0030 > R13: R14: 88827427d700 R15: c921bce8 > FS: 7f7eda014700() GS:888277a0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f7ed9216ff8 CR3: 000274391003 CR4: 00162eb0 > Call Trace: >kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] >__kvm_set_memory_region.part.64+0x559/0x960 [kvm] >kvm_set_memory_region+0x45/0x60 [kvm] >kvm_vm_ioctl+0x30f/0x920 [kvm] >do_vfs_ioctl+0xa1/0x620 >ksys_ioctl+0x66/0x70 >__x64_sys_ioctl+0x16/0x20 >do_syscall_64+0x4c/0x170 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7f7ed9911f47 > Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 > RSP: 002b:7ffc00937498 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: 01ab0010 RCX: 7f7ed9911f47 > RDX: 01ab1350 RSI: 4020ae46 RDI: 0004 > RBP: 000a R08: R09: 7f7ed9214700 > R10: 7f7ed92149d0 R11: 0246 R12: b000 > R13: 0003 R14: 7f7ed9215000 R15: > Modules linked in: kvm_intel kvm irqbypass > ---[ end trace 0c5f570b3358ca89 ]--- > > The disallow_lpage tracking is more subtle. Failure to update results > in KVM creating large pages when it shouldn't, either due to stale data > or again due to indexing beyond the end of the metadata arrays, which > can lead to memory corruption and/or leaking data to guest/userspace. > > Note, the arrays for the old memslot are freed by the unconditional call > to kvm_free_memslot() in __kvm_set_memory_region(). > > Fixes: 05da45583de9b ("KVM: MMU: large page support") > Cc: sta...@vger.kernel.org > Signed-off-by: Sean Christopherson Reviewed-by: Peter Xu I think the error-prone part is: new = old = *slot; Where IMHO it would be better if we only copy pointers explicitly when under control, rather than blindly copying all the pointers in the structure which even contains sub-structures. For example, I see PPC has this: struct kvm_arch_memory_slot { #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE unsigned long *rmap; #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ }; I started to look into HV code of it a bit, then I see... - kvm_arch_create_memslot(kvmppc_core_create_memslot_hv) init slot->arch.rmap, - kvm_arch_flush_shadow_memslot(kvmppc_core_flush_memslot_hv) didn't free it, - kvm_arch_prepare_memory_region(kvmppc_core_prepare_memory_region_hv) is nop. So Does it have similar issue? -- Peter Xu ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
Dropping non-x86 folks... This should be included in 5.5 if possible even though the bug has existed for over a decade. It's trivially easy for a malicious userspace to crash KVM and hang the host. Depending how userspace VMM behavior, it may even be possible to trigger from a guest. On Tue, Dec 17, 2019 at 12:40:23PM -0800, Sean Christopherson wrote: > Reallocate a rmap array and recalcuate large page compatibility when > moving an existing memslot to correctly handle the alignment properties > of the new memslot. The number of rmap entries required at each level > is dependent on the alignment of the memslot's base gfn with respect to > that level, e.g. moving a large-page aligned memslot so that it becomes > unaligned will increase the number of rmap entries needed at the now > unaligned level. > > Not updating the rmap array is the most obvious bug, as KVM accesses > garbage data beyond the end of the rmap. KVM interprets the bad data as > pointers, leading to non-canonical #GPs, unexpected #PFs, etc... > > general protection fault: [#1] SMP > CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:rmap_get_first+0x37/0x50 [kvm] > Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 > RSP: 0018:c921bbc8 EFLAGS: 00010246 > RAX: 00617461642e RBX: 00617461642e RCX: 0012 > RDX: 88827400f568 RSI: c921bbe0 RDI: 88827400f570 > RBP: 0010 R08: c921bd00 R09: c921bda8 > R10: c921bc48 R11: R12: 0030 > R13: R14: 88827427d700 R15: c921bce8 > FS: 7f7eda014700() GS:888277a0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 7f7ed9216ff8 CR3: 000274391003 CR4: 00162eb0 > Call Trace: >kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] >__kvm_set_memory_region.part.64+0x559/0x960 [kvm] >kvm_set_memory_region+0x45/0x60 [kvm] >kvm_vm_ioctl+0x30f/0x920 [kvm] >do_vfs_ioctl+0xa1/0x620 >ksys_ioctl+0x66/0x70 >__x64_sys_ioctl+0x16/0x20 >do_syscall_64+0x4c/0x170 >entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x7f7ed9911f47 > Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 > RSP: 002b:7ffc00937498 EFLAGS: 0246 ORIG_RAX: 0010 > RAX: ffda RBX: 01ab0010 RCX: 7f7ed9911f47 > RDX: 01ab1350 RSI: 4020ae46 RDI: 0004 > RBP: 000a R08: R09: 7f7ed9214700 > R10: 7f7ed92149d0 R11: 0246 R12: b000 > R13: 0003 R14: 7f7ed9215000 R15: > Modules linked in: kvm_intel kvm irqbypass > ---[ end trace 0c5f570b3358ca89 ]--- > > The disallow_lpage tracking is more subtle. Failure to update results > in KVM creating large pages when it shouldn't, either due to stale data > or again due to indexing beyond the end of the metadata arrays, which > can lead to memory corruption and/or leaking data to guest/userspace. > > Note, the arrays for the old memslot are freed by the unconditional call > to kvm_free_memslot() in __kvm_set_memory_region(). > > Fixes: 05da45583de9b ("KVM: MMU: large page support") > Cc: sta...@vger.kernel.org > Signed-off-by: Sean Christopherson > --- > arch/x86/kvm/x86.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 8bb2fb1705ff..04d1bf89da0e 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -9703,6 +9703,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct > kvm_memory_slot *slot, > { > int i; > > + /* > + * Clear out the previous array pointers for the KVM_MR_MOVE case. The > + * old arrays will be freed by __kvm_set_memory_region() if installing > + * the new memslot is successful. > + */ > + memset(&slot->arch, 0, sizeof(slot->arch)); > + > for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) { > struct kvm_lpage_info *linfo; > unsigned long ugfn; > @@ -9777,6 +9784,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, > const struct kvm_userspace_memory_region *mem, > enum kvm_mr_change change) > { > + if (change == KVM_MR_MOVE) > + return kvm_arch_create_memslot(kvm, memslot, > +mem->memory_size >> PAGE_SHIFT); > + > return 0; > } > > -- > 2.24.1 > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 15/19] KVM: Provide common implementation for generic dirty log functions
Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code. The arch specific implemenations are extremely similar, differing only in whether the dirty log needs to be sync'd from hardware (x86) and how the TLBs are flushed. Add new arch hooks to handle sync and TLB flush; the sync will also be used for non-generic dirty log support in a future patch (s390). The ulterior motive for providing a common implementation is to eliminate the dependency between arch and common code with respect to the memslot referenced by the dirty log, i.e. to make it obvious in the code that the validity of the memslot is guaranteed, as a future patch will rework memslot handling such that id_to_memslot() can return NULL. Acked-by: Christoffer Dall Tested-by: Christoffer Dall Signed-off-by: Sean Christopherson --- arch/mips/kvm/mips.c | 63 +++-- arch/powerpc/kvm/book3s.c | 5 +++ arch/powerpc/kvm/booke.c | 5 +++ arch/s390/kvm/kvm-s390.c | 5 +-- arch/x86/kvm/x86.c| 61 ++-- include/linux/kvm_host.h | 21 +- virt/kvm/arm/arm.c| 48 ++ virt/kvm/kvm_main.c | 84 --- 8 files changed, 103 insertions(+), 189 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 108ed14cbcac..879b1e29f106 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -965,69 +965,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, return r; } -/** - * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot - * @kvm: kvm instance - * @log: slot id and address to which we copy the log - * - * Steps 1-4 below provide general overview of dirty page logging. See - * kvm_get_dirty_log_protect() function description for additional details. - * - * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we - * always flush the TLB (step 4) even if previous step failed and the dirty - * bitmap may be corrupt. Regardless of previous outcome the KVM logging API - * does not preclude user space subsequent dirty log read. Flushing TLB ensures - * writes will be marked dirty for next log read. - * - * 1. Take a snapshot of the bit and clear it if needed. - * 2. Write protect the corresponding page. - * 3. Copy the snapshot to the userspace. - * 4. Flush TLB's if needed. - */ -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) +void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) { - struct kvm_memslots *slots; - struct kvm_memory_slot *memslot; - bool flush = false; - int r; - mutex_lock(&kvm->slots_lock); - - r = kvm_get_dirty_log_protect(kvm, log, &flush); - - if (flush) { - slots = kvm_memslots(kvm); - memslot = id_to_memslot(slots, log->slot); - - /* Let implementation handle TLB/GVA invalidation */ - kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot); - } - - mutex_unlock(&kvm->slots_lock); - return r; } -int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log) +void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, + struct kvm_memory_slot *memslot) { - struct kvm_memslots *slots; - struct kvm_memory_slot *memslot; - bool flush = false; - int r; - - mutex_lock(&kvm->slots_lock); - - r = kvm_clear_dirty_log_protect(kvm, log, &flush); - - if (flush) { - slots = kvm_memslots(kvm); - memslot = id_to_memslot(slots, log->slot); - - /* Let implementation handle TLB/GVA invalidation */ - kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot); - } - - mutex_unlock(&kvm->slots_lock); - return r; + /* Let implementation handle TLB/GVA invalidation */ + kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot); } long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index b1c9b4d11b2a..b117ca317c0d 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -804,6 +804,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu) return vcpu->kvm->arch.kvm_ops->check_requests(vcpu); } +void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) +{ + +} + int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) { return kvm->arch.kvm_ops->get_dirty_log(kvm, log); diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index a22ff567724a..35a4ef89a1db 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1796,6 +1796,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, return r; } +void k
[PATCH v4 19/19] KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION
Add a KVM selftest to test moving the base gfn of a userspace memory region. The test is primarily targeted at x86 to verify its memslot metadata is correctly updated, but also provides basic functionality coverage on other architectures. Signed-off-by: Sean Christopherson --- tools/testing/selftests/kvm/.gitignore| 1 + tools/testing/selftests/kvm/Makefile | 3 + .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c| 30 .../selftests/kvm/set_memory_region_test.c| 142 ++ 5 files changed, 177 insertions(+) create mode 100644 tools/testing/selftests/kvm/set_memory_region_test.c diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 30072c3f52fb..6f60ceb81440 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -17,3 +17,4 @@ /clear_dirty_log_test /dirty_log_test /kvm_create_max_vcpus +/set_memory_region_test diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 3138a916574a..01c79e02c5b7 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -29,15 +29,18 @@ TEST_GEN_PROGS_x86_64 += x86_64/xss_msr_test TEST_GEN_PROGS_x86_64 += clear_dirty_log_test TEST_GEN_PROGS_x86_64 += dirty_log_test TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus +TEST_GEN_PROGS_x86_64 += set_memory_region_test TEST_GEN_PROGS_aarch64 += clear_dirty_log_test TEST_GEN_PROGS_aarch64 += dirty_log_test TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus +TEST_GEN_PROGS_aarch64 += set_memory_region_test TEST_GEN_PROGS_s390x = s390x/memop TEST_GEN_PROGS_s390x += s390x/sync_regs_test TEST_GEN_PROGS_s390x += dirty_log_test TEST_GEN_PROGS_s390x += kvm_create_max_vcpus +TEST_GEN_PROGS_s390x += set_memory_region_test TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M)) LIBKVM += $(LIBKVM_$(UNAME_M)) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h index 29cccaf96baf..15d3b8690ffb 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -100,6 +100,7 @@ int _vcpu_ioctl(struct kvm_vm *vm, uint32_t vcpuid, unsigned long ioctl, void *arg); void vm_ioctl(struct kvm_vm *vm, unsigned long ioctl, void *arg); void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags); +void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa); void vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpuid); vm_vaddr_t vm_vaddr_alloc(struct kvm_vm *vm, size_t sz, vm_vaddr_t vaddr_min, uint32_t data_memslot, uint32_t pgd_memslot); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 41cf45416060..464a75ce9843 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -756,6 +756,36 @@ void vm_mem_region_set_flags(struct kvm_vm *vm, uint32_t slot, uint32_t flags) ret, errno, slot, flags); } +/* + * VM Memory Region Move + * + * Input Args: + * vm - Virtual Machine + * slot - Slot of the memory region to move + * flags - Starting guest physical address + * + * Output Args: None + * + * Return: None + * + * Change the gpa of a memory region. + */ +void vm_mem_region_move(struct kvm_vm *vm, uint32_t slot, uint64_t new_gpa) +{ + struct userspace_mem_region *region; + int ret; + + region = memslot2region(vm, slot); + + region->region.guest_phys_addr = new_gpa; + + ret = ioctl(vm->fd, KVM_SET_USER_MEMORY_REGION, ®ion->region); + + TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION failed\n" + "ret: %i errno: %i slot: %u flags: 0x%x", + ret, errno, slot, new_gpa); +} + /* * VCPU mmap Size * diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c new file mode 100644 index ..c9603b95ccf7 --- /dev/null +++ b/tools/testing/selftests/kvm/set_memory_region_test.c @@ -0,0 +1,142 @@ +// SPDX-License-Identifier: GPL-2.0 +#define _GNU_SOURCE /* for program_invocation_short_name */ +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include +#include +#include + +#define VCPU_ID 0 + +/* + * Somewhat arbitrary location and slot, intended to not overlap anything. The + * location and size are specifically 2mb sized/aligned so that the initial + * region corresponds to exactly one large page (on x86 and arm64). + */ +#define MEM_REGION_GPA 0xc000 +#define MEM_REGION_SIZE0x20 +#define MEM_REGION_SLOT10 + +static void guest_code(void) +{ + uint64_t val; + + do { + val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA)); + } whil
[PATCH v4 09/19] KVM: Move setting of memslot into helper routine
Split out the core functionality of setting a memslot into a separate helper in preparation for moving memslot deletion into its own routine. Tested-by: Christoffer Dall Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 106 ++-- 1 file changed, 63 insertions(+), 43 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9c488c653257..3663ac229c4d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -986,6 +986,66 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, return old_memslots; } +static int kvm_set_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + const struct kvm_memory_slot *old, + struct kvm_memory_slot *new, int as_id, + enum kvm_mr_change change) +{ + struct kvm_memory_slot *slot; + struct kvm_memslots *slots; + int r; + + slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); + if (!slots) + return -ENOMEM; + memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); + + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { + /* +* Note, the INVALID flag needs to be in the appropriate entry +* in the freshly allocated memslots, not in @old or @new. +*/ + slot = id_to_memslot(slots, old->id); + slot->flags |= KVM_MEMSLOT_INVALID; + + /* +* We can re-use the old memslots, the only difference from the +* newly installed memslots is the invalid flag, which will get +* dropped by update_memslots anyway. We'll also revert to the +* old memslots if preparing the new memory region fails. +*/ + slots = install_new_memslots(kvm, as_id, slots); + + /* From this point no new shadow pages pointing to a deleted, +* or moved, memslot will be created. +* +* validation of sp->gfn happens in: +* - gfn_to_hva (kvm_read_guest, gfn_to_pfn) +* - kvm_is_visible_gfn (mmu_check_roots) +*/ + kvm_arch_flush_shadow_memslot(kvm, slot); + } + + r = kvm_arch_prepare_memory_region(kvm, new, mem, change); + if (r) + goto out_slots; + + update_memslots(slots, new, change); + slots = install_new_memslots(kvm, as_id, slots); + + kvm_arch_commit_memory_region(kvm, mem, old, new, change); + + kvfree(slots); + return 0; + +out_slots: + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) + slots = install_new_memslots(kvm, as_id, slots); + kvfree(slots); + return r; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -1002,7 +1062,6 @@ int __kvm_set_memory_region(struct kvm *kvm, unsigned long npages; struct kvm_memory_slot *slot; struct kvm_memory_slot old, new; - struct kvm_memslots *slots; int as_id, id; enum kvm_mr_change change; @@ -1089,58 +1148,19 @@ int __kvm_set_memory_region(struct kvm *kvm, return r; } - slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); - if (!slots) { - r = -ENOMEM; - goto out_bitmap; - } - memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); - - if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) { - slot = id_to_memslot(slots, id); - slot->flags |= KVM_MEMSLOT_INVALID; - - /* -* We can re-use the old memslots, the only difference from the -* newly installed memslots is the invalid flag, which will get -* dropped by update_memslots anyway. We'll also revert to the -* old memslots if preparing the new memory region fails. -*/ - slots = install_new_memslots(kvm, as_id, slots); - - /* From this point no new shadow pages pointing to a deleted, -* or moved, memslot will be created. -* -* validation of sp->gfn happens in: -* - gfn_to_hva (kvm_read_guest, gfn_to_pfn) -* - kvm_is_visible_gfn (mmu_check_roots) -*/ - kvm_arch_flush_shadow_memslot(kvm, slot); - } - - r = kvm_arch_prepare_memory_region(kvm, &new, mem, change); - if (r) - goto out_slots; - /* actual memory is freed via old in kvm_free_memslot below */ if (change == KVM_MR_DELETE) { new.dirty_bitmap = NULL;
[PATCH v4 17/19] KVM: Terminate memslot walks via used_slots
Refactor memslot handling to treat the number of used slots as the de facto size of the memslot array, e.g. return NULL from id_to_memslot() when an invalid index is provided instead of relying on npages==0 to detect an invalid memslot. Rework the sorting and walking of memslots in advance of dynamically sizing memslots to aid bisection and debug, e.g. with luck, a bug in the refactoring will bisect here and/or hit a WARN instead of randomly corrupting memory. Alternatively, a global null/invalid memslot could be returned, i.e. so callers of id_to_memslot() don't have to explicitly check for a NULL memslot, but that approach runs the risk of introducing difficult-to- debug issues, e.g. if the global null slot is modified. Constifying the return from id_to_memslot() to combat such issues is possible, but would require a massive refactoring of arch specific code and would still be susceptible to casting shenanigans. Add function comments to update_memslots() and search_memslots() to explicitly (and loudly) state how memslots are sorted. No functional change intended. Tested-by: Christoffer Dall Tested-by: Marc Zyngier Signed-off-by: Sean Christopherson --- arch/powerpc/kvm/book3s_hv.c | 2 +- arch/x86/kvm/x86.c | 14 +-- include/linux/kvm_host.h | 18 ++- virt/kvm/arm/mmu.c | 9 +- virt/kvm/kvm_main.c | 220 ++- 5 files changed, 189 insertions(+), 74 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 04d5b7cf874f..c15cabb58e09 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4410,7 +4410,7 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm, slots = kvm_memslots(kvm); memslot = id_to_memslot(slots, log->slot); r = -ENOENT; - if (!memslot->dirty_bitmap) + if (!memslot || !memslot->dirty_bitmap) goto out; /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 87fca25d5217..39ba1de85575 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9532,9 +9532,9 @@ void kvm_arch_sync_events(struct kvm *kvm) int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) { int i, r; - unsigned long hva; + unsigned long hva, uninitialized_var(old_npages); struct kvm_memslots *slots = kvm_memslots(kvm); - struct kvm_memory_slot *slot, old; + struct kvm_memory_slot *slot; /* Called with kvm->slots_lock held. */ if (WARN_ON(id >= KVM_MEM_SLOTS_NUM)) @@ -9542,7 +9542,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) slot = id_to_memslot(slots, id); if (size) { - if (slot->npages) + if (slot && slot->npages) return -EEXIST; /* @@ -9554,13 +9554,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) if (IS_ERR((void *)hva)) return PTR_ERR((void *)hva); } else { - if (!slot->npages) + if (!slot || !slot->npages) return 0; - hva = 0; + hva = slot->userspace_addr; + old_npages = slot->npages; } - old = *slot; for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { struct kvm_userspace_memory_region m; @@ -9575,7 +9575,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size) } if (!size) - vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE); + vm_munmap(hva, old_npages * PAGE_SIZE); return 0; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 7d666eedd203..49b6b457a157 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -574,10 +574,11 @@ static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu) return vcpu->vcpu_idx; } -#define kvm_for_each_memslot(memslot, slots) \ - for (memslot = &slots->memslots[0]; \ - memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\ - memslot++) +#define kvm_for_each_memslot(memslot, slots) \ + for (memslot = &slots->memslots[0]; \ +memslot < slots->memslots + slots->used_slots; memslot++) \ + if (WARN_ON_ONCE(!memslot->npages)) { \ + } else int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id); void kvm_vcpu_uninit(struct kvm_vcpu *vcpu); @@ -638,12 +639,15 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu) return __kvm_memslots(vcpu->kvm, as_id); } -static inline struct kvm_memory_slot * -id_to_memslot(struct kvm_memslots *slots, int id) +static inline +struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int i
[PATCH v4 10/19] KVM: Drop "const" attribute from old memslot in commit_memory_region()
Drop the "const" attribute from @old in kvm_arch_commit_memory_region() to allow arch specific code to free arch specific resources in the old memslot without having to cast away the attribute. Freeing resources in kvm_arch_commit_memory_region() paves the way for simplifying kvm_free_memslot() by eliminating the last usage of its @dont param. Signed-off-by: Sean Christopherson --- arch/mips/kvm/mips.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- arch/s390/kvm/kvm-s390.c | 2 +- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 2 +- virt/kvm/arm/mmu.c | 2 +- virt/kvm/kvm_main.c| 2 +- 7 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 713e5465edb0..108ed14cbcac 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -224,7 +224,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index c922711a6dd8..6fd61b9dd783 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -701,7 +701,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 1be45bad7849..a5b917b72ca0 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4516,7 +4516,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4892ded361b3..0911b2f634c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9842,7 +9842,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm, void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 46dd713da634..7d86dbb467f7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -681,7 +681,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, enum kvm_mr_change change); void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change); bool kvm_largepages_enabled(void); diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index f264de85f648..4941746929ab 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -2246,7 +2246,7 @@ int kvm_mmu_init(void) void kvm_arch_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old, const struct kvm_memory_slot *new, enum kvm_mr_change change) { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3663ac229c4d..acf52fa16500 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -988,7 +988,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, static int kvm_set_memslot(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, - const struct kvm_memory_slot *old, + struct kvm_memory_slot *old,
[PATCH v4 13/19] KVM: Simplify kvm_free_memslot() and all its descendents
Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall Signed-off-by: Sean Christopherson --- arch/mips/include/asm/kvm_host.h | 2 +- arch/powerpc/include/asm/kvm_ppc.h| 6 ++ arch/powerpc/kvm/book3s.c | 5 ++--- arch/powerpc/kvm/book3s_hv.c | 9 +++-- arch/powerpc/kvm/book3s_pr.c | 3 +-- arch/powerpc/kvm/booke.c | 3 +-- arch/powerpc/kvm/powerpc.c| 5 ++--- arch/s390/include/asm/kvm_host.h | 2 +- arch/x86/include/asm/kvm_page_track.h | 3 +-- arch/x86/kvm/mmu/page_track.c | 15 ++- arch/x86/kvm/x86.c| 21 - include/linux/kvm_host.h | 3 +-- virt/kvm/arm/mmu.c| 3 +-- virt/kvm/kvm_main.c | 18 +++--- 14 files changed, 37 insertions(+), 61 deletions(-) diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 41204a49cf95..2c343c346b79 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -1133,7 +1133,7 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm); static inline void kvm_arch_hardware_unsetup(void) {} static inline void kvm_arch_sync_events(struct kvm *kvm) {} static inline void kvm_arch_free_memslot(struct kvm *kvm, - struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {} +struct kvm_memory_slot *slot) {} static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {} static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {} diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 4df042355356..033501d65340 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -201,8 +201,7 @@ extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages); extern int kvmppc_core_init_vm(struct kvm *kvm); extern void kvmppc_core_destroy_vm(struct kvm *kvm); extern void kvmppc_core_free_memslot(struct kvm *kvm, -struct kvm_memory_slot *free, -struct kvm_memory_slot *dont); +struct kvm_memory_slot *slot); extern int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, const struct kvm_userspace_memory_region *mem, @@ -292,8 +291,7 @@ struct kvmppc_ops { int (*test_age_hva)(struct kvm *kvm, unsigned long hva); void (*set_spte_hva)(struct kvm *kvm, unsigned long hva, pte_t pte); void (*mmu_destroy)(struct kvm_vcpu *vcpu); - void (*free_memslot)(struct kvm_memory_slot *free, -struct kvm_memory_slot *dont); + void (*free_memslot)(struct kvm_memory_slot *slot); int (*init_vm)(struct kvm *kvm); void (*destroy_vm)(struct kvm *kvm); int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 2480150646ae..b1c9b4d11b2a 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -809,10 +809,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) return kvm->arch.kvm_ops->get_dirty_log(kvm, log); } -void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, - struct kvm_memory_slot *dont) +void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot) { - kvm->arch.kvm_ops->free_memslot(free, dont); + kvm->arch.kvm_ops->free_memslot(slot); } void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index b265d7511f3f..04d5b7cf874f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4457,13 +4457,10 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm, return r; } -static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free, - struct kvm_memory_slot *dont) +static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *slot) { - if (!dont || free->arch.rmap != dont->arch.rmap) { - vfree(free->arch.rmap); - free->arch.rmap = NULL; - } + vfree(slot->arch.rmap); + slot->arch.rmap = NULL; } static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm, diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index fc0d9432013f..d2ee00bc7077 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/b
[PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot
Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:c921bbc8 EFLAGS: 00010246 RAX: 00617461642e RBX: 00617461642e RCX: 0012 RDX: 88827400f568 RSI: c921bbe0 RDI: 88827400f570 RBP: 0010 R08: c921bd00 R09: c921bda8 R10: c921bc48 R11: R12: 0030 R13: R14: 88827427d700 R15: c921bce8 FS: 7f7eda014700() GS:888277a0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f7ed9216ff8 CR3: 000274391003 CR4: 00162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:7ffc00937498 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 01ab0010 RCX: 7f7ed9911f47 RDX: 01ab1350 RSI: 4020ae46 RDI: 0004 RBP: 000a R08: R09: 7f7ed9214700 R10: 7f7ed92149d0 R11: 0246 R12: b000 R13: 0003 R14: 7f7ed9215000 R15: Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: 05da45583de9b ("KVM: MMU: large page support") Cc: sta...@vger.kernel.org Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8bb2fb1705ff..04d1bf89da0e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9703,6 +9703,13 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, { int i; + /* +* Clear out the previous array pointers for the KVM_MR_MOVE case. The +* old arrays will be freed by __kvm_set_memory_region() if installing +* the new memslot is successful. +*/ + memset(&slot->arch, 0, sizeof(slot->arch)); + for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) { struct kvm_lpage_info *linfo; unsigned long ugfn; @@ -9777,6 +9784,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, enum kvm_mr_change change) { + if (change == KVM_MR_MOVE) + return kvm_arch_create_memslot(kvm, memslot, + mem->memory_size >> PAGE_SHIFT); + return 0; } -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 16/19] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()
Rework kvm_get_dirty_log() so that it "returns" the associated memslot on success. A future patch will rework memslot handling such that id_to_memslot() can return NULL, returning the memslot makes it more obvious that the validity of the memslot has been verified, i.e. precludes the need to add validity checks in the arch code that are technically unnecessary. Signed-off-by: Sean Christopherson --- arch/powerpc/kvm/book3s_pr.c | 6 +- arch/s390/kvm/kvm-s390.c | 12 ++-- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 27 +++ 4 files changed, 23 insertions(+), 24 deletions(-) diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c index d2ee00bc7077..485ca134a949 100644 --- a/arch/powerpc/kvm/book3s_pr.c +++ b/arch/powerpc/kvm/book3s_pr.c @@ -1897,7 +1897,6 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm, struct kvm_dirty_log *log) { - struct kvm_memslots *slots; struct kvm_memory_slot *memslot; struct kvm_vcpu *vcpu; ulong ga, ga_end; @@ -1907,15 +1906,12 @@ static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm, mutex_lock(&kvm->slots_lock); - r = kvm_get_dirty_log(kvm, log, &is_dirty); + r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot); if (r) goto out; /* If nothing is dirty, don't bother messing with page tables. */ if (is_dirty) { - slots = kvm_memslots(kvm); - memslot = id_to_memslot(slots, log->slot); - ga = memslot->base_gfn << PAGE_SHIFT; ga_end = ga + (memslot->npages << PAGE_SHIFT); diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 9e38973fd2cc..b0f5a3b7cb01 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -610,9 +610,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, { int r; unsigned long n; - struct kvm_memslots *slots; struct kvm_memory_slot *memslot; - int is_dirty = 0; + int is_dirty; if (kvm_is_ucontrol(kvm)) return -EINVAL; @@ -623,14 +622,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, if (log->slot >= KVM_USER_MEM_SLOTS) goto out; - slots = kvm_memslots(kvm); - memslot = id_to_memslot(slots, log->slot); - r = -ENOENT; - if (!memslot->dirty_bitmap) - goto out; - - kvm_arch_sync_dirty_log(kvm, memslot); - r = kvm_get_dirty_log(kvm, log, &is_dirty); + r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot); if (r) goto out; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index eb54d196c0cb..7d666eedd203 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -828,7 +828,7 @@ void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, #else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log); int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, - int *is_dirty); + int *is_dirty, struct kvm_memory_slot **memslot); #endif int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4a4b5339f229..999a2a0c83f5 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1202,31 +1202,42 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm, } #ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT -int kvm_get_dirty_log(struct kvm *kvm, - struct kvm_dirty_log *log, int *is_dirty) +/** + * kvm_get_dirty_log - get a snapshot of dirty pages + * @kvm: pointer to kvm instance + * @log: slot id and address to which we copy the log + * @is_dirty: set to '1' if any dirty pages were found + * @memslot: set to the associated memslot, always valid on success + */ +int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log, + int *is_dirty, struct kvm_memory_slot **memslot) { struct kvm_memslots *slots; - struct kvm_memory_slot *memslot; int i, as_id, id; unsigned long n; unsigned long any = 0; + *memslot = NULL; + *is_dirty = 0; + as_id = log->slot >> 16; id = (u16)log->slot; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) return -EINVAL; slots = __kvm_memslots(kvm, as_id); - memslot = id_to_memslot(slots, id); - if (!memslot->dirty_bitmap) + *memslot = id_to_memslot(slots, id); + if (!(*memslot)->dirty_bitmap) return -ENOENT; - n = kvm_dirty_bitmap_bytes(memslot); + kvm_arch_sync_dirty_log(kvm, *memslot); + + n = kvm_dirty_bitmap_bytes
[PATCH v4 03/19] KVM: Don't free new memslot if allocation of said memslot fails
The two implementations of kvm_arch_create_memslot() in x86 and PPC are both good citizens and free up all local resources if creation fails. Return immediately (via a superfluous goto) instead of calling kvm_free_memslot(). Note, the call to kvm_free_memslot() is effectively an expensive nop in this case as there are no resources to be freed. No functional change intended. Acked-by: Christoffer Dall Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f14bde936c09..7239e3b9dda0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1092,7 +1092,7 @@ int __kvm_set_memory_region(struct kvm *kvm, new.userspace_addr = mem->userspace_addr; if (kvm_arch_create_memslot(kvm, &new, npages)) - goto out_free; + goto out; } /* Allocate page dirty bitmap if needed */ -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 18/19] KVM: Dynamically size memslot array based on number of used slots
Now that the memslot logic doesn't assume memslots are always non-NULL, dynamically size the array of memslots instead of unconditionally allocating memory for the maximum number of memslots. Note, because a to-be-deleted memslot must first be invalidated, the array size cannot be immediately reduced when deleting a memslot. However, consecutive deletions will realize the memory savings, i.e. a second deletion will trim the entry. Tested-by: Christoffer Dall Tested-by: Marc Zyngier Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 31 --- 2 files changed, 29 insertions(+), 4 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 49b6b457a157..eecfa1fe0fc7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -433,11 +433,11 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu) */ struct kvm_memslots { u64 generation; - struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM]; /* The mapping table from slot id to the index in memslots[]. */ short id_to_index[KVM_MEM_SLOTS_NUM]; atomic_t lru_slot; int used_slots; + struct kvm_memory_slot memslots[]; }; struct kvm { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a1566c5cee26..57926579551b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -569,7 +569,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void) return NULL; for (i = 0; i < KVM_MEM_SLOTS_NUM; i++) - slots->id_to_index[i] = slots->memslots[i].id = -1; + slots->id_to_index[i] = -1; return slots; } @@ -1081,6 +1081,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, return old_memslots; } +/* + * Note, at a minimum, the current number of used slots must be allocated, even + * when deleting a memslot, as we need a complete duplicate of the memslots for + * use when invalidating a memslot prior to deleting/moving the memslot. + */ +static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old, +enum kvm_mr_change change) +{ + struct kvm_memslots *slots; + size_t old_size, new_size; + + old_size = sizeof(struct kvm_memslots) + + (sizeof(struct kvm_memory_slot) * old->used_slots); + + if (change == KVM_MR_CREATE) + new_size = old_size + sizeof(struct kvm_memory_slot); + else + new_size = old_size; + + slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT); + if (likely(slots)) + memcpy(slots, old, old_size); + + return slots; +} + static int kvm_set_memslot(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, struct kvm_memory_slot *old, @@ -1091,10 +1117,9 @@ static int kvm_set_memslot(struct kvm *kvm, struct kvm_memslots *slots; int r; - slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); + slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change); if (!slots) return -ENOMEM; - memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) { /* -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 02/19] KVM: Reinstall old memslots if arch preparation fails
Reinstall the old memslots if preparing the new memory region fails after invalidating a to-be-{re}moved memslot. Remove the superfluous 'old_memslots' variable so that it's somewhat clear that the error handling path needs to free the unused memslots, not simply the 'old' memslots. Fixes: bc6678a33d9b9 ("KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU update") Reviewed-by: Christoffer Dall Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3aa21bec028d..f14bde936c09 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1002,7 +1002,7 @@ int __kvm_set_memory_region(struct kvm *kvm, unsigned long npages; struct kvm_memory_slot *slot; struct kvm_memory_slot old, new; - struct kvm_memslots *slots = NULL, *old_memslots; + struct kvm_memslots *slots; int as_id, id; enum kvm_mr_change change; @@ -1110,7 +1110,13 @@ int __kvm_set_memory_region(struct kvm *kvm, slot = id_to_memslot(slots, id); slot->flags |= KVM_MEMSLOT_INVALID; - old_memslots = install_new_memslots(kvm, as_id, slots); + /* +* We can re-use the old memslots, the only difference from the +* newly installed memslots is the invalid flag, which will get +* dropped by update_memslots anyway. We'll also revert to the +* old memslots if preparing the new memory region fails. +*/ + slots = install_new_memslots(kvm, as_id, slots); /* From this point no new shadow pages pointing to a deleted, * or moved, memslot will be created. @@ -1120,13 +1126,6 @@ int __kvm_set_memory_region(struct kvm *kvm, * - kvm_is_visible_gfn (mmu_check_roots) */ kvm_arch_flush_shadow_memslot(kvm, slot); - - /* -* We can re-use the old_memslots from above, the only difference -* from the currently installed memslots is the invalid flag. This -* will get overwritten by update_memslots anyway. -*/ - slots = old_memslots; } r = kvm_arch_prepare_memory_region(kvm, &new, mem, change); @@ -1140,15 +1139,17 @@ int __kvm_set_memory_region(struct kvm *kvm, } update_memslots(slots, &new, change); - old_memslots = install_new_memslots(kvm, as_id, slots); + slots = install_new_memslots(kvm, as_id, slots); kvm_arch_commit_memory_region(kvm, mem, &old, &new, change); kvm_free_memslot(kvm, &old, &new); - kvfree(old_memslots); + kvfree(slots); return 0; out_slots: + if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) + slots = install_new_memslots(kvm, as_id, slots); kvfree(slots); out_free: kvm_free_memslot(kvm, &new, &old); -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 08/19] KVM: Refactor error handling for setting memory region
Replace a big pile o' gotos with returns to make it more obvious what error code is being returned, and to prepare for refactoring the functional, i.e. post-checks, portion of __kvm_set_memory_region(). Reviewed-by: Janosch Frank Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 40 ++-- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6b2261a9e139..9c488c653257 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1008,34 +1008,33 @@ int __kvm_set_memory_region(struct kvm *kvm, r = check_memory_region_flags(mem); if (r) - goto out; + return r; - r = -EINVAL; as_id = mem->slot >> 16; id = (u16)mem->slot; /* General sanity checks */ if (mem->memory_size & (PAGE_SIZE - 1)) - goto out; + return -EINVAL; if (mem->guest_phys_addr & (PAGE_SIZE - 1)) - goto out; + return -EINVAL; /* We can read the guest memory with __xxx_user() later on. */ if ((id < KVM_USER_MEM_SLOTS) && ((mem->userspace_addr & (PAGE_SIZE - 1)) || !access_ok((void __user *)(unsigned long)mem->userspace_addr, mem->memory_size))) - goto out; + return -EINVAL; if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM) - goto out; + return -EINVAL; if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) - goto out; + return -EINVAL; slot = id_to_memslot(__kvm_memslots(kvm, as_id), id); base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; npages = mem->memory_size >> PAGE_SHIFT; if (npages > KVM_MEM_MAX_NR_PAGES) - goto out; + return -EINVAL; new = old = *slot; @@ -1052,20 +1051,18 @@ int __kvm_set_memory_region(struct kvm *kvm, if ((new.userspace_addr != old.userspace_addr) || (npages != old.npages) || ((new.flags ^ old.flags) & KVM_MEM_READONLY)) - goto out; + return -EINVAL; if (base_gfn != old.base_gfn) change = KVM_MR_MOVE; else if (new.flags != old.flags) change = KVM_MR_FLAGS_ONLY; - else { /* Nothing to change. */ - r = 0; - goto out; - } + else /* Nothing to change. */ + return 0; } } else { if (!old.npages) - goto out; + return -EINVAL; change = KVM_MR_DELETE; new.base_gfn = 0; @@ -1074,29 +1071,29 @@ int __kvm_set_memory_region(struct kvm *kvm, if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) { /* Check for overlaps */ - r = -EEXIST; kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) { if (slot->id == id) continue; if (!((base_gfn + npages <= slot->base_gfn) || (base_gfn >= slot->base_gfn + slot->npages))) - goto out; + return -EEXIST; } } - r = -ENOMEM; - /* Allocate/free page dirty bitmap as needed */ if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES)) new.dirty_bitmap = NULL; else if (!new.dirty_bitmap) { - if (kvm_create_dirty_bitmap(&new) < 0) - goto out; + r = kvm_create_dirty_bitmap(&new); + if (r) + return r; } slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); - if (!slots) + if (!slots) { + r = -ENOMEM; goto out_bitmap; + } memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) { @@ -1147,7 +1144,6 @@ int __kvm_set_memory_region(struct kvm *kvm, out_bitmap: if (new.dirty_bitmap && !old.dirty_bitmap) kvm_destroy_dirty_bitmap(&new); -out: return r; } EXPORT_SYMBOL_GPL(__kvm_set_memory_region); -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 00/19] KVM: Dynamically size memslot arrays
The end goal of this series is to dynamically size the memslot array so that KVM allocates memory based on the number of memslots in use, as opposed to unconditionally allocating memory for the maximum number of memslots. On x86, each memslot consumes 88 bytes, and so with 2 address spaces of 512 memslots, each VM consumes ~90k bytes for the memslots. E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces the memory footprint from 90k to ~2.6k bytes. The changes required to support dynamic sizing are relatively small, e.g. are essentially contained in patches 17/19 and 18/19. Patches 2-16 clean up the memslot code, which has gotten quite crusty, especially __kvm_set_memory_region(). The clean up is likely not strictly necessary to switch to dynamic sizing, but I didn't have a remotely reasonable level of confidence in the correctness of the dynamic sizing without first doing the clean up. The only functional change in v4 is the addition of an x86-specific bug fix in x86's handling of KVM_MR_MOVE. The bug fix is not directly related to dynamically allocating memslots, but it has subtle and hidden conflicts with the cleanup patches, and the fix is higher priority than anything else in the series, i.e. should be merged first. On non-x86 architectures, v3 and v4 should be functionally equivalent, the only non-x86 change in v4 is the dropping of a "const" in kvm_arch_commit_memory_region(). v4: - Add patch 01 to fix an x86 rmap/lpage bug, and patches 10 and 11 to resolve hidden conflicts with the bug fix. - Collect tags [Christian, Marc, Philippe]. - Rebase to kvm/queue, commit e41a90be9659 ("KVM: x86/mmu: WARN if root_hpa is invalid when handling a page fault"). v3: - Fix build errors on PPC and MIPS due to missed params during refactoring [kbuild test robot]. - Rename the helpers for update_memslots() and add comments describing the new algorithm and how it interacts with searching [Paolo]. - Remove the unnecessary and obnoxious warning regarding memslots being a flexible array [Paolo]. - Fix typos in the changelog of patch 09/15 [Christoffer]. - Collect tags [Christoffer]. v2: - Split "Drop kvm_arch_create_memslot()" into three patches to move minor functional changes to standalone patches [Janosch]. - Rebase to latest kvm/queue (f0574a1cea5b, "KVM: x86: fix ...") - Collect an Acked-by and a Reviewed-by Sean Christopherson (19): KVM: x86: Allocate new rmap and large page tracking when moving memslot KVM: Reinstall old memslots if arch preparation fails KVM: Don't free new memslot if allocation of said memslot fails KVM: PPC: Move memslot memory allocation into prepare_memory_region() KVM: x86: Allocate memslot resources during prepare_memory_region() KVM: Drop kvm_arch_create_memslot() KVM: Explicitly free allocated-but-unused dirty bitmap KVM: Refactor error handling for setting memory region KVM: Move setting of memslot into helper routine KVM: Drop "const" attribute from old memslot in commit_memory_region() KVM: x86: Free arrays for old memslot when moving memslot's base gfn KVM: Move memslot deletion to helper function KVM: Simplify kvm_free_memslot() and all its descendents KVM: Clean up local variable usage in __kvm_set_memory_region() KVM: Provide common implementation for generic dirty log functions KVM: Ensure validity of memslot with respect to kvm_get_dirty_log() KVM: Terminate memslot walks via used_slots KVM: Dynamically size memslot array based on number of used slots KVM: selftests: Add test for KVM_SET_USER_MEMORY_REGION arch/mips/include/asm/kvm_host.h | 2 +- arch/mips/kvm/mips.c | 71 +- arch/powerpc/include/asm/kvm_ppc.h| 17 +- arch/powerpc/kvm/book3s.c | 22 +- arch/powerpc/kvm/book3s_hv.c | 36 +- arch/powerpc/kvm/book3s_pr.c | 20 +- arch/powerpc/kvm/booke.c | 17 +- arch/powerpc/kvm/powerpc.c| 15 +- arch/s390/include/asm/kvm_host.h | 2 +- arch/s390/kvm/kvm-s390.c | 23 +- arch/x86/include/asm/kvm_page_track.h | 3 +- arch/x86/kvm/mmu/page_track.c | 15 +- arch/x86/kvm/x86.c| 114 +--- include/linux/kvm_host.h | 48 +- tools/testing/selftests/kvm/.gitignore| 1 + tools/testing/selftests/kvm/Makefile | 3 + .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c| 30 + .../selftests/kvm/set_memory_region_test.c| 142 virt/kvm/arm/arm.c| 48 +- virt/kvm/arm/mmu.c| 20 +- virt/kvm/kvm_main.c | 621 -- 22 files changed, 736 insertions(+), 535 deletions(-) create mode 100644 tools/testing/selftests/kvm/set_memory_region_test.c -- 2
[PATCH v4 11/19] KVM: x86: Free arrays for old memslot when moving memslot's base gfn
Explicitly free the metadata arrays (stored in slot->arch) in the old memslot structure when moving the memslot's base gfn is committed. This eliminates x86's dependency on kvm_free_memslot() being called when a memlsot move is committed, and paves the way for removing the funky code in kvm_free_memslot() that conditionally frees structures based on its @dont param. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 4 1 file changed, 4 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0911b2f634c5..5f890812fac3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9884,6 +9884,10 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, */ if (change != KVM_MR_DELETE) kvm_mmu_slot_apply_flags(kvm, (struct kvm_memory_slot *) new); + + /* Free the arrays associated with the old memslot. */ + if (change == KVM_MR_MOVE) + kvm_arch_free_memslot(kvm, old, NULL); } void kvm_arch_flush_shadow_all(struct kvm *kvm) -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 04/19] KVM: PPC: Move memslot memory allocation into prepare_memory_region()
Allocate the rmap array during kvm_arch_prepare_memory_region() to pave the way for removing kvm_arch_create_memslot() altogether. Moving PPC's memory allocation only changes the order of kernel memory allocations between PPC and common KVM code. No functional change intended. Acked-by: Paul Mackerras Signed-off-by: Sean Christopherson --- arch/powerpc/include/asm/kvm_ppc.h | 11 --- arch/powerpc/kvm/book3s.c | 12 arch/powerpc/kvm/book3s_hv.c | 25 - arch/powerpc/kvm/book3s_pr.c | 11 ++- arch/powerpc/kvm/booke.c | 9 ++--- arch/powerpc/kvm/powerpc.c | 4 ++-- 6 files changed, 26 insertions(+), 46 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index 3d2f871241a8..4df042355356 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -203,12 +203,10 @@ extern void kvmppc_core_destroy_vm(struct kvm *kvm); extern void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, struct kvm_memory_slot *dont); -extern int kvmppc_core_create_memslot(struct kvm *kvm, - struct kvm_memory_slot *slot, - unsigned long npages); extern int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, - const struct kvm_userspace_memory_region *mem); + const struct kvm_userspace_memory_region *mem, + enum kvm_mr_change change); extern void kvmppc_core_commit_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, const struct kvm_memory_slot *old, @@ -281,7 +279,8 @@ struct kvmppc_ops { void (*flush_memslot)(struct kvm *kvm, struct kvm_memory_slot *memslot); int (*prepare_memory_region)(struct kvm *kvm, struct kvm_memory_slot *memslot, -const struct kvm_userspace_memory_region *mem); +const struct kvm_userspace_memory_region *mem, +enum kvm_mr_change change); void (*commit_memory_region)(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, const struct kvm_memory_slot *old, @@ -295,8 +294,6 @@ struct kvmppc_ops { void (*mmu_destroy)(struct kvm_vcpu *vcpu); void (*free_memslot)(struct kvm_memory_slot *free, struct kvm_memory_slot *dont); - int (*create_memslot)(struct kvm_memory_slot *slot, - unsigned long npages); int (*init_vm)(struct kvm *kvm); void (*destroy_vm)(struct kvm *kvm); int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info); diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c index 58a59ee998e2..2480150646ae 100644 --- a/arch/powerpc/kvm/book3s.c +++ b/arch/powerpc/kvm/book3s.c @@ -815,12 +815,6 @@ void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, kvm->arch.kvm_ops->free_memslot(free, dont); } -int kvmppc_core_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return kvm->arch.kvm_ops->create_memslot(slot, npages); -} - void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot) { kvm->arch.kvm_ops->flush_memslot(kvm, memslot); @@ -828,9 +822,11 @@ void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot) int kvmppc_core_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, - const struct kvm_userspace_memory_region *mem) + const struct kvm_userspace_memory_region *mem, + enum kvm_mr_change change) { - return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem); + return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem, + change); } void kvmppc_core_commit_memory_region(struct kvm *kvm, diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index dc53578193ee..b265d7511f3f 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -4466,20 +4466,20 @@ static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free, } } -static int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot, -unsigned long npages) -{ - slot->arch.rmap = vzalloc(array_size(npages,
[PATCH v4 05/19] KVM: x86: Allocate memslot resources during prepare_memory_region()
Allocate the various metadata structures associated with a new memslot during kvm_arch_prepare_memory_region(), which paves the way for removing kvm_arch_create_memslot() altogether. Moving x86's memory allocation only changes the order of kernel memory allocations between x86 and common KVM code. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/x86.c | 13 + 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 04d1bf89da0e..8c815b3587b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9700,6 +9700,12 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned long npages) +{ + return 0; +} + +static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot, + unsigned long npages) { int i; @@ -9784,10 +9790,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem, enum kvm_mr_change change) { - if (change == KVM_MR_MOVE) - return kvm_arch_create_memslot(kvm, memslot, - mem->memory_size >> PAGE_SHIFT); - + if (change == KVM_MR_CREATE || change == KVM_MR_MOVE) + return kvm_alloc_memslot_metadata(memslot, + mem->memory_size >> PAGE_SHIFT); return 0; } -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 14/19] KVM: Clean up local variable usage in __kvm_set_memory_region()
Clean up __kvm_set_memory_region() to achieve several goals: - Remove local variables that serve no real purpose - Improve the readability of the code - Better show the relationship between the 'old' and 'new' memslot - Prepare for dynamically sizing memslots. Note, using 'tmp' to hold the initial memslot is not strictly necessary at this juncture, e.g. 'old' could be directly copied from id_to_memslot(), but keep the pointer usage as id_to_memslot() will be able to return a NULL pointer once memslots are dynamically sized. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 47 +++-- 1 file changed, 24 insertions(+), 23 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 2fa40c3e7961..b3e732078ab2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1074,13 +1074,11 @@ static int kvm_delete_memslot(struct kvm *kvm, int __kvm_set_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem) { - int r; - gfn_t base_gfn; - unsigned long npages; - struct kvm_memory_slot *slot; struct kvm_memory_slot old, new; - int as_id, id; + struct kvm_memory_slot *tmp; enum kvm_mr_change change; + int as_id, id; + int r; r = check_memory_region_flags(mem); if (r) @@ -1105,52 +1103,55 @@ int __kvm_set_memory_region(struct kvm *kvm, if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr) return -EINVAL; - slot = id_to_memslot(__kvm_memslots(kvm, as_id), id); - base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; - npages = mem->memory_size >> PAGE_SHIFT; - - if (npages > KVM_MEM_MAX_NR_PAGES) - return -EINVAL; - /* * Make a full copy of the old memslot, the pointer will become stale * when the memslots are re-sorted by update_memslots(). */ - old = *slot; + tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id); + old = *tmp; + tmp = NULL; + if (!mem->memory_size) return kvm_delete_memslot(kvm, mem, &old, as_id); - new = old; - new.id = id; - new.base_gfn = base_gfn; - new.npages = npages; + new.base_gfn = mem->guest_phys_addr >> PAGE_SHIFT; + new.npages = mem->memory_size >> PAGE_SHIFT; new.flags = mem->flags; new.userspace_addr = mem->userspace_addr; + if (new.npages > KVM_MEM_MAX_NR_PAGES) + return -EINVAL; + if (!old.npages) { change = KVM_MR_CREATE; + new.dirty_bitmap = NULL; + memset(&new.arch, 0, sizeof(new.arch)); } else { /* Modify an existing slot. */ if ((new.userspace_addr != old.userspace_addr) || - (npages != old.npages) || + (new.npages != old.npages) || ((new.flags ^ old.flags) & KVM_MEM_READONLY)) return -EINVAL; - if (base_gfn != old.base_gfn) + if (new.base_gfn != old.base_gfn) change = KVM_MR_MOVE; else if (new.flags != old.flags) change = KVM_MR_FLAGS_ONLY; else /* Nothing to change. */ return 0; + + /* Copy dirty_bitmap and arch from the current memslot. */ + new.dirty_bitmap = old.dirty_bitmap; + memcpy(&new.arch, &old.arch, sizeof(new.arch)); } if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) { /* Check for overlaps */ - kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) { - if (slot->id == id) + kvm_for_each_memslot(tmp, __kvm_memslots(kvm, as_id)) { + if (tmp->id == id) continue; - if (!((base_gfn + npages <= slot->base_gfn) || - (base_gfn >= slot->base_gfn + slot->npages))) + if (!((new.base_gfn + new.npages <= tmp->base_gfn) || + (new.base_gfn >= tmp->base_gfn + tmp->npages))) return -EEXIST; } } -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 06/19] KVM: Drop kvm_arch_create_memslot()
Remove kvm_arch_create_memslot() now that all arch implementations are effectively nops. Removing kvm_arch_create_memslot() eliminates the possibility for arch specific code to allocate memory prior to setting a memslot, which sets the stage for simplifying kvm_free_memslot(). Cc: Janosch Frank Acked-by: Christian Borntraeger Signed-off-by: Sean Christopherson --- arch/mips/kvm/mips.c | 6 -- arch/powerpc/kvm/powerpc.c | 6 -- arch/s390/kvm/kvm-s390.c | 6 -- arch/x86/kvm/x86.c | 6 -- include/linux/kvm_host.h | 2 -- virt/kvm/arm/mmu.c | 6 -- virt/kvm/kvm_main.c| 21 +++-- 7 files changed, 7 insertions(+), 46 deletions(-) diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 1109924560d8..713e5465edb0 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -188,12 +188,6 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, return -ENOIOCTLCMD; } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return 0; -} - void kvm_arch_flush_shadow_all(struct kvm *kvm) { /* Flush whole GPA */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index b0e6b33b476d..c922711a6dd8 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -691,12 +691,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, kvmppc_core_free_memslot(kvm, free, dont); } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return 0; -} - int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, const struct kvm_userspace_memory_region *mem, diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index d9e6bf3d54f0..1be45bad7849 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4491,12 +4491,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) return VM_FAULT_SIGBUS; } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return 0; -} - /* Section: memory related */ int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8c815b3587b4..4892ded361b3 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9698,12 +9698,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, kvm_page_track_free_memslot(free, dont); } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return 0; -} - static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot, unsigned long npages) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 339de08e5fa2..46dd713da634 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -674,8 +674,6 @@ int __kvm_set_memory_region(struct kvm *kvm, const struct kvm_userspace_memory_region *mem); void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, struct kvm_memory_slot *dont); -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages); void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen); int kvm_arch_prepare_memory_region(struct kvm *kvm, struct kvm_memory_slot *memslot, diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 38b4c910b6c3..f264de85f648 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -2358,12 +2358,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, { } -int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, - unsigned long npages) -{ - return 0; -} - void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) { } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7239e3b9dda0..d403e93e3028 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1043,12 +1043,13 @@ int __kvm_set_memory_region(struct kvm *kvm, new.base_gfn = base_gfn; new.npages = npages; new.flags = mem->flags; + new.userspace_addr = mem->userspace_addr; if (npages) { if (!old.npages) change = KVM_MR_CREATE; else { /* Modify an existing slot. */ - if ((mem->userspace_addr != old.userspace_addr) || + if ((new.userspace_addr != old.userspace_addr) || (npa
[PATCH v4 12/19] KVM: Move memslot deletion to helper function
Move memslot deletion into its own routine so that the success path for other memslot updates does not need to use kvm_free_memslot(), i.e. can explicitly destroy the dirty bitmap when necessary. This paves the way for dropping @dont from kvm_free_memslot(), i.e. all callers now pass NULL for @dont. Add a comment above the code to make a copy of the existing memslot prior to deletion, it is not at all obvious that the pointer will become stale during sorting and/or installation of new memslots. Note, kvm_arch_commit_memory_region() allows an architecture to free resources when moving a memslot or changing its flags, e.g. x86 frees its arch specific memslot metadata during commit_memory_region(). Acked-by: Christoffer Dall Tested-by: Christoffer Dall Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 73 +++-- 1 file changed, 44 insertions(+), 29 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index acf52fa16500..50e5aec0c15c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1046,6 +1046,27 @@ static int kvm_set_memslot(struct kvm *kvm, return r; } +static int kvm_delete_memslot(struct kvm *kvm, + const struct kvm_userspace_memory_region *mem, + struct kvm_memory_slot *old, int as_id) +{ + struct kvm_memory_slot new; + int r; + + if (!old->npages) + return -EINVAL; + + memset(&new, 0, sizeof(new)); + new.id = old->id; + + r = kvm_set_memslot(kvm, mem, old, &new, as_id, KVM_MR_DELETE); + if (r) + return r; + + kvm_free_memslot(kvm, old, NULL); + return 0; +} + /* * Allocate some memory and give it an address in the guest physical address * space. @@ -1095,7 +1116,15 @@ int __kvm_set_memory_region(struct kvm *kvm, if (npages > KVM_MEM_MAX_NR_PAGES) return -EINVAL; - new = old = *slot; + /* +* Make a full copy of the old memslot, the pointer will become stale +* when the memslots are re-sorted by update_memslots(). +*/ + old = *slot; + if (!mem->memory_size) + return kvm_delete_memslot(kvm, mem, &old, as_id); + + new = old; new.id = id; new.base_gfn = base_gfn; @@ -1103,29 +1132,20 @@ int __kvm_set_memory_region(struct kvm *kvm, new.flags = mem->flags; new.userspace_addr = mem->userspace_addr; - if (npages) { - if (!old.npages) - change = KVM_MR_CREATE; - else { /* Modify an existing slot. */ - if ((new.userspace_addr != old.userspace_addr) || - (npages != old.npages) || - ((new.flags ^ old.flags) & KVM_MEM_READONLY)) - return -EINVAL; - - if (base_gfn != old.base_gfn) - change = KVM_MR_MOVE; - else if (new.flags != old.flags) - change = KVM_MR_FLAGS_ONLY; - else /* Nothing to change. */ - return 0; - } - } else { - if (!old.npages) + if (!old.npages) { + change = KVM_MR_CREATE; + } else { /* Modify an existing slot. */ + if ((new.userspace_addr != old.userspace_addr) || + (npages != old.npages) || + ((new.flags ^ old.flags) & KVM_MEM_READONLY)) return -EINVAL; - change = KVM_MR_DELETE; - new.base_gfn = 0; - new.flags = 0; + if (base_gfn != old.base_gfn) + change = KVM_MR_MOVE; + else if (new.flags != old.flags) + change = KVM_MR_FLAGS_ONLY; + else /* Nothing to change. */ + return 0; } if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) { @@ -1148,17 +1168,12 @@ int __kvm_set_memory_region(struct kvm *kvm, return r; } - /* actual memory is freed via old in kvm_free_memslot below */ - if (change == KVM_MR_DELETE) { - new.dirty_bitmap = NULL; - memset(&new.arch, 0, sizeof(new.arch)); - } - r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change); if (r) goto out_bitmap; - kvm_free_memslot(kvm, &old, &new); + if (old.dirty_bitmap && !new.dirty_bitmap) + kvm_destroy_dirty_bitmap(&old); return 0; out_bitmap: -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 07/19] KVM: Explicitly free allocated-but-unused dirty bitmap
Explicitly free an allocated-but-unused dirty bitmap instead of relying on kvm_free_memslot() if an error occurs in __kvm_set_memory_region(). There is no longer a need to abuse kvm_free_memslot() to free arch specific resources as arch specific code is now called only after the common flow is guaranteed to succeed. Arch code can still fail, but it's responsible for its own cleanup in that case. Eliminating the error path's abuse of kvm_free_memslot() paves the way for simplifying kvm_free_memslot(), i.e. dropping its @dont param. Signed-off-by: Sean Christopherson --- virt/kvm/kvm_main.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d403e93e3028..6b2261a9e139 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1096,7 +1096,7 @@ int __kvm_set_memory_region(struct kvm *kvm, slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); if (!slots) - goto out_free; + goto out_bitmap; memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots)); if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) { @@ -1144,8 +1144,9 @@ int __kvm_set_memory_region(struct kvm *kvm, if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) slots = install_new_memslots(kvm, as_id, slots); kvfree(slots); -out_free: - kvm_free_memslot(kvm, &new, &old); +out_bitmap: + if (new.dirty_bitmap && !old.dirty_bitmap) + kvm_destroy_dirty_bitmap(&new); out: return r; } -- 2.24.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 00/15] KVM: Dynamically size memslot arrays
On Mon, Dec 16, 2019 at 09:25:24AM +0100, Christian Borntraeger wrote: > > On 13.12.19 21:01, Sean Christopherson wrote: > > Applies cleanly on the current kvm/queue and nothing caught fire in > > testing (though I only re-tested the series as a whole). > > Do you have the latest version somewhere on a branch? The version on the > list no longer applies cleanly. Ah, I only tested with my sparse x86-only tree. The result with three-way merging, i.e. 'git am -3', looks correct at a glance. Regardless, I need to spin a new version to handle a conflict with an unrelated in-flight memslots bug fix, I'll get that sent out today. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2] arm64: Introduce ID_ISAR6 CPU register
This adds basic building blocks required for ID_ISAR6 CPU register which identifies support for various instruction implementation on AArch32 state. Cc: Catalin Marinas Cc: Will Deacon Cc: Marc Zyngier Cc: James Morse Cc: Suzuki K Poulose Cc: Mark Rutland Cc: linux-ker...@vger.kernel.org Cc: kvmarm@lists.cs.columbia.edu Acked-by: Marc Zyngier Signed-off-by: Anshuman Khandual --- Changes in V2: - Added an explicit ftr_id_isar6[] instead of using ftr_generic_32bits per Mark - Dropped ID_ISAR6_SPECRES_SHIFT exposure in ftr_id_isar6[] per Mark - Reversed ID_ISAR6_* definitions sequence to meet existing pattern in the file arch/arm64/include/asm/cpu.h| 1 + arch/arm64/include/asm/sysreg.h | 9 + arch/arm64/kernel/cpufeature.c | 15 +++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kvm/sys_regs.c | 2 +- 5 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index d72d995..b4a4053 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -39,6 +39,7 @@ struct cpuinfo_arm64 { u32 reg_id_isar3; u32 reg_id_isar4; u32 reg_id_isar5; + u32 reg_id_isar6; u32 reg_id_mmfr0; u32 reg_id_mmfr1; u32 reg_id_mmfr2; diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 6e919fa..7a176e1 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -146,6 +146,7 @@ #define SYS_ID_ISAR4_EL1 sys_reg(3, 0, 0, 2, 4) #define SYS_ID_ISAR5_EL1 sys_reg(3, 0, 0, 2, 5) #define SYS_ID_MMFR4_EL1 sys_reg(3, 0, 0, 2, 6) +#define SYS_ID_ISAR6_EL1 sys_reg(3, 0, 0, 2, 7) #define SYS_MVFR0_EL1 sys_reg(3, 0, 0, 3, 0) #define SYS_MVFR1_EL1 sys_reg(3, 0, 0, 3, 1) @@ -679,6 +680,14 @@ #define ID_ISAR5_AES_SHIFT 4 #define ID_ISAR5_SEVL_SHIFT0 +#define ID_ISAR6_I8MM_SHIFT24 +#define ID_ISAR6_BF16_SHIFT20 +#define ID_ISAR6_SPECRES_SHIFT 16 +#define ID_ISAR6_SB_SHIFT 12 +#define ID_ISAR6_FHM_SHIFT 8 +#define ID_ISAR6_DP_SHIFT 4 +#define ID_ISAR6_JSCVT_SHIFT 0 + #define MVFR0_FPROUND_SHIFT28 #define MVFR0_FPSHVEC_SHIFT24 #define MVFR0_FPSQRT_SHIFT 20 diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 04cf64e..6cec9aad 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -313,6 +313,16 @@ static const struct arm64_ftr_bits ftr_id_mmfr4[] = { ARM64_FTR_END, }; +static const struct arm64_ftr_bits ftr_id_isar6[] = { + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_I8MM_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_BF16_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_SB_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_FHM_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_DP_SHIFT, 4, 0), + ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_ISAR6_JSCVT_SHIFT, 4, 0), + ARM64_FTR_END, +}; + static const struct arm64_ftr_bits ftr_id_pfr0[] = { ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 12, 4, 0), /* State3 */ ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, 8, 4, 0), /* State2 */ @@ -396,6 +406,7 @@ static const struct __ftr_reg_entry { ARM64_FTR_REG(SYS_ID_ISAR4_EL1, ftr_generic_32bits), ARM64_FTR_REG(SYS_ID_ISAR5_EL1, ftr_id_isar5), ARM64_FTR_REG(SYS_ID_MMFR4_EL1, ftr_id_mmfr4), + ARM64_FTR_REG(SYS_ID_ISAR6_EL1, ftr_id_isar6), /* Op1 = 0, CRn = 0, CRm = 3 */ ARM64_FTR_REG(SYS_MVFR0_EL1, ftr_generic_32bits), @@ -600,6 +611,7 @@ void __init init_cpu_features(struct cpuinfo_arm64 *info) init_cpu_ftr_reg(SYS_ID_ISAR3_EL1, info->reg_id_isar3); init_cpu_ftr_reg(SYS_ID_ISAR4_EL1, info->reg_id_isar4); init_cpu_ftr_reg(SYS_ID_ISAR5_EL1, info->reg_id_isar5); + init_cpu_ftr_reg(SYS_ID_ISAR6_EL1, info->reg_id_isar6); init_cpu_ftr_reg(SYS_ID_MMFR0_EL1, info->reg_id_mmfr0); init_cpu_ftr_reg(SYS_ID_MMFR1_EL1, info->reg_id_mmfr1); init_cpu_ftr_reg(SYS_ID_MMFR2_EL1, info->reg_id_mmfr2); @@ -753,6 +765,8 @@ void update_cpu_features(int cpu, info->reg_id_isar4, boot->reg_id_isar4); taint |= check_update_ftr_reg(SYS_ID_ISAR5_EL1, cpu, info->reg_id_isar5, boot->reg_id_isar5); + taint |= check_update_ftr_reg(SYS_ID_ISAR6_E
Re: [PATCH 5/5] KVM: arm64: Support the vcpu preemption check
On Tue, Dec 17, 2019 at 01:55:49PM +, yezengr...@huawei.com wrote: > From: Zengruan Ye > > Support the vcpu_is_preempted() functionality under KVM/arm64. This will > enhance lock performance on overcommitted hosts (more runnable vcpus > than physical cpus in the system) as doing busy waits for preempted > vcpus will hurt system performance far worse than early yielding. > > unix benchmark result: > host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 cpus > guest: kernel 5.5.0-rc1, 16 vcpus > >test-case|after-patch| before-patch > +---+-- > Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps > Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 > MWIPS > Execl Throughput | 3662.1 lps | 2718.0 lps > File Copy 1024 bufsize 2000 maxblocks |432906.4 KBps |158011.8 KBps > File Copy 256 bufsize 500 maxblocks|116023.0 KBps | 37664.0 KBps > File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |441108.8 KBps > Pipe Throughput| 6405029.6 lps | 6021457.6 lps > Pipe-based Context Switching |185872.7 lps |184255.3 lps > Process Creation | 4025.7 lps | 3706.6 lps > Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm > Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm > System Call Overhead | 3913363.1 lps | 3883287.8 lps > +---+-- > System Benchmarks Index Score | 1835.1 | 1327.6 > > Signed-off-by: Zengruan Ye > --- > arch/arm64/include/asm/paravirt.h | 3 + > arch/arm64/kernel/paravirt.c | 91 +++ > arch/arm64/kernel/setup.c | 2 + > include/linux/cpuhotplug.h| 1 + > 4 files changed, 97 insertions(+) > > diff --git a/arch/arm64/include/asm/paravirt.h > b/arch/arm64/include/asm/paravirt.h > index 7b1c81b544bb..a2cd0183bbef 100644 > --- a/arch/arm64/include/asm/paravirt.h > +++ b/arch/arm64/include/asm/paravirt.h > @@ -29,6 +29,8 @@ static inline u64 paravirt_steal_clock(int cpu) > > int __init pv_time_init(void); > > +int __init kvm_guest_init(void); > + This is a *very* generic name - I suggest something like pv_lock_init() so it's clear what the function actually does. > __visible bool __native_vcpu_is_preempted(int cpu); > > static inline bool pv_vcpu_is_preempted(int cpu) > @@ -39,6 +41,7 @@ static inline bool pv_vcpu_is_preempted(int cpu) > #else > > #define pv_time_init() do {} while (0) > +#define kvm_guest_init() do {} while (0) > > #endif // CONFIG_PARAVIRT > > diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c > index d8f1ba8c22ce..a86dead40473 100644 > --- a/arch/arm64/kernel/paravirt.c > +++ b/arch/arm64/kernel/paravirt.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > struct static_key paravirt_steal_enabled; > struct static_key paravirt_steal_rq_enabled; > @@ -158,3 +159,93 @@ int __init pv_time_init(void) > > return 0; > } > + > +DEFINE_PER_CPU(struct pvlock_vcpu_state, pvlock_vcpu_region) __aligned(64); > +EXPORT_PER_CPU_SYMBOL(pvlock_vcpu_region); > + > +static int pvlock_vcpu_state_dying_cpu(unsigned int cpu) > +{ > + struct pvlock_vcpu_state *reg; > + > + reg = this_cpu_ptr(&pvlock_vcpu_region); > + if (!reg) > + return -EFAULT; > + > + memset(reg, 0, sizeof(*reg)); I might be missing something obvious here - but I don't see the point of this. The hypervisor might immediately overwrite the structure again. Indeed you should conside a mechanism for the guest to "unregister" the region - otherwise you will face issues with the likes of kexec. For pv_time the memory is allocated by the hypervisor not the guest to avoid lifetime issues about kexec. > + > + return 0; > +} > + > +static int init_pvlock_vcpu_state(unsigned int cpu) > +{ > + struct pvlock_vcpu_state *reg; > + struct arm_smccc_res res; > + > + reg = this_cpu_ptr(&pvlock_vcpu_region); > + if (!reg) > + return -EFAULT; > + > + /* Pass the memory address to host via hypercall */ > + arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_LOCK_PREEMPTED, > + virt_to_phys(reg), &res); > + > + return 0; > +} > + > +static bool kvm_vcpu_is_preempted(int cpu) > +{ > + struct pvlock_vcpu_state *reg = &per_cpu(pvlock_vcpu_region, cpu); > + > + if (reg) > + return !!(reg->preempted & 1); > + > + return false; > +} > + > +static int kvm_arm_init_pvlock(void) > +{ > + int ret; > + > + ret = cpuhp_setup_state(CPUHP_AP_ARM_KVM_PVLOCK_STARTING, > + "hypervisor/arm/pvlock:starting", > +
Re: [PATCH 3/5] KVM: arm64: Support pvlock preempted via shared structure
On Tue, Dec 17, 2019 at 01:55:47PM +, yezengr...@huawei.com wrote: > From: Zengruan Ye > > Implement the service call for configuring a shared structure between a > vcpu and the hypervisor in which the hypervisor can tell the vcpu is > running or not. > > The preempted field is zero if 1) some old KVM deos not support this filed. > 2) the vcpu is not preempted. Other values means the vcpu has been preempted. > > Signed-off-by: Zengruan Ye > --- > arch/arm/include/asm/kvm_host.h | 13 + > arch/arm64/include/asm/kvm_host.h | 17 + > arch/arm64/kvm/Makefile | 1 + > virt/kvm/arm/arm.c| 8 > virt/kvm/arm/hypercalls.c | 4 > virt/kvm/arm/pvlock.c | 21 + > 6 files changed, 64 insertions(+) > create mode 100644 virt/kvm/arm/pvlock.c > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h > index 556cd818eccf..098375f1c89e 100644 > --- a/arch/arm/include/asm/kvm_host.h > +++ b/arch/arm/include/asm/kvm_host.h > @@ -356,6 +356,19 @@ static inline bool kvm_arm_is_pvtime_enabled(struct > kvm_vcpu_arch *vcpu_arch) > return false; > } > > +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch > *vcpu_arch) > +{ > +} > + > +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch > *vcpu_arch) > +{ > + return false; > +} > + > +static inline void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 > preempted) > +{ > +} > + > void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); > > struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index c61260cf63c5..d9b2a21a87ac 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -354,6 +354,11 @@ struct kvm_vcpu_arch { > u64 last_steal; > gpa_t base; > } steal; > + > + /* Guest PV lock state */ > + struct { > + gpa_t base; > + } pv; > }; > > /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ > @@ -515,6 +520,18 @@ static inline bool kvm_arm_is_pvtime_enabled(struct > kvm_vcpu_arch *vcpu_arch) > return (vcpu_arch->steal.base != GPA_INVALID); > } > > +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch > *vcpu_arch) > +{ > + vcpu_arch->pv.base = GPA_INVALID; > +} > + > +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch > *vcpu_arch) > +{ > + return (vcpu_arch->pv.base != GPA_INVALID); > +} > + > +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted); > + > void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome); > > struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); > diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile > index 5ffbdc39e780..e4591f56d5f1 100644 > --- a/arch/arm64/kvm/Makefile > +++ b/arch/arm64/kvm/Makefile > @@ -15,6 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o > $(KVM)/arm/mmu.o $(KVM)/arm/mmio. > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o > kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o > +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvlock.o > > kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o > kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c > index 12e0280291ce..c562f62fdd45 100644 > --- a/virt/kvm/arm/arm.c > +++ b/virt/kvm/arm/arm.c > @@ -383,6 +383,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) > > kvm_arm_pvtime_vcpu_init(&vcpu->arch); > > + kvm_arm_pvlock_preempted_init(&vcpu->arch); > + > return kvm_vgic_vcpu_init(vcpu); > } > > @@ -421,6 +423,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > vcpu_set_wfx_traps(vcpu); > > vcpu_ptrauth_setup_lazy(vcpu); > + > + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch)) > + kvm_update_pvlock_preempted(vcpu, 0); > } > > void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > @@ -434,6 +439,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) > vcpu->cpu = -1; > > kvm_arm_set_running_vcpu(NULL); > + > + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch)) > + kvm_update_pvlock_preempted(vcpu, 1); > } > > static void vcpu_power_off(struct kvm_vcpu *vcpu) > diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c > index ff13871fd85a..5964982ccd05 100644 > --- a/virt/kvm/arm/hypercalls.c > +++ b/virt/kvm/arm/hypercalls.c > @@ -65,6 +65,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > if (gpa != GPA_INVALID) > val = gpa; > break; > + case ARM_SMCCC_HV_PV_LOCK_PREEMPTED: > + vcpu->arch.pv.base = smccc_get_
Re: [PATCH 2/5] KVM: arm64: Implement PV_LOCK_FEATURES call
On Tue, Dec 17, 2019 at 01:55:46PM +, yezengr...@huawei.com wrote: > From: Zengruan Ye > > This provides a mechanism for querying which paravirtualized lock > features are available in this hypervisor. > > Also add the header file which defines the ABI for the paravirtualized > lock features we're about to add. > > Signed-off-by: Zengruan Ye > --- > arch/arm64/include/asm/pvlock-abi.h | 16 > include/linux/arm-smccc.h | 13 + > virt/kvm/arm/hypercalls.c | 3 +++ > 3 files changed, 32 insertions(+) > create mode 100644 arch/arm64/include/asm/pvlock-abi.h > > diff --git a/arch/arm64/include/asm/pvlock-abi.h > b/arch/arm64/include/asm/pvlock-abi.h > new file mode 100644 > index ..06e0c3d7710a > --- /dev/null > +++ b/arch/arm64/include/asm/pvlock-abi.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright(c) 2019 Huawei Technologies Co., Ltd > + * Author: Zengruan Ye > + */ > + > +#ifndef __ASM_PVLOCK_ABI_H > +#define __ASM_PVLOCK_ABI_H > + > +struct pvlock_vcpu_state { > + __le64 preempted; Somewhere we need to document when 'preempted' is. It looks like it's a 1-bit field from the later patches. > + /* Structure must be 64 byte aligned, pad to that size */ > + u8 padding[56]; > +} __packed; > + > +#endif > diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h > index 59494df0f55b..59e65a951959 100644 > --- a/include/linux/arm-smccc.h > +++ b/include/linux/arm-smccc.h > @@ -377,5 +377,18 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, > unsigned long a1, > ARM_SMCCC_OWNER_STANDARD_HYP,\ > 0x21) > > +/* Paravirtualised lock calls */ > +#define ARM_SMCCC_HV_PV_LOCK_FEATURES\ > + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ > +ARM_SMCCC_SMC_64,\ > +ARM_SMCCC_OWNER_STANDARD_HYP,\ > +0x40) > + > +#define ARM_SMCCC_HV_PV_LOCK_PREEMPTED \ > + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ > +ARM_SMCCC_SMC_64,\ > +ARM_SMCCC_OWNER_STANDARD_HYP,\ > +0x41) > + > #endif /*__ASSEMBLY__*/ > #endif /*__LINUX_ARM_SMCCC_H*/ > diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c > index 550dfa3e53cd..ff13871fd85a 100644 > --- a/virt/kvm/arm/hypercalls.c > +++ b/virt/kvm/arm/hypercalls.c > @@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) > case ARM_SMCCC_HV_PV_TIME_FEATURES: > val = SMCCC_RET_SUCCESS; > break; > + case ARM_SMCCC_HV_PV_LOCK_FEATURES: > + val = SMCCC_RET_SUCCESS; > + break; Ideally you wouldn't report that PV_LOCK_FEATURES exists until the actual hypercalls are wired up to avoid breaking a bisect. Steve > } > break; > case ARM_SMCCC_HV_PV_TIME_FEATURES: > -- > 2.19.1 > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: Document PV-lock interface
On Tue, Dec 17, 2019 at 01:55:45PM +, yezengr...@huawei.com wrote: > From: Zengruan Ye > > Introduce a paravirtualization interface for KVM/arm64 to obtain the vcpu > is currently running or not. > > A hypercall interface is provided for the guest to interrogate the > hypervisor's support for this interface and the location of the shared > memory structures. > > Signed-off-by: Zengruan Ye > --- > Documentation/virt/kvm/arm/pvlock.rst | 31 +++ > 1 file changed, 31 insertions(+) > create mode 100644 Documentation/virt/kvm/arm/pvlock.rst > > diff --git a/Documentation/virt/kvm/arm/pvlock.rst > b/Documentation/virt/kvm/arm/pvlock.rst > new file mode 100644 > index ..eec0c36edf17 > --- /dev/null > +++ b/Documentation/virt/kvm/arm/pvlock.rst > @@ -0,0 +1,31 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +Paravirtualized lock support for arm64 > +== > + > +KVM/arm64 provids some hypervisor service calls to support a paravirtualized > +guest obtaining the vcpu is currently running or not. > + > +Two new SMCCC compatible hypercalls are defined: > + > +* PV_LOCK_FEATURES: 0xC540 > +* PV_LOCK_PREEMPTED: 0xC541 These values are in the "Standard Hypervisor Service Calls" section of SMCCC - so is there a document that describes this features such that other OSes or hypervisors can implement it? I'm also not entirely sure of the process of ensuring that the IDs picked are non-conflicting. Otherwise if this is a KVM specific interface this should probably belong within the "Vendor Specific Hypervisor Service Calls" section along with some probing that the hypervisor is actually KVM. Although I don't see anything KVM specific. > + > +The existence of the PV_LOCK hypercall should be probed using the SMCCC 1.1 > +ARCH_FEATURES mechanism before calling it. > + > +PV_LOCK_FEATURES > += == > +Function ID: (uint32)0xC540 > +PV_call_id: (uint32)The function to query for support. > +Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the > relevant > + PV-lock feature is supported by the hypervisor. > += == > + > +PV_LOCK_PREEMPTED > += == > +Function ID: (uint32)0xC541 > +Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the IPA of > + this vcpu's pv data structure is configured by > + the hypervisor. > += == >From the code it looks like there's another argument for this SMC - the physical address (or IPA) of a struct pvlock_vcpu_state. This structure also needs to be described as it is part of the ABI. Steve ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 1/5] KVM: arm64: Document PV-lock interface
From: Zengruan Ye Introduce a paravirtualization interface for KVM/arm64 to obtain the vcpu is currently running or not. A hypercall interface is provided for the guest to interrogate the hypervisor's support for this interface and the location of the shared memory structures. Signed-off-by: Zengruan Ye --- Documentation/virt/kvm/arm/pvlock.rst | 31 +++ 1 file changed, 31 insertions(+) create mode 100644 Documentation/virt/kvm/arm/pvlock.rst diff --git a/Documentation/virt/kvm/arm/pvlock.rst b/Documentation/virt/kvm/arm/pvlock.rst new file mode 100644 index ..eec0c36edf17 --- /dev/null +++ b/Documentation/virt/kvm/arm/pvlock.rst @@ -0,0 +1,31 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Paravirtualized lock support for arm64 +== + +KVM/arm64 provids some hypervisor service calls to support a paravirtualized +guest obtaining the vcpu is currently running or not. + +Two new SMCCC compatible hypercalls are defined: + +* PV_LOCK_FEATURES: 0xC540 +* PV_LOCK_PREEMPTED: 0xC541 + +The existence of the PV_LOCK hypercall should be probed using the SMCCC 1.1 +ARCH_FEATURES mechanism before calling it. + +PV_LOCK_FEATURES += == +Function ID: (uint32)0xC540 +PV_call_id: (uint32)The function to query for support. +Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant + PV-lock feature is supported by the hypervisor. += == + +PV_LOCK_PREEMPTED += == +Function ID: (uint32)0xC541 +Return value: (int64) NOT_SUPPORTED (-1) or SUCCESS (0) if the IPA of + this vcpu's pv data structure is configured by + the hypervisor. += == -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 3/5] KVM: arm64: Support pvlock preempted via shared structure
From: Zengruan Ye Implement the service call for configuring a shared structure between a vcpu and the hypervisor in which the hypervisor can tell the vcpu is running or not. The preempted field is zero if 1) some old KVM deos not support this filed. 2) the vcpu is not preempted. Other values means the vcpu has been preempted. Signed-off-by: Zengruan Ye --- arch/arm/include/asm/kvm_host.h | 13 + arch/arm64/include/asm/kvm_host.h | 17 + arch/arm64/kvm/Makefile | 1 + virt/kvm/arm/arm.c| 8 virt/kvm/arm/hypercalls.c | 4 virt/kvm/arm/pvlock.c | 21 + 6 files changed, 64 insertions(+) create mode 100644 virt/kvm/arm/pvlock.c diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 556cd818eccf..098375f1c89e 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -356,6 +356,19 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch) return false; } +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch) +{ +} + +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch) +{ + return false; +} + +static inline void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted) +{ +} + void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot); struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index c61260cf63c5..d9b2a21a87ac 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -354,6 +354,11 @@ struct kvm_vcpu_arch { u64 last_steal; gpa_t base; } steal; + + /* Guest PV lock state */ + struct { + gpa_t base; + } pv; }; /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */ @@ -515,6 +520,18 @@ static inline bool kvm_arm_is_pvtime_enabled(struct kvm_vcpu_arch *vcpu_arch) return (vcpu_arch->steal.base != GPA_INVALID); } +static inline void kvm_arm_pvlock_preempted_init(struct kvm_vcpu_arch *vcpu_arch) +{ + vcpu_arch->pv.base = GPA_INVALID; +} + +static inline bool kvm_arm_is_pvlock_preempted_ready(struct kvm_vcpu_arch *vcpu_arch) +{ + return (vcpu_arch->pv.base != GPA_INVALID); +} + +void kvm_update_pvlock_preempted(struct kvm_vcpu *vcpu, u64 preempted); + void kvm_set_sei_esr(struct kvm_vcpu *vcpu, u64 syndrome); struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr); diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 5ffbdc39e780..e4591f56d5f1 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -15,6 +15,7 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arm.o $(KVM)/arm/mmu.o $(KVM)/arm/mmio. kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/psci.o $(KVM)/arm/perf.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hypercalls.o kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvtime.o +kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/pvlock.o kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o va_layout.o kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 12e0280291ce..c562f62fdd45 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -383,6 +383,8 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) kvm_arm_pvtime_vcpu_init(&vcpu->arch); + kvm_arm_pvlock_preempted_init(&vcpu->arch); + return kvm_vgic_vcpu_init(vcpu); } @@ -421,6 +423,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) vcpu_set_wfx_traps(vcpu); vcpu_ptrauth_setup_lazy(vcpu); + + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch)) + kvm_update_pvlock_preempted(vcpu, 0); } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) @@ -434,6 +439,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) vcpu->cpu = -1; kvm_arm_set_running_vcpu(NULL); + + if (kvm_arm_is_pvlock_preempted_ready(&vcpu->arch)) + kvm_update_pvlock_preempted(vcpu, 1); } static void vcpu_power_off(struct kvm_vcpu *vcpu) diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c index ff13871fd85a..5964982ccd05 100644 --- a/virt/kvm/arm/hypercalls.c +++ b/virt/kvm/arm/hypercalls.c @@ -65,6 +65,10 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) if (gpa != GPA_INVALID) val = gpa; break; + case ARM_SMCCC_HV_PV_LOCK_PREEMPTED: + vcpu->arch.pv.base = smccc_get_arg1(vcpu); + val = SMCCC_RET_SUCCESS; + break; default: return kvm_psci_call(vcpu); } diff --git a/virt/kvm/arm/pvlock.c b/virt/kvm/arm/pvlock.c new file mode 100644 index ..c3464958b0f5 --- /dev/null +++ b/virt/kvm/arm/pvlock.c
[PATCH 0/5] KVM: arm64: vcpu preempted check support
From: Zengruan Ye This patch set aims to support the vcpu_is_preempted() functionality under KVM/arm64, which allowing the guest to obtain the vcpu is currently running or not. This will enhance lock performance on overcommitted hosts (more runnable vcpus than physical cpus in the system) as doing busy waits for preempted vcpus will hurt system performance far worse than early yielding. We have observed some performace improvements in uninx benchmark tests. unix benchmark result: host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 cpus guest: kernel 5.5.0-rc1, 16 vcpus test-case|after-patch| before-patch +---+-- Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS Execl Throughput | 3662.1 lps |2718.0 lps File Copy 1024 bufsize 2000 maxblocks |432906.4 KBps | 158011.8 KBps File Copy 256 bufsize 500 maxblocks|116023.0 KBps | 37664.0 KBps File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 KBps Pipe Throughput| 6405029.6 lps | 6021457.6 lps Pipe-based Context Switching |185872.7 lps | 184255.3 lps Process Creation | 4025.7 lps |3706.6 lps Shell Scripts (1 concurrent) | 6745.6 lpm |6436.1 lpm Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm System Call Overhead | 3913363.1 lps | 3883287.8 lps +---+-- System Benchmarks Index Score | 1835.1 |1327.6 Zengruan Ye (5): KVM: arm64: Document PV-lock interface KVM: arm64: Implement PV_LOCK_FEATURES call KVM: arm64: Support pvlock preempted via shared structure KVM: arm64: Add interface to support vcpu preempted check KVM: arm64: Support the vcpu preemption check Documentation/virt/kvm/arm/pvlock.rst | 31 + arch/arm/include/asm/kvm_host.h| 13 arch/arm64/include/asm/kvm_host.h | 17 + arch/arm64/include/asm/paravirt.h | 15 arch/arm64/include/asm/pvlock-abi.h| 16 + arch/arm64/include/asm/spinlock.h | 7 ++ arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/paravirt-spinlocks.c | 13 arch/arm64/kernel/paravirt.c | 95 +- arch/arm64/kernel/setup.c | 2 + arch/arm64/kvm/Makefile| 1 + include/linux/arm-smccc.h | 13 include/linux/cpuhotplug.h | 1 + virt/kvm/arm/arm.c | 8 +++ virt/kvm/arm/hypercalls.c | 7 ++ virt/kvm/arm/pvlock.c | 21 ++ 16 files changed, 260 insertions(+), 2 deletions(-) create mode 100644 Documentation/virt/kvm/arm/pvlock.rst create mode 100644 arch/arm64/include/asm/pvlock-abi.h create mode 100644 arch/arm64/kernel/paravirt-spinlocks.c create mode 100644 virt/kvm/arm/pvlock.c -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 2/5] KVM: arm64: Implement PV_LOCK_FEATURES call
From: Zengruan Ye This provides a mechanism for querying which paravirtualized lock features are available in this hypervisor. Also add the header file which defines the ABI for the paravirtualized lock features we're about to add. Signed-off-by: Zengruan Ye --- arch/arm64/include/asm/pvlock-abi.h | 16 include/linux/arm-smccc.h | 13 + virt/kvm/arm/hypercalls.c | 3 +++ 3 files changed, 32 insertions(+) create mode 100644 arch/arm64/include/asm/pvlock-abi.h diff --git a/arch/arm64/include/asm/pvlock-abi.h b/arch/arm64/include/asm/pvlock-abi.h new file mode 100644 index ..06e0c3d7710a --- /dev/null +++ b/arch/arm64/include/asm/pvlock-abi.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + * Author: Zengruan Ye + */ + +#ifndef __ASM_PVLOCK_ABI_H +#define __ASM_PVLOCK_ABI_H + +struct pvlock_vcpu_state { + __le64 preempted; + /* Structure must be 64 byte aligned, pad to that size */ + u8 padding[56]; +} __packed; + +#endif diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 59494df0f55b..59e65a951959 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -377,5 +377,18 @@ asmlinkage void __arm_smccc_hvc(unsigned long a0, unsigned long a1, ARM_SMCCC_OWNER_STANDARD_HYP,\ 0x21) +/* Paravirtualised lock calls */ +#define ARM_SMCCC_HV_PV_LOCK_FEATURES \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_64,\ + ARM_SMCCC_OWNER_STANDARD_HYP,\ + 0x40) + +#define ARM_SMCCC_HV_PV_LOCK_PREEMPTED \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_64,\ + ARM_SMCCC_OWNER_STANDARD_HYP,\ + 0x41) + #endif /*__ASSEMBLY__*/ #endif /*__LINUX_ARM_SMCCC_H*/ diff --git a/virt/kvm/arm/hypercalls.c b/virt/kvm/arm/hypercalls.c index 550dfa3e53cd..ff13871fd85a 100644 --- a/virt/kvm/arm/hypercalls.c +++ b/virt/kvm/arm/hypercalls.c @@ -52,6 +52,9 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu) case ARM_SMCCC_HV_PV_TIME_FEATURES: val = SMCCC_RET_SUCCESS; break; + case ARM_SMCCC_HV_PV_LOCK_FEATURES: + val = SMCCC_RET_SUCCESS; + break; } break; case ARM_SMCCC_HV_PV_TIME_FEATURES: -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH 5/5] KVM: arm64: Support the vcpu preemption check
From: Zengruan Ye Support the vcpu_is_preempted() functionality under KVM/arm64. This will enhance lock performance on overcommitted hosts (more runnable vcpus than physical cpus in the system) as doing busy waits for preempted vcpus will hurt system performance far worse than early yielding. unix benchmark result: host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 cpus guest: kernel 5.5.0-rc1, 16 vcpus test-case|after-patch| before-patch +---+-- Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 lps Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 MWIPS Execl Throughput | 3662.1 lps | 2718.0 lps File Copy 1024 bufsize 2000 maxblocks |432906.4 KBps |158011.8 KBps File Copy 256 bufsize 500 maxblocks|116023.0 KBps | 37664.0 KBps File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps |441108.8 KBps Pipe Throughput| 6405029.6 lps | 6021457.6 lps Pipe-based Context Switching |185872.7 lps |184255.3 lps Process Creation | 4025.7 lps | 3706.6 lps Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 lpm Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 lpm System Call Overhead | 3913363.1 lps | 3883287.8 lps +---+-- System Benchmarks Index Score | 1835.1 | 1327.6 Signed-off-by: Zengruan Ye --- arch/arm64/include/asm/paravirt.h | 3 + arch/arm64/kernel/paravirt.c | 91 +++ arch/arm64/kernel/setup.c | 2 + include/linux/cpuhotplug.h| 1 + 4 files changed, 97 insertions(+) diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h index 7b1c81b544bb..a2cd0183bbef 100644 --- a/arch/arm64/include/asm/paravirt.h +++ b/arch/arm64/include/asm/paravirt.h @@ -29,6 +29,8 @@ static inline u64 paravirt_steal_clock(int cpu) int __init pv_time_init(void); +int __init kvm_guest_init(void); + __visible bool __native_vcpu_is_preempted(int cpu); static inline bool pv_vcpu_is_preempted(int cpu) @@ -39,6 +41,7 @@ static inline bool pv_vcpu_is_preempted(int cpu) #else #define pv_time_init() do {} while (0) +#define kvm_guest_init() do {} while (0) #endif // CONFIG_PARAVIRT diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c index d8f1ba8c22ce..a86dead40473 100644 --- a/arch/arm64/kernel/paravirt.c +++ b/arch/arm64/kernel/paravirt.c @@ -22,6 +22,7 @@ #include #include #include +#include struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; @@ -158,3 +159,93 @@ int __init pv_time_init(void) return 0; } + +DEFINE_PER_CPU(struct pvlock_vcpu_state, pvlock_vcpu_region) __aligned(64); +EXPORT_PER_CPU_SYMBOL(pvlock_vcpu_region); + +static int pvlock_vcpu_state_dying_cpu(unsigned int cpu) +{ + struct pvlock_vcpu_state *reg; + + reg = this_cpu_ptr(&pvlock_vcpu_region); + if (!reg) + return -EFAULT; + + memset(reg, 0, sizeof(*reg)); + + return 0; +} + +static int init_pvlock_vcpu_state(unsigned int cpu) +{ + struct pvlock_vcpu_state *reg; + struct arm_smccc_res res; + + reg = this_cpu_ptr(&pvlock_vcpu_region); + if (!reg) + return -EFAULT; + + /* Pass the memory address to host via hypercall */ + arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_LOCK_PREEMPTED, +virt_to_phys(reg), &res); + + return 0; +} + +static bool kvm_vcpu_is_preempted(int cpu) +{ + struct pvlock_vcpu_state *reg = &per_cpu(pvlock_vcpu_region, cpu); + + if (reg) + return !!(reg->preempted & 1); + + return false; +} + +static int kvm_arm_init_pvlock(void) +{ + int ret; + + ret = cpuhp_setup_state(CPUHP_AP_ARM_KVM_PVLOCK_STARTING, + "hypervisor/arm/pvlock:starting", + init_pvlock_vcpu_state, + pvlock_vcpu_state_dying_cpu); + if (ret < 0) + return ret; + + pv_ops.lock.vcpu_is_preempted = kvm_vcpu_is_preempted; + + pr_info("using PV-lock preempted\n"); + + return 0; +} + +static bool has_kvm_pvlock(void) +{ + struct arm_smccc_res res; + + /* To detect the presence of PV lock support we require SMCCC 1.1+ */ + if (psci_ops.smccc_version < SMCCC_VERSION_1_1) + return false; + + arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID, +ARM_SMCCC_HV_PV_LOCK_FEATURES, &res); + + if (res.a0 != SMCCC_RET_SUCCESS) + return false; + + return true; +} + +int __init kv
[PATCH 4/5] KVM: arm64: Add interface to support vcpu preempted check
From: Zengruan Ye This is to fix some lock holder preemption issues. Some other locks implementation do a spin loop before acquiring the lock itself. Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It takes the cpu as parameter and return true if the cpu is preempted. Then kernel can break the spin loops upon the retval of vcpu_is_preempted. As kernel has used this interface, So lets support it. Signed-off-by: Zengruan Ye --- arch/arm64/include/asm/paravirt.h | 12 arch/arm64/include/asm/spinlock.h | 7 +++ arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/paravirt-spinlocks.c | 13 + arch/arm64/kernel/paravirt.c | 4 +++- 5 files changed, 36 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/kernel/paravirt-spinlocks.c diff --git a/arch/arm64/include/asm/paravirt.h b/arch/arm64/include/asm/paravirt.h index cf3a0fd7c1a7..7b1c81b544bb 100644 --- a/arch/arm64/include/asm/paravirt.h +++ b/arch/arm64/include/asm/paravirt.h @@ -11,8 +11,13 @@ struct pv_time_ops { unsigned long long (*steal_clock)(int cpu); }; +struct pv_lock_ops { + bool (*vcpu_is_preempted)(int cpu); +}; + struct paravirt_patch_template { struct pv_time_ops time; + struct pv_lock_ops lock; }; extern struct paravirt_patch_template pv_ops; @@ -24,6 +29,13 @@ static inline u64 paravirt_steal_clock(int cpu) int __init pv_time_init(void); +__visible bool __native_vcpu_is_preempted(int cpu); + +static inline bool pv_vcpu_is_preempted(int cpu) +{ + return pv_ops.lock.vcpu_is_preempted(cpu); +} + #else #define pv_time_init() do {} while (0) diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h index b093b287babf..45ff1b2949a6 100644 --- a/arch/arm64/include/asm/spinlock.h +++ b/arch/arm64/include/asm/spinlock.h @@ -7,8 +7,15 @@ #include #include +#include /* See include/linux/spinlock.h */ #define smp_mb__after_spinlock() smp_mb() +#define vcpu_is_preempted vcpu_is_preempted +static inline bool vcpu_is_preempted(long cpu) +{ + return pv_vcpu_is_preempted(cpu); +} + #endif /* __ASM_SPINLOCK_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index fc6488660f64..b23cdae433a4 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -50,7 +50,7 @@ obj-$(CONFIG_ARMV8_DEPRECATED)+= armv8_deprecated.o obj-$(CONFIG_ACPI) += acpi.o obj-$(CONFIG_ACPI_NUMA)+= acpi_numa.o obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL) += acpi_parking_protocol.o -obj-$(CONFIG_PARAVIRT) += paravirt.o +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt-spinlocks.o obj-$(CONFIG_RANDOMIZE_BASE) += kaslr.o obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o obj-$(CONFIG_KEXEC_CORE) += machine_kexec.o relocate_kernel.o \ diff --git a/arch/arm64/kernel/paravirt-spinlocks.c b/arch/arm64/kernel/paravirt-spinlocks.c new file mode 100644 index ..718aa773d45c --- /dev/null +++ b/arch/arm64/kernel/paravirt-spinlocks.c @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright(c) 2019 Huawei Technologies Co., Ltd + * Author: Zengruan Ye + */ + +#include +#include + +__visible bool __native_vcpu_is_preempted(int cpu) +{ + return false; +} diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c index 1ef702b0be2d..d8f1ba8c22ce 100644 --- a/arch/arm64/kernel/paravirt.c +++ b/arch/arm64/kernel/paravirt.c @@ -26,7 +26,9 @@ struct static_key paravirt_steal_enabled; struct static_key paravirt_steal_rq_enabled; -struct paravirt_patch_template pv_ops; +struct paravirt_patch_template pv_ops = { + .lock.vcpu_is_preempted = __native_vcpu_is_preempted, +}; EXPORT_SYMBOL_GPL(pv_ops); struct pv_time_stolen_time_region { -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2] KVM: arm/arm64: Re-check VMA on detecting a poisoned page
When we check for a poisoned page, we use the VMA to tell userspace about the looming disaster. But we pass a pointer to this VMA after having released the mmap_sem, which isn't a good idea. Instead, stash the shift value that goes with this pfn while we are holding the mmap_sem. Reported-by: Marc Zyngier Signed-off-by: James Morse --- Based on Marc's patch: Link: lore.kernel.org/r/20191211165651.7889-3-...@kernel.org virt/kvm/arm/mmu.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 38b4c910b6c3..bb0f8d648678 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1591,16 +1591,8 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size) __invalidate_icache_guest_page(pfn, size); } -static void kvm_send_hwpoison_signal(unsigned long address, -struct vm_area_struct *vma) +static void kvm_send_hwpoison_signal(unsigned long address, short lsb) { - short lsb; - - if (is_vm_hugetlb_page(vma)) - lsb = huge_page_shift(hstate_vma(vma)); - else - lsb = PAGE_SHIFT; - send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb, current); } @@ -1673,6 +1665,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm *kvm = vcpu->kvm; struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache; struct vm_area_struct *vma; + short vma_shift; kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); @@ -1696,7 +1689,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - vma_pagesize = vma_kernel_pagesize(vma); + if (is_vm_hugetlb_page(vma)) + vma_shift = huge_page_shift(hstate_vma(vma)); + else + vma_shift = PAGE_SHIFT; + + vma_pagesize = 1ULL << vma_shift; if (logging_active || !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { force_pte = true; @@ -1735,7 +1733,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable); if (pfn == KVM_PFN_ERR_HWPOISON) { - kvm_send_hwpoison_signal(hva, vma); + kvm_send_hwpoison_signal(hva, vma_shift); return 0; } if (is_error_noslot_pfn(pfn)) -- 2.24.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm