[PATCH v4 2/2] memblock: make memblock_find_in_range method private
From: Mike Rapoport There are a lot of uses of memblock_find_in_range() along with memblock_reserve() from the times memblock allocation APIs did not exist. memblock_find_in_range() is the very core of memblock allocations, so any future changes to its internal behaviour would mandate updates of all the users outside memblock. Replace the calls to memblock_find_in_range() with an equivalent calls to memblock_phys_alloc() and memblock_phys_alloc_range() and make memblock_find_in_range() private method of memblock. This simplifies the callers, ensures that (unlikely) errors in memblock_reserve() are handled and improves maintainability of memblock_find_in_range(). Signed-off-by: Mike Rapoport Acked-by: Kirill A. Shutemov Acked-by: Rafael J. Wysocki Acked-by: Russell King (Oracle) Acked-by: Nick Kossifidis Reviewed-by: Catalin Marinas --- arch/arm/kernel/setup.c | 20 +- arch/arm64/kvm/hyp/reserved_mem.c | 9 +++ arch/arm64/mm/init.c | 36 - arch/mips/kernel/setup.c | 14 +- arch/riscv/mm/init.c | 44 ++- arch/s390/kernel/setup.c | 10 --- arch/x86/kernel/aperture_64.c | 5 ++-- arch/x86/mm/init.c| 11 arch/x86/mm/numa.c| 5 ++-- arch/x86/mm/numa_emulation.c | 5 ++-- arch/x86/realmode/init.c | 2 +- drivers/acpi/tables.c | 5 ++-- drivers/base/arch_numa.c | 5 +--- drivers/of/of_reserved_mem.c | 12 ++--- include/linux/memblock.h | 2 -- mm/memblock.c | 2 +- 16 files changed, 71 insertions(+), 116 deletions(-) diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index f97eb2371672..284a80c0b6e1 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -1012,31 +1012,25 @@ static void __init reserve_crashkernel(void) unsigned long long lowmem_max = __pa(high_memory - 1) + 1; if (crash_max > lowmem_max) crash_max = lowmem_max; - crash_base = memblock_find_in_range(CRASH_ALIGN, crash_max, - crash_size, CRASH_ALIGN); + + crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, + CRASH_ALIGN, crash_max); if (!crash_base) { pr_err("crashkernel reservation failed - No suitable area found.\n"); return; } } else { + unsigned long long crash_max = crash_base + crash_size; unsigned long long start; - start = memblock_find_in_range(crash_base, - crash_base + crash_size, - crash_size, SECTION_SIZE); - if (start != crash_base) { + start = memblock_phys_alloc_range(crash_size, SECTION_SIZE, + crash_base, crash_max); + if (!start) { pr_err("crashkernel reservation failed - memory is in use.\n"); return; } } - ret = memblock_reserve(crash_base, crash_size); - if (ret < 0) { - pr_warn("crashkernel reservation failed - memory is in use (0x%lx)\n", - (unsigned long)crash_base); - return; - } - pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n", (unsigned long)(crash_size >> 20), (unsigned long)(crash_base >> 20), diff --git a/arch/arm64/kvm/hyp/reserved_mem.c b/arch/arm64/kvm/hyp/reserved_mem.c index d654921dd09b..578670e3f608 100644 --- a/arch/arm64/kvm/hyp/reserved_mem.c +++ b/arch/arm64/kvm/hyp/reserved_mem.c @@ -92,12 +92,10 @@ void __init kvm_hyp_reserve(void) * this is unmapped from the host stage-2, and fallback to PAGE_SIZE. */ hyp_mem_size = hyp_mem_pages << PAGE_SHIFT; - hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(), - ALIGN(hyp_mem_size, PMD_SIZE), - PMD_SIZE); + hyp_mem_base = memblock_phys_alloc(ALIGN(hyp_mem_size, PMD_SIZE), + PMD_SIZE); if (!hyp_mem_base) - hyp_mem_base = memblock_find_in_range(0, memblock_end_of_DRAM(), - hyp_mem_size, PAGE_SIZE); + hyp_mem_base = memblock_phys_alloc(hyp_mem_size, PAGE_SIZE); else hyp_mem_size = ALIGN(hyp_mem_size, PMD_SIZE); @@ -105,7 +103,6 @@ void __init kvm_hyp_reserve(void) kvm_err("Failed to reserve hyp memory\n"); return; } - me
[PATCH v4 1/2] x86/mm: memory_map_top_down: remove spurious reservation of upper 2M
From: Mike Rapoport memory_map_top_down() function skips the upper 2M in the beginning and maps them in the end because "xen has big range in reserved near end of ram, skip it at first" It appears, though, that the root cause was that there was not enough memory in the range [min_pfn_mapped, max_pfn_mapped] to allocate page tables from that range in alloc_low_pages() because min_pfn_mapped didn't reflect that actual minimal pfn that was already mapped but remained close to the end of the range being mapped by memory_map_top_down(). This happened because min_pfn_mapped is updated at every iteration of the loop in memory_map_top_down(), but there is another loop in init_range_memory_mapping() that maps several regions below the current min_pfn_mapped without updating this variable. Move the update of min_pfn_mapped to add_pfn_range_mapped() next to the update of max_pfn_mapped so that every time a new range is mapped both limits will be updated accordingly, and remove the spurious "reservation" of upper 2M. Signed-off-by: Mike Rapoport --- arch/x86/mm/init.c | 16 +--- 1 file changed, 5 insertions(+), 11 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 75ef19aa8903..87150961fdca 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -486,6 +486,7 @@ static void add_pfn_range_mapped(unsigned long start_pfn, unsigned long end_pfn) nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_MAX_ENTRIES); max_pfn_mapped = max(max_pfn_mapped, end_pfn); + min_pfn_mapped = min(min_pfn_mapped, start_pfn); if (start_pfn < (1UL<<(32-PAGE_SHIFT))) max_low_pfn_mapped = max(max_low_pfn_mapped, @@ -605,20 +606,14 @@ static unsigned long __init get_new_step_size(unsigned long step_size) static void __init memory_map_top_down(unsigned long map_start, unsigned long map_end) { - unsigned long real_end, last_start; - unsigned long step_size; - unsigned long addr; + unsigned long real_end = ALIGN_DOWN(map_end, PMD_SIZE); + unsigned long last_start = real_end; + /* step_size need to be small so pgt_buf from BRK could cover it */ + unsigned long step_size = PMD_SIZE; unsigned long mapped_ram_size = 0; - /* xen has big range in reserved near end of ram, skip it at first.*/ - addr = memblock_find_in_range(map_start, map_end, PMD_SIZE, PMD_SIZE); - real_end = addr + PMD_SIZE; - - /* step_size need to be small so pgt_buf from BRK could cover it */ - step_size = PMD_SIZE; max_pfn_mapped = 0; /* will get exact value next */ min_pfn_mapped = real_end >> PAGE_SHIFT; - last_start = real_end; /* * We start from the top (end of memory) and go to the bottom. @@ -638,7 +633,6 @@ static void __init memory_map_top_down(unsigned long map_start, mapped_ram_size += init_range_memory_mapping(start, last_start); last_start = start; - min_pfn_mapped = last_start >> PAGE_SHIFT; if (mapped_ram_size >= step_size) step_size = get_new_step_size(step_size); } -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 0/2] memblock: make memblock_find_in_range method private
From: Mike Rapoport Hi, This is v4 of "memblock: make memblock_find_in_range method private" patch that essentially replaces memblock_find_in_range() + memblock_reserve() calls with equivalent calls to memblock_phys_alloc() and prevents usage of memblock_find_in_range() outside memblock itself. The patch uncovered an issue with top down memory mapping on x86 and this version has a preparation patch that addresses this issue. Guenter, I didn't add your Tested-by because the patch that addresses the crashes differs from the one you've tested. v4: * Add patch that prevents the crashes reported by Guenter Roeck on x86/i386 on QEMU with 256M or 512M of memory and EFI boot enabled. * Add Acked-by and Reviewed-by, thanks everybidy! v3: https://lore.kernel.org/lkml/20210803064218.6611-1-r...@kernel.org * simplify check for exact crash kerenl allocation on arm, per Rob * make crash_max unsigned long long on arm64, per Rob v2: https://lore.kernel.org/lkml/20210802063737.22733-1-r...@kernel.org * don't change error message in arm::reserve_crashkernel(), per Russell v1: https://lore.kernel.org/lkml/20210730104039.7047-1-r...@kernel.org Mike Rapoport (2): x86/mm: memory_map_top_down: remove spurious reservation of upper 2M memblock: make memblock_find_in_range method private arch/arm/kernel/setup.c | 20 +- arch/arm64/kvm/hyp/reserved_mem.c | 9 +++ arch/arm64/mm/init.c | 36 - arch/mips/kernel/setup.c | 14 +- arch/riscv/mm/init.c | 44 ++- arch/s390/kernel/setup.c | 10 --- arch/x86/kernel/aperture_64.c | 5 ++-- arch/x86/mm/init.c| 27 +++ arch/x86/mm/numa.c| 5 ++-- arch/x86/mm/numa_emulation.c | 5 ++-- arch/x86/realmode/init.c | 2 +- drivers/acpi/tables.c | 5 ++-- drivers/base/arch_numa.c | 5 +--- drivers/of/of_reserved_mem.c | 12 ++--- include/linux/memblock.h | 2 -- mm/memblock.c | 2 +- 16 files changed, 76 insertions(+), 127 deletions(-) base-commit: ff1176468d368232b684f75e82563369208bc371 -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 5/5] KVM: arm64: Drop unused REQUIRES_VIRT
This seems like a residue from the past. REQUIRES_VIRT is no more available . Hence it can just be dropped along with the related code section. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/kvm/arm.c | 4 1 file changed, 4 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index ee9b1166f330..d779a29c5607 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -42,10 +42,6 @@ #include #include -#ifdef REQUIRES_VIRT -__asm__(".arch_extension virt"); -#endif - static enum kvm_mode kvm_mode = KVM_MODE_DEFAULT; DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized); -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 4/5] KVM: arm64: Drop check_kvm_target_cpu() based percpu probe
kvm_target_cpu() never returns a negative error code, so check_kvm_target() would never have 'ret' filled with a negative error code. Hence the percpu probe via check_kvm_target_cpu() does not make sense as its never going to find an unsupported CPU, forcing kvm_arch_init() to exit early. Hence lets just drop this percpu probe (and also check_kvm_target_cpu()) altogether. While here, this also changes kvm_target_cpu() return type to a u32, making it explicit that an error code will not be returned from this function. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Acked-by: Will Deacon Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/kvm/arm.c | 16 +--- arch/arm64/kvm/guest.c| 4 ++-- 3 files changed, 4 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 41911585ae0c..26bbd84a02de 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -66,7 +66,7 @@ DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); extern unsigned int kvm_sve_max_vl; int kvm_arm_init_sve(void); -int __attribute_const__ kvm_target_cpu(void); +u32 __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 19560e457c11..ee9b1166f330 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1035,7 +1035,7 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu, const struct kvm_vcpu_init *init) { unsigned int i, ret; - int phys_target = kvm_target_cpu(); + u32 phys_target = kvm_target_cpu(); if (init->target != phys_target) return -EINVAL; @@ -2010,11 +2010,6 @@ static int finalize_hyp_mode(void) return 0; } -static void check_kvm_target_cpu(void *ret) -{ - *(int *)ret = kvm_target_cpu(); -} - struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) { struct kvm_vcpu *vcpu; @@ -2074,7 +2069,6 @@ void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *cons) int kvm_arch_init(void *opaque) { int err; - int ret, cpu; bool in_hyp_mode; if (!is_hyp_mode_available()) { @@ -2089,14 +2083,6 @@ int kvm_arch_init(void *opaque) kvm_info("Guests without required CPU erratum workarounds can deadlock system!\n" \ "Only trusted guests should be used on this system.\n"); - for_each_online_cpu(cpu) { - smp_call_function_single(cpu, check_kvm_target_cpu, &ret, 1); - if (ret < 0) { - kvm_err("Error, CPU %d not supported!\n", cpu); - return -ENODEV; - } - } - err = kvm_set_ipa_limit(); if (err) return err; diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 1dfb83578277..8ce850fa7b49 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -842,7 +842,7 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu, return 0; } -int __attribute_const__ kvm_target_cpu(void) +u32 __attribute_const__ kvm_target_cpu(void) { unsigned long implementor = read_cpuid_implementor(); unsigned long part_number = read_cpuid_part_number(); @@ -874,7 +874,7 @@ int __attribute_const__ kvm_target_cpu(void) int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init) { - int target = kvm_target_cpu(); + u32 target = kvm_target_cpu(); if (target < 0) return -ENODEV; -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 3/5] KVM: arm64: Drop init_common_resources()
Could do without this additional indirection via init_common_resources() by just calling kvm_set_ipa_limit() directly instead. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/kvm/arm.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index e9a2b8f27792..19560e457c11 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1696,11 +1696,6 @@ static bool init_psci_relay(void) return true; } -static int init_common_resources(void) -{ - return kvm_set_ipa_limit(); -} - static int init_subsystems(void) { int err = 0; @@ -2102,7 +2097,7 @@ int kvm_arch_init(void *opaque) } } - err = init_common_resources(); + err = kvm_set_ipa_limit(); if (err) return err; -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 2/5] KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA
Drop hard coded value for the minimum supported IPA range bits (i.e 32). Instead use ARM64_MIN_PARANGE_BITS which improves the code readability. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/kvm/reset.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index cba7872d69a8..acf309698f33 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -369,7 +369,7 @@ int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type) phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type); if (phys_shift) { if (phys_shift > kvm_ipa_limit || - phys_shift < 32) + phys_shift < ARM64_MIN_PARANGE_BITS) return -EINVAL; } else { phys_shift = KVM_PHYS_SHIFT; -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 1/5] arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros
Currently there are macros only for 48 and 52 bits parange value extracted from the ID_AA64MMFR0.PARANGE field. This change completes the enumeration and updates the helper id_aa64mmfr0_parange_to_phys_shift(). While here it also defines ARM64_MIN_PARANGE_BITS as the absolute minimum shift value PA range which could be supported on a given platform. Cc: Marc Zyngier Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/include/asm/cpufeature.h | 14 +++--- arch/arm64/include/asm/sysreg.h | 7 +++ 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index 9bb9d11750d7..8633bdb21f33 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -781,13 +781,13 @@ extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) { switch (parange) { - case 0: return 32; - case 1: return 36; - case 2: return 40; - case 3: return 42; - case 4: return 44; - case 5: return 48; - case 6: return 52; + case ID_AA64MMFR0_PARANGE_32: return 32; + case ID_AA64MMFR0_PARANGE_36: return 36; + case ID_AA64MMFR0_PARANGE_40: return 40; + case ID_AA64MMFR0_PARANGE_42: return 42; + case ID_AA64MMFR0_PARANGE_44: return 44; + case ID_AA64MMFR0_PARANGE_48: return 48; + case ID_AA64MMFR0_PARANGE_52: return 52; /* * A future PE could use a value unknown to the kernel. * However, by the "D10.1.4 Principles of the ID scheme diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 7b9c3acba684..504e909129ea 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -853,9 +853,16 @@ #define ID_AA64MMFR0_TGRAN64_SUPPORTED 0x0 #define ID_AA64MMFR0_TGRAN16_NI0x0 #define ID_AA64MMFR0_TGRAN16_SUPPORTED 0x1 +#define ID_AA64MMFR0_PARANGE_320x0 +#define ID_AA64MMFR0_PARANGE_360x1 +#define ID_AA64MMFR0_PARANGE_400x2 +#define ID_AA64MMFR0_PARANGE_420x3 +#define ID_AA64MMFR0_PARANGE_440x4 #define ID_AA64MMFR0_PARANGE_480x5 #define ID_AA64MMFR0_PARANGE_520x6 +#define ARM64_MIN_PARANGE_BITS 32 + #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0 #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE0x1 #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2 -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2 0/5] KVM: arm64: General cleanups
This series contains mostly unrelated general cleanups. This series applies on v5.14-rc5 and has been boot tested with different page sized guests. Changes in V2: - Dropped the first patch regarding PAGE_[SHIFT|SIZE] per Marc - Added remaining ID_AA64MMFR0_PARANGE_ macros and ARM64_MIN_PARANGE_BITS per Marc - Dropped memory and cycle reference from commit message in [PATCH 3/5] - Changed return value as u32 in kvm_target_cpu() per Will Changes in V1: https://lore.kernel.org/lkml/1628578961-29097-1-git-send-email-anshuman.khand...@arm.com/ Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Anshuman Khandual (5): arm64/mm: Add remaining ID_AA64MMFR0_PARANGE_ macros KVM: arm64: Use ARM64_MIN_PARANGE_BITS as the minimum supported IPA KVM: arm64: Drop init_common_resources() KVM: arm64: Drop check_kvm_target_cpu() based percpu probe KVM: arm64: Drop unused REQUIRES_VIRT arch/arm64/include/asm/cpufeature.h | 14 +++--- arch/arm64/include/asm/kvm_host.h | 2 +- arch/arm64/include/asm/sysreg.h | 7 +++ arch/arm64/kvm/arm.c| 27 ++- arch/arm64/kvm/guest.c | 4 ++-- arch/arm64/kvm/reset.c | 2 +- 6 files changed, 20 insertions(+), 36 deletions(-) -- 2.20.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state
On Wed, Aug 11, 2021 at 6:05 AM Paolo Bonzini wrote: > > On 04/08/21 10:57, Oliver Upton wrote: > > KVM's current means of saving/restoring system counters is plagued with > > temporal issues. At least on ARM64 and x86, we migrate the guest's > > system counter by-value through the respective guest system register > > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is > > brittle as the state is not idempotent: the host system counter is still > > oscillating between the attempted save and restore. Furthermore, VMMs > > may wish to transparently live migrate guest VMs, meaning that they > > include the elapsed time due to live migration blackout in the guest > > system counter view. The VMM thread could be preempted for any number of > > reasons (scheduler, L0 hypervisor under nested) between the time that > > it calculates the desired guest counter value and when KVM actually sets > > this counter state. > > > > Despite the value-based interface that we present to userspace, KVM > > actually has idempotent guest controls by way of system counter offsets. > > We can avoid all of the issues associated with a value-based interface > > by abstracting these offset controls in new ioctls. This series > > introduces new vCPU device attributes to provide userspace access to the > > vCPU's system counter offset. > > > > Patch 1 addresses a possible race in KVM_GET_CLOCK where > > use_master_clock is read outside of the pvclock_gtod_sync_lock. > > > > Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK > > ioctls to provide userspace with a (host_tsc, realtime) instant. This is > > essential for a VMM to perform precise migration of the guest's system > > counters. > > > > Patches 3-4 are some preparatory changes for exposing the TSC offset to > > userspace. Patch 5 provides a vCPU attribute to provide userspace access > > to the TSC offset. > > > > Patches 6-7 implement a test for the new additions to > > KVM_{GET,SET}_CLOCK. > > > > Patch 8 fixes some assertions in the kvm device attribute helpers. > > > > Patches 9-10 implement at test for the tsc offset attribute introduced in > > patch 5. > > The x86 parts look good, except that patch 3 is a bit redundant with my > idea of altogether getting rid of the pvclock_gtod_sync_lock. That said > I agree that patches 1 and 2 (and extracting kvm_vm_ioctl_get_clock and > kvm_vm_ioctl_set_clock) should be done before whatever locking changes > have to be done. Following up on patch 3. > Time is ticking for 5.15 due to my vacation, I'll see if I have some > time to look at it further next week. > > I agree that arm64 can be done separately from x86. Marc, just a disclaimer: I'm going to separate these two series, although there will still exist dependencies in the selftests changes. Otherwise, kernel changes are disjoint. -- Thanks, Oliver ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm64: Return -EPERM from __pkvm_host_share_hyp()
Fix the error code returned by __pkvm_host_share_hyp() when the host attempts to share with EL2 a page that has already been shared with another entity. Reported-by: Will Deacon Signed-off-by: Quentin Perret --- This patch fixes a bug introduced in the stage-2 ownership series which is already queued in kvmarm/next: https://lore.kernel.org/lkml/20210809152448.1810400-1-qper...@google.com/ I've starred at this code for hours, but that clearly was not enough. Oh well ... --- arch/arm64/kvm/hyp/nvhe/mem_protect.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c index 8165390d3ec9..6ec695311498 100644 --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c @@ -404,7 +404,7 @@ int __pkvm_host_share_hyp(u64 pfn) cur = kvm_pgtable_hyp_pte_prot(pte); prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED); if (!check_prot(cur, prot, ~prot)) - ret = EPERM; + ret = -EPERM; goto unlock; map_shared: -- 2.32.0.605.g8dce9f2422-goog ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state
On Wed, 11 Aug 2021 19:56:22 +0100, Oliver Upton wrote: [...] > > Time is ticking for 5.15 due to my vacation, I'll see if I have some > > time to look at it further next week. > > > > I agree that arm64 can be done separately from x86. > > Marc, just a disclaimer: > > I'm going to separate these two series, although there will still > exist dependencies in the selftests changes. Otherwise, kernel changes > are disjoint. No problem. The selftests can even sit in a third series if that makes it easier. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Return -EPERM from __pkvm_host_share_hyp()
On Wed, 11 Aug 2021 18:36:25 +0100, Quentin Perret wrote: > Fix the error code returned by __pkvm_host_share_hyp() when the > host attempts to share with EL2 a page that has already been shared with > another entity. Applied to next, thanks! [1/1] KVM: arm64: Return -EPERM from __pkvm_host_share_hyp() commit: 12593568d7319c34c72038ea799ab1bd0f0eb01c Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 13/21] KVM: arm64: Allow userspace to configure a vCPU's virtual offset
On Tue, 10 Aug 2021 10:44:01 +0100, Oliver Upton wrote: > > On Tue, Aug 10, 2021 at 2:35 AM Marc Zyngier wrote: > > > > On Wed, 04 Aug 2021 09:58:11 +0100, > > Oliver Upton wrote: > > > > > > Allow userspace to access the guest's virtual counter-timer offset > > > through the ONE_REG interface. The value read or written is defined to > > > be an offset from the guest's physical counter-timer. Add some > > > documentation to clarify how a VMM should use this and the existing > > > CNTVCT_EL0. > > > > > > Signed-off-by: Oliver Upton > > > --- > > > Documentation/virt/kvm/api.rst| 10 ++ > > > arch/arm64/include/uapi/asm/kvm.h | 1 + > > > arch/arm64/kvm/arch_timer.c | 11 +++ > > > arch/arm64/kvm/guest.c| 6 +- > > > include/kvm/arm_arch_timer.h | 1 + > > > 5 files changed, 28 insertions(+), 1 deletion(-) > > > > > > diff --git a/Documentation/virt/kvm/api.rst > > > b/Documentation/virt/kvm/api.rst > > > index 8d4a3471ad9e..28a65dc89985 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -2487,6 +2487,16 @@ arm64 system registers have the following id bit > > > patterns:: > > > derived from the register encoding for CNTV_CVAL_EL0. As this is > > > API, it must remain this way. > > > > > > +.. warning:: > > > + > > > + The value of KVM_REG_ARM_TIMER_OFFSET is defined as an offset from > > > + the guest's view of the physical counter-timer. > > > + > > > + Userspace should use either KVM_REG_ARM_TIMER_OFFSET or > > > + KVM_REG_ARM_TIMER_CVAL to pause and resume a guest's virtual > > > > You probably mean KVM_REG_ARM_TIMER_CNT here, despite the broken > > encoding. > > Indeed I do! > > > > > > + counter-timer. Mixed use of these registers could result in an > > > + unpredictable guest counter value. > > > + > > > arm64 firmware pseudo-registers have the following bit pattern:: > > > > > >0x6030 0014 > > > diff --git a/arch/arm64/include/uapi/asm/kvm.h > > > b/arch/arm64/include/uapi/asm/kvm.h > > > index b3edde68bc3e..949a31bc10f0 100644 > > > --- a/arch/arm64/include/uapi/asm/kvm.h > > > +++ b/arch/arm64/include/uapi/asm/kvm.h > > > @@ -255,6 +255,7 @@ struct kvm_arm_copy_mte_tags { > > > #define KVM_REG_ARM_TIMER_CTLARM64_SYS_REG(3, 3, 14, 3, > > > 1) > > > #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, > > > 2) > > > #define KVM_REG_ARM_TIMER_CNTARM64_SYS_REG(3, 3, 14, 3, > > > 2) > > > +#define KVM_REG_ARM_TIMER_OFFSET ARM64_SYS_REG(3, 4, 14, 0, 3) > > > > I don't think we can use the encoding for CNTPOFF_EL2 here, as it will > > eventually clash with a NV guest using the same feature for its own > > purpose. We don't want this offset to overlap with any of the existing > > features. > > > > I actually liked your previous proposal of controlling the physical > > offset via a device property, as it clearly indicated that you were > > dealing with non-architectural state. > > That's actually exactly what I did here :) That said, the macro name > is horribly obfuscated from CNTVOFF_EL2. I did this for the sake of > symmetry with other virtual counter-timer registers above, though this > may warrant special casing given the fact that we have a similarly > named device attribute to handle the physical offset. Gah, you are of course right. Ignore my rambling. The name is fine (or at least in keeping with existing quality level of the making). For the physical offset, something along the lines of KVM_ARM_VCPU_TIMER_PHYS_OFFSET is probably right (but feel free to be creative, I'm terrible at this stuff [1]). Thanks, M. [1] https://twitter.com/codinghorror/status/506010907021828096?lang=en -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH V2] KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
On Wed, 11 Aug 2021 16:41:15 +0530, Anshuman Khandual wrote: > Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot > be enabled as guest IPA size on 4K or 16K page size configurations. Hence > kvm_ipa_limit must be restricted to 48 bits. This change achieves required > IPA capping. > > Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA > size based on host configuration"), the problem here would have been just > latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), > which remains capped at 48 bits on 4K and 16K configs. Applied to next, thanks! [1/1] KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size commit: 5e5df9571c319fb107d7a523cc96fcc99961ee70 Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 00/21] KVM: Add idempotent controls for migrating system counter state
On 04/08/21 10:57, Oliver Upton wrote: KVM's current means of saving/restoring system counters is plagued with temporal issues. At least on ARM64 and x86, we migrate the guest's system counter by-value through the respective guest system register values (cntvct_el0, ia32_tsc). Restoring system counters by-value is brittle as the state is not idempotent: the host system counter is still oscillating between the attempted save and restore. Furthermore, VMMs may wish to transparently live migrate guest VMs, meaning that they include the elapsed time due to live migration blackout in the guest system counter view. The VMM thread could be preempted for any number of reasons (scheduler, L0 hypervisor under nested) between the time that it calculates the desired guest counter value and when KVM actually sets this counter state. Despite the value-based interface that we present to userspace, KVM actually has idempotent guest controls by way of system counter offsets. We can avoid all of the issues associated with a value-based interface by abstracting these offset controls in new ioctls. This series introduces new vCPU device attributes to provide userspace access to the vCPU's system counter offset. Patch 1 addresses a possible race in KVM_GET_CLOCK where use_master_clock is read outside of the pvclock_gtod_sync_lock. Patch 2 adopts Paolo's suggestion, augmenting the KVM_{GET,SET}_CLOCK ioctls to provide userspace with a (host_tsc, realtime) instant. This is essential for a VMM to perform precise migration of the guest's system counters. Patches 3-4 are some preparatory changes for exposing the TSC offset to userspace. Patch 5 provides a vCPU attribute to provide userspace access to the TSC offset. Patches 6-7 implement a test for the new additions to KVM_{GET,SET}_CLOCK. Patch 8 fixes some assertions in the kvm device attribute helpers. Patches 9-10 implement at test for the tsc offset attribute introduced in patch 5. The x86 parts look good, except that patch 3 is a bit redundant with my idea of altogether getting rid of the pvclock_gtod_sync_lock. That said I agree that patches 1 and 2 (and extracting kvm_vm_ioctl_get_clock and kvm_vm_ioctl_set_clock) should be done before whatever locking changes have to be done. Time is ticking for 5.15 due to my vacation, I'll see if I have some time to look at it further next week. I agree that arm64 can be done separately from x86. Paolo ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 01/21] KVM: x86: Fix potential race in KVM_GET_CLOCK
On 04/08/21 10:57, Oliver Upton wrote: Sean noticed that KVM_GET_CLOCK was checking kvm_arch.use_master_clock outside of the pvclock sync lock. This is problematic, as the clock value written to the user may or may not actually correspond to a stable TSC. Fix the race by populating the entire kvm_clock_data structure behind the pvclock_gtod_sync_lock. Suggested-by: Sean Christopherson Signed-off-by: Oliver Upton --- arch/x86/kvm/x86.c | 39 --- 1 file changed, 28 insertions(+), 11 deletions(-) I had a completely independent patch that fixed the same race. It unifies the read sides of tsc_write_lock and pvclock_gtod_sync_lock into a seqcount (and replaces pvclock_gtod_sync_lock with kvm->lock on the write side). I attach it now (based on https://lore.kernel.org/kvm/20210811102356.3406687-1-pbonz...@redhat.com/T/#t), but the testing was extremely light so I'm not sure I will be able to include it in 5.15. Paolo -- 8< - From: Paolo Bonzini Date: Thu, 8 Apr 2021 05:03:44 -0400 Subject: [PATCH] kvm: x86: protect masterclock with a seqcount Protect the reference point for kvmclock with a seqcount, so that kvmclock updates for all vCPUs can proceed in parallel. Xen runstate updates will also run in parallel and not bounce the kvmclock cacheline. This also makes it possible to use KVM_REQ_CLOCK_UPDATE (which will block on the seqcount) to prevent entering in the guests until pvclock_update_vm_gtod_copy is complete, and thus to get rid of KVM_REQ_MCLOCK_INPROGRESS. nr_vcpus_matched_tsc is updated outside pvclock_update_vm_gtod_copy though, so a spinlock must be kept for that one. Signed-off-by: Paolo Bonzini diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst index 8138201efb09..ed41da172e02 100644 --- a/Documentation/virt/kvm/locking.rst +++ b/Documentation/virt/kvm/locking.rst @@ -29,6 +29,8 @@ The acquisition orders for mutexes are as follows: On x86: +- the seqcount kvm->arch.pvclock_sc is written under kvm->lock. + - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock - kvm->arch.mmu_lock is an rwlock. kvm->arch.tdp_mmu_pages_lock is diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 20daaf67a5bf..6889aab92362 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -68,8 +68,7 @@ #define KVM_REQ_PMIKVM_ARCH_REQ(11) #define KVM_REQ_SMIKVM_ARCH_REQ(12) #define KVM_REQ_MASTERCLOCK_UPDATE KVM_ARCH_REQ(13) -#define KVM_REQ_MCLOCK_INPROGRESS \ - KVM_ARCH_REQ_FLAGS(14, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +/* 14 unused */ #define KVM_REQ_SCAN_IOAPIC \ KVM_ARCH_REQ_FLAGS(15, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_GLOBAL_CLOCK_UPDATEKVM_ARCH_REQ(16) @@ -1067,6 +1066,11 @@ struct kvm_arch { unsigned long irq_sources_bitmap; s64 kvmclock_offset; + + /* +* This also protects nr_vcpus_matched_tsc which is read from a +* preemption-disabled region, so it must be a raw spinlock. +*/ raw_spinlock_t tsc_write_lock; u64 last_tsc_nsec; u64 last_tsc_write; @@ -1077,7 +1081,7 @@ struct kvm_arch { u64 cur_tsc_generation; int nr_vcpus_matched_tsc; - spinlock_t pvclock_gtod_sync_lock; + seqcount_mutex_t pvclock_sc; bool use_master_clock; u64 master_kernel_ns; u64 master_cycle_now; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 74145a3fd4f2..7d73c5560260 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2533,9 +2533,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; kvm_vcpu_write_tsc_offset(vcpu, offset); - raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); - spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); if (!matched) { kvm->arch.nr_vcpus_matched_tsc = 0; } else if (!already_matched) { @@ -2543,7 +2541,7 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) } kvm_track_tsc_matching(vcpu); - spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); } static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, @@ -2730,9 +2728,7 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) struct kvm_arch *ka = &kvm->arch; int vclock_mode; bool host_tsc_clocksource, vcpus_matched; - - vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == - atomic_read(&kvm->online_vcpus)); + unsigned long flags; /* * If the host uses TSC clock, then passthrough TSC as stable @@ -2742,9 +2738,14 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm) &ka->ma
Re: [PATCH v4 00/21] Track shared pages at EL2 in protected mode
On Mon, 9 Aug 2021 16:24:27 +0100, Quentin Perret wrote: > This is v4 of the patch series previously posted here: > > https://lore.kernel.org/lkml/20210729132818.4091769-1-qper...@google.com/ > > This series aims to improve how the nVHE hypervisor tracks ownership of > memory pages when running in protected mode ("kvm-arm.mode=protected" on > the kernel command line). > > [...] Applied to next, thanks! [01/21] KVM: arm64: Add hyp_spin_is_locked() for basic locking assertions at EL2 commit: d21292f13f1f0721d60e8122e2db46bea8cf6950 [02/21] KVM: arm64: Introduce hyp_assert_lock_held() commit: 8e049e0daf23aa380c264e5e15e4c64ea5497ed7 [03/21] KVM: arm64: Provide the host_stage2_try() helper macro commit: 1bac49d490cbc813f407a5c9806e464bf4a300c9 [05/21] KVM: arm64: Expose page-table helpers commit: 51add457733bbc4a442fc280d73d14bfe262e4a0 [06/21] KVM: arm64: Optimize host memory aborts commit: c4f0935e4d957bfcea25ad76860445660a60f3fd [07/21] KVM: arm64: Rename KVM_PTE_LEAF_ATTR_S2_IGNORED commit: 178cac08d588e7406a09351a992f57892d8d9cc9 [08/21] KVM: arm64: Don't overwrite software bits with owner id commit: 8a0282c68121e53ab17413283cfed408a47e1a2a [09/21] KVM: arm64: Tolerate re-creating hyp mappings to set software bits commit: b53846c5f279cb5329b82f19a7d313f02cb9d21c [10/21] KVM: arm64: Enable forcing page-level stage-2 mappings commit: 5651311941105ca077d3ab74dd4a92e646ecf7fb [11/21] KVM: arm64: Allow populating software bits commit: 4505e9b624cefafa4b75d8a28e72f32076c33375 [12/21] KVM: arm64: Add helpers to tag shared pages in SW bits commit: ec250a67ea8db6209918a389554cf3aec0395b1f [13/21] KVM: arm64: Expose host stage-2 manipulation helpers commit: 39257da0e04e5cdb1e4a3ca715dc3d949fe8b059 [14/21] KVM: arm64: Expose pkvm_hyp_id commit: 2d77e238badb022adb364332b7d6a1d627f77145 [15/21] KVM: arm64: Introduce addr_is_memory() commit: e009dce1292c37cf8ee7c33e0887ad3c642f980f [16/21] KVM: arm64: Enable retrieving protections attributes of PTEs commit: 9024b3d0069ab4b8ef70cf55f0ee09e61f3a0747 [17/21] KVM: arm64: Mark host bss and rodata section as shared commit: 2c50166c62ba7f3c23c1bbdbb9324db462ddc97b [18/21] KVM: arm64: Remove __pkvm_mark_hyp commit: ad0e0139a8e163245d8f44ab4f6ec3bc9b08034d [19/21] KVM: arm64: Refactor protected nVHE stage-1 locking commit: f9370010e92638f66473baf342e19de940403362 [20/21] KVM: arm64: Restrict EL2 stage-1 changes in protected mode commit: 66c57edd3bc79e3527daaae8123f72ecd1e3fa25 [21/21] KVM: arm64: Make __pkvm_create_mappings static commit: 64a80fb766f9a91e26930bfc56d8e7c12425df12 Note that patch #4 has been used as the base for this series, and is already part of the mapping level rework. Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH V2] KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot be enabled as guest IPA size on 4K or 16K page size configurations. Hence kvm_ipa_limit must be restricted to 48 bits. This change achieves required IPA capping. Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration"), the problem here would have been just latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), which remains capped at 48 bits on 4K and 16K configs. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Fixes: c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on host configuration") Signed-off-by: Anshuman Khandual --- This applies on v5.14-rc5 Changes in V2: - Replaced IS_ENABLED() based check with PAGE_SIZE based one per Marc - Moved the conditional code block near parange assignment per Marc Changes in V1: https://lore.kernel.org/lkml/1628657549-27584-1-git-send-email-anshuman.khand...@arm.com/ arch/arm64/kvm/reset.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c index cba7872..78d4bd8 100644 --- a/arch/arm64/kvm/reset.c +++ b/arch/arm64/kvm/reset.c @@ -317,6 +317,14 @@ int kvm_set_ipa_limit(void) mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); parange = cpuid_feature_extract_unsigned_field(mmfr0, ID_AA64MMFR0_PARANGE_SHIFT); + /* +* IPA size beyond 48 bits could not be supported +* on either 4K or 16K page size. Hence let's cap +* it to 48 bits, in case it's reported as larger +* on the system. +*/ + if (PAGE_SIZE != SZ_64K) + parange = min(parange, (unsigned int)ID_AA64MMFR0_PARANGE_48); /* * Check with ARMv8.5-GTG that our PAGE_SIZE is supported at -- 2.7.4 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
On 8/11/21 3:06 PM, Marc Zyngier wrote: > On Wed, 11 Aug 2021 05:52:29 +0100, > Anshuman Khandual wrote: >> >> Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot >> be enabled as guest IPA size on 4K or 16K page size configurations. Hence >> kvm_ipa_limit must be restricted to 48 bits. This change achieves required >> IPA capping. >> >> Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA >> size based on host configuration"), the problem here would have been just >> latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), >> which remains capped at 48 bits on 4K and 16K configs. >> >> Cc: Marc Zyngier >> Cc: James Morse >> Cc: Alexandru Elisei >> Cc: Suzuki K Poulose >> Cc: Catalin Marinas >> Cc: Will Deacon >> Cc: linux-arm-ker...@lists.infradead.org >> Cc: kvmarm@lists.cs.columbia.edu >> Cc: linux-ker...@vger.kernel.org >> Fixes: c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on >> host configuration") >> Signed-off-by: Anshuman Khandual >> --- >> This applies on v5.14-rc5 >> >> arch/arm64/kvm/reset.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c >> index 20588220fe66..e66974c4b9d3 100644 >> --- a/arch/arm64/kvm/reset.c >> +++ b/arch/arm64/kvm/reset.c >> @@ -337,6 +337,15 @@ int kvm_set_ipa_limit(void) >> return -EINVAL; >> } >> >> +/* >> + * IPA size beyond 48 bits could not be supported >> + * on either 4K or 16K page size. Hence let's cap >> + * it to 48 bits, in case it's reported as larger >> + * on the system. >> + */ >> +if (!IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > > As per our earlier discussion, please use (PAGE_SIZE != SZ_64K) > instead. This is in keeping with the rest of the file. Sure, will change. > >> +parange = min(parange, (unsigned int)ID_AA64MMFR0_PARANGE_48); >> + > > Also, please move it next to the point where we assign parange. Sure, will move. > >> kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange); >> kvm_info("IPA Size Limit: %d bits%s\n", kvm_ipa_limit, >> ((kvm_ipa_limit < KVM_PHYS_SHIFT) ? > > Thanks, > > M. > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT
On Tue, 10 Aug 2021 09:59:42 +0530, Anshuman Khandual wrote: > Streamline the Stage-2 TGRAN value extraction from ID_AA64MMFR0 register by > adding a page size agnostic ID_AA64MMFR0_TGRAN_2_SHIFT. This is similar to > the existing Stage-1 TGRAN shift i.e ID_AA64MMFR0_TGRAN_SHIFT. Applied to kvm-arm64/misc-5.15, thanks! [1/1] arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT commit: 9efb41493ddfb19c7b3d0a21d68be6279520144f Cheers, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: Drop direct PAGE_[SHIFT|SIZE] usage as page size
On Wed, 11 Aug 2021 10:37:36 +0100, Anshuman Khandual wrote: > > > > On 8/11/21 1:41 PM, Marc Zyngier wrote: > > On Wed, 11 Aug 2021 06:34:46 +0100, > > Anshuman Khandual wrote: > >> > >> > >> > >> On 8/10/21 7:03 PM, Marc Zyngier wrote: > >>> On 2021-08-10 08:02, Anshuman Khandual wrote: > All instances here could just directly test against > CONFIG_ARM64_XXK_PAGES > instead of evaluating via PAGE_SHIFT or PAGE_SIZE. With this change, > there > will be no such usage left. > > Cc: Marc Zyngier > Cc: James Morse > Cc: Alexandru Elisei > Cc: Suzuki K Poulose > Cc: Catalin Marinas > Cc: Will Deacon > Cc: linux-arm-ker...@lists.infradead.org > Cc: kvmarm@lists.cs.columbia.edu > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Anshuman Khandual > --- > arch/arm64/kvm/hyp/pgtable.c | 6 +++--- > arch/arm64/mm/mmu.c | 2 +- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > index 05321f4165e3..a6112b6d6ef6 100644 > --- a/arch/arm64/kvm/hyp/pgtable.c > +++ b/arch/arm64/kvm/hyp/pgtable.c > @@ -85,7 +85,7 @@ static bool kvm_level_supports_block_mapping(u32 level) > * Reject invalid block mappings and don't bother with 4TB mappings > for > * 52-bit PAs. > */ > - return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1)); > + return !(level == 0 || (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) && level > == 1)); > } > > static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, > u32 level) > @@ -155,7 +155,7 @@ static u64 kvm_pte_to_phys(kvm_pte_t pte) > { > u64 pa = pte & KVM_PTE_ADDR_MASK; > > - if (PAGE_SHIFT == 16) > + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48; > > return pa; > @@ -165,7 +165,7 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa) > { > kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK; > > - if (PAGE_SHIFT == 16) > + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48); > > return pte; > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index 9ff0de1b2b93..8fdfca179815 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -296,7 +296,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, > unsigned long addr, > static inline bool use_1G_block(unsigned long addr, unsigned long next, > unsigned long phys) > { > - if (PAGE_SHIFT != 12) > + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) > return false; > > if (((addr | next | phys) & ~PUD_MASK) != 0) > >>> > >>> I personally find it a lot less readable. > >>> > >>> Also, there is no evaluation whatsoever. All the code guarded > >>> by a PAGE_SIZE/PAGE_SHIFT that doesn't match the configuration > >>> is dropped at compile time. > >> > >> The primary idea here is to unify around IS_ENABLED(CONFIG_ARM64_XXK_PAGES) > >> usage in arm64, rather than having multiple methods to test page size when > >> ever required. > > > > I'm sorry, but I find the idiom extremely painful to parse. If you are > > Okay, it was not explained very well. My bad. > > > annoyed with the 'PAGE_SHIFT == 12/14/16', consider replacing it with > > 'PAGE_SIZE == SZ_4/16/64K' instead. > > Sure, understood. But the problem here is not with PAGE_SHIFT/PAGE_SIZE > based tests but rather having multiple ways of doing the same thing in > arm64 tree. Please find further explanation below. > > > > > IS_ENABLED(CONFIG_ARM64_XXK_PAGES) also gives the wrong impression > > that *multiple* page sizes can be selected at any given time. That's > > obviously not the case, which actually makes PAGE_SIZE a much better > > choice. > > PAGE_SHIFT and PAGE_SIZE are derived from CONFIG_ARM64_XXK_PAGES. Hence > why not just directly use the original user selected config option that > eventually decides PAGE_SHIFT and PAGE_SIZE. > > config ARM64_PAGE_SHIFT > int > default 16 if ARM64_64K_PAGES > default 14 if ARM64_16K_PAGES > default 12 > > arch/arm64/include/asm/page-def.h:#define PAGE_SHIFT CONFIG_ARM64_PAGE_SHIFT > arch/arm64/include/asm/page-def.h:#define PAGE_SIZE (_AC(1, UL) << > PAGE_SHIFT) I'm sorry, but that's only a build system artefact. PAGE_SIZE/SHIFT is what we use in the kernel at large, not IS_ENABLED(BLAH). It is short, to the point, and it is guaranteed to be what it says on the tin. If by some miracle you were going to enable multiple *simultaneous* page sizes support in the arm64 kernel, I'd certainly look at things differently. Thankfully, this isn't the case. > Also there are already similar IS_ENABLED() instance
Re: [PATCH] arm64/mm: Define ID_AA64MMFR0_TGRAN_2_SHIFT
On Tue, Aug 10, 2021 at 09:59:42AM +0530, Anshuman Khandual wrote: > Streamline the Stage-2 TGRAN value extraction from ID_AA64MMFR0 register by > adding a page size agnostic ID_AA64MMFR0_TGRAN_2_SHIFT. This is similar to > the existing Stage-1 TGRAN shift i.e ID_AA64MMFR0_TGRAN_SHIFT. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Marc Zyngier > Cc: linux-arm-ker...@lists.infradead.org > Cc: kvmarm@lists.cs.columbia.edu > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Anshuman Khandual > --- > This applies on v5.14-rc5. > > arch/arm64/include/asm/sysreg.h | 3 +++ > arch/arm64/kvm/reset.c | 17 ++--- > 2 files changed, 5 insertions(+), 15 deletions(-) Acked-by: Will Deacon Will ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH V2] KVM: arm64: perf: Replace '0xf' instances with ID_AA64DFR0_PMUVER_IMP_DEF
On Wed, 11 Aug 2021 04:27:06 +0100, Anshuman Khandual wrote: > > ID_AA64DFR0_PMUVER_IMP_DEF which indicate implementation defined PMU, never > actually gets used although there are '0xf' instances scattered all around. > Just do the macro replacement to improve readability. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Mark Rutland > Cc: Peter Zijlstra > Cc: Marc Zyngier > Cc: linux-perf-us...@vger.kernel.org > Cc: linux-arm-ker...@lists.infradead.org > Cc: kvmarm@lists.cs.columbia.edu > Cc: linux-ker...@vger.kernel.org > Signed-off-by: Anshuman Khandual For some reason I don't understand, b4 doesn't want to cherry-pick this patch from the list. I've applied it manually to kvm-arm64/misc-5.15. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: Drop direct PAGE_[SHIFT|SIZE] usage as page size
On 8/11/21 1:41 PM, Marc Zyngier wrote: > On Wed, 11 Aug 2021 06:34:46 +0100, > Anshuman Khandual wrote: >> >> >> >> On 8/10/21 7:03 PM, Marc Zyngier wrote: >>> On 2021-08-10 08:02, Anshuman Khandual wrote: All instances here could just directly test against CONFIG_ARM64_XXK_PAGES instead of evaluating via PAGE_SHIFT or PAGE_SIZE. With this change, there will be no such usage left. Cc: Marc Zyngier Cc: James Morse Cc: Alexandru Elisei Cc: Suzuki K Poulose Cc: Catalin Marinas Cc: Will Deacon Cc: linux-arm-ker...@lists.infradead.org Cc: kvmarm@lists.cs.columbia.edu Cc: linux-ker...@vger.kernel.org Signed-off-by: Anshuman Khandual --- arch/arm64/kvm/hyp/pgtable.c | 6 +++--- arch/arm64/mm/mmu.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 05321f4165e3..a6112b6d6ef6 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -85,7 +85,7 @@ static bool kvm_level_supports_block_mapping(u32 level) * Reject invalid block mappings and don't bother with 4TB mappings for * 52-bit PAs. */ - return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1)); + return !(level == 0 || (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) && level == 1)); } static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level) @@ -155,7 +155,7 @@ static u64 kvm_pte_to_phys(kvm_pte_t pte) { u64 pa = pte & KVM_PTE_ADDR_MASK; - if (PAGE_SHIFT == 16) + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48; return pa; @@ -165,7 +165,7 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa) { kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK; - if (PAGE_SHIFT == 16) + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48); return pte; diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 9ff0de1b2b93..8fdfca179815 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -296,7 +296,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, unsigned long addr, static inline bool use_1G_block(unsigned long addr, unsigned long next, unsigned long phys) { - if (PAGE_SHIFT != 12) + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) return false; if (((addr | next | phys) & ~PUD_MASK) != 0) >>> >>> I personally find it a lot less readable. >>> >>> Also, there is no evaluation whatsoever. All the code guarded >>> by a PAGE_SIZE/PAGE_SHIFT that doesn't match the configuration >>> is dropped at compile time. >> >> The primary idea here is to unify around IS_ENABLED(CONFIG_ARM64_XXK_PAGES) >> usage in arm64, rather than having multiple methods to test page size when >> ever required. > > I'm sorry, but I find the idiom extremely painful to parse. If you are Okay, it was not explained very well. My bad. > annoyed with the 'PAGE_SHIFT == 12/14/16', consider replacing it with > 'PAGE_SIZE == SZ_4/16/64K' instead. Sure, understood. But the problem here is not with PAGE_SHIFT/PAGE_SIZE based tests but rather having multiple ways of doing the same thing in arm64 tree. Please find further explanation below. > > IS_ENABLED(CONFIG_ARM64_XXK_PAGES) also gives the wrong impression > that *multiple* page sizes can be selected at any given time. That's > obviously not the case, which actually makes PAGE_SIZE a much better > choice. PAGE_SHIFT and PAGE_SIZE are derived from CONFIG_ARM64_XXK_PAGES. Hence why not just directly use the original user selected config option that eventually decides PAGE_SHIFT and PAGE_SIZE. config ARM64_PAGE_SHIFT int default 16 if ARM64_64K_PAGES default 14 if ARM64_16K_PAGES default 12 arch/arm64/include/asm/page-def.h:#define PAGE_SHIFTCONFIG_ARM64_PAGE_SHIFT arch/arm64/include/asm/page-def.h:#define PAGE_SIZE (_AC(1, UL) << PAGE_SHIFT) Also there are already similar IS_ENABLED() instances which do not create much confusion. The point here being, to have just a single method that checks compiled page size support, instead of three different ways of doing the same thing. - IS_ENABLED(CONFIG_ARM64_XXK_PAGES) - if (PAGE_SHIFT == XX) - if (PAGE_SIZE == XX) $git grep IS_ENABLED arch/arm64/ | grep PAGES arch/arm64/include/asm/vmalloc.h: return IS_ENABLED(CONFIG_ARM64_4K_PAGES) && arch/arm64/mm/mmu.c:BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES)); arch/arm64/mm/mmu.c:BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES)); > > As things stand, I don't plan to take such a patch. Sure, will drop it from the series if the above explanation and the rationale for the patch
Re: [PATCH] KVM: arm64: Restrict IPA size to maximum 48 bits on 4K and 16K page size
On Wed, 11 Aug 2021 05:52:29 +0100, Anshuman Khandual wrote: > > Even though ID_AA64MMFR0.PARANGE reports 52 bit PA size support, it cannot > be enabled as guest IPA size on 4K or 16K page size configurations. Hence > kvm_ipa_limit must be restricted to 48 bits. This change achieves required > IPA capping. > > Before the commit c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA > size based on host configuration"), the problem here would have been just > latent via PHYS_MASK_SHIFT (which earlier in turn capped kvm_ipa_limit), > which remains capped at 48 bits on 4K and 16K configs. > > Cc: Marc Zyngier > Cc: James Morse > Cc: Alexandru Elisei > Cc: Suzuki K Poulose > Cc: Catalin Marinas > Cc: Will Deacon > Cc: linux-arm-ker...@lists.infradead.org > Cc: kvmarm@lists.cs.columbia.edu > Cc: linux-ker...@vger.kernel.org > Fixes: c9b69a0cf0b4 ("KVM: arm64: Don't constrain maximum IPA size based on > host configuration") > Signed-off-by: Anshuman Khandual > --- > This applies on v5.14-rc5 > > arch/arm64/kvm/reset.c | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c > index 20588220fe66..e66974c4b9d3 100644 > --- a/arch/arm64/kvm/reset.c > +++ b/arch/arm64/kvm/reset.c > @@ -337,6 +337,15 @@ int kvm_set_ipa_limit(void) > return -EINVAL; > } > > + /* > + * IPA size beyond 48 bits could not be supported > + * on either 4K or 16K page size. Hence let's cap > + * it to 48 bits, in case it's reported as larger > + * on the system. > + */ > + if (!IS_ENABLED(CONFIG_ARM64_64K_PAGES)) As per our earlier discussion, please use (PAGE_SIZE != SZ_64K) instead. This is in keeping with the rest of the file. > + parange = min(parange, (unsigned int)ID_AA64MMFR0_PARANGE_48); > + Also, please move it next to the point where we assign parange. > kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange); > kvm_info("IPA Size Limit: %d bits%s\n", kvm_ipa_limit, >((kvm_ipa_limit < KVM_PHYS_SHIFT) ? Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 17/21] KVM: arm64: Allow userspace to configure a guest's counter-timer offset
On Tue, 10 Aug 2021 18:55:12 +0100, Oliver Upton wrote: > > On Tue, Aug 10, 2021 at 3:56 AM Marc Zyngier wrote: > > > > On Wed, 04 Aug 2021 09:58:15 +0100, > > Oliver Upton wrote: [...] > > > diff --git a/include/clocksource/arm_arch_timer.h > > > b/include/clocksource/arm_arch_timer.h > > > index 73c7139c866f..7252ffa3d675 100644 > > > --- a/include/clocksource/arm_arch_timer.h > > > +++ b/include/clocksource/arm_arch_timer.h > > > @@ -21,6 +21,7 @@ > > > #define CNTHCTL_EVNTEN (1 << 2) > > > #define CNTHCTL_EVNTDIR (1 << 3) > > > #define CNTHCTL_EVNTI(0xF << 4) > > > +#define CNTHCTL_ECV (1 << 12) > > > > > > enum arch_timer_reg { > > > ARCH_TIMER_REG_CTRL, > > > > You also want to document that SCR_EL3.ECVEn has to be set to 1 for > > this to work (see Documentation/arm64/booting.txt). And if it isn't, > > the firmware better handle the CNTPOFF_EL2 traps correctly... > > I'll grab the popcorn now ;-) Adding docs for this, good idea. > > > What firmware did you use for this? I think we need to update the boot > > wrapper, but that's something that can be done in parallel. > > I had actually just done a direct boot from ARM-TF -> Linux, nothing > else in between. Ah, right. I tend to use the boot-wrapper[1] to build a single binary that contains the 'boot loader', DT and kernel. Using ATF is probably more representative of the final thing, but the boot-wrapper is dead easy to hack on... Thanks, M. [1] git://git.kernel.org/pub/scm/linux/kernel/git/mark/boot-wrapper-aarch64.git -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
RE: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU schedule out
Hi Will, > -Original Message- > From: Will Deacon [mailto:w...@kernel.org] > Sent: 03 August 2021 16:31 > To: Shameerali Kolothum Thodi > Cc: linux-arm-ker...@lists.infradead.org; kvmarm@lists.cs.columbia.edu; > linux-ker...@vger.kernel.org; m...@kernel.org; catalin.mari...@arm.com; > james.mo...@arm.com; julien.thierry.k...@gmail.com; > suzuki.poul...@arm.com; jean-phili...@linaro.org; > alexandru.eli...@arm.com; qper...@google.com; Linuxarm > > Subject: Re: [PATCH v3 4/4] KVM: arm64: Clear active_vmids on vCPU > schedule out [...] > I think we have to be really careful not to run into the "suspended > animation" problem described in ae120d9edfe9 ("ARM: 7767/1: let the ASID > allocator handle suspended animation") if we go down this road. > > Maybe something along the lines of: > > ROLLOVER > > * Take lock > * Inc generation > => This will force everybody down the slow path > * Record active VMIDs > * Broadcast TLBI > => Only active VMIDs can be dirty > => Reserve active VMIDs and mark as allocated > > VCPU SCHED IN > > * Set active VMID > * Check generation > * If mismatch then: > * Take lock > * Try to match a reserved VMID > * If no reserved VMID, allocate new > > VCPU SCHED OUT > > * Clear active VMID > > but I'm not daft enough to think I got it right first time. I think it > needs both implementing *and* modelling in TLA+ before we merge it! I attempted to implement the above algo as below. It seems to be working in both 16-bit vmid and 4-bit vmid test setup. Though I am not quite sure this Is exactly what you had in mind above and covers all corner cases. Please take a look and let me know. (The diff below is against this v3 series) Thanks, Shameer --->8< --- a/arch/arm64/kvm/vmid.c +++ b/arch/arm64/kvm/vmid.c @@ -43,7 +43,7 @@ static void flush_context(void) bitmap_clear(vmid_map, 0, NUM_USER_VMIDS); for_each_possible_cpu(cpu) { - vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0); + vmid = atomic64_read(&per_cpu(active_vmids, cpu)); /* Preserve reserved VMID */ if (vmid == 0) @@ -125,32 +125,17 @@ void kvm_arm_vmid_clear_active(void) void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid) { unsigned long flags; - u64 vmid, old_active_vmid; + u64 vmid; vmid = atomic64_read(&kvm_vmid->id); - - /* -* Please refer comments in check_and_switch_context() in -* arch/arm64/mm/context.c. -*/ - old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids)); - if (old_active_vmid && vmid_gen_match(vmid) && - atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids), -old_active_vmid, vmid)) + if (vmid_gen_match(vmid)) { + atomic64_set(this_cpu_ptr(&active_vmids), vmid); return; - - raw_spin_lock_irqsave(&cpu_vmid_lock, flags); - - /* Check that our VMID belongs to the current generation. */ - vmid = atomic64_read(&kvm_vmid->id); - if (!vmid_gen_match(vmid)) { - vmid = new_vmid(kvm_vmid); - atomic64_set(&kvm_vmid->id, vmid); } - + raw_spin_lock_irqsave(&cpu_vmid_lock, flags); + vmid = new_vmid(kvm_vmid); + atomic64_set(&kvm_vmid->id, vmid); atomic64_set(this_cpu_ptr(&active_vmids), vmid); raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags); } --->8< ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 1/5] KVM: arm64: Drop direct PAGE_[SHIFT|SIZE] usage as page size
On Wed, 11 Aug 2021 06:34:46 +0100, Anshuman Khandual wrote: > > > > On 8/10/21 7:03 PM, Marc Zyngier wrote: > > On 2021-08-10 08:02, Anshuman Khandual wrote: > >> All instances here could just directly test against CONFIG_ARM64_XXK_PAGES > >> instead of evaluating via PAGE_SHIFT or PAGE_SIZE. With this change, there > >> will be no such usage left. > >> > >> Cc: Marc Zyngier > >> Cc: James Morse > >> Cc: Alexandru Elisei > >> Cc: Suzuki K Poulose > >> Cc: Catalin Marinas > >> Cc: Will Deacon > >> Cc: linux-arm-ker...@lists.infradead.org > >> Cc: kvmarm@lists.cs.columbia.edu > >> Cc: linux-ker...@vger.kernel.org > >> Signed-off-by: Anshuman Khandual > >> --- > >> arch/arm64/kvm/hyp/pgtable.c | 6 +++--- > >> arch/arm64/mm/mmu.c | 2 +- > >> 2 files changed, 4 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c > >> index 05321f4165e3..a6112b6d6ef6 100644 > >> --- a/arch/arm64/kvm/hyp/pgtable.c > >> +++ b/arch/arm64/kvm/hyp/pgtable.c > >> @@ -85,7 +85,7 @@ static bool kvm_level_supports_block_mapping(u32 level) > >> * Reject invalid block mappings and don't bother with 4TB mappings > >> for > >> * 52-bit PAs. > >> */ > >> - return !(level == 0 || (PAGE_SIZE != SZ_4K && level == 1)); > >> + return !(level == 0 || (!IS_ENABLED(CONFIG_ARM64_4K_PAGES) && level > >> == 1)); > >> } > >> > >> static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 > >> level) > >> @@ -155,7 +155,7 @@ static u64 kvm_pte_to_phys(kvm_pte_t pte) > >> { > >> u64 pa = pte & KVM_PTE_ADDR_MASK; > >> > >> - if (PAGE_SHIFT == 16) > >> + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > >> pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48; > >> > >> return pa; > >> @@ -165,7 +165,7 @@ static kvm_pte_t kvm_phys_to_pte(u64 pa) > >> { > >> kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK; > >> > >> - if (PAGE_SHIFT == 16) > >> + if (IS_ENABLED(CONFIG_ARM64_64K_PAGES)) > >> pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48); > >> > >> return pte; > >> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > >> index 9ff0de1b2b93..8fdfca179815 100644 > >> --- a/arch/arm64/mm/mmu.c > >> +++ b/arch/arm64/mm/mmu.c > >> @@ -296,7 +296,7 @@ static void alloc_init_cont_pmd(pud_t *pudp, > >> unsigned long addr, > >> static inline bool use_1G_block(unsigned long addr, unsigned long next, > >> unsigned long phys) > >> { > >> - if (PAGE_SHIFT != 12) > >> + if (!IS_ENABLED(CONFIG_ARM64_4K_PAGES)) > >> return false; > >> > >> if (((addr | next | phys) & ~PUD_MASK) != 0) > > > > I personally find it a lot less readable. > > > > Also, there is no evaluation whatsoever. All the code guarded > > by a PAGE_SIZE/PAGE_SHIFT that doesn't match the configuration > > is dropped at compile time. > > The primary idea here is to unify around IS_ENABLED(CONFIG_ARM64_XXK_PAGES) > usage in arm64, rather than having multiple methods to test page size when > ever required. I'm sorry, but I find the idiom extremely painful to parse. If you are annoyed with the 'PAGE_SHIFT == 12/14/16', consider replacing it with 'PAGE_SIZE == SZ_4/16/64K' instead. IS_ENABLED(CONFIG_ARM64_XXK_PAGES) also gives the wrong impression that *multiple* page sizes can be selected at any given time. That's obviously not the case, which actually makes PAGE_SIZE a much better choice. As things stand, I don't plan to take such a patch. Thanks, M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3] memblock: make memblock_find_in_range method private
On Tue, Aug 10, 2021 at 12:21:46PM -0700, Guenter Roeck wrote: > On 8/10/21 11:55 AM, Mike Rapoport wrote: > > On Mon, Aug 09, 2021 at 12:06:41PM -0700, Guenter Roeck wrote: > > > On Tue, Aug 03, 2021 at 09:42:18AM +0300, Mike Rapoport wrote: > > > > From: Mike Rapoport > > > > > > > > There are a lot of uses of memblock_find_in_range() along with > > > > memblock_reserve() from the times memblock allocation APIs did not > > > > exist. > > > > > > > > memblock_find_in_range() is the very core of memblock allocations, so > > > > any > > > > future changes to its internal behaviour would mandate updates of all > > > > the > > > > users outside memblock. > > > > > > > > Replace the calls to memblock_find_in_range() with an equivalent calls > > > > to > > > > memblock_phys_alloc() and memblock_phys_alloc_range() and make > > > > memblock_find_in_range() private method of memblock. > > > > > > > > This simplifies the callers, ensures that (unlikely) errors in > > > > memblock_reserve() are handled and improves maintainability of > > > > memblock_find_in_range(). > > > > > > > > Signed-off-by: Mike Rapoport > > > > > > I see a number of crashes in next-20210806 when booting x86 images from > > > efi. > > > > > > [0.00] efi: EFI v2.70 by EDK II > > > [0.00] efi: SMBIOS=0x1fbcc000 ACPI=0x1fbfa000 ACPI 2.0=0x1fbfa014 > > > MEMATTR=0x1f25f018 > > > [0.00] SMBIOS 2.8 present. > > > [0.00] DMI: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 > > > 02/06/2015 > > > [0.00] last_pfn = 0x1ff50 max_arch_pfn = 0x4 > > > [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- > > > WT > > > [0.00] Kernel panic - not syncing: alloc_low_pages: can not alloc > > > memory > > > [0.00] CPU: 0 PID: 0 Comm: swapper Not tainted > > > 5.14.0-rc4-next-20210806 #1 > > > [0.00] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > > 0.0.0 02/06/2015 > > > [0.00] Call Trace: > > > [0.00] ? dump_stack_lvl+0x57/0x7d > > > [0.00] ? panic+0xfc/0x2c6 > > > [0.00] ? alloc_low_pages+0x117/0x156 > > > [0.00] ? phys_pmd_init+0x234/0x342 > > > [0.00] ? phys_pud_init+0x171/0x337 > > > [0.00] ? __kernel_physical_mapping_init+0xec/0x276 > > > [0.00] ? init_memory_mapping+0x1ea/0x2aa > > > [0.00] ? init_range_memory_mapping+0xdf/0x12e > > > [0.00] ? init_mem_mapping+0x1e9/0x26f > > > [0.00] ? setup_arch+0x5ff/0xb6d > > > [0.00] ? start_kernel+0x71/0x6b4 > > > [0.00] ? secondary_startup_64_no_verify+0xc2/0xcb > > > > > > Bisect points to this patch. Reverting it fixes the problem. Key seems to > > > be the amount of memory configured in qemu; the problem is not seen if > > > there is 1G or more of memory, but it is seen with all test boots with > > > 512M or 256M of memory. It is also seen with almost all 32-bit efi boots. > > > > > > The problem is not seen when booting without efi. > > > > It looks like this change uncovered a problem in > > x86::memory_map_top_down(). > > > > The allocation in alloc_low_pages() is limited by min_pfn_mapped and > > max_pfn_mapped. The min_pfn_mapped is updated at every iteration of the > > loop in memory_map_top_down, but there is another loop in > > init_range_memory_mapping() that maps several regions below the current > > min_pfn_mapped without updating this variable. > > > > The memory layout in qemu with 256M of RAM and EFI enabled, causes > > exhaustion of the memory limited by min_pfn_mapped and max_pfn_mapped > > before min_pfn_mapped is updated. > > > > Before this commit there was unconditional "reservation" of 2M in the end > > of the memory that moved the initial min_pfn_mapped below the memory > > reserved by EFI. The addition of check for xen_domain() removed this > > reservation for !XEN and made alloc_low_pages() use the range already busy > > with EFI data. > > > > The patch below moves the update of min_pfn_mapped near the update of > > max_pfn_mapped so that every time a new range is mapped both limits will be > > updated accordingly. > > > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index 1152a29ce109..be279f6e5a0a 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -1,3 +1,4 @@ > > +#define DEBUG > > #include > > #include > > #include > > @@ -485,6 +486,7 @@ static void add_pfn_range_mapped(unsigned long > > start_pfn, unsigned long end_pfn) > > nr_pfn_mapped = clean_sort_range(pfn_mapped, E820_MAX_ENTRIES); > > max_pfn_mapped = max(max_pfn_mapped, end_pfn); > > + min_pfn_mapped = min(min_pfn_mapped, start_pfn); > > if (start_pfn < (1UL<<(32-PAGE_SHIFT))) > > max_low_pfn_mapped = max(max_low_pfn_mapped, > > @@ -643,7 +645,6 @@ static void __init memory_map_top_down(unsigned long > > map_start, > > mapped_ram_size += init_range_memory_mapping(start, > >