Re: [v2 0/6] KVM: arm64: implement vcpu_is_preempted check
Hi Usama, Usama Arif writes: > This patchset adds support for vcpu_is_preempted in arm64, which allows the > guest > to check if a vcpu was scheduled out, which is useful to know incase it was > holding a lock. vcpu_is_preempted can be used to improve > performance in locking (see owner_on_cpu usage in mutex_spin_on_owner, > mutex_can_spin_on_owner, rtmutex_spin_on_owner and osq_lock) and scheduling > (see available_idle_cpu which is used in several places in kernel/sched/fair.c > for e.g. in wake_affine to determine which CPU can run soonest): > > This patchset shows improvement on overcommitted hosts (vCPUs > pCPUS), as > waiting > for preempted vCPUs reduces performance. > > This patchset is inspired from the para_steal_clock implementation and from > the > work originally done by Zengruan Ye: > https://lore.kernel.org/linux-arm-kernel/20191226135833.1052-1-yezengr...@huawei.com/. > > All the results in the below experiments are done on an aws r6g.metal instance > which has 64 pCPUs. > > The following table shows the index results of UnixBench running on a 128 > vCPU VM > with (6.0.0+vcpu_is_preempted) and without (6.0.0 base) the patchset. > TestName6.0.0 base 6.0.0+vcpu_is_preempted > % improvement for vcpu_is_preempted > Dhrystone 2 using register variables187761 191274.7 > 1.871368389 > Double-Precision Whetstone 96743.6 98414.4 > 1.727039308 > Execl Throughput689.3 10426 > 1412.548963 > File Copy 1024 bufsize 2000 maxblocks 549.5 3165 > 475.978162 > File Copy 256 bufsize 500 maxblocks 400.7 2084.7 > 420.2645371 > File Copy 4096 bufsize 8000 maxblocks 894.3 5003.2 > 459.4543218 > Pipe Throughput 76819.5 78601.5 > 2.319723508 > Pipe-based Context Switching3444.8 13414.5 > 289.4130283 > Process Creation301.1 293.4 > -2.557289937 > Shell Scripts (1 concurrent)1248.1 28300.6 > 2167.494592 > Shell Scripts (8 concurrent)781.2 26222.3 > 3256.669227 > System Call Overhead34263729.4 > 8.855808523 > > System Benchmarks Index Score 305311534 > 277.7923354 > > This shows a 277% overall improvement using these patches. > > The biggest improvement is in the shell scripts benchmark, which forks a lot > of processes. > This acquires rwsem lock where a large chunk of time is spent in base 6.0.0 > kernel. > This can be seen from one of the callstack of the perf output of the shell > scripts benchmark on 6.0.0 base (pseudo NMI enabled for perf numbers below): > - 33.79% el0_svc >- 33.43% do_el0_svc > - 33.43% el0_svc_common.constprop.3 > - 33.30% invoke_syscall > - 17.27% __arm64_sys_clone >- 17.27% __do_sys_clone > - 17.26% kernel_clone > - 16.73% copy_process > - 11.91% dup_mm >- 11.82% dup_mmap > - 9.15% down_write > - 8.87% rwsem_down_write_slowpath > - 8.48% osq_lock > > Just under 50% of the total time in the shell script benchmarks ends up being > spent in osq_lock in the base 6.0.0 kernel: > Children Self Command Shared ObjectSymbol >17.19%10.71% sh [kernel.kallsyms] [k] osq_lock > 6.17% 4.04% sort[kernel.kallsyms] [k] osq_lock > 4.20% 2.60% multi. [kernel.kallsyms] [k] osq_lock > 3.77% 2.47% grep[kernel.kallsyms] [k] osq_lock > 3.50% 2.24% expr[kernel.kallsyms] [k] osq_lock > 3.41% 2.23% od [kernel.kallsyms] [k] osq_lock > 3.36% 2.15% rm [kernel.kallsyms] [k] osq_lock > 3.28% 2.12% tee [kernel.kallsyms] [k] osq_lock > 3.16% 2.02% wc [kernel.kallsyms] [k] osq_lock > 0.21% 0.13% looper [kernel.kallsyms] [k] osq_lock > 0.01% 0.00% Run [kernel.kallsyms] [k] osq_lock > > and this comes down to less than 1% total with 6.0.0+vcpu_is_preempted kernel: > Children Self Command Shared ObjectSymbol > 0.26% 0.21% sh [kernel.kallsyms] [k] osq_lock > 0.10% 0.08% multi. [kernel.kallsyms] [k] osq_lock > 0.04% 0.04% sort[kernel.kallsyms] [k] osq_lock > 0.02% 0.01% grep[kernel.kallsyms] [k] osq_lock > 0.02% 0.02% od [kernel.kallsyms] [k] osq_lock > 0.01% 0.01% tee [kernel.kallsyms] [k] osq_lock > 0.01% 0.00% expr[kernel.kallsyms] [k] osq_lock > 0.01%
Re: [v2 3/6] KVM: arm64: Support pvlock preempted via shared structure
Usama Arif writes: > Implement the service call for configuring a shared structure between a > VCPU and the hypervisor in which the hypervisor can tell whether the > VCPU is running or not. > > The preempted field is zero if the VCPU is not preempted. > Any other value means the VCPU has been preempted. > > Signed-off-by: Zengruan Ye > Signed-off-by: Usama Arif > --- > Documentation/virt/kvm/arm/hypercalls.rst | 3 ++ > arch/arm64/include/asm/kvm_host.h | 18 ++ > arch/arm64/include/uapi/asm/kvm.h | 1 + > arch/arm64/kvm/Makefile | 2 +- > arch/arm64/kvm/arm.c | 8 + > arch/arm64/kvm/hypercalls.c | 8 + > arch/arm64/kvm/pvlock.c | 43 +++ > tools/arch/arm64/include/uapi/asm/kvm.h | 1 + > 8 files changed, 83 insertions(+), 1 deletion(-) > create mode 100644 arch/arm64/kvm/pvlock.c > > diff --git a/Documentation/virt/kvm/arm/hypercalls.rst > b/Documentation/virt/kvm/arm/hypercalls.rst > index 3e23084644ba..872a16226ace 100644 > --- a/Documentation/virt/kvm/arm/hypercalls.rst > +++ b/Documentation/virt/kvm/arm/hypercalls.rst > @@ -127,6 +127,9 @@ The pseudo-firmware bitmap register are as follows: > Bit-1: KVM_REG_ARM_VENDOR_HYP_BIT_PTP: >The bit represents the Precision Time Protocol KVM service. > > +Bit-2: KVM_REG_ARM_VENDOR_HYP_BIT_PV_LOCK: > + The bit represents the Paravirtualized lock service. > + > Errors: > > === = > diff --git a/arch/arm64/include/asm/kvm_host.h > b/arch/arm64/include/asm/kvm_host.h > index 45e2136322ba..18303b30b7e9 100644 > --- a/arch/arm64/include/asm/kvm_host.h > +++ b/arch/arm64/include/asm/kvm_host.h > @@ -417,6 +417,11 @@ struct kvm_vcpu_arch { > u64 last_steal; > gpa_t base; > } steal; > + > + /* Guest PV lock state */ > + struct { > + gpa_t base; > + } pv; Using "pv" for the structure isn't quite describing the usage well. It'd be better to call it "pv_lock" or "pvlock" at the least. [...] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [v2 2/6] KVM: arm64: Add SMCCC paravirtualised lock calls
Usama Arif writes: > Add a new SMCCC compatible hypercalls for PV lock features: > ARM_SMCCC_KVM_FUNC_PV_LOCK: 0xC602 > > Also add the header file which defines the ABI for the paravirtualized > lock features we're about to add. > > Signed-off-by: Zengruan Ye > Signed-off-by: Usama Arif > --- > arch/arm64/include/asm/pvlock-abi.h | 17 + > include/linux/arm-smccc.h | 8 > tools/include/linux/arm-smccc.h | 8 > 3 files changed, 33 insertions(+) > create mode 100644 arch/arm64/include/asm/pvlock-abi.h > > diff --git a/arch/arm64/include/asm/pvlock-abi.h > b/arch/arm64/include/asm/pvlock-abi.h > new file mode 100644 > index ..3f4574071679 > --- /dev/null > +++ b/arch/arm64/include/asm/pvlock-abi.h > @@ -0,0 +1,17 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright(c) 2019 Huawei Technologies Co., Ltd > + * Author: Zengruan Ye > + * Usama Arif > + */ > + > +#ifndef __ASM_PVLOCK_ABI_H > +#define __ASM_PVLOCK_ABI_H > + > +struct pvlock_vcpu_state { > + __le64 preempted; > + /* Structure must be 64 byte aligned, pad to that size */ > + u8 padding[56]; > +} __packed; For structure alignment, I'd have expected to see the use of "aligned" attribute. Is there any benefit in using padding to achieve alignment? [...] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [v2 1/6] KVM: arm64: Document PV-lock interface
Hi Usama, Usama Arif writes: > Introduce a paravirtualization interface for KVM/arm64 to obtain whether > the VCPU is currently running or not. > > The PV lock structure of the guest is allocated by user space. > > A hypercall interface is provided for the guest to interrogate the > location of the shared memory structures. > > Signed-off-by: Zengruan Ye > Signed-off-by: Usama Arif > --- > Documentation/virt/kvm/arm/index.rst| 1 + > Documentation/virt/kvm/arm/pvlock.rst | 52 + > Documentation/virt/kvm/devices/vcpu.rst | 25 > 3 files changed, 78 insertions(+) > create mode 100644 Documentation/virt/kvm/arm/pvlock.rst > > diff --git a/Documentation/virt/kvm/arm/index.rst > b/Documentation/virt/kvm/arm/index.rst > index e84848432158..b8499dc00a6a 100644 > --- a/Documentation/virt/kvm/arm/index.rst > +++ b/Documentation/virt/kvm/arm/index.rst > @@ -10,4 +10,5 @@ ARM > hyp-abi > hypercalls > pvtime > + pvlock > ptp_kvm > diff --git a/Documentation/virt/kvm/arm/pvlock.rst > b/Documentation/virt/kvm/arm/pvlock.rst > new file mode 100644 > index ..d3c391b16d36 > --- /dev/null > +++ b/Documentation/virt/kvm/arm/pvlock.rst > @@ -0,0 +1,52 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +Paravirtualized lock support for arm64 > +== > + > +KVM/arm64 provides a hypervisor service call for paravirtualized guests to > +determine whether a VCPU is currently running or not. > + > +A new SMCCC compatible hypercall is defined: > + > +* ARM_SMCCC_VENDOR_HYP_KVM_PV_LOCK_FUNC_ID: 0xC602 > + > +ARM_SMCCC_VENDOR_HYP_KVM_PV_LOCK_FUNC_ID > + > += == > +Function ID: (uint32)0xC602 > +Return value: (int64) IPA of the pv lock data structure for this > + VCPU. On failure: > + NOT_SUPPORTED (-1) > += == > + > +The IPA returned by PV_LOCK_PREEMPTED should be mapped by the guest as normal > +memory with inner and outer write back caching attributes, in the inner > +shareable domain. > + > +PV_LOCK_PREEMPTED returns the structure for the calling VCPU. > + > +PV lock state > +- > + > +The structure pointed to by the PV_LOCK_PREEMPTED hypercall is as follows: > + > ++---+-+-+-+ > +| Field | Byte Length | Byte Offset | Description | > ++===+=+=+=+ > +| preempted | 8 | 0 | Indicate if the VCPU that owns | > +| | | | this struct is running or not. | > +| | | | Non-zero values mean the VCPU | > +| | | | has been preempted. Zero means | > +| | | | the VCPU is not preempted. | > ++---+-+-+-+ > + > +The preempted field will be updated to 1 by the hypervisor prior to > scheduling > +a VCPU. When the VCPU is scheduled out, the preempted field will be updated > +to 0 by the hypervisor. The text above doesn't match the description in the table. Please update the texts to align them with the code. [...] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm64: Try PMD block mappings if PUD mappings are not supported
Hi Alex, Alexandru Elisei writes: > Hi Punit, > > Thank you for having a look! > > On 9/11/20 9:34 AM, Punit Agrawal wrote: >> Hi Alexandru, >> >> Alexandru Elisei writes: >> >>> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to >>> use the same block size to map the faulting IPA in stage 2. If stage 2 >>> cannot the same block mapping because the block size doesn't fit in the >>> memslot or the memslot is not properly aligned, user_mem_abort() will fall >>> back to a page mapping, regardless of the block size. We can do better for >>> PUD backed hugetlbfs by checking if a PMD block mapping is supported before >>> deciding to use a page. >> I think this was discussed in the past. >> >> I have a vague recollection of there being a problem if the user and >> stage 2 mappings go out of sync - can't recall the exact details. > > I'm not sure what you mean by the two tables going out of sync. I'm looking at > Documentation/vm/unevictable-lru.rst and this is what it says regarding > hugetlbfs: > > "VMAs mapping hugetlbfs page are already effectively pinned into memory. We > neither need nor want to mlock() these pages. However, to preserve the prior > behavior of mlock() - before the unevictable/mlock changes - mlock_fixup() > will > call make_pages_present() in the hugetlbfs VMA range to allocate the huge > pages > and populate the ptes." > > Please correct me if I'm wrong, but my interpretation is that once a hugetlbfs > page has been mapped in a process' address space, the only way to unmap it is > via > munmap. If that's the case, the KVM mmu notifier should take care of unmapping > from stage 2 the entire memory range addressed by the hugetlbfs pages, > right? You're right - I managed to confuse myself. Thinking about it with a bit more context, I don't see a problem with what the patch is doing. Apologies for the noise. >> >> Putting it out there in case anybody else on the thread can recall the >> details of the previous discussion (offlist). >> >> Though things may have changed and if it passes testing - then maybe I >> am mis-remembering. I'll take a closer look at the patch and shout out >> if I notice anything. > > The test I ran was to boot a VM and run ltp (with printk's sprinkled in the > host > kernel to see what page size and where it gets mapped/unmapped at stage 2). > Do you > mind recommending other tests that I might run? You may want to put the changes through VM save / restore and / or live migration. It should help catch any issues with transitioning from hugepages to regular pages. Hope that helps. Thanks, Punit [...] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm64: Try PMD block mappings if PUD mappings are not supported
Hi Alexandru, Alexandru Elisei writes: > When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to > use the same block size to map the faulting IPA in stage 2. If stage 2 > cannot the same block mapping because the block size doesn't fit in the > memslot or the memslot is not properly aligned, user_mem_abort() will fall > back to a page mapping, regardless of the block size. We can do better for > PUD backed hugetlbfs by checking if a PMD block mapping is supported before > deciding to use a page. I think this was discussed in the past. I have a vague recollection of there being a problem if the user and stage 2 mappings go out of sync - can't recall the exact details. Putting it out there in case anybody else on the thread can recall the details of the previous discussion (offlist). Though things may have changed and if it passes testing - then maybe I am mis-remembering. I'll take a closer look at the patch and shout out if I notice anything. Thanks, Punit > > vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning > its value. > > Signed-off-by: Alexandru Elisei > --- > Tested on a rockpro64 with 4K pages and hugetlbfs hugepagesz=1G (PUD sized > block mappings). First test, guest RAM starts at 0x8100 > (memslot->base_gfn not aligned to 1GB); second test, guest RAM starts at > 0x8000 , but is only 512 MB. In both cases using PUD mappings is not > possible because either the memslot base address is not aligned, or the > mapping would extend beyond the memslot. > > Without the changes, user_mem_abort() uses 4K pages to map the guest IPA. > With the patches, user_mem_abort() uses PMD block mappings (2MB) to map the > guest RAM, which means less TLB pressure and fewer stage 2 aborts. > > Changes since v1 [1]: > - Rebased on top of Will's stage 2 page table handling rewrite, version 4 > of the series [2]. His series is missing the patch "KVM: arm64: Update > page shift if stage 2 block mapping not supported" and there might be a > conflict (it's straightforward to fix). > > [1] https://www.spinics.net/lists/arm-kernel/msg834015.html > [2] https://www.spinics.net/lists/arm-kernel/msg835806.html > > arch/arm64/kvm/mmu.c | 19 ++- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 1041be1fafe4..39c539d4d4cb 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -776,16 +776,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > else > vma_shift = PAGE_SHIFT; > > - vma_pagesize = 1ULL << vma_shift; > if (logging_active || > - (vma->vm_flags & VM_PFNMAP) || > - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { > + (vma->vm_flags & VM_PFNMAP)) { > force_pte = true; > - vma_pagesize = PAGE_SIZE; > + vma_shift = PAGE_SHIFT; > + } > + > + if (vma_shift == PUD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) > +vma_shift = PMD_SHIFT; > + > + if (vma_shift == PMD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { > + force_pte = true; > + vma_shift = PAGE_SHIFT; > } > > + vma_pagesize = 1UL << vma_shift; > if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE) > - fault_ipa &= huge_page_mask(hstate_vma(vma)); > + fault_ipa &= ~(vma_pagesize - 1); > > gfn = fault_ipa >> PAGE_SHIFT; > mmap_read_unlock(current->mm); ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH 7/9] KVM: arm64: Do not try to map PUDs when they are folded into PMD
Hi Marc, Noticed this patch while catching up with the lists. Marc Zyngier writes: > For the obscure cases where PMD and PUD are the same size > (64kB pages with 42bit VA, for example, which results in only > two levels of page tables), we can't map anything as a PUD, > because there is... erm... no PUD to speak of. Everything is > either a PMD or a PTE. > > So let's only try and map a PUD when its size is different from > that of a PMD. > > Cc: sta...@vger.kernel.org > Fixes: b8e0ba7c8bea ("KVM: arm64: Add support for creating PUD hugepages at > stage 2") > Reported-by: Gavin Shan > Reported-by: Eric Auger > Reviewed-by: Alexandru Elisei > Reviewed-by: Gavin Shan > Tested-by: Gavin Shan > Tested-by: Eric Auger > Tested-by: Alexandru Elisei > Signed-off-by: Marc Zyngier > --- > arch/arm64/kvm/mmu.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 0121ef2c7c8d..16b8660ddbcc 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1964,7 +1964,12 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > (fault_status == FSC_PERM && >stage2_is_exec(mmu, fault_ipa, vma_pagesize)); > > - if (vma_pagesize == PUD_SIZE) { > + /* > + * If PUD_SIZE == PMD_SIZE, there is no real PUD level, and > + * all we have is a 2-level page table. Trying to map a PUD in > + * this case would be fatally wrong. > + */ > + if (PUD_SIZE != PMD_SIZE && vma_pagesize == PUD_SIZE) { > pud_t new_pud = kvm_pfn_pud(pfn, mem_type); > > new_pud = kvm_pud_mkhuge(new_pud); Good catch! Missed the 64kb / 42b VA case while adding the initial support. Thanks for fixing it. Punit ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v10 0/8] kvm: arm64: Support PUD hugepage at stage 2
Christoffer Dall writes: > On Tue, Dec 11, 2018 at 05:10:33PM +, Suzuki K Poulose wrote: >> This series is an update to the PUD hugepage support previously posted >> at [0]. This patchset adds support for PUD hugepages at stage 2 a >> feature that is useful on cores that have support for large sized TLB >> mappings (e.g., 1GB for 4K granule). >> >> The patches are based on v4.20-rc4 >> >> The patches have been tested on AMD Seattle system with the following >> hugepage sizes - 2M and 1G. >> >> Right now the PUD hugepage for stage2 is only supported if the stage2 >> has 4 levels. i.e, with an IPA size of minimum 44bits with 4K pages. >> This could be relaxed to stage2 with 3 levels, with the stage1 PUD huge >> page mapped in the entry level of the stage2 (i.e, pgd). I have not >> added the change here to keep this version stable w.r.t the previous >> version. I could post a patch later after further discussions in the >> list. >> > > For the series: > > Reviewed-by: Christoffer Dall Thanks a lot for reviewing the patches and the tag. And to Suzuki for picking up the patchset. (I was happy to see this while catching up with the lists after an extended break!) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 8/8] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 20 + arch/arm/include/asm/stage2_pgtable.h | 5 ++ arch/arm64/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 104 +++-- 6 files changed, 143 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index e62f0913ce7d..6336319a0d5b 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) ({ BUG(); 0; }) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)( {BUG(); pud; }) /* * The following kvm_*pud*() functions are provided strictly to allow @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} + static inline bool kvm_s2pud_exec(pud_t *pud) { BUG(); diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index f6a7ea805232..f9017167a8d1 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -68,4 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define stage2_pud_table_empty(kvm, pudp) false +static inline bool kvm_stage2_has_pud(struct kvm *kvm) +{ + return false; +} + #endif /* __ARM_S2_PGTABLE_H_ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 9f941f70775c..8af4b1befa42 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) + #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 336e24cddc87..6f1c187f1c86 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index bb0f3f17a7a9..576128635f3c 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -390,6 +390,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(p
[PATCH v9 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 +++ arch/arm64/include/asm/kvm_mmu.h | 5 +++ arch/arm64/include/asm/pgtable-hwdef.h | 2 + virt/kvm/arm/mmu.c | 53 +++--- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 37bf85d39607..839a619873d3 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 8da6d1b2a196..c755b37b3f92 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 1d7d8da2ef9b..336e24cddc87 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1c669c3c1208..8e44dccd1b47 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + *pudpp = NULL; + *pmdpp = NULL; + *ptepp = NULL; + + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp)) + return false; + + if (stage2_pud_huge(kvm, *pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(kvm, pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + } ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false; - return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, , , ); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 4/8] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Reviewed-by: Suzuki K Poulose Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 15 +++ arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index e6eff8bf5d7f..37bf85d39607 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functions are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 13d482710292..8da6d1b2a196 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index fb5325f7a1ac..1c669c3c1208 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, do { next = stage2_pud_addr_end(kvm, addr, end); if (!stage2_pud_none(kvm, *pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(kvm, *pud)); - stage2_wp_pmds(kvm, pud, addr, next); + if (stage2_pud_huge(kvm, *pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(kvm, pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1392,7 +1395,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 7/8] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 + arch/arm64/include/asm/kvm_mmu.h | 5 arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 39 4 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index fea5e723e3ac..e62f0913ce7d 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 612032bbb428..9f941f70775c 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f51e2271e6a3..bb0f3f17a7a9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -386,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index bd749601195f..3893ea6a50bf 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1932,42 +1937,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); } static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte))/* Just a page... */ + else return pte_young(*pte); - - return 0; } int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 6/8] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 9 + arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 22 +++--- 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 839a619873d3..fea5e723e3ac 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) ({ BUG(); 0; }) + + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c755b37b3f92..612032bbb428 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 50b1ef8584c0..f51e2271e6a3 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -381,6 +386,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 8e44dccd1b47..bd749601195f 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1698,6 +1698,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1707,24 +1708,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, , , )) goto out; - if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; - goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; } - pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(>kvm->mmu_lock); if (pfn_valid) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 59595207c5e1..6912529946fb 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1475,7 +1475,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, force_pte = false; + bool write_fault, writable, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1598,19 +1599,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) invalidate_icache_guest_page(pfn, vma_pagesize); + /* +* If we took an execution fault we have made the +* icache/dcache coherent above and should now let the s2 +* mapping be executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1621,13 +1628,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mark_page_dirty(kvm, gfn); } - if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pte = kvm_s2pte_mkexec(new_pte); - } ret = stage2_set_pte(kvm, memcache, fault_ipa, _pte, flags); } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 0/8] KVM: Support PUD hugepage at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2 a feature that is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). The patches are based on the latest upstream kernel. The patches have been tested on AMD Seattle system with the following hugepage sizes - 2M and 1G. Thanks, Punit [0] https://patchwork.kernel.org/cover/10622379/ v8 -> v9 * Dropped bugfix patch 1 which has been merged v7 -> v8 * Add kvm_stage2_has_pud() helper on arm32 * Rebased to v6 of 52bit dynamic IPA support v6 -> v7 * Restrict thp check to exclude hugetlbfs pages - Patch 1 * Don't update PUD entry if there's no change - Patch 9 * Add check for PUD level in stage 2 - Patch 9 v5 -> v6 * Split Patch 1 to move out the refactoring of exec permissions on page table entries. * Patch 4 - Initialise p*dpp pointers in stage2_get_leaf_entry() * Patch 5 - Trigger a BUG() in kvm_pud_pfn() on arm v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] Punit Agrawal (8): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault KVM: arm/arm64: Introduce helpers to manipulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 61 + arch/arm/include/asm/stage2_pgtable.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 48 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 312 ++--- 6 files changed, 360 insertions(+), 79 deletions(-) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 1/8] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 49 -- 1 file changed, 30 insertions(+), 19 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5eca48bdb1a6..59595207c5e1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1475,7 +1475,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, exec_fault, writable, force_pte = false; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1484,7 +1484,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1504,10 +1504,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { - hugetlb = true; + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { + /* +* Fallback to PTE if it's not one of the Stage 2 +* supported hugepage sizes +*/ + vma_pagesize = PAGE_SIZE; + /* * Pages belonging to memslots that don't have the same * alignment for userspace and IPA cannot be mapped using @@ -1573,23 +1579,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) - hugetlb = transparent_hugepage_adjust(, _ipa); + if (vma_pagesize == PAGE_SIZE && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ + if (transparent_hugepage_adjust(, _ipa)) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); - if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); if (exec_fault) { new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) @@ -1602,16 +1618,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - if (exec_fault) { new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_guest_page(pfn, PAGE_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v9 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 14 -- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 1098ffc3d54b..e6eff8bf5d7f 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -82,6 +82,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE) #define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; }) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 658657367f2f..13d482710292 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,6 +184,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 6912529946fb..fb5325f7a1ac 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -607,7 +607,7 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start, addr = start; do { pte = pte_offset_kernel(pmd, addr); - kvm_set_pte(pte, pfn_pte(pfn, prot)); + kvm_set_pte(pte, kvm_pfn_pte(pfn, prot)); get_page(virt_to_page(pte)); pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1202,7 +1202,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, pfn = __phys_to_pfn(pa); for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { - pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE); + pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE); if (writable) pte = kvm_s2pte_mkwrite(pte); @@ -1611,8 +1611,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); if (vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1621,7 +1623,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); @@ -1878,7 +1880,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) * just like a translation fault and clean the cache to the PoC. */ clean_dcache_guest_page(pfn, PAGE_SIZE); - stage2_pte = pfn_pte(pfn, PAGE_S2); + stage2_pte = kvm_pfn_pte(pfn, PAGE_S2); handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, _pte); } -- 2.19.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 1/9] KVM: arm/arm64: Ensure only THP is candidate for adjustment
Punit Agrawal writes: > Christoffer Dall writes: > >> On Mon, Oct 01, 2018 at 04:54:35PM +0100, Punit Agrawal wrote: >>> PageTransCompoundMap() returns true for hugetlbfs and THP >>> hugepages. This behaviour incorrectly leads to stage 2 faults for >>> unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be >>> treated as THP faults. >>> >>> Tighten the check to filter out hugetlbfs pages. This also leads to >>> consistently mapping all unsupported hugepage sizes as PTE level >>> entries at stage 2. >>> >>> Signed-off-by: Punit Agrawal >>> Reviewed-by: Suzuki Poulose >>> Cc: Christoffer Dall >>> Cc: Marc Zyngier >>> Cc: sta...@vger.kernel.org # v4.13+ >> >> >> Hmm, this function is only actually called from user_mem_abort() if we >> have (!hugetlb), so I'm not sure the cc stable here was actually >> warranted, nor that this patch is strictly necessary. >> >> It doesn't hurt, and makes the code potentially more robust for the >> future though. >> >> Am I missing something? > > !hugetlb is only true for hugepage sizes supported at stage 2. Of course I meant "hugetlb" above (Note the lack of "!"). > The function also got called for unsupported hugepage size at stage 2, > e.g., 64k hugepage with 4k page size, which then ended up doing the > wrong thing. > > Hope that adds some context. I should've added this to the commit log. > >> >> Thanks, >> >> Christoffer >> >>> --- >>> virt/kvm/arm/mmu.c | 8 +++- >>> 1 file changed, 7 insertions(+), 1 deletion(-) >>> >>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>> index 7e477b3cae5b..c23a1b323aad 100644 >>> --- a/virt/kvm/arm/mmu.c >>> +++ b/virt/kvm/arm/mmu.c >>> @@ -1231,8 +1231,14 @@ static bool transparent_hugepage_adjust(kvm_pfn_t >>> *pfnp, phys_addr_t *ipap) >>> { >>> kvm_pfn_t pfn = *pfnp; >>> gfn_t gfn = *ipap >> PAGE_SHIFT; >>> + struct page *page = pfn_to_page(pfn); >>> >>> - if (PageTransCompoundMap(pfn_to_page(pfn))) { >>> + /* >>> +* PageTransCompoungMap() returns true for THP and >>> +* hugetlbfs. Make sure the adjustment is done only for THP >>> +* pages. >>> +*/ >>> + if (!PageHuge(page) && PageTransCompoundMap(page)) { >>> unsigned long mask; >>> /* >>> * The address we faulted on is backed by a transparent huge >>> -- >>> 2.18.0 >>> > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 1/9] KVM: arm/arm64: Ensure only THP is candidate for adjustment
Christoffer Dall writes: > On Mon, Oct 01, 2018 at 04:54:35PM +0100, Punit Agrawal wrote: >> PageTransCompoundMap() returns true for hugetlbfs and THP >> hugepages. This behaviour incorrectly leads to stage 2 faults for >> unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be >> treated as THP faults. >> >> Tighten the check to filter out hugetlbfs pages. This also leads to >> consistently mapping all unsupported hugepage sizes as PTE level >> entries at stage 2. >> >> Signed-off-by: Punit Agrawal >> Reviewed-by: Suzuki Poulose >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: sta...@vger.kernel.org # v4.13+ > > > Hmm, this function is only actually called from user_mem_abort() if we > have (!hugetlb), so I'm not sure the cc stable here was actually > warranted, nor that this patch is strictly necessary. > > It doesn't hurt, and makes the code potentially more robust for the > future though. > > Am I missing something? !hugetlb is only true for hugepage sizes supported at stage 2. The function also got called for unsupported hugepage size at stage 2, e.g., 64k hugepage with 4k page size, which then ended up doing the wrong thing. Hope that adds some context. I should've added this to the commit log. > > Thanks, > > Christoffer > >> --- >> virt/kvm/arm/mmu.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 7e477b3cae5b..c23a1b323aad 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1231,8 +1231,14 @@ static bool transparent_hugepage_adjust(kvm_pfn_t >> *pfnp, phys_addr_t *ipap) >> { >> kvm_pfn_t pfn = *pfnp; >> gfn_t gfn = *ipap >> PAGE_SHIFT; >> +struct page *page = pfn_to_page(pfn); >> >> -if (PageTransCompoundMap(pfn_to_page(pfn))) { >> +/* >> + * PageTransCompoungMap() returns true for THP and >> + * hugetlbfs. Make sure the adjustment is done only for THP >> + * pages. >> + */ >> +if (!PageHuge(page) && PageTransCompoundMap(page)) { >> unsigned long mask; >> /* >> * The address we faulted on is backed by a transparent huge >> -- >> 2.18.0 >> ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2] KVM: arm/arm64: Check memslot bounds before mapping hugepages
Hi Lukas, Lukas Braun writes: > Userspace can create a memslot with memory backed by (transparent) > hugepages, but with bounds that do not align with hugepages. > In that case, we cannot map the entire region in the guest as hugepages > without exposing additional host memory to the guest and potentially > interfering with other memslots. > Consequently, this patch adds a bounds check when populating guest page > tables and forces the creation of regular PTEs if mapping an entire > hugepage would violate the memslots bounds. > > Signed-off-by: Lukas Braun > --- > > Hi everyone, > > for v2, in addition to writing the condition the way Marc suggested, I > moved the whole check so it also catches the problem when the hugepage > was allocated explicitly, not only for THPs. Ok, that makes sense. Memslot bounds could exceed for hugetlbfs pages as well. > The second line is quite long, but splitting it up would make things > rather ugly IMO, so I left it as it is. Let's try to do better - user_mem_abort() is quite hard to follow as it is. > > > Regards, > Lukas > > > virt/kvm/arm/mmu.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c > index ed162a6c57c5..ba77339e23ec 100644 > --- a/virt/kvm/arm/mmu.c > +++ b/virt/kvm/arm/mmu.c > @@ -1500,7 +1500,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, > phys_addr_t fault_ipa, > return -EFAULT; > } > > - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { > + if ((fault_ipa & S2_PMD_MASK) < (memslot->base_gfn << PAGE_SHIFT) || > + ALIGN(fault_ipa, S2_PMD_SIZE) >= ((memslot->base_gfn + > memslot->npages) << PAGE_SHIFT)) { > + /* PMD entry would map something outside of the memslot */ > + force_pte = true; > + } else if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { > hugetlb = true; > gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; > } else { For the purpose of this fix, using a helper to check whether the mapping fits in the memslot makes things clearer (imo) (untested patch below) - diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ed162a6c57c5..8bca141eb45e 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1466,6 +1466,18 @@ static void kvm_send_hwpoison_signal(unsigned long address, send_sig_info(SIGBUS, , current); } +static bool mapping_in_memslot(struct kvm_memory_slot *memslot, +phys_addr_t fault_ipa, unsigned long mapping_size) +{ + gfn_t start_gfn = (fault_ipa & ~(mapping_size - 1)) >> PAGE_SHIFT; + gfn_t end_gfn = ALIGN(fault_ipa, mapping_size) >> PAGE_SHIFT; + + WARN_ON(!is_power_of_2(mapping_size)); + + return memslot->base_gfn <= start_gfn && + end_gfn < memslot->base_gfn + memslot->npages; +} + static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_memory_slot *memslot, unsigned long hva, unsigned long fault_status) @@ -1480,7 +1492,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1500,7 +1512,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + /* Is the mapping contained in the memslot? */ + if (!mapping_in_memslot(memslot, fault_ipa, vma_pagesize)) { + /* memslot should be aligned to page size */ + vma_pagesize = PAGE_SIZE; + force_pte = true; + } + + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { Thoughts? ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 1/9] KVM: arm/arm64: Ensure only THP is candidate for adjustment
Marc Zyngier writes: > On 01/10/18 16:54, Punit Agrawal wrote: >> PageTransCompoundMap() returns true for hugetlbfs and THP >> hugepages. This behaviour incorrectly leads to stage 2 faults for >> unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be >> treated as THP faults. >> >> Tighten the check to filter out hugetlbfs pages. This also leads to >> consistently mapping all unsupported hugepage sizes as PTE level >> entries at stage 2. >> >> Signed-off-by: Punit Agrawal >> Reviewed-by: Suzuki Poulose >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: sta...@vger.kernel.org # v4.13+ > > FWIW, I've cherry-picked that single patch from the series and queued > it for 4.20. Thanks for picking up the fix. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v8 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose writes: > On 10/01/2018 04:54 PM, Punit Agrawal wrote: >> KVM only supports PMD hugepages at stage 2. Now that the various page >> handling routines are updated, extend the stage 2 fault handling to >> map in PUD hugepages. >> >> Addition of PUD hugepage support enables additional page sizes (e.g., >> 1G with 4K granule) which can be useful on cores that support mapping >> larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> --- >> arch/arm/include/asm/kvm_mmu.h | 20 + >> arch/arm/include/asm/stage2_pgtable.h | 9 +++ >> arch/arm64/include/asm/kvm_mmu.h | 16 >> arch/arm64/include/asm/pgtable-hwdef.h | 2 + >> arch/arm64/include/asm/pgtable.h | 2 + >> virt/kvm/arm/mmu.c | 106 +++-- >> 6 files changed, 149 insertions(+), 6 deletions(-) >> > > ... > >> diff --git a/arch/arm/include/asm/stage2_pgtable.h >> b/arch/arm/include/asm/stage2_pgtable.h >> index f6a7ea805232..a4ec25360e50 100644 >> --- a/arch/arm/include/asm/stage2_pgtable.h >> +++ b/arch/arm/include/asm/stage2_pgtable.h >> @@ -68,4 +68,13 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, >> phys_addr_t end) >> #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) >> #define stage2_pud_table_empty(kvm, pudp) false >> +static inline bool kvm_stage2_has_pud(struct kvm *kvm) >> +{ >> +#if CONFIG_PGTABLE_LEVELS > 3 >> +return true; >> +#else >> +return false; >> +#endif > > nit: We can only have PGTABLE_LEVELS=3 on ARM with LPAE. > AFAIT, this can be set to false always for ARM. I debated this and veered towards being generic but not committed either ways. I've updated this locally but will wait for further comments before re-posting. > >> +} >> + > > ... > >> @@ -1669,7 +1752,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> needs_exec = exec_fault || >> (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); >> - if (hugetlb && vma_pagesize == PMD_SIZE) { >> +if (hugetlb && vma_pagesize == PUD_SIZE) { >> +pud_t new_pud = kvm_pfn_pud(pfn, mem_type); >> + >> +new_pud = kvm_pud_mkhuge(new_pud); >> +if (writable) >> +new_pud = kvm_s2pud_mkwrite(new_pud); >> + >> +if (needs_exec) >> +new_pud = kvm_s2pud_mkexec(new_pud); >> + >> +ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, _pud); >> +} else if (hugetlb && vma_pagesize == PMD_SIZE) { >> pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); >> new_pmd = kvm_pmd_mkhuge(new_pmd); >> > > > Reviewed-by: Suzuki K Poulose Thanks a lot for going through the series. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 7/9] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 9 + arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 22 +++--- 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 26a2ab05b3f6..95b34aad0dc8 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) ({ BUG(); 0; }) + + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c06ef3be8ca9..b93e5167728f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1bdeca8918a6..a64a5c35beb1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5fd1eae7d964..1401dc015a22 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1706,6 +1706,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1715,24 +1716,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, , , )) goto out; - if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; - goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; } - pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(>kvm->mmu_lock); if (pfn_valid) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 3/9] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5b76ee204000..ec64d21c6571 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1481,7 +1481,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, writable, hugetlb = false, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1606,19 +1607,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) invalidate_icache_guest_page(pfn, vma_pagesize); + /* +* If we took an execution fault we have made the +* icache/dcache coherent above and should now let the s2 +* mapping be executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1629,13 +1636,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mark_page_dirty(kvm, gfn); } - if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pte = kvm_s2pte_mkexec(new_pte); - } ret = stage2_set_pte(kvm, memcache, fault_ipa, _pte, flags); } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 5/9] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Reviewed-by: Suzuki K Poulose Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 15 +++ arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index e77212e53e77..9ec09f4cc284 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functions are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index baabea0cbb66..3cc342177474 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 21079eb5bc15..9c48f2ca6583 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, do { next = stage2_pud_addr_end(kvm, addr, end); if (!stage2_pud_none(kvm, *pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(kvm, *pud)); - stage2_wp_pmds(kvm, pud, addr, next); + if (stage2_pud_huge(kvm, *pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(kvm, pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1392,7 +1395,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 4/9] KVM: arm/arm64: Introduce helpers to manipulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 14 -- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 5ad1a54f98dc..e77212e53e77 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -82,6 +82,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE) #define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; }) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 77b1af9e64db..baabea0cbb66 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,6 +184,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ec64d21c6571..21079eb5bc15 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -607,7 +607,7 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start, addr = start; do { pte = pte_offset_kernel(pmd, addr); - kvm_set_pte(pte, pfn_pte(pfn, prot)); + kvm_set_pte(pte, kvm_pfn_pte(pfn, prot)); get_page(virt_to_page(pte)); pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1202,7 +1202,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, pfn = __phys_to_pfn(pa); for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { - pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE); + pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE); if (writable) pte = kvm_s2pte_mkwrite(pte); @@ -1619,8 +1619,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); if (hugetlb && vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1629,7 +1631,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); @@ -1886,7 +1888,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) * just like a translation fault and clean the cache to the PoC. */ clean_dcache_guest_page(pfn, PAGE_SIZE); - stage2_pte = pfn_pte(pfn, PAGE_S2); + stage2_pte = kvm_pfn_pte(pfn, PAGE_S2); handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, _pte); } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 20 + arch/arm/include/asm/stage2_pgtable.h | 9 +++ arch/arm64/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 106 +++-- 6 files changed, 149 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index a42b9505c9a7..da5f078ae68c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) ({ BUG(); 0; }) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)( {BUG(); pud; }) /* * The following kvm_*pud*() functions are provided strictly to allow @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} + static inline bool kvm_s2pud_exec(pud_t *pud) { BUG(); diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index f6a7ea805232..a4ec25360e50 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -68,4 +68,13 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define stage2_pud_table_empty(kvm, pudp) false +static inline bool kvm_stage2_has_pud(struct kvm *kvm) +{ +#if CONFIG_PGTABLE_LEVELS > 3 + return true; +#else + return false; +#endif +} + #endif /* __ARM_S2_PGTABLE_H_ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3baf72705dcc..b4e9c2cceecb 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) + #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud
[PATCH v8 6/9] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 +++ arch/arm64/include/asm/kvm_mmu.h | 5 +++ arch/arm64/include/asm/pgtable-hwdef.h | 2 + virt/kvm/arm/mmu.c | 53 +++--- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 9ec09f4cc284..26a2ab05b3f6 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3cc342177474..c06ef3be8ca9 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..10ae592b78b8 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 9c48f2ca6583..5fd1eae7d964 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + *pudpp = NULL; + *pmdpp = NULL; + *ptepp = NULL; + + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp)) + return false; + + if (stage2_pud_huge(kvm, *pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(kvm, pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + } ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false; - return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, , , ); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 8/9] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 + arch/arm64/include/asm/kvm_mmu.h | 5 arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 39 4 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 95b34aad0dc8..a42b9505c9a7 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b93e5167728f..3baf72705dcc 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a64a5c35beb1..4d9476e420d9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1401dc015a22..1cf84507bbd6 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1940,42 +1945,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); } static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte))/* Just a page... */ + else return pte_young(*pte); - - return 0; } int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 2/9] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Cc: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 45 + 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index c23a1b323aad..5b76ee204000 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1490,7 +1490,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1510,10 +1510,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { + /* +* Fallback to PTE if it's not one of the Stage 2 +* supported hugepage sizes +*/ + vma_pagesize = PAGE_SIZE; + /* * Pages belonging to memslots that don't have the same * alignment for userspace and IPA cannot be mapped using @@ -1579,23 +1586,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ hugetlb = transparent_hugepage_adjust(, _ipa); + if (hugetlb) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); - if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); if (exec_fault) { new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) @@ -1608,16 +1626,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - if (exec_fault) { new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_guest_page(pfn, PAGE_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 1/9] KVM: arm/arm64: Ensure only THP is candidate for adjustment
PageTransCompoundMap() returns true for hugetlbfs and THP hugepages. This behaviour incorrectly leads to stage 2 faults for unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be treated as THP faults. Tighten the check to filter out hugetlbfs pages. This also leads to consistently mapping all unsupported hugepage sizes as PTE level entries at stage 2. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: sta...@vger.kernel.org # v4.13+ --- virt/kvm/arm/mmu.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7e477b3cae5b..c23a1b323aad 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1231,8 +1231,14 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) { kvm_pfn_t pfn = *pfnp; gfn_t gfn = *ipap >> PAGE_SHIFT; + struct page *page = pfn_to_page(pfn); - if (PageTransCompoundMap(pfn_to_page(pfn))) { + /* +* PageTransCompoungMap() returns true for THP and +* hugetlbfs. Make sure the adjustment is done only for THP +* pages. +*/ + if (!PageHuge(page) && PageTransCompoundMap(page)) { unsigned long mask; /* * The address we faulted on is backed by a transparent huge -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v8 0/9] KVM: Support PUD hugepage at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2 a feature that is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). The only change in this version is to update the kvm_stage2_has_pud() helper for arm to use CONFIG_PGTABLE_LEVELS. The patches are based on v6 of the dynamic IPA support. The patches have been tested on AMD Seattle system with the following hugepage sizes - 64K, 32M, 1G. Thanks, Punit v7 -> v8 * Add kvm_stage2_has_pud() helper on arm32 * Rebased to v6 of 52bit dynamic IPA support v6 -> v7 * Restrict thp check to exclude hugetlbfs pages - Patch 1 * Don't update PUD entry if there's no change - Patch 9 * Add check for PUD level in stage 2 - Patch 9 v5 -> v6 * Split Patch 1 to move out the refactoring of exec permissions on page table entries. * Patch 4 - Initialise p*dpp pointers in stage2_get_leaf_entry() * Patch 5 - Trigger a BUG() in kvm_pud_pfn() on arm v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/kvm-arm/msg32753.html [1] https://lkml.org/lkml/2018/9/26/936 Punit Agrawal (9): KVM: arm/arm64: Ensure only THP is candidate for adjustment KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault KVM: arm/arm64: Introduce helpers to manipulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 61 + arch/arm/include/asm/stage2_pgtable.h | 9 + arch/arm64/include/asm/kvm_mmu.h | 48 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 320 +++-- 6 files changed, 373 insertions(+), 78 deletions(-) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v7.1 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
Punit Agrawal writes: > KVM only supports PMD hugepages at stage 2. Now that the various page > handling routines are updated, extend the stage 2 fault handling to > map in PUD hugepages. > > Addition of PUD hugepage support enables additional page sizes (e.g., > 1G with 4K granule) which can be useful on cores that support mapping > larger block sizes in the TLB entries. > > Signed-off-by: Punit Agrawal > Cc: Christoffer Dall > Cc: Marc Zyngier > Cc: Russell King > Cc: Catalin Marinas > Cc: Will Deacon > --- > > v7 -> v7.1 > > * Added arm helper kvm_stage2_has_pud() > * Added check for PUD level present at stage 2 > * Dropped redundant comment > * Fixed up kvm_pud_mkhuge() to complain on arm > > arch/arm/include/asm/kvm_mmu.h | 20 + > arch/arm/include/asm/stage2_pgtable.h | 5 ++ > arch/arm64/include/asm/kvm_mmu.h | 16 > arch/arm64/include/asm/pgtable-hwdef.h | 2 + > arch/arm64/include/asm/pgtable.h | 2 + > virt/kvm/arm/mmu.c | 106 +++-- > 6 files changed, 145 insertions(+), 6 deletions(-) > [...] > diff --git a/arch/arm/include/asm/stage2_pgtable.h > b/arch/arm/include/asm/stage2_pgtable.h > index f6a7ea805232..ec1567d9eb4b 100644 > --- a/arch/arm/include/asm/stage2_pgtable.h > +++ b/arch/arm/include/asm/stage2_pgtable.h > @@ -68,4 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, > phys_addr_t end) > #define stage2_pmd_table_empty(kvm, pmdp)kvm_page_empty(pmdp) > #define stage2_pud_table_empty(kvm, pudp)false > > +static inline bool kvm_stage2_has_pud(struct kvm *kvm) > +{ > + return KVM_VTCR_SL0 == VTCR_SL_L1; > +} > + Turns out this isn't quite the right check. On arm32, the maximum number of supported levels is 3 with LPAE - effectively the helper should always return false. I've updated the check locally to key off of CONFIG_PGTABLE_LEVELS. I'll post these patches later today. Thanks, Punit ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7.1 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- v7 -> v7.1 * Added arm helper kvm_stage2_has_pud() * Added check for PUD level present at stage 2 * Dropped redundant comment * Fixed up kvm_pud_mkhuge() to complain on arm arch/arm/include/asm/kvm_mmu.h | 20 + arch/arm/include/asm/stage2_pgtable.h | 5 ++ arch/arm64/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 106 +++-- 6 files changed, 145 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index a42b9505c9a7..da5f078ae68c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) ({ BUG(); 0; }) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)( {BUG(); pud; }) /* * The following kvm_*pud*() functions are provided strictly to allow @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} + static inline bool kvm_s2pud_exec(pud_t *pud) { BUG(); diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h index f6a7ea805232..ec1567d9eb4b 100644 --- a/arch/arm/include/asm/stage2_pgtable.h +++ b/arch/arm/include/asm/stage2_pgtable.h @@ -68,4 +68,9 @@ stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) #define stage2_pmd_table_empty(kvm, pmdp) kvm_page_empty(pmdp) #define stage2_pud_table_empty(kvm, pudp) false +static inline bool kvm_stage2_has_pud(struct kvm *kvm) +{ + return KVM_VTCR_SL0 == VTCR_SL_L1; +} + #endif /* __ARM_S2_PGTABLE_H_ */ diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3baf72705dcc..b4e9c2cceecb 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) + #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,
Re: [PATCH v7 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose writes: > Hi Punit, > > > On 09/24/2018 06:45 PM, Punit Agrawal wrote: >> KVM only supports PMD hugepages at stage 2. Now that the various page >> handling routines are updated, extend the stage 2 fault handling to >> map in PUD hugepages. >> >> Addition of PUD hugepage support enables additional page sizes (e.g., >> 1G with 4K granule) which can be useful on cores that support mapping >> larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon > > >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index a42b9505c9a7..a8e86b926ee0 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void); >> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) >> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) >> +#define kvm_pfn_pud(pfn, prot) (__pud(0)) >> #define kvm_pud_pfn(pud) ({ BUG(); 0; }) >> #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) >> +/* No support for pud hugepages */ >> +#define kvm_pud_mkhuge(pud) (pud) >> > > shouldn't this be BUG() like other PUD huge helpers for arm32 ? > >> /* >>* The following kvm_*pud*() functions are provided strictly to allow >> @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) >> return false; >> } >> +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) >> +{ >> +BUG(); >> +} >> + >> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) >> +{ >> +BUG(); >> +return pud; >> +} >> + >> +static inline pud_t kvm_s2pud_mkexec(pud_t pud) >> +{ >> +BUG(); >> +return pud; >> +} >> + > > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 3ff7ebb262d2..5b8163537bc2 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c > > ... > > >> @@ -1669,7 +1746,28 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> needs_exec = exec_fault || >> (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); >> - if (hugetlb && vma_pagesize == PMD_SIZE) { >> +if (hugetlb && vma_pagesize == PUD_SIZE) { >> +/* >> + * Assuming that PUD level always exists at Stage 2 - >> + * this is true for 4k pages with 40 bits IPA >> + * currently supported. >> + * >> + * When using 64k pages, 40bits of IPA results in >> + * using only 2-levels at Stage 2. Overlooking this >> + * problem for now as a PUD hugepage with 64k pages is >> + * too big (4TB) to be practical. >> + */ >> +pud_t new_pud = kvm_pfn_pud(pfn, mem_type); > > Is this based on the Dynamic IPA series ? The cover letter seems > to suggest that it is. But I don't see the check to make sure we have > stage2 PUD level here before we go ahead and try PUD huge page at > stage2. Also the comment above seems outdated in that case. It is indeed based on the Dynamic IPA series but I seem to have lost the actual changes introducing the checks for PUD level. Let me fix that up and post an update. Sorry for the noise. Punit > >> + >> +new_pud = kvm_pud_mkhuge(new_pud); >> +if (writable) >> +new_pud = kvm_s2pud_mkwrite(new_pud); >> + >> +if (needs_exec) >> +new_pud = kvm_s2pud_mkexec(new_pud); >> + >> +ret = stage2_set_pud_huge(kvm, memcache, fault_ipa, _pud); >> +} else if (hugetlb && vma_pagesize == PMD_SIZE) { >> pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); >> new_pmd = kvm_pmd_mkhuge(new_pmd); >> > > > Suzuki ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 9/9] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 20 + arch/arm64/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 108 +++-- 5 files changed, 143 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index a42b9505c9a7..a8e86b926ee0 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -84,11 +84,14 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) ({ BUG(); 0; }) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)(pud) /* * The following kvm_*pud*() functions are provided strictly to allow @@ -105,6 +108,23 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} + static inline bool kvm_s2pud_exec(pud_t *pud) { BUG(); diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3baf72705dcc..b4e9c2cceecb 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,12 +184,16 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) + #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -203,6 +207,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -215,6 +225,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) +#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys)__phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 3ff7ebb262d2..5b8163537bc2 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -115,6 +115,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) put_page(virt_to_page(pmd)); } +/** + * stage2_dissolve_pud() - clear and flush hug
[PATCH v7 8/9] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 + arch/arm64/include/asm/kvm_mmu.h | 5 arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 39 4 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 95b34aad0dc8..a42b9505c9a7 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -117,6 +117,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index b93e5167728f..3baf72705dcc 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -273,6 +273,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a64a5c35beb1..4d9476e420d9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e2487e5fff37..3ff7ebb262d2 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1225,6 +1225,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1940,42 +1945,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); } static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte))/* Just a page... */ + else return pte_young(*pte); - - return 0; } int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 0/9] KVM: Support PUD hugepages at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2 a feature that is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). This version fixes two bugs - * Corrects stage 2 fault handling for unsupported hugepage sizes (new patch 1/9). This is a long standing bug and needs backporting to earlier kernels * Ensures that multiple vcpus faulting on the same hugepage doesn't hamper forward progress (patch 9) The patches are based on dynamic IPA support which could lead to a situation where the guest doesn't have PUD level. In this case, the patches have been updated to fallback to the PTE level mappings at stage 2. The patches have been tested on AMD Seattle system with the following hugepage sizes - 64K, 32M, 1G. Thanks, Punit v6 -> v7 * Restrict thp check to exclude hugetlbfs pages - Patch 1 * Don't update PUD entry if there's no change - Patch 9 * Add check for PUD level in stage 2 - Patch 9 v5 -> v6 * Split Patch 1 to move out the refactoring of exec permissions on page table entries. * Patch 4 - Initialise p*dpp pointers in stage2_get_leaf_entry() * Patch 5 - Trigger a BUG() in kvm_pud_pfn() on arm v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/kvm-arm/msg32241.html [1] https://www.spinics.net/lists/kvm-arm/msg32641.html Punit Agrawal (9): KVM: arm/arm64: Ensure only THP is candidate for adjustment KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault KVM: arm/arm64: Introduce helpers to manipulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 61 + arch/arm64/include/asm/kvm_mmu.h | 48 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 324 +++-- 5 files changed, 368 insertions(+), 78 deletions(-) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 1/9] KVM: arm/arm64: Ensure only THP is candidate for adjustment
PageTransCompoundMap() returns true for hugetlbfs and THP hugepages. This behaviour incorrectly leads to stage 2 faults for unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be treated as THP faults. Tighten the check to filter out hugetlbfs pages. This also leads to consistently mapping all unsupported hugepage sizes as PTE level entries at stage 2. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Suzuki Poulose Cc: sta...@vger.kernel.org # v4.13+ --- virt/kvm/arm/mmu.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7e477b3cae5b..c23a1b323aad 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1231,8 +1231,14 @@ static bool transparent_hugepage_adjust(kvm_pfn_t *pfnp, phys_addr_t *ipap) { kvm_pfn_t pfn = *pfnp; gfn_t gfn = *ipap >> PAGE_SHIFT; + struct page *page = pfn_to_page(pfn); - if (PageTransCompoundMap(pfn_to_page(pfn))) { + /* +* PageTransCompoungMap() returns true for THP and +* hugetlbfs. Make sure the adjustment is done only for THP +* pages. +*/ + if (!PageHuge(page) && PageTransCompoundMap(page)) { unsigned long mask; /* * The address we faulted on is backed by a transparent huge -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 4/9] KVM: arm/arm64: Introduce helpers to manipulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 14 -- 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 5ad1a54f98dc..e77212e53e77 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -82,6 +82,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pud(pmdp) __pud(__pa(pmdp) | PMD_TYPE_TABLE) #define kvm_mk_pgd(pudp) ({ BUILD_BUG(); 0; }) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 77b1af9e64db..baabea0cbb66 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -184,6 +184,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_mk_pgd(pudp) \ __pgd(__phys_to_pgd_val(__pa(pudp)) | PUD_TYPE_TABLE) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ec64d21c6571..21079eb5bc15 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -607,7 +607,7 @@ static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start, addr = start; do { pte = pte_offset_kernel(pmd, addr); - kvm_set_pte(pte, pfn_pte(pfn, prot)); + kvm_set_pte(pte, kvm_pfn_pte(pfn, prot)); get_page(virt_to_page(pte)); pfn++; } while (addr += PAGE_SIZE, addr != end); @@ -1202,7 +1202,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, pfn = __phys_to_pfn(pa); for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) { - pte_t pte = pfn_pte(pfn, PAGE_S2_DEVICE); + pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE); if (writable) pte = kvm_s2pte_mkwrite(pte); @@ -1619,8 +1619,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); if (hugetlb && vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1629,7 +1631,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); @@ -1886,7 +1888,7 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) * just like a translation fault and clean the cache to the PoC. */ clean_dcache_guest_page(pfn, PAGE_SIZE); - stage2_pte = pfn_pte(pfn, PAGE_S2); + stage2_pte = kvm_pfn_pte(pfn, PAGE_S2); handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, _pte); } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 3/9] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5b76ee204000..ec64d21c6571 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1481,7 +1481,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, writable, hugetlb = false, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1606,19 +1607,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) invalidate_icache_guest_page(pfn, vma_pagesize); + /* +* If we took an execution fault we have made the +* icache/dcache coherent above and should now let the s2 +* mapping be executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1629,13 +1636,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mark_page_dirty(kvm, gfn); } - if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pte = kvm_s2pte_mkexec(new_pte); - } ret = stage2_set_pte(kvm, memcache, fault_ipa, _pte, flags); } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 6/9] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 +++ arch/arm64/include/asm/kvm_mmu.h | 5 +++ arch/arm64/include/asm/pgtable-hwdef.h | 2 + virt/kvm/arm/mmu.c | 53 +++--- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 9ec09f4cc284..26a2ab05b3f6 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -102,6 +102,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 3cc342177474..c06ef3be8ca9 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..10ae592b78b8 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 9c48f2ca6583..5fd1eae7d964 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1083,23 +1083,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + *pudpp = NULL; + *pmdpp = NULL; + *ptepp = NULL; + + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp)) + return false; + + if (stage2_pud_huge(kvm, *pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(kvm, pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + } ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false; - return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, , , ); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 5/9] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Reviewed-by: Suzuki K Poulose Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 15 +++ arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 32 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index e77212e53e77..9ec09f4cc284 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -87,6 +87,21 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functions are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index baabea0cbb66..3cc342177474 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -251,6 +251,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 21079eb5bc15..9c48f2ca6583 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1347,9 +1347,12 @@ static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd, do { next = stage2_pud_addr_end(kvm, addr, end); if (!stage2_pud_none(kvm, *pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(kvm, *pud)); - stage2_wp_pmds(kvm, pud, addr, next); + if (stage2_pud_huge(kvm, *pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(kvm, pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1392,7 +1395,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 7/9] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 9 + arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 22 +++--- 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 26a2ab05b3f6..95b34aad0dc8 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -85,6 +85,9 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) ({ BUG(); 0; }) + + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -108,6 +111,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= L_PTE_S2_RDWR; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c06ef3be8ca9..b93e5167728f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -187,6 +187,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -266,6 +268,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep) #ifdef __PAGETABLE_PMD_FOLDED diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1bdeca8918a6..a64a5c35beb1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 5fd1eae7d964..e2487e5fff37 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1706,6 +1706,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1715,24 +1716,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, , , )) goto out; - if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; - goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; } - pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(>kvm->mmu_lock); if (pfn_valid) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v7 2/9] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Cc: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 45 + 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index c23a1b323aad..5b76ee204000 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1490,7 +1490,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1510,10 +1510,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { + /* +* Fallback to PTE if it's not one of the Stage 2 +* supported hugepage sizes +*/ + vma_pagesize = PAGE_SIZE; + /* * Pages belonging to memslots that don't have the same * alignment for userspace and IPA cannot be mapped using @@ -1579,23 +1586,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ hugetlb = transparent_hugepage_adjust(, _ipa); + if (hugetlb) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); - if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); if (exec_fault) { new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) @@ -1608,16 +1626,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - if (exec_fault) { new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_guest_page(pfn, PAGE_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
Christoffer Dall writes: > On Mon, Sep 03, 2018 at 06:29:30PM +0100, Punit Agrawal wrote: >> Christoffer Dall writes: >> >> > [Adding Andrea and Steve in CC] >> > >> > On Thu, Aug 23, 2018 at 04:33:42PM +0100, Marc Zyngier wrote: >> >> When triggering a CoW, we unmap the RO page via an MMU notifier >> >> (invalidate_range_start), and then populate the new PTE using another >> >> one (change_pte). In the meantime, we'll have copied the old page >> >> into the new one. >> >> >> >> The problem is that the data for the new page is sitting in the >> >> cache, and should the guest have an uncached mapping to that page >> >> (or its MMU off), following accesses will bypass the cache. >> >> >> >> In a way, this is similar to what happens on a translation fault: >> >> We need to clean the page to the PoC before mapping it. So let's just >> >> do that. >> >> >> >> This fixes a KVM unit test regression observed on a HiSilicon platform, >> >> and subsequently reproduced on Seattle. >> >> >> >> Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on >> >> translation fault") >> >> Reported-by: Mike Galbraith >> >> Signed-off-by: Marc Zyngier >> >> --- >> >> virt/kvm/arm/mmu.c | 9 - >> >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> >> index 1d90d79706bd..287c8e274655 100644 >> >> --- a/virt/kvm/arm/mmu.c >> >> +++ b/virt/kvm/arm/mmu.c >> >> @@ -1811,13 +1811,20 @@ static int kvm_set_spte_handler(struct kvm *kvm, >> >> gpa_t gpa, u64 size, void *data >> >> void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) >> >> { >> >> unsigned long end = hva + PAGE_SIZE; >> >> + kvm_pfn_t pfn = pte_pfn(pte); >> >> pte_t stage2_pte; >> >> >> >> if (!kvm->arch.pgd) >> >> return; >> >> >> >> trace_kvm_set_spte_hva(hva); >> >> - stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2); >> >> + >> >> + /* >> >> + * We've moved a page around, probably through CoW, so let's treat >> >> + * just like a translation fault and clean the cache to the PoC. >> >> + */ >> >> + clean_dcache_guest_page(pfn, PAGE_SIZE); >> >> + stage2_pte = pfn_pte(pfn, PAGE_S2); >> >> handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, _pte); >> >> } >> > >> > How does this work for pmd mappings? >> >> kvm_set_spte_hva() isn't called for PMD mappings. But... >> >> > >> > Are we guaranteed that a pmd mapping (hugetlbfs or THP) is split before >> > a CoW happens? >> > >> > Steve tells me that we share THP mappings on fork and that we back THPs >> > by a zero page, so CoW with THP should be possible. >> > >> >> ...the problem seems to affect handling write permission faults for CoW >> or zero pages. >> >> The memory gets unmapped at stage 2 due to the invalidate notifier (in >> hugetlb_cow() for hugetlbfs and do_huge_pmd_wp_page() for THP) > > So just to make sure I get this right. For a pte CoW, Linux calls the > set_spte function to simply change the pte mapping, without doing any > unmapping at stage 2, No. I hadn't checked into the PTE CoW for zero pages when replying but was relying on Marc's commit log. I've outlined the flow below. > but with pmd, we unmap and wait to take another fault as an > alternative method? Having looked at handling of CoW handling for the different page sizes, here's my understanding of the steps for CoW faults - note the slight variance when dealing with PTE entries. * Guest takes a stage 2 permission fault (user_mem_abort()) * The host mapping is updated to point to another page (either zeroed or contents copied). This happens via the get_user_pages_unlocked() invoked via gfn_to_pfn_prot(). * For all page sizes, mmu_invalidate_range_start() notifiers are called which will unmap the memory at stage 2. * For PTE (wp_page_copy), set_pte_at_notify() is called which eventually calls kvm_set_pte_hva() modified in $SUBJECT. For hugepages (hugetlb_cow) and annonymous THP (do_huge_pmd_wp_page) there are no notifying versions of page table entry updaters so stage 2 entries remain unmapped. * mmu_notifier_invalidate_range_end() is called. This updates mmu_notifier_seq which will abort
Re: [PATCH] KVM: arm/arm64: Clean dcache to PoC when changing PTE due to CoW
Christoffer Dall writes: > [Adding Andrea and Steve in CC] > > On Thu, Aug 23, 2018 at 04:33:42PM +0100, Marc Zyngier wrote: >> When triggering a CoW, we unmap the RO page via an MMU notifier >> (invalidate_range_start), and then populate the new PTE using another >> one (change_pte). In the meantime, we'll have copied the old page >> into the new one. >> >> The problem is that the data for the new page is sitting in the >> cache, and should the guest have an uncached mapping to that page >> (or its MMU off), following accesses will bypass the cache. >> >> In a way, this is similar to what happens on a translation fault: >> We need to clean the page to the PoC before mapping it. So let's just >> do that. >> >> This fixes a KVM unit test regression observed on a HiSilicon platform, >> and subsequently reproduced on Seattle. >> >> Fixes: a9c0e12ebee5 ("KVM: arm/arm64: Only clean the dcache on translation >> fault") >> Reported-by: Mike Galbraith >> Signed-off-by: Marc Zyngier >> --- >> virt/kvm/arm/mmu.c | 9 - >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 1d90d79706bd..287c8e274655 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1811,13 +1811,20 @@ static int kvm_set_spte_handler(struct kvm *kvm, >> gpa_t gpa, u64 size, void *data >> void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) >> { >> unsigned long end = hva + PAGE_SIZE; >> +kvm_pfn_t pfn = pte_pfn(pte); >> pte_t stage2_pte; >> >> if (!kvm->arch.pgd) >> return; >> >> trace_kvm_set_spte_hva(hva); >> -stage2_pte = pfn_pte(pte_pfn(pte), PAGE_S2); >> + >> +/* >> + * We've moved a page around, probably through CoW, so let's treat >> + * just like a translation fault and clean the cache to the PoC. >> + */ >> +clean_dcache_guest_page(pfn, PAGE_SIZE); >> +stage2_pte = pfn_pte(pfn, PAGE_S2); >> handle_hva_to_gpa(kvm, hva, end, _set_spte_handler, _pte); >> } > > How does this work for pmd mappings? kvm_set_spte_hva() isn't called for PMD mappings. But... > > Are we guaranteed that a pmd mapping (hugetlbfs or THP) is split before > a CoW happens? > > Steve tells me that we share THP mappings on fork and that we back THPs > by a zero page, so CoW with THP should be possible. > ...the problem seems to affect handling write permission faults for CoW or zero pages. The memory gets unmapped at stage 2 due to the invalidate notifier (in hugetlb_cow() for hugetlbfs and do_huge_pmd_wp_page() for THP) while the cache maintenance for the newly allocated page will be skipped due to the !FSC_PERM. Hmm... smells like there might be a problem here. I'll try and put together a fix. > Thanks, > Christoffer > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 2/2] KVM: arm/arm64: Skip updating PTE entry if no change
When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Signed-off-by: Punit Agrawal Reviewed-by: Suzuki Poulose Cc: Marc Zyngier Cc: Christoffer Dall Cc: sta...@vger.kernel.org --- virt/kvm/arm/mmu.c | 4 1 file changed, 4 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 2bb0b5dba412..c2b95a22959b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1118,6 +1118,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, /* Create 2nd stage page table mapping - Level 3 */ old_pte = *pte; if (pte_present(old_pte)) { + /* Skip page table update if there is no change */ + if (pte_val(old_pte) == pte_val(*new_pte)) + return 0; + kvm_set_pte(pte, __pte(0)); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 1/2] KVM: arm/arm64: Skip updating PMD entry if no change
Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") Signed-off-by: Punit Agrawal Reviewed-by: Suzuki Poulose Cc: Marc Zyngier Cc: Christoffer Dall Cc: sta...@vger.kernel.org --- virt/kvm/arm/mmu.c | 38 +++--- 1 file changed, 27 insertions(+), 11 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..2bb0b5dba412 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1015,19 +1015,35 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); - /* -* Mapping in huge pages should only happen through a fault. If a -* page is merged into a transparent huge page, the individual -* subpages of that huge page should be unmapped through MMU -* notifiers before we get here. -* -* Merging of CompoundPages is not supported; they should become -* splitting first, unmapped, merged, and mapped back in on-demand. -*/ - VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); - old_pmd = *pmd; if (pmd_present(old_pmd)) { + /* +* Multiple vcpus faulting on the same PMD entry, can +* lead to them sequentially updating the PMD with the +* same value. Following the break-before-make +* (pmd_clear() followed by tlb_flush()) process can +* hinder forward progress due to refaults generated +* on missing translations. +* +* Skip updating the page table if the entry is +* unchanged. +*/ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + + /* +* Mapping in huge pages should only happen through a +* fault. If a page is merged into a transparent huge +* page, the individual subpages of that huge page +* should be unmapped through MMU notifiers before we +* get here. +* +* Merging of CompoundPages is not supported; they +* should become splitting first, unmapped, merged, +* and mapped back in on-demand. +*/ + VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); + pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 0/2] KVM: Fix refaulting due to page table update
Contention when updating a page table entry can lead to unnecessary refaults. The issue was reported by a user when testing PUD hugepage support[1] but also exists for PMD and PTE updates though with a lower probability. This version - * fixes a nit reported by Suzuki * Re-orders the checks when setting PMD hugepage * drops mistakenly introduced Change-id in the commit message Thanks, Punit [1] https://lkml.org/lkml/2018/7/16/482 Punit Agrawal (2): KVM: arm/arm64: Skip updating PMD entry if no change KVM: arm/arm64: Skip updating PTE entry if no change virt/kvm/arm/mmu.c | 43 --- 1 file changed, 32 insertions(+), 11 deletions(-) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 1/2] KVM: arm/arm64: Skip updating PMD entry if no change
Marc Zyngier writes: > Hi Punit, > > On 13/08/18 10:40, Punit Agrawal wrote: [...] >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 1d90d79706bd..2ab977edc63c 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1015,19 +1015,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> pmd = stage2_get_pmd(kvm, cache, addr); >> VM_BUG_ON(!pmd); >> >> -/* >> - * Mapping in huge pages should only happen through a fault. If a >> - * page is merged into a transparent huge page, the individual >> - * subpages of that huge page should be unmapped through MMU >> - * notifiers before we get here. >> - * >> - * Merging of CompoundPages is not supported; they should become >> - * splitting first, unmapped, merged, and mapped back in on-demand. >> - */ >> -VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); >> - >> old_pmd = *pmd; >> + >> if (pmd_present(old_pmd)) { >> +/* >> + * Mapping in huge pages should only happen through a >> + * fault. If a page is merged into a transparent huge >> + * page, the individual subpages of that huge page >> + * should be unmapped through MMU notifiers before we >> + * get here. >> + * >> + * Merging of CompoundPages is not supported; they >> + * should become splitting first, unmapped, merged, >> + * and mapped back in on-demand. >> + */ >> +VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); >> + >> +/* >> + * Multiple vcpus faulting on the same PMD entry, can >> + * lead to them sequentially updating the PMD with the >> + * same value. Following the break-before-make >> + * (pmd_clear() followed by tlb_flush()) process can >> + * hinder forward progress due to refaults generated >> + * on missing translations. >> + * >> + * Skip updating the page table if the entry is >> + * unchanged. >> + */ >> +if (pmd_val(old_pmd) == pmd_val(*new_pmd)) >> +goto out; > > I think the order of these two checks should be reversed: the first one > is clearly a subset of the second one, so it'd make sense to have the > global comparison before having the more specific one. Not that it > matter much in practice, but I just find it easier to reason about. Makes sense. I've reordered the checks for the next version. Thanks, Punit > >> + >> pmd_clear(pmd); >> kvm_tlb_flush_vmid_ipa(kvm, addr); >> } else { >> @@ -1035,6 +1052,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> } >> >> kvm_set_pmd(pmd, *new_pmd); >> +out: >> return 0; >> } >> >> > > Thanks, > > M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 1/2] KVM: arm/arm64: Skip updating PMD entry if no change
Suzuki K Poulose writes: > On 08/13/2018 10:40 AM, Punit Agrawal wrote: >> Contention on updating a PMD entry by a large number of vcpus can lead >> to duplicate work when handling stage 2 page faults. As the page table >> update follows the break-before-make requirement of the architecture, >> it can lead to repeated refaults due to clearing the entry and >> flushing the tlbs. >> >> This problem is more likely when - >> >> * there are large number of vcpus >> * the mapping is large block mapping >> >> such as when using PMD hugepages (512MB) with 64k pages. >> >> Fix this by skipping the page table update if there is no change in >> the entry being updated. >> >> Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") >> Change-Id: Ib417957c842ef67a6f4b786f68df62048d202c24 >> Signed-off-by: Punit Agrawal >> Cc: Marc Zyngier >> Cc: Christoffer Dall >> Cc: Suzuki Poulose >> Cc: sta...@vger.kernel.org >> --- >> virt/kvm/arm/mmu.c | 40 +--- >> 1 file changed, 29 insertions(+), 11 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 1d90d79706bd..2ab977edc63c 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1015,19 +1015,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> pmd = stage2_get_pmd(kvm, cache, addr); >> VM_BUG_ON(!pmd); >> - /* >> - * Mapping in huge pages should only happen through a fault. If a >> - * page is merged into a transparent huge page, the individual >> - * subpages of that huge page should be unmapped through MMU >> - * notifiers before we get here. >> - * >> - * Merging of CompoundPages is not supported; they should become >> - * splitting first, unmapped, merged, and mapped back in on-demand. >> - */ >> -VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); >> - >> old_pmd = *pmd; >> + >> if (pmd_present(old_pmd)) { >> +/* >> + * Mapping in huge pages should only happen through a >> + * fault. If a page is merged into a transparent huge >> + * page, the individual subpages of that huge page >> + * should be unmapped through MMU notifiers before we >> + * get here. >> + * >> + * Merging of CompoundPages is not supported; they >> + * should become splitting first, unmapped, merged, >> + * and mapped back in on-demand. >> + */ >> +VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); >> + >> +/* >> + * Multiple vcpus faulting on the same PMD entry, can >> + * lead to them sequentially updating the PMD with the >> + * same value. Following the break-before-make >> + * (pmd_clear() followed by tlb_flush()) process can >> + * hinder forward progress due to refaults generated >> + * on missing translations. >> + * >> + * Skip updating the page table if the entry is >> + * unchanged. >> + */ >> +if (pmd_val(old_pmd) == pmd_val(*new_pmd)) >> +goto out; > > minor nit: You could as well return here, as there are no other users > for the label and there are no clean up actions. Ok - I'll do a quick respin for the maintainers to pick up if they are happy with the other aspects of the patch. > > Either way, > > Reviewed-by: Suzuki K Poulose Thanks Suzuki. > > >> + >> pmd_clear(pmd); >> kvm_tlb_flush_vmid_ipa(kvm, addr); >> } else { >> @@ -1035,6 +1052,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache >> } >> kvm_set_pmd(pmd, *new_pmd); >> +out: >> return 0; >> } >> >> > > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2 1/2] KVM: arm/arm64: Skip updating PMD entry if no change
Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Fixes: ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") Change-Id: Ib417957c842ef67a6f4b786f68df62048d202c24 Signed-off-by: Punit Agrawal Cc: Marc Zyngier Cc: Christoffer Dall Cc: Suzuki Poulose Cc: sta...@vger.kernel.org --- virt/kvm/arm/mmu.c | 40 +--- 1 file changed, 29 insertions(+), 11 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..2ab977edc63c 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1015,19 +1015,36 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache pmd = stage2_get_pmd(kvm, cache, addr); VM_BUG_ON(!pmd); - /* -* Mapping in huge pages should only happen through a fault. If a -* page is merged into a transparent huge page, the individual -* subpages of that huge page should be unmapped through MMU -* notifiers before we get here. -* -* Merging of CompoundPages is not supported; they should become -* splitting first, unmapped, merged, and mapped back in on-demand. -*/ - VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); - old_pmd = *pmd; + if (pmd_present(old_pmd)) { + /* +* Mapping in huge pages should only happen through a +* fault. If a page is merged into a transparent huge +* page, the individual subpages of that huge page +* should be unmapped through MMU notifiers before we +* get here. +* +* Merging of CompoundPages is not supported; they +* should become splitting first, unmapped, merged, +* and mapped back in on-demand. +*/ + VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd)); + + /* +* Multiple vcpus faulting on the same PMD entry, can +* lead to them sequentially updating the PMD with the +* same value. Following the break-before-make +* (pmd_clear() followed by tlb_flush()) process can +* hinder forward progress due to refaults generated +* on missing translations. +* +* Skip updating the page table if the entry is +* unchanged. +*/ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + goto out; + pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1035,6 +1052,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache } kvm_set_pmd(pmd, *new_pmd); +out: return 0; } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2 2/2] KVM: arm/arm64: Skip updating PTE entry if no change
When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup") Change-Id: I28e17daf394a4821b13c2cf8726bf72bf30434f9 Signed-off-by: Punit Agrawal Cc: Marc Zyngier Cc: Christoffer Dall Cc: Suzuki Poulose Cc: sta...@vger.kernel.org --- virt/kvm/arm/mmu.c | 5 + 1 file changed, 5 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 2ab977edc63c..d0a9dccc3793 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1120,6 +1120,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, /* Create 2nd stage page table mapping - Level 3 */ old_pte = *pte; if (pte_present(old_pte)) { + /* Skip page table update if there is no change */ + if (pte_val(old_pte) == pte_val(*new_pte)) + goto out; + kvm_set_pte(pte, __pte(0)); kvm_tlb_flush_vmid_ipa(kvm, addr); } else { @@ -1127,6 +1131,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, } kvm_set_pte(pte, *new_pte); +out: return 0; } -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v2 0/2] KVM: Fix refaulting due to page table update
Hi, Here's a couple of patches to fix an issue when multiple vcpus fault on a page table entry[0]. The issue was reported by a user when testing PUD hugepage support[1] but also exists for PMD and PTE updates though with a lower probability. In this version - * the fix has been split for PMD hugepage and PTE update * refactored the PMD fix * applied fixes tag and cc'ing to stable Thanks, Punit [0] https://lkml.org/lkml/2018/8/10/256 [1] https://lkml.org/lkml/2018/7/16/482 Punit Agrawal (2): KVM: arm/arm64: Skip updating PMD entry if no change KVM: arm/arm64: Skip updating PTE entry if no change virt/kvm/arm/mmu.c | 45 ++--- 1 file changed, 34 insertions(+), 11 deletions(-) -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
Marc Zyngier writes: > Hi Punit, > > On Fri, 10 Aug 2018 12:13:00 +0100, > Punit Agrawal wrote: >> >> Contention on updating a page table entry by a large number of vcpus >> can lead to duplicate work when handling stage 2 page faults. As the >> page table update follows the break-before-make requirement of the >> architecture, it can lead to repeated refaults due to clearing the >> entry and flushing the tlbs. >> >> This problem is more likely when - >> >> * there are large number of vcpus >> * the mapping is large block mapping >> >> such as when using PMD hugepages (512MB) with 64k pages. >> >> Fix this by skipping the page table update if there is no change in >> the entry being updated. >> >> Signed-off-by: Punit Agrawal >> Cc: Marc Zyngier >> Cc: Christoffer Dall >> Cc: Suzuki Poulose > > This definitely deserves a Cc to stable, and a Fixes: tag. Agreed. For PMD the issue exists since commit ad361f093c1e ("KVM: ARM: Support hugetlbfs backed huge pages") when the file lived in arch/arm/kvm. (v3.12+) For PTE the issue exists since commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup"). (v3.8+) I'll split the fix into two patches and add a cc to stable. >> -- >> Hi, >> >> This problem was reported by a user when testing PUD hugepages. During >> VM restore when all threads are running cpu intensive workload, the >> refauting was causing the VM to not make any forward progress. >> >> This patch fixes the problem for PMD and PTE page fault handling. >> >> Thanks, >> Punit >> >> Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21 Just noticed this. Looks like the commit hook has escaped it's worktree. >> --- >> virt/kvm/arm/mmu.c | 16 >> 1 file changed, 16 insertions(+) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 1d90d79706bd..a66a5441ca2f 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); >> >> old_pmd = *pmd; >> +/* >> + * Multiple vcpus faulting on the same PMD entry, can lead to >> + * them sequentially updating the PMD with the same >> + * value. Following the break-before-make (pmd_clear() >> + * followed by tlb_flush()) process can hinder forward >> + * progress due to refaults generated on missing translations. >> + * >> + * Skip updating the page table if the entry is unchanged. >> + */ >> +if (pmd_val(old_pmd) == pmd_val(*new_pmd)) >> +return 0; > > Shouldn't you take this opportunity to also refactor it with the above > VM_BUG_ON and the below pmd_present? At the moment, we end-up testing > pmd_present twice, and your patch is awfully similar to the VM_BUG_ON > one. I went for the minimal change keeping a backport in mind. The VM_BUG_ON() is enabled only when CONFIG_DEBUG_VM is selected and checks for pfn ignoring the attributes, while this fix checks for the entire entry. Maybe I'm missing something, but I can't see an obvious way to combine the checks. I've eliminated the duplicate pmd_present() check and will post the updated patches. Thanks, Punit >> + >> if (pmd_present(old_pmd)) { >> pmd_clear(pmd); >> kvm_tlb_flush_vmid_ipa(kvm, addr); >> @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >> >> /* Create 2nd stage page table mapping - Level 3 */ >> old_pte = *pte; >> +/* Skip page table update if there is no change */ >> +if (pte_val(old_pte) == pte_val(*new_pte)) >> +return 0; >> + >> if (pte_present(old_pte)) { >> kvm_set_pte(pte, __pte(0)); >> kvm_tlb_flush_vmid_ipa(kvm, addr); > > Thanks, > > M. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm/arm64: Skip updating page table entry if no change
Contention on updating a page table entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Signed-off-by: Punit Agrawal Cc: Marc Zyngier Cc: Christoffer Dall Cc: Suzuki Poulose -- Hi, This problem was reported by a user when testing PUD hugepages. During VM restore when all threads are running cpu intensive workload, the refauting was causing the VM to not make any forward progress. This patch fixes the problem for PMD and PTE page fault handling. Thanks, Punit Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21 --- virt/kvm/arm/mmu.c | 16 1 file changed, 16 insertions(+) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..a66a5441ca2f 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd)); old_pmd = *pmd; + /* +* Multiple vcpus faulting on the same PMD entry, can lead to +* them sequentially updating the PMD with the same +* value. Following the break-before-make (pmd_clear() +* followed by tlb_flush()) process can hinder forward +* progress due to refaults generated on missing translations. +* +* Skip updating the page table if the entry is unchanged. +*/ + if (pmd_val(old_pmd) == pmd_val(*new_pmd)) + return 0; + if (pmd_present(old_pmd)) { pmd_clear(pmd); kvm_tlb_flush_vmid_ipa(kvm, addr); @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, /* Create 2nd stage page table mapping - Level 3 */ old_pte = *pte; + /* Skip page table update if there is no change */ + if (pte_val(old_pte) == pte_val(*new_pte)) + return 0; + if (pte_present(old_pte)) { kvm_set_pte(pte, __pte(0)); kvm_tlb_flush_vmid_ipa(kvm, addr); -- 2.18.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v6 1/8] KVM: arm/arm64: Share common code in user_mem_abort()
Hi Suzuki, Suzuki K Poulose writes: > On 16/07/18 12:08, Punit Agrawal wrote: >> The code for operations such as marking the pfn as dirty, and >> dcache/icache maintenance during stage 2 fault handling is duplicated >> between normal pages and PMD hugepages. >> >> Instead of creating another copy of the operations when we introduce >> PUD hugepages, let's share them across the different pagesizes. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier > > Thanks for splitting the patch. It looks much simpler this way. > > Reviewed-by: Suzuki K Poulose Thanks for reviewing the patches. Punit > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 7/8] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 + arch/arm64/include/asm/kvm_mmu.h | 5 arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 39 4 files changed, 32 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 13c0ee73756e..8225ec15cae7 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -111,6 +111,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 4d2780c588b0..c542052fb199 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a64a5c35beb1..4d9476e420d9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 6dc40b710d0d..c00155fe05c3 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1176,6 +1176,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1880,42 +1885,38 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); } static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + if (!stage2_get_leaf_entry(kvm, gpa, , , )) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte))/* Just a page... */ + else return pte_young(*pte); - - return 0; } int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 5/8] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 +++ arch/arm64/include/asm/kvm_mmu.h | 5 +++ arch/arm64/include/asm/pgtable-hwdef.h | 2 + virt/kvm/arm/mmu.c | 53 +++--- 4 files changed, 61 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index c3ac7a76fb69..ec0c58e139da 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -96,6 +96,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 84051930ddfe..15bc1be8f82f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -249,6 +249,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..10ae592b78b8 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ed8f8271c389..3839d0e3766d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1038,23 +1038,66 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + *pudpp = NULL; + *pmdpp = NULL; + *ptepp = NULL; + + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) + return false; + + if (pud_huge(*pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + } ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false; - return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, , , ); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 8/8] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 19 + arch/arm64/include/asm/kvm_mmu.h | 15 arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 98 -- 5 files changed, 131 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8225ec15cae7..665c746c46ce 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -77,11 +77,14 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) ({ BUG(); 0; }) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)(pud) /* * The following kvm_*pud*() functions are provided strictly to allow @@ -98,6 +101,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} static inline bool kvm_s2pud_exec(pud_t *pud) { diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c542052fb199..dd8a23159463 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -171,13 +171,16 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -191,6 +194,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -203,6 +212,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) +#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys)__phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index c00155fe05c3..552fceb0521b 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -116,6 +116,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) put_page(virt_to_page(pmd)); } +/** + * stage2_dissolve_pud() - clear and flush huge PUD entry + * @kvm:
[PATCH v6 6/8] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 9 + arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 22 +++--- 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index ec0c58e139da..13c0ee73756e 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -78,6 +78,9 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) ({ BUG(); 0; }) + + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -102,6 +105,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 15bc1be8f82f..4d2780c588b0 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -175,6 +175,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -254,6 +256,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1bdeca8918a6..a64a5c35beb1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 3839d0e3766d..6dc40b710d0d 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1641,6 +1641,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1650,24 +1651,23 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + if (!stage2_get_leaf_entry(vcpu->kvm, fault_ipa, , , )) goto out; - if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; - goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; } - pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(>kvm->mmu_lock); if (pfn_valid) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 4/8] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Reviewed-by: Suzuki K Poulose Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d095c2d0b284..c3ac7a76fb69 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functions are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 689def9bb9d5..84051930ddfe 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -239,6 +239,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e131b7f9b7d7..ed8f8271c389 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1288,9 +1288,12 @@ static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) do { next = stage2_pud_addr_end(addr, end); if (!stage2_pud_none(*pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(*pud)); - stage2_wp_pmds(pud, addr, next); + if (stage2_pud_huge(*pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1333,7 +1336,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 0/8] KVM: Support PUD hugepages at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2 a feature that is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). This version adds tags and addresses feedback received on v5. Additionally, Patch 1 (1 & 2 in this version) has been split to help make it easy to review. Support is added to code that is shared between arm and arm64. Dummy helpers for arm are provided as the port does not support PUD hugepage sizes. The patches have been tested on an A57 based system. The patchset is based on v4.18-rc5. The are a few conflicts with the support for 52 bit IPA[1] - I'll work with Suzuki to resolve this once the code is in the right shape. Thanks, Punit v5 -> v6 * Split Patch 1 to move out the refactoring of exec permissions on page table entries. * Patch 4 - Initialise p*dpp pointers in stage2_get_leaf_entry() * Patch 5 - Trigger a BUG() in kvm_pud_pfn() on arm v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/arm-kernel/msg664276.html [1] https://www.spinics.net/lists/kvm/msg171065.html Punit Agrawal (8): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault KVM: arm/arm64: Introduce helpers to manipulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 61 + arch/arm64/include/asm/kvm_mmu.h | 47 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 294 ++--- 5 files changed, 341 insertions(+), 74 deletions(-) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 3/8] KVM: arm/arm64: Introduce helpers to manipulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Reviewed-by: Suzuki K Poulose Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 8 +--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68b7c8a..d095c2d0b284 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void); int kvm_mmu_init(void); void kvm_clear_hyp_idmap(void); +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index fb9a7127bb75..689def9bb9d5 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -172,6 +172,11 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ea3d992e4fb7..e131b7f9b7d7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1554,8 +1554,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); if (hugetlb && vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1564,7 +1566,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 2/8] KVM: arm/arm64: Re-factor setting the Stage 2 entry to exec on fault
Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1c8d407a92ce..ea3d992e4fb7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1422,7 +1422,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, writable, hugetlb = false, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1541,19 +1542,25 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (exec_fault) invalidate_icache_guest_page(pfn, vma_pagesize); + /* +* If we took an execution fault we have made the +* icache/dcache coherent above and should now let the s2 +* mapping be executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1564,13 +1571,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, mark_page_dirty(kvm, gfn); } - if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pte = kvm_s2pte_mkexec(new_pte); - } ret = stage2_set_pte(kvm, memcache, fault_ipa, _pte, flags); } -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v6 1/8] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 39 +++ 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..1c8d407a92ce 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1431,7 +1431,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1451,7 +1451,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { @@ -1520,23 +1521,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ hugetlb = transparent_hugepage_adjust(, _ipa); + if (hugetlb) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); + + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); - if (hugetlb) { + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); if (exec_fault) { new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) @@ -1549,16 +1561,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - if (exec_fault) { new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_guest_page(pfn, PAGE_SIZE); } else if (fault_status == FSC_PERM) { /* Preserve execute if XN was already cleared */ if (stage2_is_exec(kvm, fault_ipa)) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v5 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose writes: > On 11/07/18 17:05, Punit Agrawal wrote: >> Suzuki K Poulose writes: >> >>> On 09/07/18 15:41, Punit Agrawal wrote: >>>> KVM only supports PMD hugepages at stage 2. Now that the various page >>>> handling routines are updated, extend the stage 2 fault handling to >>>> map in PUD hugepages. >>>> >>>> Addition of PUD hugepage support enables additional page sizes (e.g., >>>> 1G with 4K granule) which can be useful on cores that support mapping >>>> larger block sizes in the TLB entries. >>>> >>>> Signed-off-by: Punit Agrawal >>>> Cc: Christoffer Dall >>>> Cc: Marc Zyngier >>>> Cc: Russell King >>>> Cc: Catalin Marinas >>>> Cc: Will Deacon >>>> --- >>>>arch/arm/include/asm/kvm_mmu.h | 19 +++ >>>>arch/arm64/include/asm/kvm_mmu.h | 15 + >>>>arch/arm64/include/asm/pgtable-hwdef.h | 2 + >>>>arch/arm64/include/asm/pgtable.h | 2 + >>>>virt/kvm/arm/mmu.c | 78 -- >>>>5 files changed, 112 insertions(+), 4 deletions(-) >>>> >> >> [...] >> >>>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >>>> index a6d3ac9d7c7a..d8e2497e5353 100644 >>>> --- a/virt/kvm/arm/mmu.c >>>> +++ b/virt/kvm/arm/mmu.c >> >> [...] >> >>>> @@ -1100,6 +1139,7 @@ static int stage2_set_pte(struct kvm *kvm, struct >>>> kvm_mmu_memory_cache *cache, >>>> phys_addr_t addr, const pte_t *new_pte, >>>> unsigned long flags) >>>>{ >>>> + pud_t *pud; >>>>pmd_t *pmd; >>>>pte_t *pte, old_pte; >>>>bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; >>>> @@ -1108,6 +1148,22 @@ static int stage2_set_pte(struct kvm *kvm, struct >>>> kvm_mmu_memory_cache *cache, >>>>VM_BUG_ON(logging_active && !cache); >>>>/* Create stage-2 page table mapping - Levels 0 and 1 */ >>>> + pud = stage2_get_pud(kvm, cache, addr); >>>> + if (!pud) { >>>> + /* >>>> + * Ignore calls from kvm_set_spte_hva for unallocated >>>> + * address ranges. >>>> + */ >>>> + return 0; >>>> + } >>>> + >>>> + /* >>>> + * While dirty page logging - dissolve huge PUD, then continue >>>> + * on to allocate page. >>> >>> Punit, >>> >>> We don't seem to allocate a page here for the PUD entry, in case if it is >>> dissolved >>> or empty (i.e, stage2_pud_none(*pud) is true.). >> >> I was trying to avoid duplicating the PUD allocation by reusing the >> functionality in stage2_get_pmd(). >> >> Does the below updated comment help? >> >> /* >> * While dirty page logging - dissolve huge PUD, it'll be >> * allocated in stage2_get_pmd(). >> */ >> >> The other option is to duplicate the stage2_pud_none() case from >> stage2_get_pmd() here. > > I think the explicit check for stage2_pud_none() suits better here. > That would make it explicit that we are tearing down the entries > from top to bottom. Also, we may be able to short cut for case > where we know we just allocated a PUD page and hence we need another > PMD level page. Ok, I'll add the PUD allocation code here. > > Also, you are missing the comment about the assumption that stage2 PUD > level always exist with 4k fixed IPA. Hmm... I'm quite sure I wrote a comment to that effect but can't find it now. I'll include it in the next version. Thanks, Punit > > Cheers > Suzuki > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v5 2/7] KVM: arm/arm64: Introduce helpers to manupulate page table entries
Suzuki K Poulose writes: > On 09/07/18 15:41, Punit Agrawal wrote: >> Introduce helpers to abstract architectural handling of the conversion >> of pfn to page table entries and marking a PMD page table entry as a >> block entry. >> >> The helpers are introduced in preparation for supporting PUD hugepages >> at stage 2 - which are supported on arm64 but do not exist on arm. >> >> Signed-off-by: Punit Agrawal >> Acked-by: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> --- > > Reviewed-by: Suzuki K Poulose Other than the query on Patch 7 I've incorporated all your suggestions locally. Thanks a lot for reviewing the patches. Punit ps: Just noticed the typo (manupulate) in the subject. I've fixed it up locally. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v5 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose writes: > On 09/07/18 15:41, Punit Agrawal wrote: >> KVM only supports PMD hugepages at stage 2. Now that the various page >> handling routines are updated, extend the stage 2 fault handling to >> map in PUD hugepages. >> >> Addition of PUD hugepage support enables additional page sizes (e.g., >> 1G with 4K granule) which can be useful on cores that support mapping >> larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> --- >> arch/arm/include/asm/kvm_mmu.h | 19 +++ >> arch/arm64/include/asm/kvm_mmu.h | 15 + >> arch/arm64/include/asm/pgtable-hwdef.h | 2 + >> arch/arm64/include/asm/pgtable.h | 2 + >> virt/kvm/arm/mmu.c | 78 -- >> 5 files changed, 112 insertions(+), 4 deletions(-) >> [...] >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index a6d3ac9d7c7a..d8e2497e5353 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c [...] >> @@ -1100,6 +1139,7 @@ static int stage2_set_pte(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >>phys_addr_t addr, const pte_t *new_pte, >>unsigned long flags) >> { >> +pud_t *pud; >> pmd_t *pmd; >> pte_t *pte, old_pte; >> bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP; >> @@ -1108,6 +1148,22 @@ static int stage2_set_pte(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >> VM_BUG_ON(logging_active && !cache); >> /* Create stage-2 page table mapping - Levels 0 and 1 */ >> +pud = stage2_get_pud(kvm, cache, addr); >> +if (!pud) { >> +/* >> + * Ignore calls from kvm_set_spte_hva for unallocated >> + * address ranges. >> + */ >> +return 0; >> +} >> + >> +/* >> + * While dirty page logging - dissolve huge PUD, then continue >> + * on to allocate page. > > Punit, > > We don't seem to allocate a page here for the PUD entry, in case if it is > dissolved > or empty (i.e, stage2_pud_none(*pud) is true.). I was trying to avoid duplicating the PUD allocation by reusing the functionality in stage2_get_pmd(). Does the below updated comment help? /* * While dirty page logging - dissolve huge PUD, it'll be * allocated in stage2_get_pmd(). */ The other option is to duplicate the stage2_pud_none() case from stage2_get_pmd() here. What do you think? Thanks, Punit >> + */ >> +if (logging_active) >> +stage2_dissolve_pud(kvm, addr, pud); >> + >> pmd = stage2_get_pmd(kvm, cache, addr); >> if (!pmd) { > > And once you add an entry, pmd is just the matter of getting > stage2_pmd_offset() from your pud. > No need to start again from the top-level with stage2_get_pmd(). > > Cheers > Suzuki > > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v5 4/7] KVM: arm64: Support PUD hugepage in stage2_is_exec()
Suzuki K Poulose writes: > On 09/07/18 15:41, Punit Agrawal wrote: >> In preparation for creating PUD hugepages at stage 2, add support for >> detecting execute permissions on PUD page table entries. Faults due to >> lack of execute permissions on page table entries is used to perform >> i-cache invalidation on first execute. >> >> Provide trivial implementations of arm32 helpers to allow sharing of >> code. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> --- >> arch/arm/include/asm/kvm_mmu.h | 6 >> arch/arm64/include/asm/kvm_mmu.h | 5 +++ >> arch/arm64/include/asm/pgtable-hwdef.h | 2 ++ >> virt/kvm/arm/mmu.c | 49 +++--- >> 4 files changed, 57 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index c23722f75d5c..d05c8986e495 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -96,6 +96,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) >> } >> +static inline bool kvm_s2pud_exec(pud_t *pud) >> +{ >> +BUG(); >> +return false; >> +} >> + >> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) >> { >> *pmd = new_pmd; >> diff --git a/arch/arm64/include/asm/kvm_mmu.h >> b/arch/arm64/include/asm/kvm_mmu.h >> index 84051930ddfe..15bc1be8f82f 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -249,6 +249,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) >> return kvm_s2pte_readonly((pte_t *)pudp); >> } >> +static inline bool kvm_s2pud_exec(pud_t *pudp) >> +{ >> +return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); >> +} >> + >> static inline bool kvm_page_empty(void *ptr) >> { >> struct page *ptr_page = virt_to_page(ptr); >> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h >> b/arch/arm64/include/asm/pgtable-hwdef.h >> index fd208eac9f2a..10ae592b78b8 100644 >> --- a/arch/arm64/include/asm/pgtable-hwdef.h >> +++ b/arch/arm64/include/asm/pgtable-hwdef.h >> @@ -193,6 +193,8 @@ >> #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ >> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ >> +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] >> */ >> + >> /* >>* Memory Attribute override for Stage-2 (MemAttr[3:0]) >>*/ >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index ed8f8271c389..e73909a31e02 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1038,23 +1038,62 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> return 0; >> } >> -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) >> +/* >> + * stage2_get_leaf_entry - walk the stage2 VM page tables and return >> + * true if a valid and present leaf-entry is found. A pointer to the >> + * leaf-entry is returned in the appropriate level variable - pudpp, >> + * pmdpp, ptepp. >> + */ >> +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, >> + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) >> { >> +pud_t *pudp; >> pmd_t *pmdp; >> pte_t *ptep; > > nit: As mentioned in the other thread, you may initialize the reference > pointers to NULL to make sure we start clean and avoid the initialization > everywhere this is called. I took the approach to not touch the pointers unless they are being assigned a valid pointer. I'll initialise the incoming pointers (p*dpp) before proceeding with the table walk. Thanks, Punit > >> - pmdp = stage2_get_pmd(kvm, NULL, addr); >> +pudp = stage2_get_pud(kvm, NULL, addr); >> +if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) >> +return false; >> + >> +if (pud_huge(*pudp)) { >> +*pudpp = pudp; >> +return true; >> +} >> + >> +pmdp = stage2_pmd_offset(pudp, addr); >> if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) >> return false; >> - if (pmd_thp_or_huge(*pmdp)) >> -return kvm_s2pmd_exec(pmdp); >> +if (pmd_thp_or_huge(*pmdp)) { >> +*pmdpp = pmdp; >> +re
Re: [PATCH v5 0/7] KVM: Support PUD hugepages at stage 2
Please ignore this cover letter. Apologies for the duplicate cover-letter and a somewhat funky threading (I blame emacs unsaved buffer). The patches appear to be intact so don't let the threading get in the way of review. Punit Agrawal writes: > This series is an update to the PUD hugepage support previously posted > at [0]. This patchset adds support for PUD hugepages at stage > 2. This feature is useful on cores that have support for large sized > TLB mappings (e.g., 1GB for 4K granule). > > The biggest change in this version is to replace repeated instances of > walking the page tables to get to a leaf-entry with a function to > return the appropriate entry. This was suggested by Suzuki and should > help reduce the amount of churn resulting from future changes. It also > address other feedback on the previous version. > > Support is added to code that is shared between arm and arm64. Dummy > helpers for arm are provided as the port does not support PUD hugepage > sizes. > > The patches have been tested on an A57 based system. The patchset is > based on v4.18-rc4. The are a few conflicts with the support for 52 > bit IPA[1] due to change in number of parameters for > stage2_pmd_offset(). > > Thanks, > Punit > > v4 -> v5: > > * Patch 1 - Drop helper stage2_should_exec() and refactor the > condition to decide if a page table entry should be marked > executable > * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and > latter patches > * Patch 7 - Use stage 2 accessors instead of using the page table > helpers directly > * Patch 7 - Add a note to update the PUD hugepage support when number > of levels of stage 2 tables differs from stage 1 > > > v3 -> v4: > * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled > * Patch 4-5 - Add PUD hugepage support for exec and access faults > * Patch 6 - PUD hugepage support for aging page table entries > > v2 -> v3: > * Update vma_pagesize directly if THP [1/4]. Previsouly this was done > indirectly via hugetlb > * Added review tag [4/4] > > v1 -> v2: > * Create helper to check if the page should have exec permission [1/4] > * Fix broken condition to detect THP hugepage [1/4] > * Fix in-correct hunk resulting from a rebase [4/4] > > [0] https://www.spinics.net/lists/arm-kernel/msg663562.html > [1] https://www.spinics.net/lists/kvm/msg171065.html > > > Punit Agrawal (7): > KVM: arm/arm64: Share common code in user_mem_abort() > KVM: arm/arm64: Introduce helpers to manupulate page table entries > KVM: arm64: Support dirty page tracking for PUD hugepages > KVM: arm64: Support PUD hugepage in stage2_is_exec() > KVM: arm64: Support handling access faults for PUD hugepages > KVM: arm64: Update age handlers to support PUD hugepages > KVM: arm64: Add support for creating PUD hugepages at stage 2 > > arch/arm/include/asm/kvm_mmu.h | 60 + > arch/arm64/include/asm/kvm_mmu.h | 47 > arch/arm64/include/asm/pgtable-hwdef.h | 4 + > arch/arm64/include/asm/pgtable.h | 9 + > virt/kvm/arm/mmu.c | 289 ++--- > 5 files changed, 330 insertions(+), 79 deletions(-) ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 5/7] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 8 arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 29 - 4 files changed, 37 insertions(+), 13 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d05c8986e495..a4298d429efc 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -78,6 +78,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -102,6 +104,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 15bc1be8f82f..4d2780c588b0 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -175,6 +175,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -254,6 +256,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1bdeca8918a6..a64a5c35beb1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e73909a31e02..d2c705e31584 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1637,33 +1637,36 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { - pmd_t *pmd; - pte_t *pte; + pud_t *pud = NULL; + pmd_t *pmd = NULL; + pte_t *pte = NULL; kvm_pfn_t pfn; - bool pfn_valid = false; + bool found, pfn_valid = false; trace_kvm_access_fault(fault_ipa); spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + found = stage2_get_leaf_entry(vcpu->kvm, fault_ipa, , , ); + if (!found) goto out; - if (pmd_thp_or_huge(*pmd)) {/* THP, HugeTLB */ + if (pud) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + goto out; + } else if (pmd) { /* THP, HugeTLB */ *pmd = pmd_mkyoung(*pmd); pfn = pmd_pfn(*pmd); pfn_valid = true; goto out; + } else { + *pte = pte_mkyoung(*pte); /* Just a page... */ + pfn = pte_pfn(*pte); + pfn_valid = true; } - pte = pte_offset_kernel(pmd, fault_ipa); - if (pte_none(*pte)) /* Nothing there either */ - goto out; - - *pte = pte_mkyoung(*pte); /* Just a page... */ - pfn = pte_pfn(*pte); - pfn_valid = true; out: spin_unlock(>kvm->mmu_lock); if (pfn_valid) -- 2.17.1
[PATCH v5 6/7] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 arch/arm64/include/asm/kvm_mmu.h | 5 arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 51 ++-- 4 files changed, 40 insertions(+), 23 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index a4298d429efc..8e1e8aee229e 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -110,6 +110,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 4d2780c588b0..c542052fb199 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a64a5c35beb1..4d9476e420d9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index d2c705e31584..a6d3ac9d7c7a 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1172,6 +1172,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1879,42 +1884,42 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { - pmd_t *pmd; - pte_t *pte; + pud_t *pud = NULL; + pmd_t *pmd = NULL; + pte_t *pte = NULL; + bool found; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + found = stage2_get_leaf_entry(kvm, gpa, , , ); + if (!found) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return stage2_pudp_test_and_clear_young(pud); + else if (pmd) return stage2_pmdp_test_and_clear_young(pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (pte_none(*pte)) - return 0; - - return stage2_ptep_test_and_clear_young(pte); + else + return stage2_ptep_test_and_clear_young(pte); } static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { - pmd_t *pmd; - pte_t *pte; + pud_t *pud = NULL; + pmd_t *pmd = NULL; + pte_t *pte = NULL; + bool found; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); - if (!pmd || pmd_none(*pmd)) /* Nothing there */ + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + found = stage2_get_leaf_entry(kvm, gpa, , , ); + if (!found) return 0; - if (pmd_thp_or_huge(*pmd)) /* THP, HugeTLB */ + if (pud) + return kvm_s2pud_young(*pud); + else if (pmd) return pmd_young(*pmd); - - pte = pte_offset_kernel(pmd, gpa); - if (!pte_none(*pte))/* Just a page... */ + else return pte_young(*pte); - - return 0; } int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 19 +++ arch/arm64/include/asm/kvm_mmu.h | 15 + arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 78 -- 5 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8e1e8aee229e..787baf9ec994 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -77,10 +77,13 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)(pud) /* * The following kvm_*pud*() functionas are provided strictly to allow @@ -97,6 +100,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} static inline bool kvm_s2pud_exec(pud_t *pud) { diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c542052fb199..dd8a23159463 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -171,13 +171,16 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -191,6 +194,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -203,6 +212,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) +#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys)__phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index a6d3ac9d7c7a..d8e2497e5353 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -116,6 +116,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) put_page(virt_to_page(pmd)); } +/** +
[PATCH v5 4/7] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 arch/arm64/include/asm/kvm_mmu.h | 5 +++ arch/arm64/include/asm/pgtable-hwdef.h | 2 ++ virt/kvm/arm/mmu.c | 49 +++--- 4 files changed, 57 insertions(+), 5 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index c23722f75d5c..d05c8986e495 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -96,6 +96,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 84051930ddfe..15bc1be8f82f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -249,6 +249,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..10ae592b78b8 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ed8f8271c389..e73909a31e02 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1038,23 +1038,62 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +/* + * stage2_get_leaf_entry - walk the stage2 VM page tables and return + * true if a valid and present leaf-entry is found. A pointer to the + * leaf-entry is returned in the appropriate level variable - pudpp, + * pmdpp, ptepp. + */ +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, + pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) + return false; + + if (pud_huge(*pudp)) { + *pudpp = pudp; + return true; + } + + pmdp = stage2_pmd_offset(pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*pmdp)) { + *pmdpp = pmdp; + return true; + } ptep = pte_offset_kernel(pmdp, addr); if (!ptep || pte_none(*ptep) || !pte_present(*ptep)) return false; - return kvm_s2pte_exec(ptep); + *ptepp = ptep; + return true; +} + +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +{ + pud_t *pudp = NULL; + pmd_t *pmdp = NULL; + pte_t *ptep = NULL; + bool found; + + found = stage2_get_leaf_entry(kvm, addr, , , ); + if (!found) + return false; + + if (pudp) + return kvm_s2pud_exec(pudp); + else if (pmdp) + return kvm_s2pmd_exec(pmdp); + else + return kvm_s2pte_exec(ptep); } static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache, -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 2/7] KVM: arm/arm64: Introduce helpers to manupulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 8 +--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68b7c8a..d095c2d0b284 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void); int kvm_mmu_init(void); void kvm_clear_hyp_idmap(void); +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index fb9a7127bb75..689def9bb9d5 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -172,6 +172,11 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ea3d992e4fb7..e131b7f9b7d7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1554,8 +1554,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); if (hugetlb && vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1564,7 +1566,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 3/7] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d095c2d0b284..c23722f75d5c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functionas are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 689def9bb9d5..84051930ddfe 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -239,6 +239,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index e131b7f9b7d7..ed8f8271c389 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1288,9 +1288,12 @@ static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) do { next = stage2_pud_addr_end(addr, end); if (!stage2_pud_none(*pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(*pud)); - stage2_wp_pmds(pud, addr, next); + if (stage2_pud_huge(*pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1333,7 +1336,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 1/7] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 67 ++ 1 file changed, 38 insertions(+), 29 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..ea3d992e4fb7 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1422,7 +1422,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, unsigned long fault_status) { int ret; - bool write_fault, exec_fault, writable, hugetlb = false, force_pte = false; + bool write_fault, writable, hugetlb = false, force_pte = false; + bool exec_fault, needs_exec; unsigned long mmu_seq; gfn_t gfn = fault_ipa >> PAGE_SHIFT; struct kvm *kvm = vcpu->kvm; @@ -1431,7 +1432,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1451,7 +1452,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { @@ -1520,28 +1522,45 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ hugetlb = transparent_hugepage_adjust(, _ipa); + if (hugetlb) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); + + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); - if (hugetlb) { + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + /* +* If we took an execution fault we have made the +* icache/dcache coherent above and should now let the s2 +* mapping be executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + needs_exec = exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); - - if (exec_fault) { + if (needs_exec) new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1549,21 +1568,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - - if (exec_fault) { + if (needs_exec) new_pte = kvm_s2pte_mkexec(new_pte); - invalidate_icache_gues
[PATCH v5 0/7] KVM: Support PUD hugepages at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2. This feature is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). The biggest change in this version is to replace repeated instances of walking the page tables to get to a leaf-entry with a function to return the appropriate entry. This was suggested by Suzuki and should help reduce the amount of churn resulting from future changes. It also address other feedback on the previous version. Support is added to code that is shared between arm and arm64. Dummy helpers for arm are provided as the port does not support PUD hugepage sizes. The patches have been tested on an A57 based system. The patchset is based on v4.18-rc4. The are a few conflicts with the support for 52 bit IPA[1] due to change in number of parameters for stage2_pmd_offset(). Thanks, Punit v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/arm-kernel/msg663562.html [1] https://www.spinics.net/lists/kvm/msg171065.html Punit Agrawal (7): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Introduce helpers to manupulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 60 + arch/arm64/include/asm/kvm_mmu.h | 47 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 289 ++--- 5 files changed, 330 insertions(+), 79 deletions(-) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v5 0/7] KVM: Support PUD hugepages at stage 2
This series is an update to the PUD hugepage support previously posted at [0]. This patchset adds support for PUD hugepages at stage 2. This feature is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). The biggest change in this version is to replace repeated instances of walking the page tables to get to a leaf-entry with a function to return the appropriate entry. This was suggested by Suzuki and should help reduce the amount of churn resulting from future changes. It also address other feedback on the previous version. Support is added to code that is shared between arm and arm64. Dummy helpers for arm are provided as the port does not support PUD hugepage sizes. The patches have been tested on an A57 based system. The patchset is based on v4.18-rc4. The are a few conflicts with the support for 52 bit IPA[1] due to change in number of parameters for stage2_pmd_offset(). Thanks, Punit v4 -> v5: * Patch 1 - Drop helper stage2_should_exec() and refactor the condition to decide if a page table entry should be marked executable * Patch 4-6 - Introduce stage2_get_leaf_entry() and use it in this and latter patches * Patch 7 - Use stage 2 accessors instead of using the page table helpers directly * Patch 7 - Add a note to update the PUD hugepage support when number of levels of stage 2 tables differs from stage 1 v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/arm-kernel/msg663562.html [1] https://www.spinics.net/lists/kvm/msg171065.html Punit Agrawal (7): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Introduce helpers to manupulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 60 + arch/arm64/include/asm/kvm_mmu.h | 47 arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 + virt/kvm/arm/mmu.c | 289 ++--- 5 files changed, 330 insertions(+), 79 deletions(-) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 4/7] KVM: arm64: Support PUD hugepage in stage2_is_exec()
Suzuki K Poulose writes: > Hi Punit, > > On 05/07/18 15:08, Punit Agrawal wrote: >> In preparation for creating PUD hugepages at stage 2, add support for >> detecting execute permissions on PUD page table entries. Faults due to >> lack of execute permissions on page table entries is used to perform >> i-cache invalidation on first execute. >> >> Provide trivial implementations of arm32 helpers to allow sharing of >> code. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon >> --- >> arch/arm/include/asm/kvm_mmu.h | 6 ++ >> arch/arm64/include/asm/kvm_mmu.h | 5 + >> arch/arm64/include/asm/pgtable-hwdef.h | 2 ++ >> virt/kvm/arm/mmu.c | 10 +- >> 4 files changed, 22 insertions(+), 1 deletion(-) >> [...] >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index db04b18218c1..ccdea0edabb3 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1040,10 +1040,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) >> { >> +pud_t *pudp; >> pmd_t *pmdp; >> pte_t *ptep; >> - pmdp = stage2_get_pmd(kvm, NULL, addr); >> +pudp = stage2_get_pud(kvm, NULL, addr); >> +if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) >> +return false; >> + >> +if (pud_huge(*pudp)) >> +return kvm_s2pud_exec(pudp); >> + >> +pmdp = stage2_pmd_offset(pudp, addr); >> if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) >> return false; > > I am wondering if we need a slightly better way to deal with this > kind of operation. We seem to duplicate the above operation (here and > in the following patches), i.e, finding the "leaf entry" for a given > address and follow the checks one level up at a time. We definitely need a better way to walk the page tables - for stage 2 but also stage 1 and hugetlbfs. As things stands, there is a lot of repetitive pattern with small differences at some levels (hugepage and/or THP, p*d_none(), p*d_present(), ...) > So instead of doing, stage2_get_pud() and walking down everywhere this > is needed, how about adding : > > /* Returns true if the leaf entry is found and updates the relevant pointer */ > found = stage2_get_leaf_entry(kvm, NULL, addr, , , ) > > which could set the appropriate entry and we could check the result > here. I prototyped with the above approach but found that it could not be used in all places due to the specific semantics of the walk. Also, then we end up with the following pattern. if (pudp) { ... } else if (pmdp) { ... } else { ... } At the end of the conversion, the resulting code is the same size as well (see diff below for changes). Another idea might be to build a page table walker passing in callbacks - but this makes more sense if we have unified modifiers for the levels. I think this is something we should explore but would like to do outside the context of this series. Hope thats ok. Thanks for having a look, Punit -- >8 -- diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index eddb74a7fac3..ea5c99f6dfab 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1077,31 +1077,56 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac return 0; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, pud_t **pudp, + pmd_t **pmdp, pte_t **ptep) { - pud_t *pudp; - pmd_t *pmdp; - pte_t *ptep; + pud_t *lpudp; + pmd_t *lpmdp; + pte_t *lptep; - pudp = stage2_get_pud(kvm, NULL, addr); - if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) + lpudp = stage2_get_pud(kvm, NULL, addr); + if (!lpudp || pud_none(*lpudp) || !pud_present(*lpudp)) return false; - if (pud_huge(*pudp)) - return kvm_s2pud_exec(pudp); + if (pud_huge(*lpudp)) { + *pudp = lpudp; + return true; + } - pmdp = stage2_pmd_offset(pudp, addr); - if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) + lpmdp = stage2_pmd_offset(lpudp, addr); + if (!lpmdp || pmd_none(*lpmdp) || !pmd_present(*lpmdp)) return false; - if (pmd_thp_or_huge(*pmdp)) - return kvm_s2pmd_exec(pmdp); + if (pmd_thp_or_huge(*lpmdp)) { + *pmdp = lpmdp; +
Re: [PATCH v4 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2
Suzuki K Poulose writes: > Hi Punit, > > On 05/07/18 15:08, Punit Agrawal wrote: >> KVM only supports PMD hugepages at stage 2. Now that the various page >> handling routines are updated, extend the stage 2 fault handling to >> map in PUD hugepages. >> >> Addition of PUD hugepage support enables additional page sizes (e.g., >> 1G with 4K granule) which can be useful on cores that support mapping >> larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> Cc: Russell King >> Cc: Catalin Marinas >> Cc: Will Deacon > >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 0c04c64e858c..5912210e94d9 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -116,6 +116,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, >> phys_addr_t addr, pmd_t *pmd) >> put_page(virt_to_page(pmd)); >> } >> +/** >> + * stage2_dissolve_pud() - clear and flush huge PUD entry >> + * @kvm:pointer to kvm structure. >> + * @addr: IPA >> + * @pud:pud pointer for IPA >> + * >> + * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs. Marks >> all >> + * pages in the range dirty. >> + */ >> +static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t >> *pud) >> +{ >> +if (!pud_huge(*pud)) >> +return; >> + >> +pud_clear(pud); > > You need to use the stage2_ accessors here. The stage2_dissolve_pmd() uses > "pmd_" helpers as the PTE entries (level 3) are always guaranteed to exist. I've fixed this and other uses of the PUD helpers to go via the stage2_ accessors. I've still not quite come to terms with the lack of certain levels at stage 2 vis-a-vis stage 1. I'll be more careful about this going forward. > >> +kvm_tlb_flush_vmid_ipa(kvm, addr); >> +put_page(virt_to_page(pud)); >> +} >> + >> static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, >>int min, int max) >> { >> @@ -993,7 +1012,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache >> pmd_t *pmd; >> pud = stage2_get_pud(kvm, cache, addr); >> -if (!pud) >> +if (!pud || pud_huge(*pud)) >> return NULL; > > Same here. > >> if (stage2_pud_none(*pud)) { > > Like this ^ > >> @@ -1038,6 +1057,26 @@ static int stage2_set_pmd_huge(struct kvm *kvm, >> struct kvm_mmu_memory_cache >> return 0; >> } >> +static int stage2_set_pud_huge(struct kvm *kvm, struct >> kvm_mmu_memory_cache *cache, >> + phys_addr_t addr, const pud_t *new_pud) >> +{ >> +pud_t *pud, old_pud; >> + >> +pud = stage2_get_pud(kvm, cache, addr); >> +VM_BUG_ON(!pud); >> + >> +old_pud = *pud; >> +if (pud_present(old_pud)) { >> +pud_clear(pud); >> +kvm_tlb_flush_vmid_ipa(kvm, addr); > > Same here. > >> +} else { >> +get_page(virt_to_page(pud)); >> +} >> + >> +kvm_set_pud(pud, *new_pud); >> +return 0; >> +} >> + >> static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) >> { >> pud_t *pudp; [...] >> @@ -1572,7 +1631,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> if (exec_fault) >> invalidate_icache_guest_page(pfn, vma_pagesize); >> - if (hugetlb && vma_pagesize == PMD_SIZE) { >> +if (hugetlb && vma_pagesize == PUD_SIZE) { > > I think we may need to check if the stage2 indeed has 3 levels of > tables to use stage2 PUD. Otherwise, fall back to PTE level mapping > or even PMD huge pages. Also, this cannot be triggered right now, > as we only get PUD hugepages with 4K and we are guaranteed to have > at least 3 levels with 40bit IPA. May be I can take care of it in > the Dynamic IPA series, when we run a guest with say 32bit IPA. > So for now, it is worth adding a comment here. Good point. I've added the following comment. /* * PUD level may not exist if the guest boots with two * levels at Stage 2. This configuration is currently * not supported due to IPA size supported by KVM. * * Revisit the assumptions about PUD levels when * additional IPA sizes are supported by KVM. */ Let me know if looks OK to you. Thanks a lot for reviewing the patches. Punit > >> +pud_t new_pud
Re: [PATCH v4 1/7] KVM: arm/arm64: Share common code in user_mem_abort()
Marc Zyngier writes: > Hi Punit, > > On 05/07/18 15:08, Punit Agrawal wrote: >> The code for operations such as marking the pfn as dirty, and >> dcache/icache maintenance during stage 2 fault handling is duplicated >> between normal pages and PMD hugepages. >> >> Instead of creating another copy of the operations when we introduce >> PUD hugepages, let's share them across the different pagesizes. >> >> Signed-off-by: Punit Agrawal >> Cc: Christoffer Dall >> Cc: Marc Zyngier >> --- >> virt/kvm/arm/mmu.c | 68 +++--- >> 1 file changed, 40 insertions(+), 28 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 1d90d79706bd..dd14cc36c51c 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1398,6 +1398,21 @@ static void invalidate_icache_guest_page(kvm_pfn_t >> pfn, unsigned long size) >> __invalidate_icache_guest_page(pfn, size); >> } >> >> +static bool stage2_should_exec(struct kvm *kvm, phys_addr_t addr, >> + bool exec_fault, unsigned long fault_status) > > I find this "should exec" very confusing. > >> +{ >> +/* >> + * If we took an execution fault we will have made the >> + * icache/dcache coherent and should now let the s2 mapping be >> + * executable. >> + * >> + * Write faults (!exec_fault && FSC_PERM) are orthogonal to >> + * execute permissions, and we preserve whatever we have. >> + */ >> +return exec_fault || >> +(fault_status == FSC_PERM && stage2_is_exec(kvm, addr)); >> +} >> + >> static void kvm_send_hwpoison_signal(unsigned long address, >> struct vm_area_struct *vma) >> { >> @@ -1431,7 +1446,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> kvm_pfn_t pfn; >> pgprot_t mem_type = PAGE_S2; >> bool logging_active = memslot_is_logging(memslot); >> -unsigned long flags = 0; >> +unsigned long vma_pagesize, flags = 0; >> >> write_fault = kvm_is_write_fault(vcpu); >> exec_fault = kvm_vcpu_trap_is_iabt(vcpu); >> @@ -1451,7 +1466,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> return -EFAULT; >> } >> >> -if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { >> +vma_pagesize = vma_kernel_pagesize(vma); >> +if (vma_pagesize == PMD_SIZE && !logging_active) { >> hugetlb = true; >> gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; >> } else { >> @@ -1520,28 +1536,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> if (mmu_notifier_retry(kvm, mmu_seq)) >> goto out_unlock; >> >> -if (!hugetlb && !force_pte) >> +if (!hugetlb && !force_pte) { >> +/* >> + * Only PMD_SIZE transparent hugepages(THP) are >> + * currently supported. This code will need to be >> + * updated to support other THP sizes. >> + */ >> hugetlb = transparent_hugepage_adjust(, _ipa); >> +if (hugetlb) >> +vma_pagesize = PMD_SIZE; >> +} >> + >> +if (writable) >> +kvm_set_pfn_dirty(pfn); >> >> -if (hugetlb) { >> +if (fault_status != FSC_PERM) >> +clean_dcache_guest_page(pfn, vma_pagesize); >> + >> +if (exec_fault) >> +invalidate_icache_guest_page(pfn, vma_pagesize); >> + >> +if (hugetlb && vma_pagesize == PMD_SIZE) { >> pmd_t new_pmd = pfn_pmd(pfn, mem_type); >> new_pmd = pmd_mkhuge(new_pmd); >> -if (writable) { >> +if (writable) >> new_pmd = kvm_s2pmd_mkwrite(new_pmd); >> -kvm_set_pfn_dirty(pfn); >> -} >> >> -if (fault_status != FSC_PERM) >> -clean_dcache_guest_page(pfn, PMD_SIZE); >> - >> -if (exec_fault) { >> +if (stage2_should_exec(kvm, fault_ipa, exec_fault, >> fault_status)) >> new_pmd = kvm_s2pmd_mkexec(new_pmd); > > OK, I find this absolutely horrid... ;-) > > The rest of the function deals with discrete flags, and all of a sudden > we have a function call with a bunch of seemingly unrelated parameters. > And you are repeating it for each vma_pagesize... > > How about something like: > > bool needs_exec; > > [...] > > needs_exec = exec_fault || (fault_status == FSC_PERM && > stage2_is_exec(kvm, fault_ipa); > > And then you just check needs_exec to update the pte/pmd. And you drop > this helper. That does look a lot better. I'll roll the change into the next version. Thanks, Punit [...] ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 6/7] KVM: arm64: Update age handlers to support PUD hugepages
In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 ++ arch/arm64/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/pgtable.h | 1 + virt/kvm/arm/mmu.c | 29 + 4 files changed, 37 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index a4298d429efc..8e1e8aee229e 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -110,6 +110,12 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud; } +static inline bool kvm_s2pud_young(pud_t pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 4d2780c588b0..c542052fb199 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -261,6 +261,11 @@ static inline pud_t kvm_s2pud_mkyoung(pud_t pud) return pud_mkyoung(pud); } +static inline bool kvm_s2pud_young(pud_t pud) +{ + return pud_young(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a64a5c35beb1..4d9476e420d9 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -385,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_young(pud) pte_young(pud_pte(pud)) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 94a91bcdd152..0c04c64e858c 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1141,6 +1141,11 @@ static int stage2_pmdp_test_and_clear_young(pmd_t *pmd) return stage2_ptep_test_and_clear_young((pte_t *)pmd); } +static int stage2_pudp_test_and_clear_young(pud_t *pud) +{ + return stage2_ptep_test_and_clear_young((pte_t *)pud); +} + /** * kvm_phys_addr_ioremap - map a device range to guest IPA * @@ -1860,11 +1865,19 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte) static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + pud = stage2_get_pud(kvm, NULL, gpa); + if (!pud || pud_none(*pud)) /* Nothing there */ + return 0; + + if (pud_huge(*pud)) /* HugeTLB */ + return stage2_pudp_test_and_clear_young(pud); + + pmd = stage2_pmd_offset(pud, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; @@ -1880,11 +1893,19 @@ static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data) { + pud_t *pud; pmd_t *pmd; pte_t *pte; - WARN_ON(size != PAGE_SIZE && size != PMD_SIZE); - pmd = stage2_get_pmd(kvm, NULL, gpa); + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE); + pud = stage2_get_pud(kvm, NULL, gpa); + if (!pud || pud_none(*pud)) /* Nothing there */ + return 0; + + if (pud_huge(*pud)) /* HugeTLB */ + return kvm_s2pud_young(*pud); + + pmd = stage2_pmd_offset(pud, gpa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ return 0; -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 5/7] KVM: arm64: Support handling access faults for PUD hugepages
In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 8 arch/arm64/include/asm/kvm_mmu.h | 7 +++ arch/arm64/include/asm/pgtable.h | 6 ++ virt/kvm/arm/mmu.c | 14 +- 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d05c8986e495..a4298d429efc 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -78,6 +78,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) /* @@ -102,6 +104,12 @@ static inline bool kvm_s2pud_exec(pud_t *pud) return false; } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + BUG(); + return pud; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 15bc1be8f82f..4d2780c588b0 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -175,6 +175,8 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pud_pfn(pud) pud_pfn(pud) + #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) @@ -254,6 +256,11 @@ static inline bool kvm_s2pud_exec(pud_t *pudp) return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); } +static inline pud_t kvm_s2pud_mkyoung(pud_t pud) +{ + return pud_mkyoung(pud); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 1bdeca8918a6..a64a5c35beb1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -314,6 +314,11 @@ static inline pte_t pud_pte(pud_t pud) return __pte(pud_val(pud)); } +static inline pud_t pte_pud(pte_t pte) +{ + return __pud(pte_val(pte)); +} + static inline pmd_t pud_pmd(pud_t pud) { return __pmd(pud_val(pud)); @@ -380,6 +385,7 @@ static inline int pmd_protnone(pmd_t pmd) #define pfn_pmd(pfn,prot) __pmd(__phys_to_pmd_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot)) #define mk_pmd(page,prot) pfn_pmd(page_to_pfn(page),prot) +#define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index ccdea0edabb3..94a91bcdd152 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1609,6 +1609,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) { + pud_t *pud; pmd_t *pmd; pte_t *pte; kvm_pfn_t pfn; @@ -1618,7 +1619,18 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa) spin_lock(>kvm->mmu_lock); - pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa); + pud = stage2_get_pud(vcpu->kvm, NULL, fault_ipa); + if (!pud || pud_none(*pud)) + goto out; /* Nothing there */ + + if (pud_huge(*pud)) { /* HugeTLB */ + *pud = kvm_s2pud_mkyoung(*pud); + pfn = kvm_pud_pfn(*pud); + pfn_valid = true; + goto out; + } + + pmd = stage2_pmd_offset(pud, fault_ipa); if (!pmd || pmd_none(*pmd)) /* Nothing there */ goto out; -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 7/7] KVM: arm64: Add support for creating PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 19 +++ arch/arm64/include/asm/kvm_mmu.h | 15 + arch/arm64/include/asm/pgtable-hwdef.h | 2 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 78 -- 5 files changed, 112 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8e1e8aee229e..787baf9ec994 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -77,10 +77,13 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pud_pfn(pud) (((pud_val(pud) & PUD_MASK) & PHYS_MASK) >> PAGE_SHIFT) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)(pud) /* * The following kvm_*pud*() functionas are provided strictly to allow @@ -97,6 +100,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} static inline bool kvm_s2pud_exec(pud_t *pud) { diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index c542052fb199..dd8a23159463 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -171,13 +171,16 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pud_pfn(pud) pud_pfn(pud) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -191,6 +194,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -203,6 +212,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index 10ae592b78b8..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ #define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ /* diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 4d9476e420d9..0afc34f94ff5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -389,6 +389,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud))) #define pud_write(pud) pte_write(pud_pte(pud)) +#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys)__phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 0c04c64e858c..5912210e94d9 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -116,6 +116,25 @@ static void stage2_dissolve_pmd(struct kvm *kvm, phys_addr_t addr, pmd_t *pmd) put_page(virt_to_page(pmd)); } +/** +
[PATCH v4 4/7] KVM: arm64: Support PUD hugepage in stage2_is_exec()
In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 6 ++ arch/arm64/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/pgtable-hwdef.h | 2 ++ virt/kvm/arm/mmu.c | 10 +- 4 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index c23722f75d5c..d05c8986e495 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -96,6 +96,12 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) } +static inline bool kvm_s2pud_exec(pud_t *pud) +{ + BUG(); + return false; +} + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 84051930ddfe..15bc1be8f82f 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -249,6 +249,11 @@ static inline bool kvm_s2pud_readonly(pud_t *pudp) return kvm_s2pte_readonly((pte_t *)pudp); } +static inline bool kvm_s2pud_exec(pud_t *pudp) +{ + return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..10ae592b78b8 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,8 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index db04b18218c1..ccdea0edabb3 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1040,10 +1040,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) { + pud_t *pudp; pmd_t *pmdp; pte_t *ptep; - pmdp = stage2_get_pmd(kvm, NULL, addr); + pudp = stage2_get_pud(kvm, NULL, addr); + if (!pudp || pud_none(*pudp) || !pud_present(*pudp)) + return false; + + if (pud_huge(*pudp)) + return kvm_s2pud_exec(pudp); + + pmdp = stage2_pmd_offset(pudp, addr); if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp)) return false; -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 3/7] KVM: arm64: Support dirty page tracking for PUD hugepages
In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: Punit Agrawal Reviewed-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 16 arch/arm64/include/asm/kvm_mmu.h | 10 ++ virt/kvm/arm/mmu.c | 11 +++ 3 files changed, 33 insertions(+), 4 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index d095c2d0b284..c23722f75d5c 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -80,6 +80,22 @@ void kvm_clear_hyp_idmap(void); #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* + * The following kvm_*pud*() functionas are provided strictly to allow + * sharing code with arm64. They should never be called in practice. + */ +static inline void kvm_set_s2pud_readonly(pud_t *pud) +{ + BUG(); +} + +static inline bool kvm_s2pud_readonly(pud_t *pud) +{ + BUG(); + return false; +} + + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 689def9bb9d5..84051930ddfe 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -239,6 +239,16 @@ static inline bool kvm_s2pmd_exec(pmd_t *pmdp) return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN); } +static inline void kvm_set_s2pud_readonly(pud_t *pudp) +{ + kvm_set_s2pte_readonly((pte_t *)pudp); +} + +static inline bool kvm_s2pud_readonly(pud_t *pudp) +{ + return kvm_s2pte_readonly((pte_t *)pudp); +} + static inline bool kvm_page_empty(void *ptr) { struct page *ptr_page = virt_to_page(ptr); diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 040cd0bce5e1..db04b18218c1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1288,9 +1288,12 @@ static void stage2_wp_puds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end) do { next = stage2_pud_addr_end(addr, end); if (!stage2_pud_none(*pud)) { - /* TODO:PUD not supported, revisit later if supported */ - BUG_ON(stage2_pud_huge(*pud)); - stage2_wp_pmds(pud, addr, next); + if (stage2_pud_huge(*pud)) { + if (!kvm_s2pud_readonly(pud)) + kvm_set_s2pud_readonly(pud); + } else { + stage2_wp_pmds(pud, addr, next); + } } } while (pud++, addr = next, addr != end); } @@ -1333,7 +1336,7 @@ static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end) * * Called to start logging dirty pages after memory region * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns - * all present PMD and PTEs are write protected in the memory region. + * all present PUD, PMD and PTEs are write protected in the memory region. * Afterwards read of dirty page log can be called. * * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired, -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 2/7] KVM: arm/arm64: Introduce helpers to manupulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal Acked-by: Christoffer Dall Cc: Marc Zyngier Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 8 +--- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 8553d68b7c8a..d095c2d0b284 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void); int kvm_mmu_init(void); void kvm_clear_hyp_idmap(void); +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index fb9a7127bb75..689def9bb9d5 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -172,6 +172,11 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index dd14cc36c51c..040cd0bce5e1 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1557,8 +1557,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, invalidate_icache_guest_page(pfn, vma_pagesize); if (hugetlb && vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1567,7 +1569,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v4 1/7] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal Cc: Christoffer Dall Cc: Marc Zyngier --- virt/kvm/arm/mmu.c | 68 +++--- 1 file changed, 40 insertions(+), 28 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 1d90d79706bd..dd14cc36c51c 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1398,6 +1398,21 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size) __invalidate_icache_guest_page(pfn, size); } +static bool stage2_should_exec(struct kvm *kvm, phys_addr_t addr, + bool exec_fault, unsigned long fault_status) +{ + /* +* If we took an execution fault we will have made the +* icache/dcache coherent and should now let the s2 mapping be +* executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + return exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, addr)); +} + static void kvm_send_hwpoison_signal(unsigned long address, struct vm_area_struct *vma) { @@ -1431,7 +1446,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1451,7 +1466,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { @@ -1520,28 +1536,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ hugetlb = transparent_hugepage_adjust(, _ipa); + if (hugetlb) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); - if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (hugetlb && vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); - - if (exec_fault) { + if (stage2_should_exec(kvm, fault_ipa, exec_fault, fault_status)) new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1549,21 +1571,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - - if (exec_fault) { + if (stage2_should_exec(kvm, fault_ipa, exec_fault, fault_status)) new_pte = kvm_s2pte_mkexec(new_pte); -
[PATCH v4 0/7] KVM: Support PUD hugepages at stage 2
This series is an update to the PUD hugepage support previously posted at [0][1][2][3]. This patchset adds support for PUD hugepages at stage 2. This feature is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). There are a three new patches to support PUD hugepages for execute permission faults, access faults and handling aging of PUD page table entries (patches 4-6). This addresses Suzuki's feedback on the previous version. Also, live migration didn't work with earlier versions. This has now been addressed by updating patch 1 & 7 to ensure that hugepages are dissolved correctly when dirty logging is enabled. Support is added to code that is shared between arm and arm64. Dummy helpers for arm are provided as the port does not support PUD hugepage sizes. The patches have been tested on an A57 based system. The patchset is based on v4.18-rc3. The are a few conflicts with the support for 52 bit IPA[4] due to change in number of parameters for stage2_pmd_offset(). Thanks, Punit v3 -> v4: * Patch 1 and 7 - Don't put down hugepages pte if logging is enabled * Patch 4-5 - Add PUD hugepage support for exec and access faults * Patch 6 - PUD hugepage support for aging page table entries v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://lkml.org/lkml/2018/5/14/907 [1] https://www.spinics.net/lists/arm-kernel/msg628053.html [2] https://lkml.org/lkml/2018/4/20/566 [3] https://lkml.org/lkml/2018/5/1/133 [4] https://www.spinics.net/lists/kvm/msg171065.html Punit Agrawal (7): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Introduce helpers to manupulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Support PUD hugepage in stage2_is_exec() KVM: arm64: Support handling access faults for PUD hugepages KVM: arm64: Update age handlers to support PUD hugepages KVM: arm64: Add support for creating PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 60 +++ arch/arm64/include/asm/kvm_mmu.h | 47 ++ arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 9 ++ virt/kvm/arm/mmu.c | 214 - 5 files changed, 291 insertions(+), 43 deletions(-) -- 2.17.1 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v3 4/4] KVM: arm64: Add support for PUD hugepages at stage 2
Suzuki K Poulose <suzuki.poul...@arm.com> writes: > On 05/14/2018 03:43 PM, Punit Agrawal wrote: >> KVM only supports PMD hugepages at stage 2. Extend the stage 2 fault >> handling to add support for PUD hugepages. >> >> Addition of pud hugepage support enables additional hugepage >> sizes (e.g., 1G with 4K granule) which can be useful on cores that >> support mapping larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal <punit.agra...@arm.com> >> Reviewed-by: Christoffer Dall <christoffer.d...@arm.com> >> Cc: Marc Zyngier <marc.zyng...@arm.com> >> Cc: Russell King <li...@armlinux.org.uk> >> Cc: Catalin Marinas <catalin.mari...@arm.com> >> Cc: Will Deacon <will.dea...@arm.com> >> --- >> arch/arm/include/asm/kvm_mmu.h | 19 >> arch/arm64/include/asm/kvm_mmu.h | 15 ++ >> arch/arm64/include/asm/pgtable-hwdef.h | 4 +++ >> arch/arm64/include/asm/pgtable.h | 2 ++ >> virt/kvm/arm/mmu.c | 40 -- >> 5 files changed, 77 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index 224c22c0a69c..155916dbdd7e 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -77,8 +77,11 @@ void kvm_clear_hyp_idmap(void); >> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) >> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) >> +#define kvm_pfn_pud(pfn, prot) (__pud(0)) >> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) >> +/* No support for pud hugepages */ >> +#define kvm_pud_mkhuge(pud) (pud) >> /* >>* The following kvm_*pud*() functionas are provided strictly to allow >> @@ -95,6 +98,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) >> return false; >> } >> +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) >> +{ >> +BUG(); >> +} >> + >> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) >> +{ >> +BUG(); >> +return pud; >> +} >> + >> +static inline pud_t kvm_s2pud_mkexec(pud_t pud) >> +{ >> +BUG(); >> +return pud; >> +} >> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) >> { >> diff --git a/arch/arm64/include/asm/kvm_mmu.h >> b/arch/arm64/include/asm/kvm_mmu.h >> index f440cf216a23..f49a68fcbf26 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -172,11 +172,14 @@ void kvm_clear_hyp_idmap(void); >> #define kvm_set_pte(ptep, pte) set_pte(ptep, pte) >> #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) >> +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) >> #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) >> #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) >> +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) >> #define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) >> +#define kvm_pud_mkhuge(pud) pud_mkhuge(pud) >> static inline pte_t kvm_s2pte_mkwrite(pte_t pte) >> { >> @@ -190,6 +193,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) >> return pmd; >> } >> +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) >> +{ >> +pud_val(pud) |= PUD_S2_RDWR; >> +return pud; >> +} >> + >> static inline pte_t kvm_s2pte_mkexec(pte_t pte) >> { >> pte_val(pte) &= ~PTE_S2_XN; >> @@ -202,6 +211,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) >> return pmd; >> } >> +static inline pud_t kvm_s2pud_mkexec(pud_t pud) >> +{ >> +pud_val(pud) &= ~PUD_S2_XN; >> +return pud; >> +} >> + >> static inline void kvm_set_s2pte_readonly(pte_t *ptep) >> { >> pteval_t old_pteval, pteval; >> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h >> b/arch/arm64/include/asm/pgtable-hwdef.h >> index fd208eac9f2a..e327665e94d1 100644 >> --- a/arch/arm64/include/asm/pgtable-hwdef.h >> +++ b/arch/arm64/include/asm/pgtable-hwdef.h >> @@ -193,6 +193,10 @@ >> #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ >> #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ >> +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* >> HAP[2:1] */ >> +#define PUD_S2_RDWR (_AT(pudval_t, 3) << 6) /* HAP[2:1
Re: [PATCH v2 4/4] KVM: arm64: Add support for PUD hugepages at stage 2
Catalin Marinas <catalin.mari...@arm.com> writes: > On Tue, May 01, 2018 at 11:26:59AM +0100, Punit Agrawal wrote: >> KVM currently supports PMD hugepages at stage 2. Extend the stage 2 >> fault handling to add support for PUD hugepages. >> >> Addition of pud hugepage support enables additional hugepage >> sizes (e.g., 1G with 4K granule) which can be useful on cores that >> support mapping larger block sizes in the TLB entries. >> >> Signed-off-by: Punit Agrawal <punit.agra...@arm.com> >> Cc: Christoffer Dall <christoffer.d...@arm.com> >> Cc: Marc Zyngier <marc.zyng...@arm.com> >> Cc: Russell King <li...@armlinux.org.uk> >> Cc: Catalin Marinas <catalin.mari...@arm.com> >> Cc: Will Deacon <will.dea...@arm.com> >> --- >> arch/arm/include/asm/kvm_mmu.h | 19 >> arch/arm64/include/asm/kvm_mmu.h | 15 ++ >> arch/arm64/include/asm/pgtable-hwdef.h | 4 +++ >> arch/arm64/include/asm/pgtable.h | 2 ++ >> virt/kvm/arm/mmu.c | 40 -- >> 5 files changed, 77 insertions(+), 3 deletions(-) > > Since this patch touches a couple of core arm64 files: > > Acked-by: Catalin Marinas <catalin.mari...@arm.com> Thanks Catalin. I've posted a v3 with minor changes yesterday[0]. Can you comment there? Or maybe Marc can apply the tag while merging the patches. [0] https://lkml.org/lkml/2018/5/14/912 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 4/4] KVM: arm64: Add support for PUD hugepages at stage 2
KVM only supports PMD hugepages at stage 2. Extend the stage 2 fault handling to add support for PUD hugepages. Addition of pud hugepage support enables additional hugepage sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: Punit Agrawal <punit.agra...@arm.com> Reviewed-by: Christoffer Dall <christoffer.d...@arm.com> Cc: Marc Zyngier <marc.zyng...@arm.com> Cc: Russell King <li...@armlinux.org.uk> Cc: Catalin Marinas <catalin.mari...@arm.com> Cc: Will Deacon <will.dea...@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 19 arch/arm64/include/asm/kvm_mmu.h | 15 ++ arch/arm64/include/asm/pgtable-hwdef.h | 4 +++ arch/arm64/include/asm/pgtable.h | 2 ++ virt/kvm/arm/mmu.c | 40 -- 5 files changed, 77 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 224c22c0a69c..155916dbdd7e 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -77,8 +77,11 @@ void kvm_clear_hyp_idmap(void); #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) (__pud(0)) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +/* No support for pud hugepages */ +#define kvm_pud_mkhuge(pud)(pud) /* * The following kvm_*pud*() functionas are provided strictly to allow @@ -95,6 +98,22 @@ static inline bool kvm_s2pud_readonly(pud_t *pud) return false; } +static inline void kvm_set_pud(pud_t *pud, pud_t new_pud) +{ + BUG(); +} + +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + BUG(); + return pud; +} + +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + BUG(); + return pud; +} static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index f440cf216a23..f49a68fcbf26 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -172,11 +172,14 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_set_pud(pudp, pud) set_pud(pudp, pud) #define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) #define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) +#define kvm_pfn_pud(pfn, prot) pfn_pud(pfn, prot) #define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) +#define kvm_pud_mkhuge(pud)pud_mkhuge(pud) static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { @@ -190,6 +193,12 @@ static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkwrite(pud_t pud) +{ + pud_val(pud) |= PUD_S2_RDWR; + return pud; +} + static inline pte_t kvm_s2pte_mkexec(pte_t pte) { pte_val(pte) &= ~PTE_S2_XN; @@ -202,6 +211,12 @@ static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd) return pmd; } +static inline pud_t kvm_s2pud_mkexec(pud_t pud) +{ + pud_val(pud) &= ~PUD_S2_XN; + return pud; +} + static inline void kvm_set_s2pte_readonly(pte_t *ptep) { pteval_t old_pteval, pteval; diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h index fd208eac9f2a..e327665e94d1 100644 --- a/arch/arm64/include/asm/pgtable-hwdef.h +++ b/arch/arm64/include/asm/pgtable-hwdef.h @@ -193,6 +193,10 @@ #define PMD_S2_RDWR(_AT(pmdval_t, 3) << 6) /* HAP[2:1] */ #define PMD_S2_XN (_AT(pmdval_t, 2) << 53) /* XN[1:0] */ +#define PUD_S2_RDONLY (_AT(pudval_t, 1) << 6) /* HAP[2:1] */ +#define PUD_S2_RDWR(_AT(pudval_t, 3) << 6) /* HAP[2:1] */ +#define PUD_S2_XN (_AT(pudval_t, 2) << 53) /* XN[1:0] */ + /* * Memory Attribute override for Stage-2 (MemAttr[3:0]) */ diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index 7c4c8f318ba9..31ea9fda07e3 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -386,6 +386,8 @@ static inline int pmd_protnone(pmd_t pmd) #define pud_write(pud) pte_write(pud_pte(pud)) +#define pud_mkhuge(pud)(__pud(pud_val(pud) & ~PUD_TABLE_BIT)) + #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud)) #define __phys_to_pud_val(phys)__phys_to_pte_val(phys) #define pud_pfn(pud) ((__pud_to_phys(pud) & PUD_MASK) >> PAGE_SHIFT) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 671d3c0825f2..b0931fa2d64e 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1036,6 +1036,26 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache return 0;
[PATCH v3 1/4] KVM: arm/arm64: Share common code in user_mem_abort()
The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: Punit Agrawal <punit.agra...@arm.com> Reviewed-by: Christoffer Dall <christoffer.d...@arm.com> Cc: Marc Zyngier <marc.zyng...@arm.com> --- virt/kvm/arm/mmu.c | 69 +++--- 1 file changed, 40 insertions(+), 29 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 7f6a944db23d..07ae1e003762 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1396,6 +1396,21 @@ static void invalidate_icache_guest_page(kvm_pfn_t pfn, unsigned long size) __invalidate_icache_guest_page(pfn, size); } +static bool stage2_should_exec(struct kvm *kvm, phys_addr_t addr, + bool exec_fault, unsigned long fault_status) +{ + /* +* If we took an execution fault we will have made the +* icache/dcache coherent and should now let the s2 mapping be +* executable. +* +* Write faults (!exec_fault && FSC_PERM) are orthogonal to +* execute permissions, and we preserve whatever we have. +*/ + return exec_fault || + (fault_status == FSC_PERM && stage2_is_exec(kvm, addr)); +} + static void kvm_send_hwpoison_signal(unsigned long address, struct vm_area_struct *vma) { @@ -1428,7 +1443,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, kvm_pfn_t pfn; pgprot_t mem_type = PAGE_S2; bool logging_active = memslot_is_logging(memslot); - unsigned long flags = 0; + unsigned long vma_pagesize, flags = 0; write_fault = kvm_is_write_fault(vcpu); exec_fault = kvm_vcpu_trap_is_iabt(vcpu); @@ -1448,7 +1463,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, return -EFAULT; } - if (vma_kernel_pagesize(vma) == PMD_SIZE && !logging_active) { + vma_pagesize = vma_kernel_pagesize(vma); + if (vma_pagesize == PMD_SIZE && !logging_active) { hugetlb = true; gfn = (fault_ipa & PMD_MASK) >> PAGE_SHIFT; } else { @@ -1517,28 +1533,33 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (mmu_notifier_retry(kvm, mmu_seq)) goto out_unlock; - if (!hugetlb && !force_pte) - hugetlb = transparent_hugepage_adjust(, _ipa); + if (!hugetlb && !force_pte) { + /* +* Only PMD_SIZE transparent hugepages(THP) are +* currently supported. This code will need to be +* updated to support other THP sizes. +*/ + if (transparent_hugepage_adjust(, _ipa)) + vma_pagesize = PMD_SIZE; + } + + if (writable) + kvm_set_pfn_dirty(pfn); - if (hugetlb) { + if (fault_status != FSC_PERM) + clean_dcache_guest_page(pfn, vma_pagesize); + + if (exec_fault) + invalidate_icache_guest_page(pfn, vma_pagesize); + + if (vma_pagesize == PMD_SIZE) { pmd_t new_pmd = pfn_pmd(pfn, mem_type); new_pmd = pmd_mkhuge(new_pmd); - if (writable) { + if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); - kvm_set_pfn_dirty(pfn); - } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PMD_SIZE); - - if (exec_fault) { + if (stage2_should_exec(kvm, fault_ipa, exec_fault, fault_status)) new_pmd = kvm_s2pmd_mkexec(new_pmd); - invalidate_icache_guest_page(pfn, PMD_SIZE); - } else if (fault_status == FSC_PERM) { - /* Preserve execute if XN was already cleared */ - if (stage2_is_exec(kvm, fault_ipa)) - new_pmd = kvm_s2pmd_mkexec(new_pmd); - } ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { @@ -1546,21 +1567,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); - kvm_set_pfn_dirty(pfn); mark_page_dirty(kvm, gfn); } - if (fault_status != FSC_PERM) - clean_dcache_guest_page(pfn, PAGE_SIZE); - - if (exec_fault) { + if (stage2_should_exec
[PATCH v3 2/4] KVM: arm/arm64: Introduce helpers to manupulate page table entries
Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: Punit Agrawal <punit.agra...@arm.com> Acked-by: Christoffer Dall <christoffer.d...@arm.com> Cc: Marc Zyngier <marc.zyng...@arm.com> Cc: Russell King <li...@armlinux.org.uk> Cc: Catalin Marinas <catalin.mari...@arm.com> Cc: Will Deacon <will.dea...@arm.com> --- arch/arm/include/asm/kvm_mmu.h | 5 + arch/arm64/include/asm/kvm_mmu.h | 5 + virt/kvm/arm/mmu.c | 7 --- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h index 707a1f06dc5d..5907a81ad5c1 100644 --- a/arch/arm/include/asm/kvm_mmu.h +++ b/arch/arm/include/asm/kvm_mmu.h @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void); int kvm_mmu_init(void); void kvm_clear_hyp_idmap(void); +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) { *pmd = new_pmd; diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h index 082110993647..d962508ce4b3 100644 --- a/arch/arm64/include/asm/kvm_mmu.h +++ b/arch/arm64/include/asm/kvm_mmu.h @@ -173,6 +173,11 @@ void kvm_clear_hyp_idmap(void); #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) +#define kvm_pfn_pte(pfn, prot) pfn_pte(pfn, prot) +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) + +#define kvm_pmd_mkhuge(pmd)pmd_mkhuge(pmd) + static inline pte_t kvm_s2pte_mkwrite(pte_t pte) { pte_val(pte) |= PTE_S2_RDWR; diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index 07ae1e003762..0beefcc5e090 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -1553,8 +1553,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, invalidate_icache_guest_page(pfn, vma_pagesize); if (vma_pagesize == PMD_SIZE) { - pmd_t new_pmd = pfn_pmd(pfn, mem_type); - new_pmd = pmd_mkhuge(new_pmd); + pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); + + new_pmd = kvm_pmd_mkhuge(new_pmd); if (writable) new_pmd = kvm_s2pmd_mkwrite(new_pmd); @@ -1563,7 +1564,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); } else { - pte_t new_pte = pfn_pte(pfn, mem_type); + pte_t new_pte = kvm_pfn_pte(pfn, mem_type); if (writable) { new_pte = kvm_s2pte_mkwrite(new_pte); -- 2.17.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 0/4] KVM: Support PUD hugepages at stage 2
Hi, This patchset adds support for PUD hugepages at stage 2. This feature is useful on cores that have support for large sized TLB mappings (e.g., 1GB for 4K granule). Previous postings can be found at [0][1][2]. Support is added to code that is shared between arm and arm64. Dummy helpers for arm are provided as the port does not support PUD hugepage sizes. There is a small conflict with the series to add support for 52 bit IPA[3]. The patches have been functionally tested on an A57 based system. The patchset is based on v4.17-rc5 and incorporates feedback received on the previous version. Thanks, Punit v2 -> v3: * Update vma_pagesize directly if THP [1/4]. Previsouly this was done indirectly via hugetlb * Added review tag [4/4] v1 -> v2: * Create helper to check if the page should have exec permission [1/4] * Fix broken condition to detect THP hugepage [1/4] * Fix in-correct hunk resulting from a rebase [4/4] [0] https://www.spinics.net/lists/arm-kernel/msg628053.html [1] https://lkml.org/lkml/2018/4/20/566 [2] https://lkml.org/lkml/2018/5/1/133 [3] https://lwn.net/Articles/750176/ Punit Agrawal (4): KVM: arm/arm64: Share common code in user_mem_abort() KVM: arm/arm64: Introduce helpers to manupulate page table entries KVM: arm64: Support dirty page tracking for PUD hugepages KVM: arm64: Add support for PUD hugepages at stage 2 arch/arm/include/asm/kvm_mmu.h | 40 arch/arm64/include/asm/kvm_mmu.h | 30 ++ arch/arm64/include/asm/pgtable-hwdef.h | 4 + arch/arm64/include/asm/pgtable.h | 2 + virt/kvm/arm/mmu.c | 121 + 5 files changed, 161 insertions(+), 36 deletions(-) -- 2.17.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 1/4] KVM: arm/arm64: Share common code in user_mem_abort()
Christoffer Dall <christoffer.d...@arm.com> writes: > On Tue, May 01, 2018 at 11:26:56AM +0100, Punit Agrawal wrote: >> The code for operations such as marking the pfn as dirty, and >> dcache/icache maintenance during stage 2 fault handling is duplicated >> between normal pages and PMD hugepages. >> >> Instead of creating another copy of the operations when we introduce >> PUD hugepages, let's share them across the different pagesizes. >> >> Signed-off-by: Punit Agrawal <punit.agra...@arm.com> >> Reviewed-by: Christoffer Dall <christoffer.d...@arm.com> >> Cc: Marc Zyngier <marc.zyng...@arm.com> >> --- >> virt/kvm/arm/mmu.c | 66 +++--- >> 1 file changed, 39 insertions(+), 27 deletions(-) >> >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 7f6a944db23d..686fc6a4b866 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c [...] >> @@ -1517,28 +1533,34 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> if (mmu_notifier_retry(kvm, mmu_seq)) >> goto out_unlock; >> >> -if (!hugetlb && !force_pte) >> +if (!hugetlb && !force_pte) { >> hugetlb = transparent_hugepage_adjust(, _ipa); >> +/* >> + * Only PMD_SIZE transparent hugepages(THP) are >> + * currently supported. This code will need to be >> + * updated to support other THP sizes. >> + */ >> +if (hugetlb) >> +vma_pagesize = PMD_SIZE; > > nit: this is a bit of a trap waiting to happen, as the suggested > semantics of hugetlb is now hugetlbfs and not THP. > > It may be slightly nicer to do do: > > if (transparent_hugepage_adjust(, _ipa)) > vma_pagesize = PMD_SIZE; I should've noticed this. I'll incorporate your suggestion and update the condition below using hugetlb to rely on vma_pagesize instead. Thanks, Punit > >> +} >> + >> +if (writable) >> +kvm_set_pfn_dirty(pfn); >> + >> +if (fault_status != FSC_PERM) >> +clean_dcache_guest_page(pfn, vma_pagesize); >> + >> +if (exec_fault) >> +invalidate_icache_guest_page(pfn, vma_pagesize); >> >> if (hugetlb) { >> pmd_t new_pmd = pfn_pmd(pfn, mem_type); >> new_pmd = pmd_mkhuge(new_pmd); >> -if (writable) { >> +if (writable) >> new_pmd = kvm_s2pmd_mkwrite(new_pmd); >> -kvm_set_pfn_dirty(pfn); >> -} >> >> -if (fault_status != FSC_PERM) >> -clean_dcache_guest_page(pfn, PMD_SIZE); >> - >> -if (exec_fault) { >> +if (stage2_should_exec(kvm, fault_ipa, exec_fault, >> fault_status)) >> new_pmd = kvm_s2pmd_mkexec(new_pmd); >> -invalidate_icache_guest_page(pfn, PMD_SIZE); >> -} else if (fault_status == FSC_PERM) { >> -/* Preserve execute if XN was already cleared */ >> -if (stage2_is_exec(kvm, fault_ipa)) >> -new_pmd = kvm_s2pmd_mkexec(new_pmd); >> -} >> >> ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, _pmd); >> } else { >> @@ -1546,21 +1568,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> >> if (writable) { >> new_pte = kvm_s2pte_mkwrite(new_pte); >> -kvm_set_pfn_dirty(pfn); >> mark_page_dirty(kvm, gfn); >> } >> >> -if (fault_status != FSC_PERM) >> -clean_dcache_guest_page(pfn, PAGE_SIZE); >> - >> -if (exec_fault) { >> +if (stage2_should_exec(kvm, fault_ipa, exec_fault, >> fault_status)) >> new_pte = kvm_s2pte_mkexec(new_pte); >> -invalidate_icache_guest_page(pfn, PAGE_SIZE); >> -} else if (fault_status == FSC_PERM) { >> -/* Preserve execute if XN was already cleared */ >> -if (stage2_is_exec(kvm, fault_ipa)) >> -new_pte = kvm_s2pte_mkexec(new_pte); >> -} >> >> ret = stage2_set_pte(kvm, memcache, fault_ipa, _pte, flags); >> } >> -- >> 2.17.0 >> > > Otherwise looks good. > > Thanks, > -Christoffer > ___ > kvmarm mailing list > kvmarm@lists.cs.columbia.edu > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 2/4] KVM: arm/arm64: Introduce helpers to manupulate page table entries
Hi Suzuki, Thanks for having a look. Suzuki K Poulose <suzuki.poul...@arm.com> writes: > On 01/05/18 11:26, Punit Agrawal wrote: >> Introduce helpers to abstract architectural handling of the conversion >> of pfn to page table entries and marking a PMD page table entry as a >> block entry. >> >> The helpers are introduced in preparation for supporting PUD hugepages >> at stage 2 - which are supported on arm64 but do not exist on arm. > > Punit, > > The change are fine by me. However, we usually do not define kvm_* > accessors for something which we know matches with the host variant. > i.e, PMD and PTE helpers, which are always present and we make use > of them directly. (see unmap_stage2_pmds for e.g) In general, I agree - it makes sense to avoid duplication. Having said that, the helpers here allow following a common pattern for handling the various page sizes - pte, pmd and pud - during stage 2 fault handling (see patch 4). As you've said you're OK with this change, I'd prefer to keep this patch but will drop it if any others reviewers are concerned about the duplication as well. Thanks, Punit > > Cheers > Suzuki > >> >> Signed-off-by: Punit Agrawal <punit.agra...@arm.com> >> Acked-by: Christoffer Dall <christoffer.d...@arm.com> >> Cc: Marc Zyngier <marc.zyng...@arm.com> >> Cc: Russell King <li...@armlinux.org.uk> >> Cc: Catalin Marinas <catalin.mari...@arm.com> >> Cc: Will Deacon <will.dea...@arm.com> >> --- >> arch/arm/include/asm/kvm_mmu.h | 5 + >> arch/arm64/include/asm/kvm_mmu.h | 5 + >> virt/kvm/arm/mmu.c | 7 --- >> 3 files changed, 14 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h >> index 707a1f06dc5d..5907a81ad5c1 100644 >> --- a/arch/arm/include/asm/kvm_mmu.h >> +++ b/arch/arm/include/asm/kvm_mmu.h >> @@ -75,6 +75,11 @@ phys_addr_t kvm_get_idmap_vector(void); >> int kvm_mmu_init(void); >> void kvm_clear_hyp_idmap(void); >> +#define kvm_pfn_pte(pfn, prot)pfn_pte(pfn, prot) >> +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) >> + >> +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) >> + >> static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) >> { >> *pmd = new_pmd; >> diff --git a/arch/arm64/include/asm/kvm_mmu.h >> b/arch/arm64/include/asm/kvm_mmu.h >> index 082110993647..d962508ce4b3 100644 >> --- a/arch/arm64/include/asm/kvm_mmu.h >> +++ b/arch/arm64/include/asm/kvm_mmu.h >> @@ -173,6 +173,11 @@ void kvm_clear_hyp_idmap(void); >> #definekvm_set_pte(ptep, pte) set_pte(ptep, pte) >> #definekvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) >> +#define kvm_pfn_pte(pfn, prot)pfn_pte(pfn, prot) >> +#define kvm_pfn_pmd(pfn, prot) pfn_pmd(pfn, prot) >> + >> +#define kvm_pmd_mkhuge(pmd) pmd_mkhuge(pmd) >> + >> static inline pte_t kvm_s2pte_mkwrite(pte_t pte) >> { >> pte_val(pte) |= PTE_S2_RDWR; >> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c >> index 686fc6a4b866..74750236f445 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -1554,8 +1554,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> invalidate_icache_guest_page(pfn, vma_pagesize); >> if (hugetlb) { >> -pmd_t new_pmd = pfn_pmd(pfn, mem_type); >> -new_pmd = pmd_mkhuge(new_pmd); >> +pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type); >> + >> +new_pmd = kvm_pmd_mkhuge(new_pmd); >> if (writable) >> new_pmd = kvm_s2pmd_mkwrite(new_pmd); >> @@ -1564,7 +1565,7 @@ static int user_mem_abort(struct kvm_vcpu >> *vcpu, phys_addr_t fault_ipa, >> ret = stage2_set_pmd_huge(kvm, memcache, fault_ipa, >> _pmd); >> } else { >> -pte_t new_pte = pfn_pte(pfn, mem_type); >> +pte_t new_pte = kvm_pfn_pte(pfn, mem_type); >> if (writable) { >> new_pte = kvm_s2pte_mkwrite(new_pte); >> ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm