Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
On Thu, 22 Apr 2021 03:25:23 +0100, Gavin Shan wrote: > > Hi Keqian, > > On 4/21/21 4:36 PM, Keqian Zhu wrote: > > On 2021/4/21 15:52, Gavin Shan wrote: > >> On 4/16/21 12:03 AM, Keqian Zhu wrote: > >>> The MMIO region of a device maybe huge (GB level), try to use > >>> block mapping in stage2 to speedup both map and unmap. > >>> > >>> Compared to normal memory mapping, we should consider two more > >>> points when try block mapping for MMIO region: > >>> > >>> 1. For normal memory mapping, the PA(host physical address) and > >>> HVA have same alignment within PUD_SIZE or PMD_SIZE when we use > >>> the HVA to request hugepage, so we don't need to consider PA > >>> alignment when verifing block mapping. But for device memory > >>> mapping, the PA and HVA may have different alignment. > >>> > >>> 2. For normal memory mapping, we are sure hugepage size properly > >>> fit into vma, so we don't check whether the mapping size exceeds > >>> the boundary of vma. But for device memory mapping, we should pay > >>> attention to this. > >>> > >>> This adds get_vma_page_shift() to get page shift for both normal > >>> memory and device MMIO region, and check these two points when > >>> selecting block mapping size for MMIO region. > >>> > >>> Signed-off-by: Keqian Zhu > >>> --- > >>>arch/arm64/kvm/mmu.c | 61 > >>>1 file changed, 51 insertions(+), 10 deletions(-) > >>> > >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > >>> index c59af5ca01b0..5a1cc7751e6d 100644 > >>> --- a/arch/arm64/kvm/mmu.c > >>> +++ b/arch/arm64/kvm/mmu.c > >>> @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot > >>> *memslot, > >>>return PAGE_SIZE; > >>>} > >>>+static int get_vma_page_shift(struct vm_area_struct *vma, unsigned > >>> long hva) > >>> +{ > >>> +unsigned long pa; > >>> + > >>> +if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP)) > >>> +return huge_page_shift(hstate_vma(vma)); > >>> + > >>> +if (!(vma->vm_flags & VM_PFNMAP)) > >>> +return PAGE_SHIFT; > >>> + > >>> +VM_BUG_ON(is_vm_hugetlb_page(vma)); > >>> + > >> > >> I don't understand how VM_PFNMAP is set for hugetlbfs related vma. > >> I think they are exclusive, meaning the flag is never set for > >> hugetlbfs vma. If it's true, VM_PFNMAP needn't be checked on hugetlbfs > >> vma and the VM_BUG_ON() becomes unnecessary. > > Yes, but we're not sure all drivers follow this rule. Add a BUG_ON() is > > a way to catch issue. > > > > I think I didn't make things clear. What I meant is VM_PFNMAP can't > be set for hugetlbfs VMAs. So the checks here can be simplified as > below if you agree: > > if (is_vm_hugetlb_page(vma)) > return huge_page_shift(hstate_vma(vma)); > > if (!(vma->vm_flags & VM_PFNMAP)) > return PAGE_SHIFT; > > VM_BUG_ON(is_vm_hugetlb_page(vma)); /* Can be dropped */ No. If this case happens, I want to see it. I have explicitly asked for it, and this check stays. M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
On Thu, 22 Apr 2021 03:02:00 +0100, Gavin Shan wrote: > > Hi Marc, > > On 4/21/21 9:59 PM, Marc Zyngier wrote: > > On Wed, 21 Apr 2021 07:17:44 +0100, > > Keqian Zhu wrote: > >> On 2021/4/21 14:20, Gavin Shan wrote: > >>> On 4/21/21 12:59 PM, Keqian Zhu wrote: > On 2020/10/22 0:16, Santosh Shukla wrote: > > The Commit:6d674e28 introduces a notion to detect and handle the > > device mapping. The commit checks for the VM_PFNMAP flag is set > > in vma->flags and if set then marks force_pte to true such that > > if force_pte is true then ignore the THP function check > > (/transparent_hugepage_adjust()). > > > > There could be an issue with the VM_PFNMAP flag setting and checking. > > For example consider a case where the mdev vendor driver register's > > the vma_fault handler named vma_mmio_fault(), which maps the > > host MMIO region in-turn calls remap_pfn_range() and maps > > the MMIO's vma space. Where, remap_pfn_range implicitly sets > > the VM_PFNMAP flag into vma->flags. > Could you give the name of the mdev vendor driver that triggers this > issue? > I failed to find one according to your description. Thanks. > > >>> > >>> I think it would be fixed in driver side to set VM_PFNMAP in > >>> its mmap() callback (call_mmap()), like vfio PCI driver does. > >>> It means it won't be delayed until page fault is issued and > >>> remap_pfn_range() is called. It's determined from the beginning > >>> that the vma associated the mdev vendor driver is serving as > >>> PFN remapping purpose. So the vma should be populated completely, > >>> including the VM_PFNMAP flag before it becomes visible to user > >>> space. > > > > Why should that be a requirement? Lazy populating of the VMA should be > > perfectly acceptable if the fault can only happen on the CPU side. > > > > It isn't a requirement and the drivers needn't follow strictly. I checked > several drivers before looking into the patch and found almost all the > drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c, > there is a comment as below, but it doesn't reveal too much about why > we can't set VM_PFNMAP at fault time. > > static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > { > : > /* > * See remap_pfn_range(), called from vfio_pci_fault() but we can't > * change vm_flags within the fault handler. Set them now. > */ > vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > vma->vm_ops = &vfio_pci_mmap_ops; > > return 0; > } > > To set these flags in advance does have advantages. For example, > VM_DONTEXPAND prevents the vma to be merged with another > one. VM_DONTDUMP make this vma isn't eligible for > coredump. Otherwise, the address space, which is associated with the > vma is accessed and unnecessary page faults are triggered on > coredump. VM_IO and VM_PFNMAP avoids to walk the page frames > associated with the vma since we don't have valid PFN in the > mapping. But PCI clearly isn't the case we are dealing with here, and not everything is VFIO either. I can *today* create a driver that implements a mmap+fault handler, call mmap() on it, pass the result to a memslot, and get to the exact same result Santosh describes. No PCI, no VFIO, just a random driver. We are *required* to handle that. M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
From: Mike Rapoport The intended semantics of pfn_valid() is to verify whether there is a struct page for the pfn in question and nothing else. Yet, on arm64 it is used to distinguish memory areas that are mapped in the linear map vs those that require ioremap() to access them. Introduce a dedicated pfn_is_map_memory() wrapper for memblock_is_map_memory() to perform such check and use it where appropriate. Using a wrapper allows to avoid cyclic include dependencies. While here also update style of pfn_valid() so that both pfn_valid() and pfn_is_map_memory() declarations will be consistent. Signed-off-by: Mike Rapoport --- arch/arm64/include/asm/memory.h | 2 +- arch/arm64/include/asm/page.h | 3 ++- arch/arm64/kvm/mmu.c| 2 +- arch/arm64/mm/init.c| 12 arch/arm64/mm/ioremap.c | 4 ++-- arch/arm64/mm/mmu.c | 2 +- 6 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 0aabc3be9a75..194f9f993d30 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x) #define virt_addr_valid(addr) ({ \ __typeof__(addr) __addr = __tag_reset(addr);\ - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \ + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); \ }) void dump_mem_limit(void); diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h index 012cffc574e8..75ddfe671393 100644 --- a/arch/arm64/include/asm/page.h +++ b/arch/arm64/include/asm/page.h @@ -37,7 +37,8 @@ void copy_highpage(struct page *to, struct page *from); typedef struct page *pgtable_t; -extern int pfn_valid(unsigned long); +int pfn_valid(unsigned long pfn); +int pfn_is_map_memory(unsigned long pfn); #include diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8711894db8c2..23dd99e29b23 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) static bool kvm_is_device_pfn(unsigned long pfn) { - return !pfn_valid(pfn); + return !pfn_is_map_memory(pfn); } /* diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 3685e12aba9b..966a7a18d528 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -258,6 +258,18 @@ int pfn_valid(unsigned long pfn) } EXPORT_SYMBOL(pfn_valid); +int pfn_is_map_memory(unsigned long pfn) +{ + phys_addr_t addr = PFN_PHYS(pfn); + + /* avoid false positives for bogus PFNs, see comment in pfn_valid() */ + if (PHYS_PFN(addr) != pfn) + return 0; + + return memblock_is_map_memory(addr); +} +EXPORT_SYMBOL(pfn_is_map_memory); + static phys_addr_t memory_limit = PHYS_ADDR_MAX; /* diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c index b5e83c46b23e..b7c81dacabf0 100644 --- a/arch/arm64/mm/ioremap.c +++ b/arch/arm64/mm/ioremap.c @@ -43,7 +43,7 @@ static void __iomem *__ioremap_caller(phys_addr_t phys_addr, size_t size, /* * Don't allow RAM to be mapped. */ - if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr + if (WARN_ON(pfn_is_map_memory(__phys_to_pfn(phys_addr return NULL; area = get_vm_area_caller(size, VM_IOREMAP, caller); @@ -84,7 +84,7 @@ EXPORT_SYMBOL(iounmap); void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size) { /* For normal memory we already have a cacheable mapping. */ - if (pfn_valid(__phys_to_pfn(phys_addr))) + if (pfn_is_map_memory(__phys_to_pfn(phys_addr))) return (void __iomem *)__phys_to_virt(phys_addr); return __ioremap_caller(phys_addr, size, __pgprot(PROT_NORMAL), diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 5d9550fdb9cf..26045e9adbd7 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -81,7 +81,7 @@ void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd) pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, unsigned long size, pgprot_t vma_prot) { - if (!pfn_valid(pfn)) + if (!pfn_is_map_memory(pfn)) return pgprot_noncached(vma_prot); else if (file->f_flags & O_SYNC) return pgprot_writecombine(vma_prot); -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
From: Mike Rapoport The arm64's version of pfn_valid() differs from the generic because of two reasons: * Parts of the memory map are freed during boot. This makes it necessary to verify that there is actual physical memory that corresponds to a pfn which is done by querying memblock. * There are NOMAP memory regions. These regions are not mapped in the linear map and until the previous commit the struct pages representing these areas had default values. As the consequence of absence of the special treatment of NOMAP regions in the memory map it was necessary to use memblock_is_map_memory() in pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that generic mm functionality would not treat a NOMAP page as a normal page. Since the NOMAP regions are now marked as PageReserved(), pfn walkers and the rest of core mm will treat them as unusable memory and thus pfn_valid_within() is no longer required at all and can be disabled by removing CONFIG_HOLES_IN_ZONE on arm64. pfn_valid() can be slightly simplified by replacing memblock_is_map_memory() with memblock_is_memory(). Signed-off-by: Mike Rapoport Acked-by: David Hildenbrand --- arch/arm64/Kconfig | 3 --- arch/arm64/mm/init.c | 4 ++-- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e4e1b6550115..58e439046d05 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK def_bool y depends on NUMA -config HOLES_IN_ZONE - def_bool y - source "kernel/Kconfig.hz" config ARCH_SPARSEMEM_ENABLE diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 966a7a18d528..f431b38d0837 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn) /* * ZONE_DEVICE memory does not have the memblock entries. -* memblock_is_map_memory() check for ZONE_DEVICE based +* memblock_is_memory() check for ZONE_DEVICE based * addresses will always fail. Even the normal hotplugged * memory will never have MEMBLOCK_NOMAP flag set in their * memblock entries. Skip memblock search for all non early @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn) return pfn_section_valid(ms, pfn); } #endif - return memblock_is_map_memory(addr); + return memblock_is_memory(addr); } EXPORT_SYMBOL(pfn_valid); -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 2/4] memblock: update initialization of reserved pages
From: Mike Rapoport The struct pages representing a reserved memory region are initialized using reserve_bootmem_range() function. This function is called for each reserved region just before the memory is freed from memblock to the buddy page allocator. The struct pages for MEMBLOCK_NOMAP regions are kept with the default values set by the memory map initialization which makes it necessary to have a special treatment for such pages in pfn_valid() and pfn_valid_within(). Split out initialization of the reserved pages to a function with a meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the reserved regions and mark struct pages for the NOMAP regions as PageReserved. Signed-off-by: Mike Rapoport Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual --- include/linux/memblock.h | 4 +++- mm/memblock.c| 28 ++-- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5984fff3f175..1b4c97c151ae 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn; * @MEMBLOCK_NONE: no special request * @MEMBLOCK_HOTPLUG: hotpluggable region * @MEMBLOCK_MIRROR: mirrored region - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as + * reserved in the memory map; refer to memblock_mark_nomap() description + * for further details */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ diff --git a/mm/memblock.c b/mm/memblock.c index afaefa8fc6ab..3abf2c3fea7f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) * @base: the base phys addr of the region * @size: the size of the region * + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the + * direct mapping of the physical memory. These regions will still be + * covered by the memory map. The struct page representing NOMAP memory + * frames in the memory map will be PageReserved() + * * Return: 0 on success, -errno on failure. */ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size) @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } +static void __init memmap_init_reserved_pages(void) +{ + struct memblock_region *region; + phys_addr_t start, end; + u64 i; + + /* initialize struct pages for the reserved regions */ + for_each_reserved_mem_range(i, &start, &end) + reserve_bootmem_region(start, end); + + /* and also treat struct pages for the NOMAP regions as PageReserved */ + for_each_mem_region(region) { + if (memblock_is_nomap(region)) { + start = region->base; + end = start + region->size; + reserve_bootmem_region(start, end); + } + } +} + static unsigned long __init free_low_memory_core_early(void) { unsigned long count = 0; @@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void) memblock_clear_hotplug(0, -1); - for_each_reserved_mem_range(i, &start, &end) - reserve_bootmem_region(start, end); + memmap_init_reserved_pages(); /* * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
From: Mike Rapoport Add comment describing the semantics of pfn_valid() that clarifies that pfn_valid() only checks for availability of a memory map entry (i.e. struct page) for a PFN rather than availability of usable memory backing that PFN. The most "generic" version of pfn_valid() used by the configurations with SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most suitable place for documentation about semantics of pfn_valid(). Suggested-by: Anshuman Khandual Signed-off-by: Mike Rapoport Reviewed-by: Anshuman Khandual --- include/linux/mmzone.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 47946cec7584..961f0eeefb62 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) #endif #ifndef CONFIG_HAVE_ARCH_PFN_VALID +/** + * pfn_valid - check if there is a valid memory map entry for a PFN + * @pfn: the page frame number to check + * + * Check if there is a valid memory map entry aka struct page for the @pfn. + * Note, that availability of the memory map entry does not imply that + * there is actual usable memory at that @pfn. The struct page may + * represent a hole or an unusable page frame. + * + * Return: 1 for PFNs that have memory map entries and 0 otherwise + */ static inline int pfn_valid(unsigned long pfn) { struct mem_section *ms; -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH v3 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
From: Mike Rapoport Hi, These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire pfn_valid_within() to 1. The idea is to mark NOMAP pages as reserved in the memory map and restore the intended semantics of pfn_valid() to designate availability of struct page for a pfn. With this the core mm will be able to cope with the fact that it cannot use NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks will be treated correctly even without the need for pfn_valid_within. The patches are only boot tested on qemu-system-aarch64 so I'd really appreciate memory stress tests on real hardware. If this actually works we'll be one step closer to drop custom pfn_valid() on arm64 altogether. v3: * Fix minor issues found by Anshuman * Freshen up the declaration of pfn_valid() to make it consistent with pfn_is_map_memory() * Add more Acked-by and Reviewed-by tags, thanks Anshuman and David v2: Link: https://lore.kernel.org/lkml/20210421065108.1987-1-r...@kernel.org * Add check for PFN overflow in pfn_is_map_memory() * Add Acked-by and Reviewed-by tags, thanks David. v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-r...@kernel.org * Add comment about the semantics of pfn_valid() as Anshuman suggested * Extend comments about MEMBLOCK_NOMAP, per Anshuman * Use pfn_is_map_memory() name for the exported wrapper for memblock_is_map_memory(). It is still local to arch/arm64 in the end because of header dependency issues. rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-r...@kernel.org Mike Rapoport (4): include/linux/mmzone.h: add documentation for pfn_valid() memblock: update initialization of reserved pages arm64: decouple check whether pfn is in linear map from pfn_valid() arm64: drop pfn_valid_within() and simplify pfn_valid() arch/arm64/Kconfig | 3 --- arch/arm64/include/asm/memory.h | 2 +- arch/arm64/include/asm/page.h | 3 ++- arch/arm64/kvm/mmu.c| 2 +- arch/arm64/mm/init.c| 16 ++-- arch/arm64/mm/ioremap.c | 4 ++-- arch/arm64/mm/mmu.c | 2 +- include/linux/memblock.h| 4 +++- include/linux/mmzone.h | 11 +++ mm/memblock.c | 28 ++-- 10 files changed, 61 insertions(+), 14 deletions(-) base-commit: e49d033bddf5b565044e2abe4241353959bc9120 -- 2.28.0 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
Hi Keqian, On 4/21/21 4:36 PM, Keqian Zhu wrote: On 2021/4/21 15:52, Gavin Shan wrote: On 4/16/21 12:03 AM, Keqian Zhu wrote: The MMIO region of a device maybe huge (GB level), try to use block mapping in stage2 to speedup both map and unmap. Compared to normal memory mapping, we should consider two more points when try block mapping for MMIO region: 1. For normal memory mapping, the PA(host physical address) and HVA have same alignment within PUD_SIZE or PMD_SIZE when we use the HVA to request hugepage, so we don't need to consider PA alignment when verifing block mapping. But for device memory mapping, the PA and HVA may have different alignment. 2. For normal memory mapping, we are sure hugepage size properly fit into vma, so we don't check whether the mapping size exceeds the boundary of vma. But for device memory mapping, we should pay attention to this. This adds get_vma_page_shift() to get page shift for both normal memory and device MMIO region, and check these two points when selecting block mapping size for MMIO region. Signed-off-by: Keqian Zhu --- arch/arm64/kvm/mmu.c | 61 1 file changed, 51 insertions(+), 10 deletions(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c59af5ca01b0..5a1cc7751e6d 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot, return PAGE_SIZE; } +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva) +{ +unsigned long pa; + +if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP)) +return huge_page_shift(hstate_vma(vma)); + +if (!(vma->vm_flags & VM_PFNMAP)) +return PAGE_SHIFT; + +VM_BUG_ON(is_vm_hugetlb_page(vma)); + I don't understand how VM_PFNMAP is set for hugetlbfs related vma. I think they are exclusive, meaning the flag is never set for hugetlbfs vma. If it's true, VM_PFNMAP needn't be checked on hugetlbfs vma and the VM_BUG_ON() becomes unnecessary. Yes, but we're not sure all drivers follow this rule. Add a BUG_ON() is a way to catch issue. I think I didn't make things clear. What I meant is VM_PFNMAP can't be set for hugetlbfs VMAs. So the checks here can be simplified as below if you agree: if (is_vm_hugetlb_page(vma)) return huge_page_shift(hstate_vma(vma)); if (!(vma->vm_flags & VM_PFNMAP)) return PAGE_SHIFT; VM_BUG_ON(is_vm_hugetlb_page(vma)); /* Can be dropped */ +pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start); + +#ifndef __PAGETABLE_PMD_FOLDED +if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) && +ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start && +ALIGN(hva, PUD_SIZE) <= vma->vm_end) +return PUD_SHIFT; +#endif + +if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) && +ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start && +ALIGN(hva, PMD_SIZE) <= vma->vm_end) +return PMD_SHIFT; + +return PAGE_SHIFT; +} + There is "switch(...)" fallback mechanism in user_mem_abort(). PUD_SIZE/PMD_SIZE can be downgraded accordingly if the addresses fails in the alignment check by fault_supports_stage2_huge_mapping(). I think it would make user_mem_abort() simplified if the logic can be moved to get_vma_page_shift(). Another question if we need the check from fault_supports_stage2_huge_mapping() if VM_PFNMAP area is going to be covered by block mapping. If so, the "switch(...)" fallback mechanism needs to be part of get_vma_page_shift(). Yes, Good suggestion. My idea is that we can keep this series simpler and do further optimization in another patch series. Do you mind to send a patch? Yeah, It's fine to keep this series as of being. There are 3 checks applied here for MMIO region: hva/hpa/ipa and they're distributed in two functions, making the code a bit hard to follow. I can post patch to simplify it after your series gets merged :) Thanks, Gavin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v4 1/2] kvm/arm64: Remove the creation time's mapping of MMIO regions
Hi Keqian, On 4/21/21 4:28 PM, Keqian Zhu wrote: On 2021/4/21 14:38, Gavin Shan wrote: On 4/16/21 12:03 AM, Keqian Zhu wrote: The MMIO regions may be unmapped for many reasons and can be remapped by stage2 fault path. Map MMIO regions at creation time becomes a minor optimization and makes these two mapping path hard to sync. Remove the mapping code while keep the useful sanity check. Signed-off-by: Keqian Zhu --- arch/arm64/kvm/mmu.c | 38 +++--- 1 file changed, 3 insertions(+), 35 deletions(-) After removing the logic to create stage2 mapping for VM_PFNMAP region, I think the "do { } while" loop becomes unnecessary and can be dropped completely. It means the only sanity check is to see if the memory slot overflows IPA space or not. In that case, KVM_MR_FLAGS_ONLY can be ignored because the memory slot's base address and length aren't changed when we have KVM_MR_FLAGS_ONLY. Maybe not exactly. Here we do an important sanity check that we shouldn't log dirty for memslots with VM_PFNMAP. Yeah, Sorry that I missed that part. Something associated with Santosh's patch. The flag can be not existing until the page fault happened on the vma. In this case, the check could be not working properly. [PATCH] KVM: arm64: Correctly handle the mmio faulting [...] Thanks, Gavin diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8711894db8c2..c59af5ca01b0 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1301,7 +1301,6 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, { hva_t hva = mem->userspace_addr; hva_t reg_end = hva + mem->memory_size; -bool writable = !(mem->flags & KVM_MEM_READONLY); int ret = 0; if (change != KVM_MR_CREATE && change != KVM_MR_MOVE && @@ -1318,8 +1317,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, mmap_read_lock(current->mm); /* * A memory region could potentially cover multiple VMAs, and any holes - * between them, so iterate over all of them to find out if we can map - * any of them right now. + * between them, so iterate over all of them. * * ++ * +---++ ++ @@ -1330,50 +1328,20 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm, */ do { struct vm_area_struct *vma = find_vma(current->mm, hva); -hva_t vm_start, vm_end; if (!vma || vma->vm_start >= reg_end) break; -/* - * Take the intersection of this VMA with the memory region - */ -vm_start = max(hva, vma->vm_start); -vm_end = min(reg_end, vma->vm_end); - if (vma->vm_flags & VM_PFNMAP) { -gpa_t gpa = mem->guest_phys_addr + -(vm_start - mem->userspace_addr); -phys_addr_t pa; - -pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT; -pa += vm_start - vma->vm_start; - /* IO region dirty page logging not allowed */ if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) { ret = -EINVAL; -goto out; -} - -ret = kvm_phys_addr_ioremap(kvm, gpa, pa, -vm_end - vm_start, -writable); -if (ret) break; +} } -hva = vm_end; +hva = min(reg_end, vma->vm_end); } while (hva < reg_end); -if (change == KVM_MR_FLAGS_ONLY) -goto out; - -spin_lock(&kvm->mmu_lock); -if (ret) -unmap_stage2_range(&kvm->arch.mmu, mem->guest_phys_addr, mem->memory_size); -else if (!cpus_have_final_cap(ARM64_HAS_STAGE2_FWB)) -stage2_flush_memslot(kvm, memslot); -spin_unlock(&kvm->mmu_lock); -out: mmap_read_unlock(current->mm); return ret; } . ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Hi Marc, On 4/21/21 9:59 PM, Marc Zyngier wrote: On Wed, 21 Apr 2021 07:17:44 +0100, Keqian Zhu wrote: On 2021/4/21 14:20, Gavin Shan wrote: On 4/21/21 12:59 PM, Keqian Zhu wrote: On 2020/10/22 0:16, Santosh Shukla wrote: The Commit:6d674e28 introduces a notion to detect and handle the device mapping. The commit checks for the VM_PFNMAP flag is set in vma->flags and if set then marks force_pte to true such that if force_pte is true then ignore the THP function check (/transparent_hugepage_adjust()). There could be an issue with the VM_PFNMAP flag setting and checking. For example consider a case where the mdev vendor driver register's the vma_fault handler named vma_mmio_fault(), which maps the host MMIO region in-turn calls remap_pfn_range() and maps the MMIO's vma space. Where, remap_pfn_range implicitly sets the VM_PFNMAP flag into vma->flags. Could you give the name of the mdev vendor driver that triggers this issue? I failed to find one according to your description. Thanks. I think it would be fixed in driver side to set VM_PFNMAP in its mmap() callback (call_mmap()), like vfio PCI driver does. It means it won't be delayed until page fault is issued and remap_pfn_range() is called. It's determined from the beginning that the vma associated the mdev vendor driver is serving as PFN remapping purpose. So the vma should be populated completely, including the VM_PFNMAP flag before it becomes visible to user space. Why should that be a requirement? Lazy populating of the VMA should be perfectly acceptable if the fault can only happen on the CPU side. It isn't a requirement and the drivers needn't follow strictly. I checked several drivers before looking into the patch and found almost all the drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c, there is a comment as below, but it doesn't reveal too much about why we can't set VM_PFNMAP at fault time. static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) { : /* * See remap_pfn_range(), called from vfio_pci_fault() but we can't * change vm_flags within the fault handler. Set them now. */ vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; vma->vm_ops = &vfio_pci_mmap_ops; return 0; } To set these flags in advance does have advantages. For example, VM_DONTEXPAND prevents the vma to be merged with another one. VM_DONTDUMP make this vma isn't eligible for coredump. Otherwise, the address space, which is associated with the vma is accessed and unnecessary page faults are triggered on coredump. VM_IO and VM_PFNMAP avoids to walk the page frames associated with the vma since we don't have valid PFN in the mapping. The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c: vfio_pci_mmap: VM_PFNMAP is set for the vma vfio_pci_mmap_fault: remap_pfn_range() is called Right. I have discussed the above with Marc. I want to find the driver to fix it. However, AFAICS, there is no driver matches the description... I have the feeling this is an out-of-tree driver (and Santosh email address is bouncing, so I guess we won't have much information from him). However, the simple fact that any odd driver can provide a fault handler and populate it the VMA on demand makes me think that we need to support this case. Yeah, Santosh's email isn't reachable. I think the VM_PFNMAP need to be rechecked after gup if we're going to support this case. Thanks, Gavin ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: build perf support only when enabled
On Wed, Apr 21, 2021 at 3:56 PM Marc Zyngier wrote: > On Wed, 21 Apr 2021 14:49:01 +0100, Arnd Bergmann wrote: > > I think a better way is to get rid of perf_num_counters() entirely, > see [1]. If someone acks the second and last patches, I'll even take > the whole series in (nudge nudge...). Makes sense. I like your diffstat, too. Arnd ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
[PATCH] KVM: arm64: build perf support only when enabled
From: Arnd Bergmann The perf_num_counters() function is only defined when CONFIG_PERF_EVENTS is enabled: arch/arm64/kvm/perf.c: In function 'kvm_perf_init': arch/arm64/kvm/perf.c:58:43: error: implicit declaration of function 'perf_num_counters'; did you mean 'dec_mm_counter'? [-Werror=implicit-function-declaration] 58 | if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0 | ^ Use conditional compilation to disable this feature entirely when CONFIG_PERF_EVENTS is disabled in the host. Fixes: 6b5b368fccd7 ("KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key") Signed-off-by: Arnd Bergmann --- Not sure if this is the correct symbol to check for, but this is one way to avoid the build failure. --- arch/arm64/kvm/Makefile | 4 +++- arch/arm64/kvm/arm.c| 8 +--- include/kvm/arm_pmu.h | 7 +++ 3 files changed, 15 insertions(+), 4 deletions(-) diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile index 589921392cb1..9adf12ba5047 100644 --- a/arch/arm64/kvm/Makefile +++ b/arch/arm64/kvm/Makefile @@ -12,7 +12,7 @@ obj-$(CONFIG_KVM) += hyp/ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ $(KVM)/vfio.o $(KVM)/irqchip.o \ -arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \ +arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \ inject_fault.o va_layout.o handle_exit.o \ guest.o debug.o reset.o sys_regs.o \ vgic-sys-reg-v3.o fpsimd.o pmu.o \ @@ -24,4 +24,6 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \ vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \ vgic/vgic-its.o vgic/vgic-debug.o +kvm-$(CONFIG_PERF_EVENTS) += perf.o + kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 4808aca8c87c..720e075c70f9 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -1694,7 +1694,8 @@ static int init_subsystems(void) if (err) goto out; - kvm_perf_init(); + if (IS_ENABLED(CONFIG_PERF_EVENTS)) + kvm_perf_init(); kvm_sys_reg_table_init(); out: @@ -1899,7 +1900,8 @@ static int init_hyp_mode(void) return 0; out_err: - teardown_hyp_mode(); + if (IS_ENABLED(CONFIG_PERF_EVENTS)) + teardown_hyp_mode(); kvm_err("error initializing Hyp mode: %d\n", err); return err; } @@ -2101,7 +2103,7 @@ int kvm_arch_init(void *opaque) out_hyp: hyp_cpu_pm_exit(); - if (!in_hyp_mode) + if (!IS_ENABLED(CONFIG_PERF_EVENTS) && in_hyp_mode) teardown_hyp_mode(); out_err: return err; diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 6fd3cda608e4..d84307a1ebd5 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -13,12 +13,19 @@ #define ARMV8_PMU_CYCLE_IDX(ARMV8_PMU_MAX_COUNTERS - 1) #define ARMV8_PMU_MAX_COUNTER_PAIRS((ARMV8_PMU_MAX_COUNTERS + 1) >> 1) +#ifdef CONFIG_PERF_EVENTS DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available); static __always_inline bool kvm_arm_support_pmu_v3(void) { return static_branch_likely(&kvm_arm_pmu_available); } +#else +static __always_inline bool kvm_arm_support_pmu_v3(void) +{ + return 0; +} +#endif #ifdef CONFIG_HW_PERF_EVENTS -- 2.29.2 ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: build perf support only when enabled
On Wed, 21 Apr 2021 14:49:01 +0100, Arnd Bergmann wrote: > > From: Arnd Bergmann > > The perf_num_counters() function is only defined when CONFIG_PERF_EVENTS > is enabled: > > arch/arm64/kvm/perf.c: In function 'kvm_perf_init': > arch/arm64/kvm/perf.c:58:43: error: implicit declaration of function > 'perf_num_counters'; did you mean 'dec_mm_counter'? > [-Werror=implicit-function-declaration] >58 | if (IS_ENABLED(CONFIG_ARM_PMU) && perf_num_counters() > 0 > | ^ > > Use conditional compilation to disable this feature entirely when > CONFIG_PERF_EVENTS is disabled in the host. > > Fixes: 6b5b368fccd7 ("KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static > key") > Signed-off-by: Arnd Bergmann > --- > Not sure if this is the correct symbol to check for, but this is > one way to avoid the build failure. I think a better way is to get rid of perf_num_counters() entirely, see [1]. If someone acks the second and last patches, I'll even take the whole series in (nudge nudge...). Thanks, M. [1] https://lore.kernel.org/r/20210414134409.1266357-1-...@kernel.org -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
On 4/21/21 5:54 PM, Mike Rapoport wrote: > On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote: >> >> On 4/21/21 12:21 PM, Mike Rapoport wrote: >>> From: Mike Rapoport >>> >>> The arm64's version of pfn_valid() differs from the generic because of two >>> reasons: >>> >>> * Parts of the memory map are freed during boot. This makes it necessary to >>> verify that there is actual physical memory that corresponds to a pfn >>> which is done by querying memblock. >>> >>> * There are NOMAP memory regions. These regions are not mapped in the >>> linear map and until the previous commit the struct pages representing >>> these areas had default values. >>> >>> As the consequence of absence of the special treatment of NOMAP regions in >>> the memory map it was necessary to use memblock_is_map_memory() in >>> pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that >>> generic mm functionality would not treat a NOMAP page as a normal page. >>> >>> Since the NOMAP regions are now marked as PageReserved(), pfn walkers and >>> the rest of core mm will treat them as unusable memory and thus >>> pfn_valid_within() is no longer required at all and can be disabled by >>> removing CONFIG_HOLES_IN_ZONE on arm64. >> >> This makes sense. >> >>> >>> pfn_valid() can be slightly simplified by replacing >>> memblock_is_map_memory() with memblock_is_memory(). >>> >>> Signed-off-by: Mike Rapoport >>> --- >>> arch/arm64/Kconfig | 3 --- >>> arch/arm64/mm/init.c | 4 ++-- >>> 2 files changed, 2 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>> index e4e1b6550115..58e439046d05 100644 >>> --- a/arch/arm64/Kconfig >>> +++ b/arch/arm64/Kconfig >>> @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK >>> def_bool y >>> depends on NUMA >>> >>> -config HOLES_IN_ZONE >>> - def_bool y >>> - >> >> Right. >> >>> source "kernel/Kconfig.hz" >>> >>> config ARCH_SPARSEMEM_ENABLE >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>> index dc03bdc12c0f..eb3f56fb8c7c 100644 >>> --- a/arch/arm64/mm/init.c >>> +++ b/arch/arm64/mm/init.c >>> @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn) >>> >>> /* >>> * ZONE_DEVICE memory does not have the memblock entries. >>> -* memblock_is_map_memory() check for ZONE_DEVICE based >>> +* memblock_is_memory() check for ZONE_DEVICE based >>> * addresses will always fail. Even the normal hotplugged >>> * memory will never have MEMBLOCK_NOMAP flag set in their >>> * memblock entries. Skip memblock search for all non early >>> @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn) >>> return pfn_section_valid(ms, pfn); >>> } >>> #endif >>> - return memblock_is_map_memory(addr); >>> + return memblock_is_memory(addr); >> >> Wondering if MEMBLOCK_NOMAP is now being treated similarly to other >> memory pfns for page table walking purpose but with PageReserved(), >> why memblock_is_memory() is still required ? At this point, should >> not we just return valid for early_section() memory. As pfn_valid() >> now just implies that pfn has a struct page backing which has been >> already verified with valid_section() etc. > > memblock_is_memory() is required because arm64 frees unused parts of the > memory map. So, for instance, if we have 64M out of 128M populated in a > section the section based calculation would return 1 for a pfn in the > second half of the section, but there would be no memory map there. Understood. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
On 4/21/21 5:49 PM, Mike Rapoport wrote: > On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote: >> >> On 4/21/21 12:21 PM, Mike Rapoport wrote: >>> From: Mike Rapoport >>> >>> The intended semantics of pfn_valid() is to verify whether there is a >>> struct page for the pfn in question and nothing else. >>> >>> Yet, on arm64 it is used to distinguish memory areas that are mapped in the >>> linear map vs those that require ioremap() to access them. >>> >>> Introduce a dedicated pfn_is_map_memory() wrapper for >>> memblock_is_map_memory() to perform such check and use it where >>> appropriate. >>> >>> Using a wrapper allows to avoid cyclic include dependencies. >>> >>> Signed-off-by: Mike Rapoport >>> --- >>> arch/arm64/include/asm/memory.h | 2 +- >>> arch/arm64/include/asm/page.h | 1 + >>> arch/arm64/kvm/mmu.c| 2 +- >>> arch/arm64/mm/init.c| 11 +++ >>> arch/arm64/mm/ioremap.c | 4 ++-- >>> arch/arm64/mm/mmu.c | 2 +- >>> 6 files changed, 17 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/memory.h >>> b/arch/arm64/include/asm/memory.h >>> index 0aabc3be9a75..194f9f993d30 100644 >>> --- a/arch/arm64/include/asm/memory.h >>> +++ b/arch/arm64/include/asm/memory.h >>> @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x) >>> >>> #define virt_addr_valid(addr) ({ >>> \ >>> __typeof__(addr) __addr = __tag_reset(addr);\ >>> - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \ >>> + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); >>> \ >>> }) >>> >>> void dump_mem_limit(void); >>> diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h >>> index 012cffc574e8..99a6da91f870 100644 >>> --- a/arch/arm64/include/asm/page.h >>> +++ b/arch/arm64/include/asm/page.h >>> @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from); >>> typedef struct page *pgtable_t; >>> >>> extern int pfn_valid(unsigned long); >>> +extern int pfn_is_map_memory(unsigned long); >> >> Check patch is complaining about this. >> >> WARNING: function definition argument 'unsigned long' should also have an >> identifier name >> #50: FILE: arch/arm64/include/asm/page.h:41: >> +extern int pfn_is_map_memory(unsigned long); >> >> >>> >>> #include >>> >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 8711894db8c2..23dd99e29b23 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) >>> >>> static bool kvm_is_device_pfn(unsigned long pfn) >>> { >>> - return !pfn_valid(pfn); >>> + return !pfn_is_map_memory(pfn); >>> } >>> >>> /* >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>> index 3685e12aba9b..dc03bdc12c0f 100644 >>> --- a/arch/arm64/mm/init.c >>> +++ b/arch/arm64/mm/init.c >>> @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn) >>> } >>> EXPORT_SYMBOL(pfn_valid); >>> >>> +int pfn_is_map_memory(unsigned long pfn) >>> +{ >>> + phys_addr_t addr = PFN_PHYS(pfn); >>> + >> >> Should also bring with it, the comment regarding upper bits in >> the pfn from arm64 pfn_valid(). > > I think a reference to the comment in pfn_valid() will suffice. Okay. > > BTW, I wonder how is that other architectures do not need this check? Trying to move that into generic pfn_valid() in mmzone.h, will resend the RFC patch after this series. https://patchwork.kernel.org/project/linux-mm/patch/1615174073-10520-1-git-send-email-anshuman.khand...@arm.com/ > >>> + if (PHYS_PFN(addr) != pfn) >>> + return 0; >>> + >> >> ^ trailing spaces here. >> >> ERROR: trailing whitespace >> #81: FILE: arch/arm64/mm/init.c:263: >> +^I$ > > Oops :) > >>> + return memblock_is_map_memory(addr); >>> +} >>> +EXPORT_SYMBOL(pfn_is_map_memory); >>> + >> >> Is the EXPORT_SYMBOL() required to build drivers which will use >> pfn_is_map_memory() but currently use pfn_valid() ? > > Yes, this is required for virt_addr_valid() that is used by modules. > There will be two adjacent EXPORT_SYMBOLS(), one for pfn_valid() and one for pfn_is_map_memory(). But its okay I guess, cant help it. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
On Wed, Apr 21, 2021 at 04:36:46PM +0530, Anshuman Khandual wrote: > > On 4/21/21 12:21 PM, Mike Rapoport wrote: > > From: Mike Rapoport > > > > The arm64's version of pfn_valid() differs from the generic because of two > > reasons: > > > > * Parts of the memory map are freed during boot. This makes it necessary to > > verify that there is actual physical memory that corresponds to a pfn > > which is done by querying memblock. > > > > * There are NOMAP memory regions. These regions are not mapped in the > > linear map and until the previous commit the struct pages representing > > these areas had default values. > > > > As the consequence of absence of the special treatment of NOMAP regions in > > the memory map it was necessary to use memblock_is_map_memory() in > > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that > > generic mm functionality would not treat a NOMAP page as a normal page. > > > > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and > > the rest of core mm will treat them as unusable memory and thus > > pfn_valid_within() is no longer required at all and can be disabled by > > removing CONFIG_HOLES_IN_ZONE on arm64. > > This makes sense. > > > > > pfn_valid() can be slightly simplified by replacing > > memblock_is_map_memory() with memblock_is_memory(). > > > > Signed-off-by: Mike Rapoport > > --- > > arch/arm64/Kconfig | 3 --- > > arch/arm64/mm/init.c | 4 ++-- > > 2 files changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index e4e1b6550115..58e439046d05 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK > > def_bool y > > depends on NUMA > > > > -config HOLES_IN_ZONE > > - def_bool y > > - > > Right. > > > source "kernel/Kconfig.hz" > > > > config ARCH_SPARSEMEM_ENABLE > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index dc03bdc12c0f..eb3f56fb8c7c 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn) > > > > /* > > * ZONE_DEVICE memory does not have the memblock entries. > > -* memblock_is_map_memory() check for ZONE_DEVICE based > > +* memblock_is_memory() check for ZONE_DEVICE based > > * addresses will always fail. Even the normal hotplugged > > * memory will never have MEMBLOCK_NOMAP flag set in their > > * memblock entries. Skip memblock search for all non early > > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn) > > return pfn_section_valid(ms, pfn); > > } > > #endif > > - return memblock_is_map_memory(addr); > > + return memblock_is_memory(addr); > > Wondering if MEMBLOCK_NOMAP is now being treated similarly to other > memory pfns for page table walking purpose but with PageReserved(), > why memblock_is_memory() is still required ? At this point, should > not we just return valid for early_section() memory. As pfn_valid() > now just implies that pfn has a struct page backing which has been > already verified with valid_section() etc. memblock_is_memory() is required because arm64 frees unused parts of the memory map. So, for instance, if we have 64M out of 128M populated in a section the section based calculation would return 1 for a pfn in the second half of the section, but there would be no memory map there. > > } > > EXPORT_SYMBOL(pfn_valid); > > > > -- Sincerely yours, Mike. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
On Wed, Apr 21, 2021 at 04:29:48PM +0530, Anshuman Khandual wrote: > > On 4/21/21 12:21 PM, Mike Rapoport wrote: > > From: Mike Rapoport > > > > The intended semantics of pfn_valid() is to verify whether there is a > > struct page for the pfn in question and nothing else. > > > > Yet, on arm64 it is used to distinguish memory areas that are mapped in the > > linear map vs those that require ioremap() to access them. > > > > Introduce a dedicated pfn_is_map_memory() wrapper for > > memblock_is_map_memory() to perform such check and use it where > > appropriate. > > > > Using a wrapper allows to avoid cyclic include dependencies. > > > > Signed-off-by: Mike Rapoport > > --- > > arch/arm64/include/asm/memory.h | 2 +- > > arch/arm64/include/asm/page.h | 1 + > > arch/arm64/kvm/mmu.c| 2 +- > > arch/arm64/mm/init.c| 11 +++ > > arch/arm64/mm/ioremap.c | 4 ++-- > > arch/arm64/mm/mmu.c | 2 +- > > 6 files changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/arch/arm64/include/asm/memory.h > > b/arch/arm64/include/asm/memory.h > > index 0aabc3be9a75..194f9f993d30 100644 > > --- a/arch/arm64/include/asm/memory.h > > +++ b/arch/arm64/include/asm/memory.h > > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x) > > > > #define virt_addr_valid(addr) ({ > > \ > > __typeof__(addr) __addr = __tag_reset(addr);\ > > - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \ > > + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); > > \ > > }) > > > > void dump_mem_limit(void); > > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h > > index 012cffc574e8..99a6da91f870 100644 > > --- a/arch/arm64/include/asm/page.h > > +++ b/arch/arm64/include/asm/page.h > > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from); > > typedef struct page *pgtable_t; > > > > extern int pfn_valid(unsigned long); > > +extern int pfn_is_map_memory(unsigned long); > > Check patch is complaining about this. > > WARNING: function definition argument 'unsigned long' should also have an > identifier name > #50: FILE: arch/arm64/include/asm/page.h:41: > +extern int pfn_is_map_memory(unsigned long); > > > > > > #include > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index 8711894db8c2..23dd99e29b23 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) > > > > static bool kvm_is_device_pfn(unsigned long pfn) > > { > > - return !pfn_valid(pfn); > > + return !pfn_is_map_memory(pfn); > > } > > > > /* > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 3685e12aba9b..dc03bdc12c0f 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn) > > } > > EXPORT_SYMBOL(pfn_valid); > > > > +int pfn_is_map_memory(unsigned long pfn) > > +{ > > + phys_addr_t addr = PFN_PHYS(pfn); > > + > > Should also bring with it, the comment regarding upper bits in > the pfn from arm64 pfn_valid(). I think a reference to the comment in pfn_valid() will suffice. BTW, I wonder how is that other architectures do not need this check? > > + if (PHYS_PFN(addr) != pfn) > > + return 0; > > + > > ^ trailing spaces here. > > ERROR: trailing whitespace > #81: FILE: arch/arm64/mm/init.c:263: > +^I$ Oops :) > > + return memblock_is_map_memory(addr); > > +} > > +EXPORT_SYMBOL(pfn_is_map_memory); > > + > > Is the EXPORT_SYMBOL() required to build drivers which will use > pfn_is_map_memory() but currently use pfn_valid() ? Yes, this is required for virt_addr_valid() that is used by modules. -- Sincerely yours, Mike. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part)
Hi Eric, On 4/11/21 4:42 PM, Eric Auger wrote: SMMUv3 Nested Stage Setup (IOMMU part) [snip] Eric Auger (12): iommu: Introduce attach/detach_pasid_table API iommu: Introduce bind/unbind_guest_msi iommu/smmuv3: Allow s1 and s2 configs to coexist iommu/smmuv3: Get prepared for nested stage support iommu/smmuv3: Implement attach/detach_pasid_table iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs iommu/smmuv3: Implement cache_invalidate dma-iommu: Implement NESTED_MSI cookie iommu/smmuv3: Nested mode single MSI doorbell per domain enforcement iommu/smmuv3: Enforce incompatibility between nested mode and HW MSI regions iommu/smmuv3: Implement bind/unbind_guest_msi iommu/smmuv3: report additional recoverable faults [snip] I noticed that the patch[1]: [PATCH v13 15/15] iommu/smmuv3: Add PASID cache invalidation per PASID has been dropped in the v14 and v15 of this series. Is this planned to be part of any future series, or did I miss a discussion about dropping the patch? :-) [1] https://patchwork.kernel.org/project/kvm/patch/20201118112151.25412-16-eric.au...@redhat.com/ Best regards Vivek IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
On Wed, 21 Apr 2021 07:17:44 +0100, Keqian Zhu wrote: > > Hi Gavin, > > On 2021/4/21 14:20, Gavin Shan wrote: > > Hi Keqian and Santosh, > > > > On 4/21/21 12:59 PM, Keqian Zhu wrote: > >> On 2020/10/22 0:16, Santosh Shukla wrote: > >>> The Commit:6d674e28 introduces a notion to detect and handle the > >>> device mapping. The commit checks for the VM_PFNMAP flag is set > >>> in vma->flags and if set then marks force_pte to true such that > >>> if force_pte is true then ignore the THP function check > >>> (/transparent_hugepage_adjust()). > >>> > >>> There could be an issue with the VM_PFNMAP flag setting and checking. > >>> For example consider a case where the mdev vendor driver register's > >>> the vma_fault handler named vma_mmio_fault(), which maps the > >>> host MMIO region in-turn calls remap_pfn_range() and maps > >>> the MMIO's vma space. Where, remap_pfn_range implicitly sets > >>> the VM_PFNMAP flag into vma->flags. > >> Could you give the name of the mdev vendor driver that triggers this issue? > >> I failed to find one according to your description. Thanks. > >> > > > > I think it would be fixed in driver side to set VM_PFNMAP in > > its mmap() callback (call_mmap()), like vfio PCI driver does. > > It means it won't be delayed until page fault is issued and > > remap_pfn_range() is called. It's determined from the beginning > > that the vma associated the mdev vendor driver is serving as > > PFN remapping purpose. So the vma should be populated completely, > > including the VM_PFNMAP flag before it becomes visible to user > > space. Why should that be a requirement? Lazy populating of the VMA should be perfectly acceptable if the fault can only happen on the CPU side. > > > > The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c: > > vfio_pci_mmap: VM_PFNMAP is set for the vma > > vfio_pci_mmap_fault: remap_pfn_range() is called > Right. I have discussed the above with Marc. I want to find the driver > to fix it. However, AFAICS, there is no driver matches the description... I have the feeling this is an out-of-tree driver (and Santosh email address is bouncing, so I guess we won't have much information from him). However, the simple fact that any odd driver can provide a fault handler and populate it the VMA on demand makes me think that we need to support this case. M. -- Without deviation from the norm, progress is not possible. ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
On 4/21/21 12:21 PM, Mike Rapoport wrote: > From: Mike Rapoport > > The arm64's version of pfn_valid() differs from the generic because of two > reasons: > > * Parts of the memory map are freed during boot. This makes it necessary to > verify that there is actual physical memory that corresponds to a pfn > which is done by querying memblock. > > * There are NOMAP memory regions. These regions are not mapped in the > linear map and until the previous commit the struct pages representing > these areas had default values. > > As the consequence of absence of the special treatment of NOMAP regions in > the memory map it was necessary to use memblock_is_map_memory() in > pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that > generic mm functionality would not treat a NOMAP page as a normal page. > > Since the NOMAP regions are now marked as PageReserved(), pfn walkers and > the rest of core mm will treat them as unusable memory and thus > pfn_valid_within() is no longer required at all and can be disabled by > removing CONFIG_HOLES_IN_ZONE on arm64. This makes sense. > > pfn_valid() can be slightly simplified by replacing > memblock_is_map_memory() with memblock_is_memory(). > > Signed-off-by: Mike Rapoport > --- > arch/arm64/Kconfig | 3 --- > arch/arm64/mm/init.c | 4 ++-- > 2 files changed, 2 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index e4e1b6550115..58e439046d05 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK > def_bool y > depends on NUMA > > -config HOLES_IN_ZONE > - def_bool y > - Right. > source "kernel/Kconfig.hz" > > config ARCH_SPARSEMEM_ENABLE > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index dc03bdc12c0f..eb3f56fb8c7c 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn) > > /* >* ZONE_DEVICE memory does not have the memblock entries. > - * memblock_is_map_memory() check for ZONE_DEVICE based > + * memblock_is_memory() check for ZONE_DEVICE based >* addresses will always fail. Even the normal hotplugged >* memory will never have MEMBLOCK_NOMAP flag set in their >* memblock entries. Skip memblock search for all non early > @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn) > return pfn_section_valid(ms, pfn); > } > #endif > - return memblock_is_map_memory(addr); > + return memblock_is_memory(addr); Wondering if MEMBLOCK_NOMAP is now being treated similarly to other memory pfns for page table walking purpose but with PageReserved(), why memblock_is_memory() is still required ? At this point, should not we just return valid for early_section() memory. As pfn_valid() now just implies that pfn has a struct page backing which has been already verified with valid_section() etc. > } > EXPORT_SYMBOL(pfn_valid); > > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 3/4] arm64: decouple check whether pfn is in linear map from pfn_valid()
On 4/21/21 12:21 PM, Mike Rapoport wrote: > From: Mike Rapoport > > The intended semantics of pfn_valid() is to verify whether there is a > struct page for the pfn in question and nothing else. > > Yet, on arm64 it is used to distinguish memory areas that are mapped in the > linear map vs those that require ioremap() to access them. > > Introduce a dedicated pfn_is_map_memory() wrapper for > memblock_is_map_memory() to perform such check and use it where > appropriate. > > Using a wrapper allows to avoid cyclic include dependencies. > > Signed-off-by: Mike Rapoport > --- > arch/arm64/include/asm/memory.h | 2 +- > arch/arm64/include/asm/page.h | 1 + > arch/arm64/kvm/mmu.c| 2 +- > arch/arm64/mm/init.c| 11 +++ > arch/arm64/mm/ioremap.c | 4 ++-- > arch/arm64/mm/mmu.c | 2 +- > 6 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h > index 0aabc3be9a75..194f9f993d30 100644 > --- a/arch/arm64/include/asm/memory.h > +++ b/arch/arm64/include/asm/memory.h > @@ -351,7 +351,7 @@ static inline void *phys_to_virt(phys_addr_t x) > > #define virt_addr_valid(addr)({ > \ > __typeof__(addr) __addr = __tag_reset(addr);\ > - __is_lm_address(__addr) && pfn_valid(virt_to_pfn(__addr)); \ > + __is_lm_address(__addr) && pfn_is_map_memory(virt_to_pfn(__addr)); > \ > }) > > void dump_mem_limit(void); > diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h > index 012cffc574e8..99a6da91f870 100644 > --- a/arch/arm64/include/asm/page.h > +++ b/arch/arm64/include/asm/page.h > @@ -38,6 +38,7 @@ void copy_highpage(struct page *to, struct page *from); > typedef struct page *pgtable_t; > > extern int pfn_valid(unsigned long); > +extern int pfn_is_map_memory(unsigned long); Check patch is complaining about this. WARNING: function definition argument 'unsigned long' should also have an identifier name #50: FILE: arch/arm64/include/asm/page.h:41: +extern int pfn_is_map_memory(unsigned long); > > #include > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 8711894db8c2..23dd99e29b23 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -85,7 +85,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm) > > static bool kvm_is_device_pfn(unsigned long pfn) > { > - return !pfn_valid(pfn); > + return !pfn_is_map_memory(pfn); > } > > /* > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 3685e12aba9b..dc03bdc12c0f 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -258,6 +258,17 @@ int pfn_valid(unsigned long pfn) > } > EXPORT_SYMBOL(pfn_valid); > > +int pfn_is_map_memory(unsigned long pfn) > +{ > + phys_addr_t addr = PFN_PHYS(pfn); > + Should also bring with it, the comment regarding upper bits in the pfn from arm64 pfn_valid(). > + if (PHYS_PFN(addr) != pfn) > + return 0; > + ^ trailing spaces here. ERROR: trailing whitespace #81: FILE: arch/arm64/mm/init.c:263: +^I$ > + return memblock_is_map_memory(addr); > +} > +EXPORT_SYMBOL(pfn_is_map_memory); > + Is the EXPORT_SYMBOL() required to build drivers which will use pfn_is_map_memory() but currently use pfn_valid() ? ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 2/4] memblock: update initialization of reserved pages
On 4/21/21 12:21 PM, Mike Rapoport wrote: > From: Mike Rapoport > > The struct pages representing a reserved memory region are initialized > using reserve_bootmem_range() function. This function is called for each > reserved region just before the memory is freed from memblock to the buddy > page allocator. > > The struct pages for MEMBLOCK_NOMAP regions are kept with the default > values set by the memory map initialization which makes it necessary to > have a special treatment for such pages in pfn_valid() and > pfn_valid_within(). > > Split out initialization of the reserved pages to a function with a > meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the > reserved regions and mark struct pages for the NOMAP regions as > PageReserved. > > Signed-off-by: Mike Rapoport > --- > include/linux/memblock.h | 4 +++- > mm/memblock.c| 28 ++-- > 2 files changed, 29 insertions(+), 3 deletions(-) > > diff --git a/include/linux/memblock.h b/include/linux/memblock.h > index 5984fff3f175..634c1a578db8 100644 > --- a/include/linux/memblock.h > +++ b/include/linux/memblock.h > @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn; > * @MEMBLOCK_NONE: no special request > * @MEMBLOCK_HOTPLUG: hotpluggable region > * @MEMBLOCK_MIRROR: mirrored region > - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping > + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as > + * reserved in the memory map; refer to memblock_mark_nomap() description > + * for futher details Small nit - s/futher/further > */ > enum memblock_flags { > MEMBLOCK_NONE = 0x0, /* No special request */ > diff --git a/mm/memblock.c b/mm/memblock.c > index afaefa8fc6ab..3abf2c3fea7f 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t > base, phys_addr_t size) > * @base: the base phys addr of the region > * @size: the size of the region > * > + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the > + * direct mapping of the physical memory. These regions will still be > + * covered by the memory map. The struct page representing NOMAP memory > + * frames in the memory map will be PageReserved() > + * > * Return: 0 on success, -errno on failure. > */ > int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size) > @@ -2002,6 +2007,26 @@ static unsigned long __init > __free_memory_core(phys_addr_t start, > return end_pfn - start_pfn; > } > > +static void __init memmap_init_reserved_pages(void) > +{ > + struct memblock_region *region; > + phys_addr_t start, end; > + u64 i; > + > + /* initialize struct pages for the reserved regions */ > + for_each_reserved_mem_range(i, &start, &end) > + reserve_bootmem_region(start, end); > + > + /* and also treat struct pages for the NOMAP regions as PageReserved */ > + for_each_mem_region(region) { > + if (memblock_is_nomap(region)) { > + start = region->base; > + end = start + region->size; > + reserve_bootmem_region(start, end); > + } > + } I guess there is no possible method to unify both these loops. > +} > + > static unsigned long __init free_low_memory_core_early(void) > { > unsigned long count = 0; > @@ -2010,8 +2035,7 @@ static unsigned long __init > free_low_memory_core_early(void) > > memblock_clear_hotplug(0, -1); > > - for_each_reserved_mem_range(i, &start, &end) > - reserve_bootmem_region(start, end); > + memmap_init_reserved_pages(); > > /* >* We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id > Reviewed-by: Anshuman Khandual ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 1/4] include/linux/mmzone.h: add documentation for pfn_valid()
On 4/21/21 12:21 PM, Mike Rapoport wrote: > From: Mike Rapoport > > Add comment describing the semantics of pfn_valid() that clarifies that > pfn_valid() only checks for availability of a memory map entry (i.e. struct > page) for a PFN rather than availability of usable memory backing that PFN. > > The most "generic" version of pfn_valid() used by the configurations with > SPARSEMEM enabled resides in include/linux/mmzone.h so this is the most > suitable place for documentation about semantics of pfn_valid(). > > Suggested-by: Anshuman Khandual > Signed-off-by: Mike Rapoport Reviewed-by: Anshuman Khandual > --- > include/linux/mmzone.h | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 47946cec7584..961f0eeefb62 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1410,6 +1410,17 @@ static inline int pfn_section_valid(struct mem_section > *ms, unsigned long pfn) > #endif > > #ifndef CONFIG_HAVE_ARCH_PFN_VALID > +/** > + * pfn_valid - check if there is a valid memory map entry for a PFN > + * @pfn: the page frame number to check > + * > + * Check if there is a valid memory map entry aka struct page for the @pfn. > + * Note, that availability of the memory map entry does not imply that > + * there is actual usable memory at that @pfn. The struct page may > + * represent a hole or an unusable page frame. > + * > + * Return: 1 for PFNs that have memory map entries and 0 otherwise > + */ > static inline int pfn_valid(unsigned long pfn) > { > struct mem_section *ms; > ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 2/4] memblock: update initialization of reserved pages
On 21.04.21 08:51, Mike Rapoport wrote: From: Mike Rapoport The struct pages representing a reserved memory region are initialized using reserve_bootmem_range() function. This function is called for each reserved region just before the memory is freed from memblock to the buddy page allocator. The struct pages for MEMBLOCK_NOMAP regions are kept with the default values set by the memory map initialization which makes it necessary to have a special treatment for such pages in pfn_valid() and pfn_valid_within(). Split out initialization of the reserved pages to a function with a meaningful name and treat the MEMBLOCK_NOMAP regions the same way as the reserved regions and mark struct pages for the NOMAP regions as PageReserved. Signed-off-by: Mike Rapoport --- include/linux/memblock.h | 4 +++- mm/memblock.c| 28 ++-- 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 5984fff3f175..634c1a578db8 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -30,7 +30,9 @@ extern unsigned long long max_possible_pfn; * @MEMBLOCK_NONE: no special request * @MEMBLOCK_HOTPLUG: hotpluggable region * @MEMBLOCK_MIRROR: mirrored region - * @MEMBLOCK_NOMAP: don't add to kernel direct mapping + * @MEMBLOCK_NOMAP: don't add to kernel direct mapping and treat as + * reserved in the memory map; refer to memblock_mark_nomap() description + * for futher details */ enum memblock_flags { MEMBLOCK_NONE = 0x0, /* No special request */ diff --git a/mm/memblock.c b/mm/memblock.c index afaefa8fc6ab..3abf2c3fea7f 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -906,6 +906,11 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) * @base: the base phys addr of the region * @size: the size of the region * + * The memory regions marked with %MEMBLOCK_NOMAP will not be added to the + * direct mapping of the physical memory. These regions will still be + * covered by the memory map. The struct page representing NOMAP memory + * frames in the memory map will be PageReserved() + * * Return: 0 on success, -errno on failure. */ int __init_memblock memblock_mark_nomap(phys_addr_t base, phys_addr_t size) @@ -2002,6 +2007,26 @@ static unsigned long __init __free_memory_core(phys_addr_t start, return end_pfn - start_pfn; } +static void __init memmap_init_reserved_pages(void) +{ + struct memblock_region *region; + phys_addr_t start, end; + u64 i; + + /* initialize struct pages for the reserved regions */ + for_each_reserved_mem_range(i, &start, &end) + reserve_bootmem_region(start, end); + + /* and also treat struct pages for the NOMAP regions as PageReserved */ + for_each_mem_region(region) { + if (memblock_is_nomap(region)) { + start = region->base; + end = start + region->size; + reserve_bootmem_region(start, end); + } + } +} + static unsigned long __init free_low_memory_core_early(void) { unsigned long count = 0; @@ -2010,8 +2035,7 @@ static unsigned long __init free_low_memory_core_early(void) memblock_clear_hotplug(0, -1); - for_each_reserved_mem_range(i, &start, &end) - reserve_bootmem_region(start, end); + memmap_init_reserved_pages(); /* * We need to use NUMA_NO_NODE instead of NODE_DATA(0)->node_id Reviewed-by: David Hildenbrand -- Thanks, David / dhildenb ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Re: [PATCH v2 4/4] arm64: drop pfn_valid_within() and simplify pfn_valid()
On 21.04.21 08:51, Mike Rapoport wrote: From: Mike Rapoport The arm64's version of pfn_valid() differs from the generic because of two reasons: * Parts of the memory map are freed during boot. This makes it necessary to verify that there is actual physical memory that corresponds to a pfn which is done by querying memblock. * There are NOMAP memory regions. These regions are not mapped in the linear map and until the previous commit the struct pages representing these areas had default values. As the consequence of absence of the special treatment of NOMAP regions in the memory map it was necessary to use memblock_is_map_memory() in pfn_valid() and to have pfn_valid_within() aliased to pfn_valid() so that generic mm functionality would not treat a NOMAP page as a normal page. Since the NOMAP regions are now marked as PageReserved(), pfn walkers and the rest of core mm will treat them as unusable memory and thus pfn_valid_within() is no longer required at all and can be disabled by removing CONFIG_HOLES_IN_ZONE on arm64. pfn_valid() can be slightly simplified by replacing memblock_is_map_memory() with memblock_is_memory(). Signed-off-by: Mike Rapoport --- arch/arm64/Kconfig | 3 --- arch/arm64/mm/init.c | 4 ++-- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index e4e1b6550115..58e439046d05 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1040,9 +1040,6 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK def_bool y depends on NUMA -config HOLES_IN_ZONE - def_bool y - source "kernel/Kconfig.hz" config ARCH_SPARSEMEM_ENABLE diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index dc03bdc12c0f..eb3f56fb8c7c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -243,7 +243,7 @@ int pfn_valid(unsigned long pfn) /* * ZONE_DEVICE memory does not have the memblock entries. -* memblock_is_map_memory() check for ZONE_DEVICE based +* memblock_is_memory() check for ZONE_DEVICE based * addresses will always fail. Even the normal hotplugged * memory will never have MEMBLOCK_NOMAP flag set in their * memblock entries. Skip memblock search for all non early @@ -254,7 +254,7 @@ int pfn_valid(unsigned long pfn) return pfn_section_valid(ms, pfn); } #endif - return memblock_is_map_memory(addr); + return memblock_is_memory(addr); } EXPORT_SYMBOL(pfn_valid); Acked-by: David Hildenbrand -- Thanks, David / dhildenb ___ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm