On 6/4/25 16:19, Lorenzo Stoakes wrote: > The walk_page_range_novma() function is rather confusing - it supports two > modes, one used often, the other used only for debugging. > > The first mode is the common case of traversal of kernel page tables, which > is what nearly all callers use this for. > > Secondly it provides an unusual debugging interface that allows for the > traversal of page tables in a userland range of memory even for that memory > which is not described by a VMA. > > It is far from certain that such page tables should even exist, but perhaps > this is precisely why it is useful as a debugging mechanism. > > As a result, this is utilised by ptdump only. Historically, things were > reversed - ptdump was the only user, and other parts of the kernel evolved > to use the kernel page table walking here. > > Since we have some complicated and confusing locking rules for the novma > case, it makes sense to separate the two usages into their own functions. > > Doing this also provide self-documentation as to the intent of the caller - > are they doing something rather unusual or are they simply doing a standard > kernel page table walk? > > We therefore establish two separate functions - walk_page_range_debug() for > this single usage, and walk_kernel_page_table_range() for general kernel > page table walking. > > We additionally make walk_page_range_debug() internal to mm. > > Note that ptdump uses the precise same function for kernel walking as a
IMHO it's not clear at this point what "the precise same function" means. > convenience, so we permit this but make it very explicit by having > walk_page_range_novma() invoke walk_kernel_page_table_range() in this case. ^ walk_page_range_debug() Maybe this could be reworded in the sense (AFAIU) that walk_page_range_debug() can be used for both user space page table walking or kernel depending on what mm is passed, so in the case of init_mm it invokes walk_kernel_page_table_range() internally. > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoa...@oracle.com> > Acked-by: Mike Rapoport (Microsoft) <r...@kernel.org> > --- > v2: > * Renamed walk_page_range_novma() to walk_page_range_debug() as per David. > * Moved walk_page_range_debug() definition to mm/internal.h as per Mike. > * Renamed walk_page_range_kernel() to walk_kernel_page_table_range() as > per David. > > v1 resend: > * Actually cc'd lists... > * Fixed mistake in walk_page_range_novma() not handling kernel mappings and > update commit message to referene. > * Added Mike's off-list Acked-by. > * Fixed up comments as per Mike. > * Add some historic flavour to the commit message as per Mike. > https://lore.kernel.org/all/20250603192213.182931-1-lorenzo.stoa...@oracle.com/ > > v1: > (accidentally sent off-list due to error in scripting) > > arch/loongarch/mm/pageattr.c | 2 +- > arch/openrisc/kernel/dma.c | 4 +- > arch/riscv/mm/pageattr.c | 8 +-- > include/linux/pagewalk.h | 7 ++- > mm/hugetlb_vmemmap.c | 2 +- > mm/internal.h | 4 ++ > mm/pagewalk.c | 98 ++++++++++++++++++++++++------------ > mm/ptdump.c | 3 +- > 8 files changed, 82 insertions(+), 46 deletions(-) > > diff --git a/arch/loongarch/mm/pageattr.c b/arch/loongarch/mm/pageattr.c > index 99165903908a..f5e910b68229 100644 > --- a/arch/loongarch/mm/pageattr.c > +++ b/arch/loongarch/mm/pageattr.c > @@ -118,7 +118,7 @@ static int __set_memory(unsigned long addr, int numpages, > pgprot_t set_mask, pgp > return 0; > > mmap_write_lock(&init_mm); > - ret = walk_page_range_novma(&init_mm, start, end, &pageattr_ops, NULL, > &masks); > + ret = walk_kernel_page_table_range(start, end, &pageattr_ops, NULL, > &masks); > mmap_write_unlock(&init_mm); You've removed init_mm from walk_page_range_novma() but I see most callers do the locking of init_mm immediately around it. This suggests a version handling that automatically? A bit complicated by the read/write possibilities, so maybe not worth wrapping? Just a thought, as David says ;) > > flush_tlb_kernel_range(start, end); > diff --git a/arch/openrisc/kernel/dma.c b/arch/openrisc/kernel/dma.c > index 3a7b5baaa450..af932a4ad306 100644 > --- a/arch/openrisc/kernel/dma.c > +++ b/arch/openrisc/kernel/dma.c > @@ -72,7 +72,7 @@ void *arch_dma_set_uncached(void *cpu_addr, size_t size) > * them and setting the cache-inhibit bit. > */ > mmap_write_lock(&init_mm); > - error = walk_page_range_novma(&init_mm, va, va + size, > + error = walk_kernel_page_table_range(va, va + size, > &set_nocache_walk_ops, NULL, NULL); > mmap_write_unlock(&init_mm); > > @@ -87,7 +87,7 @@ void arch_dma_clear_uncached(void *cpu_addr, size_t size) > > mmap_write_lock(&init_mm); > /* walk_page_range shouldn't be able to fail here */ > - WARN_ON(walk_page_range_novma(&init_mm, va, va + size, > + WARN_ON(walk_kernel_page_table_range(va, va + size, > &clear_nocache_walk_ops, NULL, NULL)); > mmap_write_unlock(&init_mm); > } > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c > index d815448758a1..3f76db3d2769 100644 > --- a/arch/riscv/mm/pageattr.c > +++ b/arch/riscv/mm/pageattr.c > @@ -299,7 +299,7 @@ static int __set_memory(unsigned long addr, int numpages, > pgprot_t set_mask, > if (ret) > goto unlock; > > - ret = walk_page_range_novma(&init_mm, lm_start, lm_end, > + ret = walk_kernel_page_table_range(lm_start, lm_end, > &pageattr_ops, NULL, > &masks); Note this and other places break the second line's arguments alignment on the opening bracket. Maybe it just shows it's a bit fragile style...