On Thu, Jun 5, 2025 at 10:23 PM David Hildenbrand <da...@redhat.com> wrote: > On 05.06.25 21:19, Jann Horn wrote: > > On Wed, Jun 4, 2025 at 4:21 PM Lorenzo Stoakes > > <lorenzo.stoa...@oracle.com> wrote: > >> The walk_page_range_novma() function is rather confusing - it supports two > >> modes, one used often, the other used only for debugging. > >> > >> The first mode is the common case of traversal of kernel page tables, which > >> is what nearly all callers use this for. > >> > >> Secondly it provides an unusual debugging interface that allows for the > >> traversal of page tables in a userland range of memory even for that memory > >> which is not described by a VMA. > >> > >> It is far from certain that such page tables should even exist, but perhaps > >> this is precisely why it is useful as a debugging mechanism. > >> > >> As a result, this is utilised by ptdump only. Historically, things were > >> reversed - ptdump was the only user, and other parts of the kernel evolved > >> to use the kernel page table walking here. > > > > Just for the record, copy-pasting my comment on v1 that was > > accidentally sent off-list: > > ``` > > Sort of a tangential comment: I wonder if it would make sense to give > > ptdump a different page table walker that uses roughly the same safety > > contract as gup_fast() - turn off IRQs and then walk the page tables > > locklessly. We'd need basically no locking and no special cases > > (regarding userspace mappings at least), at the cost of having to > > write the walker code such that we periodically restart the walk from > > scratch and not being able to inspect referenced pages. (That might > > also be nicer for debugging, since it wouldn't block on locks...) > > ``` > > I assume we don't have to dump more than pte values etc? So > pte_special() and friends are not relevant to get it right. > > GUP-fast depend on CONFIG_HAVE_GUP_FAST, not sure if that would be a > concern for now.
Ah, good point, that's annoying... maaaybe we should just gate this entire feature on CONFIG_HAVE_GUP_FAST to make sure the userspace mappings are designed to be walkable in this way? It's in debugfs, which _theoretically_ (https://docs.kernel.org/filesystems/debugfs.html) means there are no stability guarantees, and I think it is normally used on architectures that define CONFIG_HAVE_GUP_FAST...