On Tue, Jun 17, 2025 at 11:30:20AM +0200, David Hildenbrand wrote: > On 17.06.25 11:25, David Hildenbrand wrote: > > On 16.06.25 13:58, Alistair Popple wrote: > > > Previously dax pages were skipped by the pagewalk code as pud_special() or > > > vm_normal_page{_pmd}() would be false for DAX pages. Now that dax pages > > > are > > > refcounted normally that is no longer the case, so the pagewalk code will > > > start returning them. > > > > > > Most callers already explicitly filter for DAX or zone device pages so > > > don't need updating. However some don't, so add checks to those callers. > > > > > > Signed-off-by: Alistair Popple <apop...@nvidia.com> > > > > > > --- > > > > > > Changes since v1: > > > > > > - Dropped "mm/pagewalk: Skip dax pages in pagewalk" and replaced it > > > with this new patch for v2 > > > > > > - As suggested by David and Jason we can filter the folios in the > > > callers instead of doing it in folio_start_walk(). Most callers > > > already do this (see below). > > > > > > I audited all callers of folio_walk_start() and found the following: > > > > > > mm/ksm.c: > > > > > > break_ksm() - doesn't need to filter zone_device pages because the can > > > never be KSM pages. > > > > > > get_mergeable_page() - already filters out zone_device pages. > > > scan_get_next_rmap_iterm() - already filters out zone_device_pages. > > > > > > mm/huge_memory.c: > > > > > > split_huge_pages_pid() - already checks for DAX with > > > vma_not_suitable_for_thp_split() > > > > > > mm/rmap.c: > > > > > > make_device_exclusive() - only works on anonymous pages, although > > > there'd be no issue with finding a DAX page even if support was extended > > > to file-backed pages. > > > > > > mm/migrate.c: > > > > > > add_folio_for_migration() - already checks the vma with vma_migratable() > > > do_pages_stat_array() - explicitly checks for zone_device folios > > > > > > kernel/event/uprobes.c: > > > > > > uprobe_write_opcode() - only works on anonymous pages, not sure if > > > zone_device could ever work so add an explicit check > > > > > > arch/s390/mm/fault.c: > > > > > > do_secure_storage_access() - not sure so be conservative and add a check > > > > > > arch/s390/kernel/uv.c: > > > > > > make_hva_secure() - not sure so be conservative and add a check > > > --- > > > arch/s390/kernel/uv.c | 2 +- > > > arch/s390/mm/fault.c | 2 +- > > > kernel/events/uprobes.c | 2 +- > > > 3 files changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c > > > index b99478e..55aa280 100644 > > > --- a/arch/s390/kernel/uv.c > > > +++ b/arch/s390/kernel/uv.c > > > @@ -424,7 +424,7 @@ int make_hva_secure(struct mm_struct *mm, unsigned > > > long hva, struct uv_cb_header > > > return -EFAULT; > > > } > > > folio = folio_walk_start(&fw, vma, hva, 0); > > > - if (!folio) { > > > + if (!folio || folio_is_zone_device(folio)) { > > > mmap_read_unlock(mm); > > > return -ENXIO; > > > } > > > diff --git a/arch/s390/mm/fault.c b/arch/s390/mm/fault.c > > > index e1ad05b..df1a067 100644 > > > --- a/arch/s390/mm/fault.c > > > +++ b/arch/s390/mm/fault.c > > > @@ -449,7 +449,7 @@ void do_secure_storage_access(struct pt_regs *regs) > > > if (!vma) > > > return handle_fault_error(regs, SEGV_MAPERR); > > > folio = folio_walk_start(&fw, vma, addr, 0); > > > - if (!folio) { > > > + if (!folio || folio_is_zone_device(folio)) { > > > mmap_read_unlock(mm); > > > return; > > > } > > > > Curious, does s390 even support ZONE_DEVICE and could trigger this?
In thoery yes. Now that we don't need the DEVMAP PTE bit someone could enable ZONE_DEVICE on s390 as it supports the rest of the prerequisites AFAICT: config ZONE_DEVICE bool "Device memory (pmem, HMM, etc...) hotplug support" depends on MEMORY_HOTPLUG depends on MEMORY_HOTREMOVE depends on SPARSEMEM_VMEMMAP > Ah, I see you raised this above. Even if it could be triggered (which I > don't think), I wonder if there would actually be a problem with zone_device > folios in here? Yes, I'm not sure either - it seems unlikely but I know nothing about how secure storage works on s390 so was trying to be be conservative. > I think these two can be dropped for now Ok. > > I wonder if __uprobe_write_opcode() would just work with anon device folios? > > > > We only modify page content, and conditionally zap the page. Would there > > be a problem with anon device folios? The two main types of anon device folios I know of are DEVICE_COHERENT and DEVICE_PRIVATE. I doubt it would be a problem for the former, but it would definitely be a problem for the latter as the actual page content is unaddressable from the CPU. So we could probably make the check specific to DEVICE_PRIVATE, although it's hard to imagine anyone caring about uprobes from DEVICE_COHERENT memory. > -- > Cheers, > > David / dhildenb >