On Mon, Feb 23, 2026 at 02:07:15PM +0100, David Hildenbrand (Arm) wrote:
> >
> > Gregory Price (27):
> > numa: introduce N_MEMORY_PRIVATE node state
> > mm,cpuset: gate allocations from N_MEMORY_PRIVATE behind __GFP_PRIVATE
> > mm/page_alloc: add numa_zone_allowed() and wire it up
> > mm/page_alloc: Add private node handling to build_zonelists
> > mm: introduce folio_is_private_managed() unified predicate
> > mm/mlock: skip mlock for managed-memory folios
> > mm/madvise: skip madvise for managed-memory folios
> > mm/ksm: skip KSM for managed-memory folios
> > mm/khugepaged: skip private node folios when trying to collapse.
> > mm/swap: add free_folio callback for folio release cleanup
> > mm/huge_memory.c: add private node folio split notification callback
> > mm/migrate: NP_OPS_MIGRATION - support private node user migration
> > mm/mempolicy: NP_OPS_MEMPOLICY - support private node mempolicy
> > mm/memory-tiers: NP_OPS_DEMOTION - support private node demotion
> > mm/mprotect: NP_OPS_PROTECT_WRITE - gate PTE/PMD write-upgrades
>
> I'm concerned about adding more special-casing (similar to what we already
> added for ZONE_DEVICE) all over the place.
>
> Like the whole folio_managed_() stuff in mprotect.c
>
> Having that said, sounds like a reasonable topic to discuss.
>
It's a valid concern - and is why I tried to re-use as many of the
zone_device hooks as possible. It does not seem zone_device has quite
the same semantics for a case like this, so I had to make something new.
DEVICE_COHERENT injects a temporary swap entry to allow the device to do
a large atomic operation - then the page table is restored and the CPU
is free to change entries as it pleases.
Another option would be to add the hook to vma_wants_writenotify()
instead of the page table code - and mask MM_CP_TRY_CHANGE_WRITABLE.
This would require adding a vma flag - or maybe a count of protected /
device pages.
int mprotect_fixup() {
...
if (vma_wants_manual_pte_write_upgrade(vma))
mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
}
bool vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
{
if (vma->managed_wrprotect)
return true;
}
That would localize the change in folio_managed_fixup_migration_pte() :
static inline pte_t folio_managed_fixup_migration_pte(struct page *new,
pte_t pte,
pte_t old_pte,
struct vm_area_struct
*vma)
{
...
} else if (folio_managed_wrprotect(page_folio(new))) {
pte = pte_wrprotect(pte);
+ atomic_inc(&vma->managed_wrprotect);
}
return pte;
}
This would cover both the huge_memory.c and mprotect, and maybe that's
just generally cleaner? I can try that to see if it actually works.
~Gregory