On Thu, Feb 26, 2026 at 12:54:08AM -0500, Gregory Price wrote: > On Thu, Feb 26, 2026 at 02:27:24PM +1100, Alistair Popple wrote: > > > > If NUMA is the interface we want, then NODE_DATA is the right direction > > > regardless of struct page's future or what zone it lives in. > > > > > > There's no reason to keep per-page pgmap w/ device-to-node mappings. > > > > In reality I suspect that's already the case today. I'm not sure we need > > per-page pgmap. > > > > Probably, and maybe there's a good argument for stealing 80-90% of the > common surface here, shunting ZONE_DEVICE to use this instead of pgmap > before we go all the way to private nodes. >
Out of curiosity i went digging through existing users, and it seems like the average driver has 1-8 discrete pgmaps, with Nouveau being an outliar that does ad-hoc registering in 256MB chunks, with the relevant annoyance being the percpu_ref it uses to track lifetime of the pgmap, and the fact that they can be non-contiguous. tl;dr here: a 1-to-1 mapping of node-to-pgmap isn't realistic for most existing ZONE_DEVICE users, meaning a 1-op lookup (page->pgmap) turns into a multi-op pointer chase on and range comparison. Not sure that turns out well for anyone (only on ZONE_DEVICE / managed node users, all traditional nodes still have a simple pgdat or page->flag lookup to check membership). There's an argument for trying to do this just for the sake of getting pgmap out of struct page/folio, but this only deals with the problem on NUMA systems. For non-numa systems the pgmap still probably ends up in folio_ext (assuming we get there), but even that might not be sufficient get LRU back. Might need Willy's opinion here. ~Gregory
