> On 22 Feb 2026, at 09:48, Gregory Price <[email protected]> wrote: > > Topic type: MM > > Presenter: Gregory Price <[email protected]> > > This series introduces N_MEMORY_PRIVATE, a NUMA node state for memory > managed by the buddy allocator but excluded from normal allocations. > > I present it with an end-to-end Compressed RAM service (mm/cram.c) > that would otherwise not be possible (or would be considerably more > difficult, be device-specific, and add to the ZONE_DEVICE boondoggle). > > > TL;DR > === > > N_MEMORY_PRIVATE is all about isolating NUMA nodes and then punching > explicit holes in that isolation to do useful things we couldn't do > before without re-implementing entire portions of mm/ in a driver. > > > /* This is my memory. There are many like it, but this one is mine. */ > rc = add_private_memory_driver_managed(nid, start, size, name, flags, > online_type, private_context); > > page = alloc_pages_node(nid, __GFP_PRIVATE, 0); > > /* Ok but I want to do something useful with it */ > static const struct node_private_ops ops = { > .migrate_to = my_migrate_to, > .folio_migrate = my_folio_migrate, > .flags = NP_OPS_MIGRATION | NP_OPS_MEMPOLICY, > }; > node_private_set_ops(nid, &ops); > > /* And now I can use mempolicy with my memory */ > buf = mmap(...); > mbind(buf, len, mode, private_node, ...); > buf[0] = 0xdeadbeef; /* Faults onto private node */ > > /* And to be clear, no one else gets my memory */ > buf2 = malloc(4096); /* Standard allocation */ > buf2[0] = 0xdeadbeef; /* Can never land on private node */ > > /* But i can choose to migrate it to the private node */ > move_pages(0, 1, &buf, &private_node, NULL, ...); > > /* And more fun things like this */ > > > Patchwork > === > A fully working branch based on cxl/next can be found here: > https://github.com/gourryinverse/linux/tree/private_compression > > A QEMU device which can inject high/low interrupts can be found here: > https://github.com/gourryinverse/qemu/tree/compressed_cxl_clean > > The additional patches on these branches are CXL and DAX driver > housecleaning only tangentially relevant to this RFC, so i've > omitted them for the sake of trying to keep it somewhat clean > here. Those patches should (hopefully) be going upstream anyway. > > Patches 1-22: Core Private Node Infrastructure > > Patch 1: Introduce N_MEMORY_PRIVATE scaffolding > Patch 2: Introduce __GFP_PRIVATE > Patch 3: Apply allocation isolation mechanisms > Patch 4: Add N_MEMORY nodes to private fallback lists > Patches 5-9: Filter operations not yet supported > Patch 10: free_folio callback > Patch 11: split_folio callback > Patches 12-20: mm/ service opt-ins: > Migration, Mempolicy, Demotion, Write Protect, > Reclaim, OOM, NUMA Balancing, Compaction, > LongTerm Pinning > Patch 21: memory_failure callback > Patch 22: Memory hotplug plumbing for private nodes > > Patch 23: mm/cram -- Compressed RAM Management > > Patches 24-27: CXL Driver examples > Sysram Regions with Private node support > Basic Driver Example: (MIGRATION | MEMPOLICY) > Compression Driver Example (Generic) > Hi,
As I think this is about to be discussed in the conference, I thought to share some high level comments. I have tested this for some time on a device with compression (after some necessary fixes for CXL RCD to work, that Greg helped me with). Overall, the isolation property that this provides is something I deem necessary for this technology. Others are better placed to judge the MM plumbing itself, but I wanted to say that this functionality is an important piece of the puzzle from the device/use-case side. For cram itself, as it is in this RFC, I think there is still performance and value left on the table (as noted in the description), but I fully understand Gregory’s premise in approaching it this way. <snip> > > Future CRAM : Loosening the read-only constraint > === > > The read-only model is safe but conservative. For workloads where > compressed pages are occasionally written, the promotion fault adds > latency. A future optimization could allow a tunable fraction of > compressed pages to be mapped writable, accepting some risk of > write-driven decompression in exchange for lower overhead. > > The private node ops make this straightforward: > > - Adjust fixup_migration_pte to selectively skip > write-protection. > - Use the backpressure system to either revoke writable mappings, > deny additional demotions, or evict when device pressure rises. I have some quick hacks playing with these ideas but I haven’t had the time to test it thoroughly and get to something robust yet. I saw in another thread that there is a follow up cooking which looks interesting. Thanks Greg for pushing this, and I’m happy to test more on HW in our lab. Best, /Yiannis
