Jason Gunthorpe wrote: > On Tue, Jan 09, 2024 at 06:02:03PM +0100, David Hildenbrand wrote: > > > Given that, an alternative proposal that I think would work > > > for you would be to add a 'placeholder' memory node definition > > > in SRAT (so allow 0 size explicitly - might need a new SRAT > > > entry to avoid backwards compat issues). > > > > Putting all the PCI/GI/... complexity aside, I'll just raise again that for > > virtio-mem something simple like that might be helpful as well, IIUC. > > > > -numa node,nodeid=2 \ > > ... > > -device virtio-mem-pci,node=2,... \ > > > > All we need is the OS to prepare for an empty node that will get populated > > with memory later. > > That is all this is doing too, the NUMA relationship of the actual > memory is desribed already by the PCI device since it is a BAR on the > device. > > The only purpose is to get the empty nodes into Linux :( > > > So if that's what a "placeholder" node definition in srat could achieve as > > well, even without all of the other acpi-generic-initiator stuff, that would > > be great. > > Seems like there are two use quite similar cases.. virtio-mem is going > to be calling the same family of kernel API I suspect :)
It seems sad that we, as an industry, went through all of this trouble to define a dynamically enumerable CXL device model only to turn around and require static ACPI tables to tell us how to enumerate it. A similar problem exists on the memory target side and the approach taken there was to have Linux statically reserve at least enough numa node numbers for all the platform CXL memory ranges (defined in the ACPI.CEDT.CFMWS), but with the promise to come back and broach the dynamic node creation problem "if the need arises". This initiator-node enumeration case seems like that occasion where the need has arisen to get Linux out of the mode of needing to declare all possible numa nodes early in boot. Allow for nodes to be discoverable post NUMA-init. One strawman scheme that comes to mind is instead of "add nodes early" in boot, "delete unused nodes late" in boot after the device topology has been enumerated. Otherwise, requiring static ACPI tables to further enumerate an industry-standard dynamically enumerated bus seems to be going in the wrong direction.