On Tue, Apr 19, 2022 at 05:47:56PM -0700, Dan Williams wrote: > On Tue, Apr 19, 2022 at 4:04 PM Jason Gunthorpe <[email protected]> wrote: > > > > On Tue, Apr 19, 2022 at 02:59:46PM -0700, Dan Williams wrote: > > > > > ...or are you suggesting to represent CXL free memory capacity in > > > iomem_resource and augment the FW list early with CXL ranges. That > > > seems doable, but it would only represent the free CXL ranges in > > > iomem_resource as the populated CXL ranges cannot have their resources > > > reparented after the fact, and there is plenty of code that expects > > > "System RAM" to be a top-level resource. > > > > Yes, something more like this. iomem_resource should represent stuff > > actually in use and CXL shouldn't leave behind an 'IOW' for address > > space it isn't actually able to currently use. > > So that's the problem, these gigantic windows need to support someone > showing up unannounced with a stack of multi-terabyte devices to add > to the system.
In my experience PCIe hotplug is already extremely rare, you may need to do this reservation on systems with hotplug slots, but not generally. In PCIe world the BIOS often figures this out and bridge windows are not significantly over allocated on non-hotplug HW. (though even PCIe has the resizable bar extension and other things that are quite like hotplug and do trigger huge resource requirements) > > Your whole description sounds like the same problems PCI hotplug has > > adjusting the bridge windows. > > ...but even there the base bounds (AFAICS) are coming from FW (_CRS > entries for ACPI described PCIe host bridges). So if CXL follows that > model then the entire unmapped portion of the CXL ranges should be > marked as an idle resource in iomem_resource. And possibly yes, because part of the point of this stuff is to declare where HW is actually using the address space. So if FW has left a host bridge decoder setup to actually consume this space then it really has to be set aside to prevent hotplug of other bus types from trying to claim the same address space for their own usages. If no actual decoder is setup then it maybe it shouldn't be left as an IOW in the resource tree. In this case it might be better to teach the io resource allocator to leave gaps for future hotplug. > The improvement that offers over this current proposal is that it > allows for global visibility of CXL hotplug resources, but it does set > up a discontinuity between FW mapped and OS mapped CXL. FW mapped will > have top-level "System RAM" resources indistinguishable from typical > DRAM while OS mapped CXL will look like this: Maybe this can be reotractively fixed up in the resource tree? Jason
