On 20.10.25 11:07, fanhuang wrote:
Hi David and Igor,I apologize for the delayed response. Thank you very much for your thoughtful questions and feedback on the SPM patch series. Before addressing your questions, I'd like to briefly mention what the new QEMU patch series additionally resolves: 1. **Corrected SPM terminology**: Fixed the description error from the previous version. The correct acronym is "Specific Purpose Memory" (not "special purpose memory" as previously stated). 2. **Fixed overlapping E820 entries**: Updated the implementation to properly handle overlapping E820 RAM entries before adding E820_SOFT_RESERVED regions. The previous implementation created overlapping E820 entries by first adding a large E820_RAM entry covering the entire above-4GB memory range, then adding E820_SOFT_RESERVED entries for SPM regions that overlapped with the RAM entry. This violated the E820 specification and caused OVMF/UEFI firmware to receive conflicting memory type information for the same physical addresses. The new implementation processes SPM regions first to identify reserved areas, then adds RAM entries around the SPM regions, generating a clean, non-overlapping E820 map. Now, regarding your questions: ======================================================================== Why SPM Must Be Boot Memory ======================================================================== SPM cannot be implemented as hotplug memory (DIMM/NVDIMM) because: The primary goal of SPM is to ensure that memory is managed by guest device drivers, not the guest OS. This requires boot-time discovery for three key reasons: 1. SPM regions must appear in the E820 memory map as `E820_SOFT_RESERVED` during firmware initialization, before the OS starts. 2. Hotplug memory is integrated into kernel memory management, making it unavailable for device-specific use. ======================================================================== Detailed Use Case ======================================================================== **Background** Unified Address Space for CPU and GPU: Modern heterogeneous computing architectures implement a coherent and unified address space shared between CPUs and GPUs. Unlike traditional discrete GPU designs with dedicated frame buffer, these accelerators connect CPU and GPU through high-speed interconnects (e.g., XGMI): - **HBM (High Bandwidth Memory)**: Physically attached to each GPU, reported to the OS as driver-managed system memory - **XGMI (eXternal Global Memory Interconnect, aka. Infinity Fabric)**: Maintains data coherence between CPU and GPU, enabling direct CPU access to GPU HBM without data copying In this architecture, GPU HBM is reported as system memory to the OS, but it needs to be managed exclusively by the GPU driver rather than the general OS memory allocator. This driver-managed memory provides optimal performance for GPU workloads while enabling coherent CPU-GPU data sharing through the XGMI. This is where SPM (Specific Purpose Memory) becomes essential. **Virtualization Scenario** In virtualization, hypervisor need to expose this memory topology to guest VMs while maintaining the same driver-managed vs OS-managed distinction.
Just wondering, could device hotplug in that model ever work? I guess we wouldn't expose the memory at all in e820 (after all, it gets hotplugged later) and instead the device driver in the guest would have to detect+hotplug that memoory.
But that sounds weird, because the device driver in the VM shouldn't do something virt specific.
Which raises the question: how is device hoplug of such gpus handled on bare metal? Or does it simply not work? :)
-- Cheers David / dhildenb
