On 5/14/2026 9:05 PM, Igor Mammedov wrote:
[You don't often get email from [email protected]. Learn why this is
important at https://aka.ms/LearnAboutSenderIdentification ]
On Fri, 6 Mar 2026 16:27:35 +0800
fanhuang <[email protected]> wrote:
Add a 'memmap-type' option to NUMA node configuration that allows
specifying the memory type for a NUMA node.
Supported values:
- normal: Regular system RAM (E820 type 1, default)
- spm: Specific Purpose Memory (E820 type 0xEFFFFFFF)
- reserved: Reserved memory (E820 type 2)
The 'spm' type indicates Specific Purpose Memory - a hint to the guest
that this memory might be managed by device drivers based on guest policy.
The 'reserved' type marks memory as not usable as RAM.
Note: This option is only supported on x86 platforms.
Usage:
-numa node,nodeid=1,memdev=m1,memmap-type=spm
in short:
don't do it this way
I'm against merging it as is, till you convince me otherwise.
more detailed answer:
* mandatory bashing chapter:
the more i look at it, the hackier this approach looks to me,
and what even worse that nonsense propagates to firmware.
Judging by commit message, the goal is to expose some RAM as
E820 SPM, to guest (that's it).
You however picked -numa node as a way to achieve that,
and then hack the numa code not to generate numa data for it (SRAT)
and massage e820 to exclude SPM from RAM entries.
But at this stage I don't really see a good justification for hack(s)
this patch introduces (it's definitely is not in commit message not cover
letter).
And until alternative approach is not explored and proved to be worse,
I'm against merging this patch.
* suggestion chapter:
I don't recall but I likely asked before
why not use device memory instead for it (aka DIMM device or some device derived
from device memory object and then add e820 entry for it).
It would be a way more simpler approach and impl. without need to resplit
anything in e820.
And no need for messing with firmware (SeaBIOS: RamSizeOver4G patch) nor EDK2.
Hi Igor,
Thanks for taking the time to review this -- and for the candor in
the bashing chapter. Before going into the bigger picture, let me
re-establish one factual point that v7 didn't carry forward from
the v6 cover letter.
On SRAT generation:
v7 only suppresses SRAT for memmap-type=reserved. memmap-type=spm
nodes get a normal SRAT Memory Affinity entry. This was shown
explicitly in the v6 cover letter, which v7 didn't carry forward
since v7 is a single-patch series. For the spm case:
[ 0.042582] ACPI: SRAT: Node 1 PXM 1 [mem 0x280000000-0x47fffffff]
Full transcript with all three memmap-type variants side by side:
https://lore.kernel.org/qemu-devel/[email protected]/
The bigger picture -- real-world context that drove the design:
The use case is GPU/accelerator HBM exposed to the OS as SPM. On
bare metal, the platform firmware:
- emits E820 type 0xEFFFFFFF (SOFT_RESERVED) for the HBM region;
- emits ACPI SRAT memory affinity entries that bind HBM to a
dedicated proximity domain (NUMA node);
- tags the accelerator's PCI device with _PXM matching that node.
That gives the device driver a stable lookup chain at runtime:
dev -> pci_dev_to_node(dev) -> SRAT walk -> HBM GPA range
NUMA node here is not incidental -- it is the OS-exposed
intermediary ID that the device driver uses to find its own HBM.
This is the in-tree path used by accelerator drivers today.
The "-numa node + memmap-type=spm + E820 SOFT_RESERVED" combo in
v7 is a direct 1:1 model of this BM topology. The E820 retyping
in the patch is exactly what makes the guest-visible E820 match
what BM firmware emits for the same kind of region.
On the DIMM / device-memory alternative:
David pointed this out in the v6 thread, and Gregory's reply in
this thread reinforces the same point -- DIMM / NVDIMM ranges are
described in E820 only as the hotplug area. SPM needs to be in
the boot E820 from the start so the OS classifies it as SP and
treats it accordingly. Going via DIMM would also detach the
memory from the NUMA topology (no SRAT entry tied to the device's
_PXM), which breaks the dev -> node -> SRAT -> HBM lookup the
driver relies on.
Happy to dig into any of this further, or to reshape parts you
still see as too hacky.
Best regards,
FangSheng Huang (Jerry)