Hi, This series here[0] attempts to add support for PCDIMM in QEMU for ARM/Virt platform and has stumbled upon an issue as it is not clear(at least from Qemu/EDK2 point of view) how in physical world the hotpluggable memory is handled by kernel.
The proposed implementation in Qemu, builds the SRAT and DSDT parts and uses GED device to trigger the hotplug. This works fine. But when we added the DT node corresponding to the PCDIMM(cold plug scenario), we noticed that Guest kernel see this memory during early boot even if we are booting with ACPI. Because of this, hotpluggable memory may end up in zone normal and make it non-hot-un-pluggable even if Guest boots with ACPI. Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the ACPI content from Qemu(this is designed to do so) and uses DT info to build the GetMemoryMap(). To solve this, introduced "hotpluggable" property to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate the nodes and exclude the hotpluggable ones from GetMemoryMap(). But then Laszlo rightly pointed out that in order to accommodate the changes into UEFI we need to know how exactly Linux expects/handles all the hotpluggable memory scenarios. Please find the discussion here[2]. For ease, I am just copying the relevant comment from Laszlo below, /****** "Given patches #7 and #8, as I understand them, the firmware cannot distinguish hotpluggable & present, from hotpluggable & absent. The firmware can only skip both hotpluggable cases. That's fine in that the firmware will hog neither type -- but is that OK for the OS as well, for both ACPI boot and DT boot? Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap will not include the range despite it being present at boot. Presumably, ACPI will refer to the range somehow, however. Will that not confuse the OS? When Igor raised this earlier, I suggested that hotpluggable-and-present should be added by the firmware, but also allocated immediately, as EfiBootServicesData type memory. This will prevent other drivers in the firmware from allocating AcpiNVS or Reserved chunks from the same memory range, the UEFI memmap will contain the range as EfiBootServicesData, and then the OS can release that allocation in one go early during boot. But this really has to be clarified from the Linux kernel's expectations. Please formalize all of the following cases: OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as DT/ACPI should report as ----------------- ------------------ ------------------------------- ------------------------ DT present ? ? DT absent ? ? ACPI present ? ? ACPI absent ? ? Again, this table is dictated by Linux." ******/ Could you please take a look at this and let us know what is expected here from a Linux kernel view point. (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid points above). Thanks, Shameer [0] https://patchwork.kernel.org/cover/10890919/ [1] https://patchwork.kernel.org/patch/10863299/ [2] https://patchwork.kernel.org/patch/10890937/