Hi Shameer,
On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
Hi,
This series here[0] attempts to add support for PCDIMM in QEMU for
ARM/Virt platform and has stumbled upon an issue as it is not clear(at least
from Qemu/EDK2 point of view) how in physical world the hotpluggable
memory is handled by kernel.
The proposed implementation in Qemu, builds the SRAT and DSDT parts
and uses GED device to trigger the hotplug. This works fine.
But when we added the DT node corresponding to the PCDIMM(cold plug
scenario), we noticed that Guest kernel see this memory during early boot
even if we are booting with ACPI. Because of this, hotpluggable memory
may end up in zone normal and make it non-hot-un-pluggable even if Guest
boots with ACPI.
Further discussions[1] revealed that, EDK2 UEFI has no means to interpret the
ACPI content from Qemu(this is designed to do so) and uses DT info to
build the GetMemoryMap(). To solve this, introduced "hotpluggable" property
to DT memory node(patches #7 & #8 from [0]) so that UEFI can differentiate
the nodes and exclude the hotpluggable ones from GetMemoryMap().
But then Laszlo rightly pointed out that in order to accommodate the changes
into UEFI we need to know how exactly Linux expects/handles all the
hotpluggable memory scenarios. Please find the discussion here[2].
For ease, I am just copying the relevant comment from Laszlo below,
/******
"Given patches #7 and #8, as I understand them, the firmware cannot distinguish
hotpluggable & present, from hotpluggable & absent. The firmware can only
skip both hotpluggable cases. That's fine in that the firmware will hog
neither
type -- but is that OK for the OS as well, for both ACPI boot and DT boot?
Consider in particular the "hotpluggable & present, ACPI boot" case. Assuming
we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
will not include the range despite it being present at boot. Presumably, ACPI
will refer to the range somehow, however. Will that not confuse the OS?
When Igor raised this earlier, I suggested that hotpluggable-and-present should
be added by the firmware, but also allocated immediately, as EfiBootServicesData
type memory. This will prevent other drivers in the firmware from allocating
AcpiNVS
or Reserved chunks from the same memory range, the UEFI memmap will contain
the range as EfiBootServicesData, and then the OS can release that allocation in
one go early during boot.
But this really has to be clarified from the Linux kernel's expectations. Please
formalize all of the following cases:
OS boot (DT/ACPI) hotpluggable & ... GetMemoryMap() should report as DT/ACPI
should report as
----------------- ------------------ -------------------------------
------------------------
DT present ? ?
DT absent ? ?
ACPI present ? ?
ACPI absent ? ?
Again, this table is dictated by Linux."
******/
Could you please take a look at this and let us know what is expected here from
a Linux kernel view point.
For arm64, so far we've not even been considering DT-based hotplug - as
far as I'm aware there would still be a big open question there around
notification mechanisms and how to describe them. The DT stuff so far
has come from the PowerPC folks, so it's probably worth seeing what
their ideas are.
ACPI-wise I've always assumed/hoped that hotplug-related things should
be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
would be enough for us.
Robin.
(Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed any valid
points above).
Thanks,
Shameer
[0] https://patchwork.kernel.org/cover/10890919/
[1] https://patchwork.kernel.org/patch/10863299/
[2] https://patchwork.kernel.org/patch/10890937/