Hi Laszlo, > -----Original Message----- > From: Laszlo Ersek [mailto:ler...@redhat.com] > Sent: 03 May 2019 15:14 > To: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > qemu-devel@nongnu.org; qemu-...@nongnu.org; eric.au...@redhat.com; > imamm...@redhat.com > Cc: peter.mayd...@linaro.org; sa...@linux.intel.com; > ard.biesheu...@linaro.org; Linuxarm <linux...@huawei.com>; > shannon.zha...@gmail.com; sebastien.bo...@intel.com; xuwei (O) > <xuw...@huawei.com> > Subject: Re: [Qemu-devel] [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM > nodes in the DT > > Hi Shameer, > > On 05/03/19 15:35, Shameerali Kolothum Thodi wrote: > > > > > >> -----Original Message----- > >> From: Linuxarm [mailto:linuxarm-boun...@huawei.com] On Behalf Of > >> Shameerali Kolothum Thodi > >> Sent: 10 April 2019 09:49 > >> To: Laszlo Ersek <ler...@redhat.com>; qemu-devel@nongnu.org; > >> qemu-...@nongnu.org; eric.au...@redhat.com; imamm...@redhat.com > >> Cc: peter.mayd...@linaro.org; sa...@linux.intel.com; > >> ard.biesheu...@linaro.org; Linuxarm <linux...@huawei.com>; > >> shannon.zha...@gmail.com; sebastien.bo...@intel.com; xuwei (O) > >> <xuw...@huawei.com> > >> Subject: RE: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in > >> the DT > >> > >> > >>> -----Original Message----- > >>> From: Laszlo Ersek [mailto:ler...@redhat.com] > >>> Sent: 09 April 2019 16:09 > >>> To: Shameerali Kolothum Thodi > >>> <shameerali.kolothum.th...@huawei.com>; > >>> qemu-devel@nongnu.org; qemu-...@nongnu.org; > eric.au...@redhat.com; > >>> imamm...@redhat.com > >>> Cc: peter.mayd...@linaro.org; shannon.zha...@gmail.com; > >>> sa...@linux.intel.com; sebastien.bo...@intel.com; xuwei (O) > >>> <xuw...@huawei.com>; ard.biesheu...@linaro.org; Linuxarm > >>> <linux...@huawei.com> > >>> Subject: Re: [PATCH v4 8/8] hw/arm/boot: Expose the PC-DIMM nodes in > >>> the DT > >>> > >>> On 04/09/19 12:29, Shameer Kolothum wrote: > >>>> This patch adds memory nodes corresponding to PC-DIMM regions. > >>>> This will enable support for cold plugged device memory for Guests > >>>> with DT boot. > >>>> > >>>> Signed-off-by: Shameer Kolothum > >> <shameerali.kolothum.th...@huawei.com> > >>>> Signed-off-by: Eric Auger <eric.au...@redhat.com> > >>>> --- > >>>> hw/arm/boot.c | 42 > ++++++++++++++++++++++++++++++++++++++++++ > >>>> 1 file changed, 42 insertions(+) > >>>> > >>>> diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 8c840ba..150e1ed > >>>> 100644 > >>>> --- a/hw/arm/boot.c > >>>> +++ b/hw/arm/boot.c > >>>> @@ -19,6 +19,7 @@ > >>>> #include "sysemu/numa.h" > >>>> #include "hw/boards.h" > >>>> #include "hw/loader.h" > >>>> +#include "hw/mem/memory-device.h" > >>>> #include "elf.h" > >>>> #include "sysemu/device_tree.h" > >>>> #include "qemu/config-file.h" > >>>> @@ -538,6 +539,41 @@ static void fdt_add_psci_node(void *fdt) > >>>> qemu_fdt_setprop_cell(fdt, "/psci", "migrate", migrate_fn); } > >>>> > >>>> +static int fdt_add_hotpluggable_memory_nodes(void *fdt, > >>>> + uint32_t acells, > >>> uint32_t scells) { > >>>> + MemoryDeviceInfoList *info, *info_list = > qmp_memory_device_list(); > >>>> + MemoryDeviceInfo *mi; > >>>> + int ret = 0; > >>>> + > >>>> + for (info = info_list; info != NULL; info = info->next) { > >>>> + mi = info->value; > >>>> + switch (mi->type) { > >>>> + case MEMORY_DEVICE_INFO_KIND_DIMM: > >>>> + { > >>>> + PCDIMMDeviceInfo *di = mi->u.dimm.data; > >>>> + > >>>> + ret = fdt_add_memory_node(fdt, acells, di->addr, scells, > >>>> + di->size, di->node, true); > >>>> + if (ret) { > >>>> + fprintf(stderr, > >>>> + "couldn't add PCDIMM > >> /memory@%"PRIx64" > >>> node\n", > >>>> + di->addr); > >>>> + goto out; > >>>> + } > >>>> + break; > >>>> + } > >>>> + default: > >>>> + fprintf(stderr, "%s memory nodes are not yet > supported\n", > >>>> + MemoryDeviceInfoKind_str(mi->type)); > >>>> + ret = -ENOENT; > >>>> + goto out; > >>>> + } > >>>> + } > >>>> +out: > >>>> + qapi_free_MemoryDeviceInfoList(info_list); > >>>> + return ret; > >>>> +} > >>>> + > >>>> int arm_load_dtb(hwaddr addr, const struct arm_boot_info *binfo, > >>>> hwaddr addr_limit, AddressSpace *as) { @@ > -637,6 > >>>> +673,12 @@ int arm_load_dtb(hwaddr addr, const struct > >>> arm_boot_info *binfo, > >>>> } > >>>> } > >>>> > >>>> + rc = fdt_add_hotpluggable_memory_nodes(fdt, acells, scells); > >>>> + if (rc < 0) { > >>>> + fprintf(stderr, "couldn't add hotpluggable memory nodes\n"); > >>>> + goto fail; > >>>> + } > >>>> + > >>>> rc = fdt_path_offset(fdt, "/chosen"); > >>>> if (rc < 0) { > >>>> qemu_fdt_add_subnode(fdt, "/chosen"); > >>>> > >>> > >>> > >>> Given patches #7 and #8, as I understand them, the firmware cannot > >>> distinguish hotpluggable & present, from hotpluggable & absent. The > >> firmware > >>> can only skip both hotpluggable cases. That's fine in that the > >>> firmware will > >> hog > >>> neither type -- but is that OK for the OS as well, for both ACPI > >>> boot and DT boot? > >> > >> Right. This only handles the hotpluggable-and-present condition. > >> > >>> Consider in particular the "hotpluggable & present, ACPI boot" case. > >> Assuming > >>> we modify the firmware to skip "hotpluggable" altogether, the UEFI > >>> memmap will not include the range despite it being present at boot. > >>> Presumably, ACPI will refer to the range somehow, however. Will that not > confuse the OS? > >> > >> From my testing so far, without patches #7 and #8(ie, no UEFI memmap > >> entry), ACPI boots fine. I think ACPI only relies on aml and SRAT. > >> > >>> When Igor raised this earlier, I suggested that > >>> hotpluggable-and-present should be added by the firmware, but also > >>> allocated immediately, as EfiBootServicesData type memory. This will > >>> prevent other drivers in the firmware from allocating AcpiNVS or > >>> Reserved chunks from the same memory range, the UEFI memmap will > >>> contain the range as EfiBootServicesData, and then the OS can release > that allocation in one go early during boot. > >> > >> Ok. Agree that hotpluggable-and-present case it may make sense to > >> make use of that memory rather than just hiding it altogether. > >> > >> On another note, Does hotpluggable-and-absent case make any valid use > >> case for a DT boot? I am not sure how we can hot-add memory in the > >> case of DT boot later. > >> I have not verified the sysfs probe interface yet and there are > >> discussions of dropping that interface altogether. > >> > >>> But this really has to be clarified from the Linux kernel's > >>> expectations. Please formalize all of the following cases: > >> > >> Sure. I will wait for suggestions here and work on it. > > > > To continue the discussion on this, this is my proposal, > > > > [...] > > I didn't miss your last update, on 10 April. The reason I didn't respond then > was > that, the table that you create here, needs to be approved by Linux > developers. > In other words, the table should summarize how Linux expects DT/ACPI to look, > for the given use cases. It's not something that I can comment on. The > requirements come from Linux, and we should attempt (in QEMU and the fw) > to satisfy them. > > If those use cases / requirements haven't been *designed* in Linux, in the > first > place, then the discussion belongs even more on a kernel development list. (I > really can't say what Linux *should* expect, and even if I had input on that, > discussing it *just* on qemu-devel would be > futile.) > > I mean, considering ACPI and the UEFI memmap at least, can we take > examples from the physical world (I guess x86 too)? What does Linux (and > maybe Windows) expect wrt. hotpluggable memory areas, in ACPI and in the > UEFI memmap? > > I find it hard to believe that these are such use cases that we have to > *invent* now. It seems more likely that OSes already handle these use cases, > they have expectations, and we should *collect* them.
Right. I have just sent out a mail seeking clarification on this to the wider Linux community (CCd all of you as well) here, http://lists.infradead.org/pipermail/linux-arm-kernel/2019-May/651360.html Hope, we can come to a conclusion soon. Thanks, Shameer