Hi Eric, > -----Original Message----- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 27 February 2019 10:27 > To: Igor Mammedov <imamm...@redhat.com> > Cc: peter.mayd...@linaro.org; drjo...@redhat.com; da...@redhat.com; > dgilb...@redhat.com; Shameerali Kolothum Thodi > <shameerali.kolothum.th...@huawei.com>; qemu-devel@nongnu.org; > qemu-...@nongnu.org; eric.auger....@gmail.com; > da...@gibson.dropbear.id.au > Subject: Re: [Qemu-devel] [PATCH v7 00/17] ARM virt: Initial RAM expansion > and PCDIMM/NVDIMM support > > Hi Igor, Shameer, > > On 2/27/19 11:10 AM, Igor Mammedov wrote: > > On Tue, 26 Feb 2019 18:53:24 +0100 > > Auger Eric <eric.au...@redhat.com> wrote: > > > >> Hi Igor, > >> > >> On 2/26/19 5:56 PM, Igor Mammedov wrote: > >>> On Tue, 26 Feb 2019 14:11:58 +0100 > >>> Auger Eric <eric.au...@redhat.com> wrote: > >>> > >>>> Hi Igor, > >>>> > >>>> On 2/26/19 9:40 AM, Auger Eric wrote: > >>>>> Hi Igor, > >>>>> > >>>>> On 2/25/19 10:42 AM, Igor Mammedov wrote: > >>>>>> On Fri, 22 Feb 2019 18:35:26 +0100 > >>>>>> Auger Eric <eric.au...@redhat.com> wrote: > >>>>>> > >>>>>>> Hi Igor, > >>>>>>> > >>>>>>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>>>>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>>>>>> Eric Auger <eric.au...@redhat.com> wrote: > >>>>>>>> > >>>>>>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>>>>>> support device memory in general, and especially > PCDIMM/NVDIMM. > >>>>>>>>> > >>>>>>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>>>>>> grow up to 255GB. From 256GB onwards we find IO regions such as > the > >>>>>>>>> additional GICv3 RDIST region, high PCIe ECAM region and high > PCIe > >>>>>>>>> MMIO region. The address map was 1TB large. This corresponded > to > >>>>>>>>> the max IPA capacity KVM was able to manage. > >>>>>>>>> > >>>>>>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>>>>>> IPA range. So the guest physical address can go beyond the 1TB. > The > >>>>>>>>> max GPA size depends on the host kernel configuration and physical > CPUs. > >>>>>>>>> > >>>>>>>>> In this series we use this feature and allow the RAM to grow > without > >>>>>>>>> any other limit than the one put by the host kernel. > >>>>>>>>> > >>>>>>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of > >>>>>>>>> size > >>>>>>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>>>>>> maxram_size - ram_size. The device memory is potentially > hotpluggable > >>>>>>>>> depending on the instantiated memory objects. > >>>>>>>>> > >>>>>>>>> IO regions previously located between 256GB and 1TB are moved > after > >>>>>>>>> the RAM. Their offset is dynamically computed, depends on > ram_size > >>>>>>>>> and maxram_size. Size alignment is enforced. > >>>>>>>>> > >>>>>>>>> In case maxmem value is inferior to 255GB, the legacy memory > map > >>>>>>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>>>>>> onwards. > >>>>>>>>> > >>>>>>>>> As we keep the initial RAM at 1GB base address, we do not need to > do > >>>>>>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>>>>>> that job at the moment. > >>>>>>>>> > >>>>>>>>> Device memory being put just after the initial RAM, it is possible > >>>>>>>>> to get access to this feature while keeping a 1TB address map. > >>>>>>>>> > >>>>>>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>>>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>>>>>> > >>>>>>>>> Functionally, the series is split into 3 parts: > >>>>>>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>>>>>> the memory map > >>>>>>>> > >>>>>>>>> 2) Support of PC-DIMM [10 - 13] > >>>>>>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>>>>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>>>>>> visible to the guest. It might be that DT is masking problem > >>>>>>>> but well, that won't work on ACPI only guests. > >>>>>>> > >>>>>>> guest /proc/meminfo or "lshw -class memory" reflects the amount of > mem > >>>>>>> added with the DIMM slots. > >>>>>> Question is how does it get there? Does it come from DT or from > firmware > >>>>>> via UEFI interfaces? > >>>>>> > >>>>>>> So it looks fine to me. Isn't E820 a pure x86 matter? > >>>>>> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >>>>>> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >>>>>> via UEFI GetMemoryMap() as guest kernel might start using it as > normal > >>>>>> memory early at boot and later put that memory into zone normal and > hence > >>>>>> make it non-hot-un-pluggable. The same concerns apply to DT based > means > >>>>>> of discovery. > >>>>>> (That's guest issue but it's easy to workaround it not putting > hotpluggable > >>>>>> memory into UEFI GetMemoryMap() or DT and let DSDT describe it > properly) > >>>>>> That way memory doesn't get (ab)used by firmware or early boot > kernel stages > >>>>>> and doesn't get locked up. > >>>>>> > >>>>>>> What else would you expect in the dsdt? > >>>>>> Memory device descriptions, look for code that adds PNP0C80 with > _CRS > >>>>>> describing memory ranges > >>>>> > >>>>> OK thank you for the explanations. I will work on PNP0C80 addition then. > >>>>> Does it mean that in ACPI mode we must not output DT hotplug memory > >>>>> nodes or assuming that PNP0C80 is properly described, it will "override" > >>>>> DT description? > >>>> > >>>> After further investigations, I think the pieces you pointed out are > >>>> added by Shameer's series, ie. through the build_memory_hotplug_aml() > >>>> call. So I suggest we separate the concerns: this series brings support > >>>> for DIMM coldplug. hotplug, including all the relevant ACPI structures > >>>> will be added later on by Shameer. > >>> > >>> Maybe we should not put pc-dimms in DT for this series until it gets clear > >>> if it doesn't conflict with ACPI in some way. > >> > >> I guess you mean removing the DT hotpluggable memory nodes only in ACPI > >> mode? Otherwise you simply remove the DIMM feature, right? > > Something like this so DT won't get in conflict with ACPI. > > Only we don't have a switch for it something like, -machine fdt=on (with > default off) > > > >> I double checked and if you remove the hotpluggable memory DT nodes in > >> ACPI mode: > >> - you do not see the PCDIMM slots in guest /proc/meminfo anymore. So I > >> guess you're right, if the DT nodes are available, that memory is > >> considered as not unpluggable by the guest. > >> - You can see the NVDIMM slots using ndctl list -u. You can mount a DAX > >> system. > >> > >> Hotplug/unplug is clearly not supported by this series and any attempt > >> results in "memory hotplug is not supported". Is it really an issue if > >> the guest does not consider DIMM slots as not hot-unpluggable memory? I > >> am not even sure the guest kernel would support to unplug that memory. > >> > >> In case we want all ACPI tables to be ready for making this memory seen > >> as hot-unpluggable we need some Shameer's patches on top of this series. > > May be we should push for this way (into 4.0), it's just a several patches > > after all or even merge them in your series (I'd guess it would need to be > > rebased on top of your latest work) > > Shameer, would you agree if we merge PATCH 1 of your RFC hotplug series > (without the reduced hw_reduced_acpi flag) in this series and isolate in > a second PATCH the acpi_memory_hotplug_init() + build_memory_hotplug_aml > called in virt code?
Sure, that’s fine with me. So what would you use for the event_handler_method in build_memory_hotplug_aml()? GPO0 device? Thanks, Shameer > Then would remain the GED/GPIO actual integration. > > Thanks > > Eric > > > >> Also don't DIMM slots already make sense in DT mode. Usually we accept > >> to add one feature in DT and then in ACPI. For instance we can benefit > > usually it doesn't conflict with each other (at least I'm not aware of it) > > but I see a problem with in this case. > > > >> from nvdimm in dt mode right? So, considering an incremental approach I > >> would be in favour of keeping the DT nodes. > > I'd guess it is the same as for DIMMs, ACPI support for NVDIMMs is much > > more versatile. > > > > I consider target application of arm/virt as a board that's used to > > run in production generic ACPI capable guest in most use cases and > > various DT only guests as secondary ones. It's hard to make > > both usecases be happy with defaults (that's probably one of the > > reasons why 'sbsa' board is being added). > > > > So I'd give priority to ACPI based arm/virt versus DT when defaults are > > considered. > > > >> Thanks > >> > >> Eric > >>> > >>> > >>> > >>> > >