On Tue, 26 Feb 2019 14:11:58 +0100 Auger Eric <eric.au...@redhat.com> wrote:
> Hi Igor, > > On 2/26/19 9:40 AM, Auger Eric wrote: > > Hi Igor, > > > > On 2/25/19 10:42 AM, Igor Mammedov wrote: > >> On Fri, 22 Feb 2019 18:35:26 +0100 > >> Auger Eric <eric.au...@redhat.com> wrote: > >> > >>> Hi Igor, > >>> > >>> On 2/22/19 5:27 PM, Igor Mammedov wrote: > >>>> On Wed, 20 Feb 2019 23:39:46 +0100 > >>>> Eric Auger <eric.au...@redhat.com> wrote: > >>>> > >>>>> This series aims to bump the 255GB RAM limit in machvirt and to > >>>>> support device memory in general, and especially PCDIMM/NVDIMM. > >>>>> > >>>>> In machvirt versions < 4.0, the initial RAM starts at 1GB and can > >>>>> grow up to 255GB. From 256GB onwards we find IO regions such as the > >>>>> additional GICv3 RDIST region, high PCIe ECAM region and high PCIe > >>>>> MMIO region. The address map was 1TB large. This corresponded to > >>>>> the max IPA capacity KVM was able to manage. > >>>>> > >>>>> Since 4.20, the host kernel is able to support a larger and dynamic > >>>>> IPA range. So the guest physical address can go beyond the 1TB. The > >>>>> max GPA size depends on the host kernel configuration and physical CPUs. > >>>>> > >>>>> In this series we use this feature and allow the RAM to grow without > >>>>> any other limit than the one put by the host kernel. > >>>>> > >>>>> The RAM still starts at 1GB. First comes the initial ram (-m) of size > >>>>> ram_size and then comes the device memory (,maxmem) of size > >>>>> maxram_size - ram_size. The device memory is potentially hotpluggable > >>>>> depending on the instantiated memory objects. > >>>>> > >>>>> IO regions previously located between 256GB and 1TB are moved after > >>>>> the RAM. Their offset is dynamically computed, depends on ram_size > >>>>> and maxram_size. Size alignment is enforced. > >>>>> > >>>>> In case maxmem value is inferior to 255GB, the legacy memory map > >>>>> still is used. The change of memory map becomes effective from 4.0 > >>>>> onwards. > >>>>> > >>>>> As we keep the initial RAM at 1GB base address, we do not need to do > >>>>> invasive changes in the EDK2 FW. It seems nobody is eager to do > >>>>> that job at the moment. > >>>>> > >>>>> Device memory being put just after the initial RAM, it is possible > >>>>> to get access to this feature while keeping a 1TB address map. > >>>>> > >>>>> This series reuses/rebases patches initially submitted by Shameer > >>>>> in [1] and Kwangwoo in [2] for the PC-DIMM and NV-DIMM parts. > >>>>> > >>>>> Functionally, the series is split into 3 parts: > >>>>> 1) bump of the initial RAM limit [1 - 9] and change in > >>>>> the memory map > >>>> > >>>>> 2) Support of PC-DIMM [10 - 13] > >>>> Is this part complete ACPI wise (for coldplug)? I haven't noticed > >>>> DSDT AML here no E820 changes, so ACPI wise pc-dimm shouldn't be > >>>> visible to the guest. It might be that DT is masking problem > >>>> but well, that won't work on ACPI only guests. > >>> > >>> guest /proc/meminfo or "lshw -class memory" reflects the amount of mem > >>> added with the DIMM slots. > >> Question is how does it get there? Does it come from DT or from firmware > >> via UEFI interfaces? > >> > >>> So it looks fine to me. Isn't E820 a pure x86 matter? > >> sorry for misleading, I've meant is UEFI GetMemoryMap(). > >> On x86, I'm wary of adding PC-DIMMs to E802 which then gets exposed > >> via UEFI GetMemoryMap() as guest kernel might start using it as normal > >> memory early at boot and later put that memory into zone normal and hence > >> make it non-hot-un-pluggable. The same concerns apply to DT based means > >> of discovery. > >> (That's guest issue but it's easy to workaround it not putting hotpluggable > >> memory into UEFI GetMemoryMap() or DT and let DSDT describe it properly) > >> That way memory doesn't get (ab)used by firmware or early boot kernel > >> stages > >> and doesn't get locked up. > >> > >>> What else would you expect in the dsdt? > >> Memory device descriptions, look for code that adds PNP0C80 with _CRS > >> describing memory ranges > > > > OK thank you for the explanations. I will work on PNP0C80 addition then. > > Does it mean that in ACPI mode we must not output DT hotplug memory > > nodes or assuming that PNP0C80 is properly described, it will "override" > > DT description? > > After further investigations, I think the pieces you pointed out are > added by Shameer's series, ie. through the build_memory_hotplug_aml() > call. So I suggest we separate the concerns: this series brings support > for DIMM coldplug. hotplug, including all the relevant ACPI structures > will be added later on by Shameer. Maybe we should not put pc-dimms in DT for this series until it gets clear if it doesn't conflict with ACPI in some way.