> -----Original Message----- > From: Auger Eric [mailto:eric.au...@redhat.com] > Sent: 05 July 2018 13:18 > To: David Hildenbrand <da...@redhat.com>; eric.auger....@gmail.com; > qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org; > Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > imamm...@redhat.com > Cc: w...@redhat.com; drjo...@redhat.com; da...@gibson.dropbear.id.au; > dgilb...@redhat.com; ag...@suse.de > Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate > device_memory > > Hi David, > > On 07/05/2018 02:09 PM, David Hildenbrand wrote: > > On 05.07.2018 14:00, Auger Eric wrote: > >> Hi David, > >> > >> On 07/05/2018 01:54 PM, David Hildenbrand wrote: > >>> On 05.07.2018 13:42, Auger Eric wrote: > >>>> Hi David, > >>>> > >>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote: > >>>>> On 03.07.2018 21:27, Auger Eric wrote: > >>>>>> Hi David, > >>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote: > >>>>>>> On 03.07.2018 09:19, Eric Auger wrote: > >>>>>>>> We define a new hotpluggable RAM region (aka. device memory). > >>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support > >>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device > >>>>>>>> memory region is max 2TB. > >>>>>>> > >>>>>>> Maybe a stupid question, but why exactly does it have to start at 2TB > >>>>>>> (and not e.g. at 1TB)? > >>>>>> not a stupid question. See tentative answer below. > >>>>>>> > >>>>>>>> > >>>>>>>> This is largely inspired of device memory initialization in > >>>>>>>> pc machine code. > >>>>>>>> > >>>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com> > >>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo....@sk.com> > >>>>>>>> --- > >>>>>>>> hw/arm/virt.c | 104 > ++++++++++++++++++++++++++++++++++++-------------- > >>>>>>>> include/hw/arm/arm.h | 2 + > >>>>>>>> include/hw/arm/virt.h | 1 + > >>>>>>>> 3 files changed, 79 insertions(+), 28 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>>>>>>> index 5a4d0bf..6fefb78 100644 > >>>>>>>> --- a/hw/arm/virt.c > >>>>>>>> +++ b/hw/arm/virt.c > >>>>>>>> @@ -59,6 +59,7 @@ > >>>>>>>> #include "qapi/visitor.h" > >>>>>>>> #include "standard-headers/linux/input.h" > >>>>>>>> #include "hw/arm/smmuv3.h" > >>>>>>>> +#include "hw/acpi/acpi.h" > >>>>>>>> > >>>>>>>> #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ > >>>>>>>> static void virt_##major##_##minor##_class_init(ObjectClass *oc, > \ > >>>>>>>> @@ -94,34 +95,25 @@ > >>>>>>>> > >>>>>>>> #define PLATFORM_BUS_NUM_IRQS 64 > >>>>>>>> > >>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this > means > >>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical > >>>>>>>> - * address space unallocated and free for future use between 256G > and 512G. > >>>>>>>> - * If we need to provide more RAM to VMs in the future then we > need to: > >>>>>>>> - * * allocate a second bank of RAM starting at 2TB and working up > >>>>>> I acknowledge this comment was the main justification. Now if you look > at > >>>>>> > >>>>>> Principles of ARM Memory Maps > >>>>>> > http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ > iples_of_arm_memory_maps.pdf > >>>>>> chapter 2.3 you will find that when adding PA bits, you always leave > >>>>>> space for reserved space and mapped IO. > >>>>> > >>>>> Thanks for the pointer! > >>>>> > >>>>> So ... we can fit > >>>>> > >>>>> a) 2GB at 2GB > >>>>> b) 32GB at 32GB > >>>>> c) 512GB at 512GB > >>>>> d) 8TB at 8TB > >>>>> e) 128TB at 128TB > >>>>> > >>>>> (this is a nice rule of thumb if I understand it correctly :) ) > >>>>> > >>>>> We should strive for device memory (maxram_size - ram_size) to fit > >>>>> exactly into one of these slots (otherwise things get nasty). > >>>>> > >>>>> Depending on the ram_size, we might have simpler setups and can > support > >>>>> more configurations, no? > >>>>> > >>>>> E.g. ram_size <= 34GB, device_memory <= 512GB > >>>>> -> move ram into a) and b) > >>>>> -> move device memory into c) > >>>> > >>>> The issue is machvirt doesn't comply with that document. > >>>> At the moment we have > >>>> 0 -> 1GB MMIO > >>>> 1GB -> 256GB RAM > >>>> 256GB -> 512GB is theoretically reserved for IO but most is free. > >>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our > >>>> existing 40b GPA space. > >>>> > >>>> We don't want to change this address map due to legacy reasons. > >>>> > >>> > >>> Thanks, good to know! > >>> > >>>> Another question! do you know if it would be possible to have > >>>> device_memory region split into several discontinuous segments? > >>> > >>> It can be implemented for sure, but I would try to avoid that, as it > >>> makes certain configurations impossible (and very end user unfriendly). > >>> > >>> E.g. (numbers completely made up, but it should show what I mean) > >>> > >>> -m 20G,maxmem=120G: > >>> -> Try to add a DIMM with 100G -> error. > >>> -> But we can add e.g. two DIMMs with 40G and 60G. > >>> > >>> This exposes internal details to the end user. And the end user has no > >>> idea what is going on. > >>> > >>> So I think we should try our best to keep that area consecutive. > >> > >> Actually I didn't sufficiently detail the context. I would like > >> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff > >> (what this series targets) and > >> 2) another segment used to instantiate PC-DIMM for internal needs as > >> replacement of part of the 1GB -> 256GB static RAM. This was the purpose > >> of Shameer's original series > > > > I am not sure if PC-DIMMs are exactly what you want for internal purposes. > > > >> > >> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > >> http://patchwork.ozlabs.org/cover/914694/ > >> This approach is not yet validated though. > >> > >> The rationale is sometimes you must have "holes" in RAM as some GPAs > >> match reserved IOVAs for assigned devices. > > > > So if I understand it correctly, all you want is some memory region that > > a) contains only initially defined memory > > b) can have some holes in it > > > > This is exactly what x86 already does (pc_memory_init): Simply construct > > your own memory region leaving holes in it. > > > > > > memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram, > > 0, pcms->below_4g_mem_size); > > memory_region_add_subregion(system_memory, 0, ram_below_4g); > > ... > > if (pcms->above_4g_mem_size > 0) > > memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram, > > ... > > memory_region_add_subregion(system_memory, 0x100000000ULL, > > ... > > > > They "indicate" these different GPA areas using the e820 map to the guest. > > > > Would that also work for you? > > I would tentatively say yes. Effectively I am not sure that if we were > to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the > natural choice. Also the reserved IOVA issue impacts the device_memory > region area I think. I am skeptical about the fact we can put holes in > static RAM and device_memory regions like that.
The first approach[1] we had to address the holes in memory was using the memory alias way mentioned above. And based on Drew's review, the pc-dimm way of handling was introduced. I think the main argument was that it will be useful when we eventually support hotplug. But since that is added anyway as part of this series, I am not sure we have any other benefit in modeling it as pc-dimm. May be I am missing something here. Thanks, Shameer [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html > Thanks! > > Eric > > > >> > >> Thanks > >> > >> Eric > >> > >>> > >>>> > >>>> Thanks > >>>> > >>>> Eric > >>> > >>> > > > >