On Thu, 5 Jul 2018 16:27:05 +0200 Auger Eric <eric.au...@redhat.com> wrote:
> Hi Shameer, > > On 07/05/2018 03:19 PM, Shameerali Kolothum Thodi wrote: > > > >> -----Original Message----- > >> From: Auger Eric [mailto:eric.au...@redhat.com] > >> Sent: 05 July 2018 13:18 > >> To: David Hildenbrand <da...@redhat.com>; eric.auger....@gmail.com; > >> qemu-devel@nongnu.org; qemu-...@nongnu.org; peter.mayd...@linaro.org; > >> Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>; > >> imamm...@redhat.com > >> Cc: w...@redhat.com; drjo...@redhat.com; da...@gibson.dropbear.id.au; > >> dgilb...@redhat.com; ag...@suse.de > >> Subject: Re: [Qemu-devel] [RFC v3 06/15] hw/arm/virt: Allocate > >> device_memory > >> > >> Hi David, > >> > >> On 07/05/2018 02:09 PM, David Hildenbrand wrote: > >>> On 05.07.2018 14:00, Auger Eric wrote: > >>>> Hi David, > >>>> > >>>> On 07/05/2018 01:54 PM, David Hildenbrand wrote: > >>>>> On 05.07.2018 13:42, Auger Eric wrote: > >>>>>> Hi David, > >>>>>> > >>>>>> On 07/04/2018 02:05 PM, David Hildenbrand wrote: > >>>>>>> On 03.07.2018 21:27, Auger Eric wrote: > >>>>>>>> Hi David, > >>>>>>>> On 07/03/2018 08:25 PM, David Hildenbrand wrote: > >>>>>>>>> On 03.07.2018 09:19, Eric Auger wrote: > >>>>>>>>>> We define a new hotpluggable RAM region (aka. device memory). > >>>>>>>>>> Its base is 2TB GPA. This obviously requires 42b IPA support > >>>>>>>>>> in KVM/ARM, FW and guest kernel. At the moment the device > >>>>>>>>>> memory region is max 2TB. > >>>>>>>>> > >>>>>>>>> Maybe a stupid question, but why exactly does it have to start at > >>>>>>>>> 2TB > >>>>>>>>> (and not e.g. at 1TB)? > >>>>>>>> not a stupid question. See tentative answer below. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This is largely inspired of device memory initialization in > >>>>>>>>>> pc machine code. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Eric Auger <eric.au...@redhat.com> > >>>>>>>>>> Signed-off-by: Kwangwoo Lee <kwangwoo....@sk.com> > >>>>>>>>>> --- > >>>>>>>>>> hw/arm/virt.c | 104 > >> ++++++++++++++++++++++++++++++++++++-------------- > >>>>>>>>>> include/hw/arm/arm.h | 2 + > >>>>>>>>>> include/hw/arm/virt.h | 1 + > >>>>>>>>>> 3 files changed, 79 insertions(+), 28 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c > >>>>>>>>>> index 5a4d0bf..6fefb78 100644 > >>>>>>>>>> --- a/hw/arm/virt.c > >>>>>>>>>> +++ b/hw/arm/virt.c > >>>>>>>>>> @@ -59,6 +59,7 @@ > >>>>>>>>>> #include "qapi/visitor.h" > >>>>>>>>>> #include "standard-headers/linux/input.h" > >>>>>>>>>> #include "hw/arm/smmuv3.h" > >>>>>>>>>> +#include "hw/acpi/acpi.h" > >>>>>>>>>> > >>>>>>>>>> #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \ > >>>>>>>>>> static void virt_##major##_##minor##_class_init(ObjectClass > >>>>>>>>>> *oc, > >> \ > >>>>>>>>>> @@ -94,34 +95,25 @@ > >>>>>>>>>> > >>>>>>>>>> #define PLATFORM_BUS_NUM_IRQS 64 > >>>>>>>>>> > >>>>>>>>>> -/* RAM limit in GB. Since VIRT_MEM starts at the 1GB mark, this > >> means > >>>>>>>>>> - * RAM can go up to the 256GB mark, leaving 256GB of the physical > >>>>>>>>>> - * address space unallocated and free for future use between 256G > >>>>>>>>>> > >> and 512G. > >>>>>>>>>> - * If we need to provide more RAM to VMs in the future then we > >> need to: > >>>>>>>>>> - * * allocate a second bank of RAM starting at 2TB and working > >>>>>>>>>> up > >>>>>>>> I acknowledge this comment was the main justification. Now if you > >>>>>>>> look > >> at > >>>>>>>> > >>>>>>>> Principles of ARM Memory Maps > >>>>>>>> > >> http://infocenter.arm.com/help/topic/com.arm.doc.den0001c/DEN0001C_princ > >> iples_of_arm_memory_maps.pdf > >>>>>>>> chapter 2.3 you will find that when adding PA bits, you always leave > >>>>>>>> space for reserved space and mapped IO. > >>>>>>> > >>>>>>> Thanks for the pointer! > >>>>>>> > >>>>>>> So ... we can fit > >>>>>>> > >>>>>>> a) 2GB at 2GB > >>>>>>> b) 32GB at 32GB > >>>>>>> c) 512GB at 512GB > >>>>>>> d) 8TB at 8TB > >>>>>>> e) 128TB at 128TB > >>>>>>> > >>>>>>> (this is a nice rule of thumb if I understand it correctly :) ) > >>>>>>> > >>>>>>> We should strive for device memory (maxram_size - ram_size) to fit > >>>>>>> exactly into one of these slots (otherwise things get nasty). > >>>>>>> > >>>>>>> Depending on the ram_size, we might have simpler setups and can > >> support > >>>>>>> more configurations, no? > >>>>>>> > >>>>>>> E.g. ram_size <= 34GB, device_memory <= 512GB > >>>>>>> -> move ram into a) and b) > >>>>>>> -> move device memory into c) > >>>>>> > >>>>>> The issue is machvirt doesn't comply with that document. > >>>>>> At the moment we have > >>>>>> 0 -> 1GB MMIO > >>>>>> 1GB -> 256GB RAM > >>>>>> 256GB -> 512GB is theoretically reserved for IO but most is free. > >>>>>> 512GB -> 1T is reserved for ECAM MMIO range. This is the top of our > >>>>>> existing 40b GPA space. > >>>>>> > >>>>>> We don't want to change this address map due to legacy reasons. > >>>>>> > >>>>> > >>>>> Thanks, good to know! > >>>>> > >>>>>> Another question! do you know if it would be possible to have > >>>>>> device_memory region split into several discontinuous segments? > >>>>> > >>>>> It can be implemented for sure, but I would try to avoid that, as it > >>>>> makes certain configurations impossible (and very end user unfriendly). > >>>>> > >>>>> E.g. (numbers completely made up, but it should show what I mean) > >>>>> > >>>>> -m 20G,maxmem=120G: > >>>>> -> Try to add a DIMM with 100G -> error. > >>>>> -> But we can add e.g. two DIMMs with 40G and 60G. > >>>>> > >>>>> This exposes internal details to the end user. And the end user has no > >>>>> idea what is going on. > >>>>> > >>>>> So I think we should try our best to keep that area consecutive. > >>>> > >>>> Actually I didn't sufficiently detail the context. I would like > >>>> 1) 1 segment to be exposed to the end-user through slot|maxmem stuff > >>>> (what this series targets) and > >>>> 2) another segment used to instantiate PC-DIMM for internal needs as > >>>> replacement of part of the 1GB -> 256GB static RAM. This was the purpose > >>>> of Shameer's original series > >>> > >>> I am not sure if PC-DIMMs are exactly what you want for internal purposes. > >>> > >>>> > >>>> [1] [RFC v2 0/6] hw/arm: Add support for non-contiguous iova regions > >>>> http://patchwork.ozlabs.org/cover/914694/ > >>>> This approach is not yet validated though. > >>>> > >>>> The rationale is sometimes you must have "holes" in RAM as some GPAs > >>>> match reserved IOVAs for assigned devices. > >>> > >>> So if I understand it correctly, all you want is some memory region that > >>> a) contains only initially defined memory > >>> b) can have some holes in it > >>> > >>> This is exactly what x86 already does (pc_memory_init): Simply construct > >>> your own memory region leaving holes in it. > >>> > >>> > >>> memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram, > >>> 0, pcms->below_4g_mem_size); > >>> memory_region_add_subregion(system_memory, 0, ram_below_4g); > >>> ... > >>> if (pcms->above_4g_mem_size > 0) > >>> memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram, > >>> ... > >>> memory_region_add_subregion(system_memory, 0x100000000ULL, > >>> ... > >>> > >>> They "indicate" these different GPA areas using the e820 map to the guest. > >>> > >>> Would that also work for you? > >> > >> I would tentatively say yes. Effectively I am not sure that if we were > >> to actually put holes in the 1G-256GB RAM segment, PC-DIMM would be the > >> natural choice. Also the reserved IOVA issue impacts the device_memory > >> region area I think. I am skeptical about the fact we can put holes in > >> static RAM and device_memory regions like that. Could we just use a single device_memory region for both initial+hotpluggable RAM if we make base RAM address dynamic? In this case RAM could start wherever there is a free space for maxmem (if there is free space in lowmem, put device_memory there, otherwise put it somewhere in high mem) and we won't care if there is IOVA or not. *I don't have a clue about iommus, so here goes a stupid question* I agree with Peter that whole IOVA thing looks broken, when host layout dictates the guest's one. Shouldn't be there somewhere an iommu that would remap host map into guest specific one? (so guest would model board we need and be migrate-able instead of mimicking host hw) > > The first approach[1] we had to address the holes in memory was using > > the memory alias way mentioned above. And based on Drew's review, the > > pc-dimm way of handling was introduced. I think the main argument was that > > it will be useful when we eventually support hotplug. > > That's my understanding too. not only hotplug, a RAM memory region that's split by aliases is difficult to handle as it creates nonlinear GPA<->HVA mapping instead of 1:1 mapping of pc-dimm, so if one needs to build HVA<->GPA map for a given MemoryRegion in case of aliases one would have to get list of MemorySections that belong to it and build map from that vs (addr + offset) in case of simple 1:1 mapping. complicated machine specific SRAT/e820 code due to holes /grep 'the memory map is a bit tricky'/ > But since that is added > > anyway as part of this series, I am not sure we have any other benefit in > > modeling it as pc-dimm. May be I am missing something here. > > I tentatively agree with you. I was trying to understand if the > device_memory region was fitting the original needs too but I think > standard alias approach is more adapted to hole creation. Aliases are easy way to start with, but as compat knobs grow (based on PC experience, grep 'Calculate ram split') It's quite a pain to maintain manual implicit aliases layout without breaking it by accident. We probably won't be able to get rid of aliases on PC for legacy reasons but why introduce the same pain to virt board. Well, magical conversion from -m X to 2..y memory regions (aliases or not) aren't going to be easy in both cases, especially if one would take into account "-numa memdev|mem". I'd rather use a single pc-dimm approach for both /initial and hotpluggble RAM/ and then use device_memory framework to enumerate RAM wherever needed (ACPI/DT) in inform way. > Thanks > > Eric > > > > Thanks, > > Shameer > > > > [1]. https://lists.gnu.org/archive/html/qemu-arm/2018-04/msg00243.html > > > > > >> Thanks! > >> > >> Eric > >>> > >>>> > >>>> Thanks > >>>> > >>>> Eric > >>>> > >>>>> > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>> Eric > >>>>> > >>>>> > >>> > >>>