On 27.03.19 18:08, Igor Mammedov wrote: > On Wed, 27 Mar 2019 15:01:37 +0100 > David Hildenbrand <da...@redhat.com> wrote: > >> On 27.03.19 10:09, Igor Mammedov wrote: >>> On Wed, 27 Mar 2019 09:10:01 +0100 >>> David Hildenbrand <da...@redhat.com> wrote: >>> >>>> On 27.03.19 01:12, David Gibson wrote: >>>>> On Tue, Mar 26, 2019 at 06:02:51PM +0100, David Hildenbrand wrote: >>>>>> On 26.03.19 15:08, Igor Mammedov wrote: >>>>>>> On Tue, 26 Mar 2019 14:50:58 +1100 >>>>>>> David Gibson <da...@gibson.dropbear.id.au> wrote: >>>>>>> >>>>>>>> qemu_getrampagesize() works out the minimum host page size backing any >>>>>>>> of >>>>>>>> guest RAM. This is required in a few places, such as for POWER8 PAPR >>>>>>>> KVM >>>>>>>> guests, because limitations of the hardware virtualization mean the >>>>>>>> guest >>>>>>>> can't use pagesizes larger than the host pages backing its memory. >>>>>>>> >>>>>>>> However, it currently checks against *every* memory backend, whether >>>>>>>> or not >>>>>>>> it is actually mapped into guest memory at the moment. This is >>>>>>>> incorrect. >>>>>>>> >>>>>>>> This can cause a problem attempting to add memory to a POWER8 pseries >>>>>>>> KVM >>>>>>>> guest which is configured to allow hugepages in the guest (e.g. >>>>>>>> -machine cap-hpt-max-page-size=16m). If you attempt to add >>>>>>>> non-hugepage, >>>>>>>> you can (correctly) create a memory backend, however it (correctly) >>>>>>>> will >>>>>>>> throw an error when you attempt to map that memory into the guest by >>>>>>>> 'device_add'ing a pc-dimm. >>>>>>>> >>>>>>>> What's not correct is that if you then reset the guest a startup check >>>>>>>> against qemu_getrampagesize() will cause a fatal error because of the >>>>>>>> new >>>>>>>> memory object, even though it's not mapped into the guest. >>>>>>> I'd say that backend should be remove by mgmt app since device_add >>>>>>> failed >>>>>>> instead of leaving it to hang around. (but fatal error either not a nice >>>>>>> behavior on QEMU part) >>>>>> >>>>>> Indeed, it should be removed. Depending on the options (huge pages with >>>>>> prealloc?) memory might be consumed for unused memory. Undesired. >>>>> >>>>> Right, but if the guest initiates a reboot before the management gets >>>>> to that, we'll have a crash. >>>>> >>>> >>>> Yes, I agree. >>>> >>>> At least on s390x (extending on what Igor said): >>>> >>>> mc->init() -> s390_memory_init() -> >>>> memory_region_allocate_system_memory() -> host_memory_backend_set_mapped() >>>> >>>> >>>> ac->init_machine() -> kvm_arch_init() -> >>>> kvm_s390_configure_mempath_backing() -> qemu_getrampagesize() >>>> >>>> >>>> And in vl.c >>>> >>>> configure_accelerator(current_machine, argv[0]); >>> Looking more at it, it is seems s390 is 'broken' anyways. >>> We call qemu_getrampagesize() here with huge page backends on CLI >>> but memory-backends are initialized later >>> qemu_opts_foreach(..., object_create_delayed, ...) >>> so s390 doesn't take into account memory backends currently >> >> BTW that might indeed be true, we only check against --mem-path. > > It's possible to break it with '-numa node,memdev=...' since we don't really > have > anything to block that call chain for s390 (but I'd argue it's invalid use of > CLI > for s390 but it's effectively -mem-path on steroids alternative) >
I remember that -numa on s390x is completely blocked, but my mind might play tricks with me. Anyhow, detecting the biggest page size also ahs to be fixed. -- Thanks, David / dhildenb