On Wed, 27 Mar 2019 09:10:01 +0100 David Hildenbrand <da...@redhat.com> wrote:
> On 27.03.19 01:12, David Gibson wrote: > > On Tue, Mar 26, 2019 at 06:02:51PM +0100, David Hildenbrand wrote: > >> On 26.03.19 15:08, Igor Mammedov wrote: > >>> On Tue, 26 Mar 2019 14:50:58 +1100 > >>> David Gibson <da...@gibson.dropbear.id.au> wrote: > >>> > >>>> qemu_getrampagesize() works out the minimum host page size backing any of > >>>> guest RAM. This is required in a few places, such as for POWER8 PAPR KVM > >>>> guests, because limitations of the hardware virtualization mean the guest > >>>> can't use pagesizes larger than the host pages backing its memory. > >>>> > >>>> However, it currently checks against *every* memory backend, whether or > >>>> not > >>>> it is actually mapped into guest memory at the moment. This is > >>>> incorrect. > >>>> > >>>> This can cause a problem attempting to add memory to a POWER8 pseries KVM > >>>> guest which is configured to allow hugepages in the guest (e.g. > >>>> -machine cap-hpt-max-page-size=16m). If you attempt to add non-hugepage, > >>>> you can (correctly) create a memory backend, however it (correctly) will > >>>> throw an error when you attempt to map that memory into the guest by > >>>> 'device_add'ing a pc-dimm. > >>>> > >>>> What's not correct is that if you then reset the guest a startup check > >>>> against qemu_getrampagesize() will cause a fatal error because of the new > >>>> memory object, even though it's not mapped into the guest. > >>> I'd say that backend should be remove by mgmt app since device_add failed > >>> instead of leaving it to hang around. (but fatal error either not a nice > >>> behavior on QEMU part) > >> > >> Indeed, it should be removed. Depending on the options (huge pages with > >> prealloc?) memory might be consumed for unused memory. Undesired. > > > > Right, but if the guest initiates a reboot before the management gets > > to that, we'll have a crash. > > > > Yes, I agree. > > At least on s390x (extending on what Igor said): > > mc->init() -> s390_memory_init() -> > memory_region_allocate_system_memory() -> host_memory_backend_set_mapped() > > > ac->init_machine() -> kvm_arch_init() -> > kvm_s390_configure_mempath_backing() -> qemu_getrampagesize() > > > And in vl.c > > configure_accelerator(current_machine, argv[0]); Looking more at it, it is seems s390 is 'broken' anyways. We call qemu_getrampagesize() here with huge page backends on CLI but memory-backends are initialized later qemu_opts_foreach(..., object_create_delayed, ...) so s390 doesn't take into account memory backends currently > ... > machine_run_board_init() > > So memory is indeed not mapped before calling qemu_getrampagesize(). > > > We *could* move the call to kvm_s390_configure_mempath_backing() to > s390_memory_init(). > > cap_hpage_1m is not needed before we create VCPUs, so this would work fine. > > We could than eventually make qemu_getrampagesize() asssert if no > backends are mapped at all, to catch other user that rely on this being > correct. Looks like a reasonable way to fix immediate crash in 4.0 with mandatory assert (but see my other reply, about getting rid of qemu_getrampagesize())