On Wed, Mar 27, 2019 at 02:19:41PM +0100, David Hildenbrand wrote: > On 27.03.19 10:03, David Gibson wrote: > > On Wed, Mar 27, 2019 at 09:10:01AM +0100, David Hildenbrand wrote: > >> On 27.03.19 01:12, David Gibson wrote: > >>> On Tue, Mar 26, 2019 at 06:02:51PM +0100, David Hildenbrand wrote: > >>>> On 26.03.19 15:08, Igor Mammedov wrote: > >>>>> On Tue, 26 Mar 2019 14:50:58 +1100 > >>>>> David Gibson <da...@gibson.dropbear.id.au> wrote: > >>>>> > >>>>>> qemu_getrampagesize() works out the minimum host page size backing any > >>>>>> of > >>>>>> guest RAM. This is required in a few places, such as for POWER8 PAPR > >>>>>> KVM > >>>>>> guests, because limitations of the hardware virtualization mean the > >>>>>> guest > >>>>>> can't use pagesizes larger than the host pages backing its memory. > >>>>>> > >>>>>> However, it currently checks against *every* memory backend, whether > >>>>>> or not > >>>>>> it is actually mapped into guest memory at the moment. This is > >>>>>> incorrect. > >>>>>> > >>>>>> This can cause a problem attempting to add memory to a POWER8 pseries > >>>>>> KVM > >>>>>> guest which is configured to allow hugepages in the guest (e.g. > >>>>>> -machine cap-hpt-max-page-size=16m). If you attempt to add > >>>>>> non-hugepage, > >>>>>> you can (correctly) create a memory backend, however it (correctly) > >>>>>> will > >>>>>> throw an error when you attempt to map that memory into the guest by > >>>>>> 'device_add'ing a pc-dimm. > >>>>>> > >>>>>> What's not correct is that if you then reset the guest a startup check > >>>>>> against qemu_getrampagesize() will cause a fatal error because of the > >>>>>> new > >>>>>> memory object, even though it's not mapped into the guest. > >>>>> I'd say that backend should be remove by mgmt app since device_add > >>>>> failed > >>>>> instead of leaving it to hang around. (but fatal error either not a nice > >>>>> behavior on QEMU part) > >>>> > >>>> Indeed, it should be removed. Depending on the options (huge pages with > >>>> prealloc?) memory might be consumed for unused memory. Undesired. > >>> > >>> Right, but if the guest initiates a reboot before the management gets > >>> to that, we'll have a crash. > >>> > >> > >> Yes, I agree. > >> > >> At least on s390x (extending on what Igor said): > >> > >> mc->init() -> s390_memory_init() -> > >> memory_region_allocate_system_memory() -> host_memory_backend_set_mapped() > >> > >> > >> ac->init_machine() -> kvm_arch_init() -> > >> kvm_s390_configure_mempath_backing() -> qemu_getrampagesize() > >> > >> > >> And in vl.c > >> > >> configure_accelerator(current_machine, argv[0]); > >> ... > >> machine_run_board_init() > >> > >> So memory is indeed not mapped before calling qemu_getrampagesize(). > >> > >> > >> We *could* move the call to kvm_s390_configure_mempath_backing() to > >> s390_memory_init(). > >> > >> cap_hpage_1m is not needed before we create VCPUs, so this would work fine. > >> > >> We could than eventually make qemu_getrampagesize() asssert if no > >> backends are mapped at all, to catch other user that rely on this being > >> correct. > > > > So.. I had a look at the usage in kvm_s390_configure_mempath_backing() > > and I'm pretty sure it's broken. It will work in the case where > > there's only one backend. And if that's the default -mem-path rather > > than an explicit memory backend then my patch won't break it any > > further. > > It works for the current scenarios, where you only have one (maximum > two) backings of the same kind. Your patch would break that.
Actually it wouldn't. My patch only affects checking of explicit backend objects - checking of the base -mem-path implicit backend remains the same. > > qemu_getrampagesize() returns the smallest host page size for any > > memory backend. That's what matters for pcc KVM (in several cases) > > because we need certain things to be host-contiguous, not just > > guest-contiguous. Bigger host page sizes are fine for that purpose, > > clearly. > > > > AFAICT on s390 you're looking to determine if any backend is using > > hugepages, because KVM may not support that. The minimum host page > > size isn't adequate to determine that, so qemu_getrampagesize() won't > > tell you what you need. > > Well, as long as we don't support DIMMS or anything like that it works > perfectly fine. But yes it is far from beautiful. > > First of all, I'll prepare a patch to do the call from a different > context. Then we can fine tune to using something else than > qemu_getrampagesize() > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature