On 27.03.19 10:09, Igor Mammedov wrote:
> On Wed, 27 Mar 2019 09:10:01 +0100
> David Hildenbrand <da...@redhat.com> wrote:
> 
>> On 27.03.19 01:12, David Gibson wrote:
>>> On Tue, Mar 26, 2019 at 06:02:51PM +0100, David Hildenbrand wrote:  
>>>> On 26.03.19 15:08, Igor Mammedov wrote:  
>>>>> On Tue, 26 Mar 2019 14:50:58 +1100
>>>>> David Gibson <da...@gibson.dropbear.id.au> wrote:
>>>>>  
>>>>>> qemu_getrampagesize() works out the minimum host page size backing any of
>>>>>> guest RAM.  This is required in a few places, such as for POWER8 PAPR KVM
>>>>>> guests, because limitations of the hardware virtualization mean the guest
>>>>>> can't use pagesizes larger than the host pages backing its memory.
>>>>>>
>>>>>> However, it currently checks against *every* memory backend, whether or 
>>>>>> not
>>>>>> it is actually mapped into guest memory at the moment.  This is 
>>>>>> incorrect.
>>>>>>
>>>>>> This can cause a problem attempting to add memory to a POWER8 pseries KVM
>>>>>> guest which is configured to allow hugepages in the guest (e.g.
>>>>>> -machine cap-hpt-max-page-size=16m).  If you attempt to add non-hugepage,
>>>>>> you can (correctly) create a memory backend, however it (correctly) will
>>>>>> throw an error when you attempt to map that memory into the guest by
>>>>>> 'device_add'ing a pc-dimm.
>>>>>>
>>>>>> What's not correct is that if you then reset the guest a startup check
>>>>>> against qemu_getrampagesize() will cause a fatal error because of the new
>>>>>> memory object, even though it's not mapped into the guest.  
>>>>> I'd say that backend should be remove by mgmt app since device_add failed
>>>>> instead of leaving it to hang around. (but fatal error either not a nice
>>>>> behavior on QEMU part)  
>>>>
>>>> Indeed, it should be removed. Depending on the options (huge pages with
>>>> prealloc?) memory might be consumed for unused memory. Undesired.  
>>>
>>> Right, but if the guest initiates a reboot before the management gets
>>> to that, we'll have a crash.
>>>   
>>
>> Yes, I agree.
>>
>> At least on s390x (extending on what Igor said):
>>
>> mc->init() -> s390_memory_init() ->
>> memory_region_allocate_system_memory() -> host_memory_backend_set_mapped()
>>
>>
>> ac->init_machine() -> kvm_arch_init() ->
>> kvm_s390_configure_mempath_backing() -> qemu_getrampagesize()
>>
>>
>> And in vl.c
>>
>> configure_accelerator(current_machine, argv[0]);
> Looking more at it, it is seems s390 is 'broken' anyways.
> We call qemu_getrampagesize() here with huge page backends on CLI
> but memory-backends are initialized later
>  qemu_opts_foreach(..., object_create_delayed, ...)
> so s390 doesn't take into account memory backends currently
> 
>> ...
>> machine_run_board_init()
>>
>> So memory is indeed not mapped before calling qemu_getrampagesize().
> 
> 
> 
>>
>>
>> We *could* move the call to kvm_s390_configure_mempath_backing() to
>> s390_memory_init().
>>
>> cap_hpage_1m is not needed before we create VCPUs, so this would work fine.
>>
>> We could than eventually make qemu_getrampagesize() asssert if no
>> backends are mapped at all, to catch other user that rely on this being
>> correct.
> Looks like a reasonable way to fix immediate crash in 4.0 with mandatory 
> assert
> (but see my other reply, about getting rid of qemu_getrampagesize())
> 

I'll send a patch to move the call for s390x. We can than decide how to
proceed with qemu_getrampagesize() in general.

-- 

Thanks,

David / dhildenb

Reply via email to