On 2019/5/8 21:20, Markus Armbruster wrote:
> Laszlo Ersek <ler...@redhat.com> writes:
>
>> Hi Markus,
>>
>> On 05/07/19 20:01, Markus Armbruster wrote:
>>> The subject is slightly misleading. Holes read as zero. So do
>>> non-holes full of zeroes. The patch avoids reading the former, but
>>> still reads the latter.
>>>
>>> Xiang Zheng <zhengxia...@huawei.com> writes:
>>>
>>>> Currently we fill the memory space with two 64MB NOR images when
>>>> using persistent UEFI variables on virt board. Actually we only use
>>>> a very small(non-zero) part of the memory while the rest significant
>>>> large(zero) part of memory is wasted.
>>>
>>> Neglects to mention that the "virt board" is ARM.
>>>
>>>> So this patch checks the block status and only writes the non-zero part
>>>> into memory. This requires pflash devices to use sparse files for
>>>> backends.
>>>
>>> I started to draft an improved commit message, but then I realized this
>>> patch can't work.
>>>
>>> The pflash_cfi01 device allocates its device memory like this:
>>>
>>> memory_region_init_rom_device(
>>> &pfl->mem, OBJECT(dev),
>>> &pflash_cfi01_ops,
>>> pfl,
>>> pfl->name, total_len, &local_err);
>>>
>>> pflash_cfi02 is similar.
>>>
>>> memory_region_init_rom_device() calls
>>> memory_region_init_rom_device_nomigrate() calls qemu_ram_alloc() calls
>>> qemu_ram_alloc_internal() calls g_malloc0(). Thus, all the device
>>> memory gets written to even with this patch.
>>
>> As far as I can see, qemu_ram_alloc_internal() calls g_malloc0() only to
>> allocate the the new RAMBlock object called "new_block". The actual
>> guest RAM allocation occurs inside ram_block_add(), which is also called
>> by qemu_ram_alloc_internal().
>
> You're right. I should've read more attentively.
>
>> One frame outwards the stack, qemu_ram_alloc() passes NULL to
>> qemu_ram_alloc_internal(), for the 4th ("host") parameter. Therefore, in
>> qemu_ram_alloc_internal(), we set "new_block->host" to NULL as well.
>>
>> Then in ram_block_add(), we take the (!new_block->host) branch, and call
>> phys_mem_alloc().
>>
>> Unfortunately, "phys_mem_alloc" is a function pointer, set with
>> phys_mem_set_alloc(). The phys_mem_set_alloc() function is called from
>> "target/s390x/kvm.c" (setting the function pointer to
>> legacy_s390_alloc()), so it doesn't apply in this case. Therefore we end
>> up calling the default qemu_anon_ram_alloc() function, through the
>> funcptr. (I think anyway.)
>>
>> And qemu_anon_ram_alloc() boils down to mmap() + MAP_ANONYMOUS, in
>> qemu_ram_mmap(). (Even on PPC64 hosts, because qemu_anon_ram_alloc()
>> passes (-1) for "fd".)
>>
>> I may have missed something, of course -- I obviously didn't test it,
>> just speculated from the source.
>
> Thanks for your sleuthing!
>
>>> I'm afraid you neglected to test.
>
> Accusation actually unsupported. I apologize, and replace it by a
> question: have you observed the improvement you're trying to achieve,
> and if yes, how?
>
Yes, we need to create sparse files as the backing images for pflash device.
To create sparse files like:
dd of="QEMU_EFI-pflash.raw" if="/dev/zero" bs=1M seek=64 count=0
dd of="QEMU_EFI-pflash.raw" if="QEMU_EFI.fd" conv=notrunc
dd of="empty_VARS.fd" if="/dev/zero" bs=1M seek=64 count=0
Start a VM with below commandline:
-drive
file=/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw,if=pflash,format=raw,unit=0,readonly=on\
-drive
file=/usr/share/edk2/aarch64/empty_VARS.fd,if=pflash,format=raw,unit=1 \
Then observe the memory usage of the qemu process (THP is on).
1) Without this patch:
# cat /proc/`pidof qemu-system-aarch64`/smaps | grep AnonHugePages: | grep -v '
0 kB'
AnonHugePages: 706560 kB
AnonHugePages: 2048 kB
AnonHugePages: 65536 kB // pflash memory device
AnonHugePages: 65536 kB // pflash memory device
AnonHugePages: 2048 kB
# ps aux | grep qemu-system-aarch64
RSS: 879684
2) After applying this patch:
# cat /proc/`pidof qemu-system-aarch64`/smaps | grep AnonHugePages: | grep -v '
0 kB'
AnonHugePages: 700416 kB
AnonHugePages: 2048 kB
AnonHugePages: 2048 kB // pflash memory device
AnonHugePages: 2048 kB // pflash memory device
AnonHugePages: 2048 kB
# ps aux | grep qemu-system-aarch64
RSS: 744380
Obviously, there are at least 100MiB memory saved for each guest.
--
Thanks,
Xiang