virt: use variable size of flash device to save memory

Xiang Zheng Mon, 08 Apr 2019 20:41:39 -0700


On 2019/4/9 0:14, Laszlo Ersek wrote:
> On 04/08/19 15:43, Xiang Zheng wrote:
>>
>> On 2019/4/3 23:35, Laszlo Ersek wrote:
>>>> I thought about your comments and wrote the following patch (just for test)
>>>> which uses a file mapping to replace the anonymous mapping. UEFI seems to 
>>>> work
>>>> fine. So why not use a file mapping to read or write a pflash device?
>>> Honestly I can't answer "why not do this". Maybe we should.
>>>
>>> Regarding "why not do this *exactly as shown below*" -- probably because
>>> then we'd have updates to the same underlying regular file via both
>>> mmap() and write(), and the interactions between those are really tricky
>>> (= best avoided).
>>>
>>> One of my favorite questions over the years used to be posting a matrix
>>> of possible mmap() and file descriptor r/w/sync interactions, with the
>>> outcomes as they were "implied" by POSIX, and then asking at the bottom,
>>> "is my understanding correct?" I've never ever received an answer to
>>> that. :)
>>
>> In my opinion, it's feasible to r/w/sync the memory devices which use a block
>> backend via mmap() and write().
> 
> Maybe. I think that would be a first in QEMU, and you'd likely have to
> describe and follow a semi-formal model between fd actions and mmap()
> actions.
> 
> Here's the (unconfirmed) table I referred to earlier.
> 
> +-------------+-----------------------------------------------------+
> | change made | change visible via                                  |
> | through     +-----------------+-------------+---------------------+
> |             | MAP_SHARED      | MAP_PRIVATE | read()              |
> +-------------+-----------------+-------------+---------------------+
> | MAP_SHARED  | yes             | unspecified | depends on MS_SYNC, |
> |             |                 |             | MS_ASYNC, or normal |
> |             |                 |             | system activity     |
> +-------------+-----------------+-------------+---------------------+
> | MAP_PRIVATE | no              | no          | no                  |
> +-------------+-----------------+-------------+---------------------+
> | write()     | depends on      | unspecified | yes                 |
> |             | MS_INVALIDATE,  |             |                     |
> |             | or the system's |             |                     |
> |             | read/write      |             |                     |
> |             | consistency     |             |                     |
> +-------------+-----------------+-------------+---------------------+
> 
> Presumably:
> 
> - a write() to some offset range of a regular file should be visible in
> a MAP_SHARED mapping of that range after a matching msync(...,
> MS_INVALIDATE) call;
> 
> - changes through a MAP_SHARED mapping to a regular file should be
> visible via read() after a matching msync(..., MS_SYNC) call returns.
> 
> I sent this table first in 2010 to the Austin Group mailing list, and
> received no comments. Then another person (on the same list) asked
> basically the same questions in 2013, to which I responded with the
> above assumptions / interpretations -- and received no comments from
> third parties again.
> 
> But, I'm really out of my depth here -- we're not even discussing
> generic read()/write()/mmap()/munmap()/msync() interactions, but how
> they would fit into the qemu block layer. And I have no idea.
> 
>>
>>>
>>> Also... although we don't really use them in practice, some QCOW2
>>> features for pflash block backends are -- or would be -- nice. (Not the
>>> internal snapshots or the live RAM dumps, of course.)
>>
>> Regarding sparse files, can we read actual backend size data for the 
>> read-only
>> pflash memory as the following steps shown?
>>
>> 1) Create a sparse file -- QEMU_EFI-test.raw:
>>    dd if=/dev/zero of=QEMU_EFI-test.raw bs=1M seek=64 count=0
>>
>> 2) Read from QEMU_EFI.fd and write to QEMU_EFI-test.raw:
>>    dd of="QEMU_EFI-test.raw" if="QEMU_EFI.fd" conv=notrunc
>>
>> 3) Use QEMU_EFI-test.raw as the read-only pflash and start qemu with the 
>> below
>> patch applied:
>>
>> ---8>---
>>
>> diff --git a/hw/block/block.c b/hw/block/block.c
>> index bf56c76..ad18d5e 100644
>> --- a/hw/block/block.c
>> +++ b/hw/block/block.c
>> @@ -30,7 +30,7 @@
>>  bool blk_check_size_and_read_all(BlockBackend *blk, void *buf, hwaddr size,
>>                                   Error **errp)
>>  {
>> -    int64_t blk_len;
>> +    int64_t blk_len, actual_len;
>>      int ret;
>>
>>      blk_len = blk_getlength(blk);
>> @@ -53,7 +53,9 @@ bool blk_check_size_and_read_all(BlockBackend *blk, void 
>> *buf, hwaddr size,
>>       * block device and read only on demand.
>>       */
>>      assert(size <= BDRV_REQUEST_MAX_BYTES);
>> -    ret = blk_pread(blk, 0, buf, size);
>> +    actual_len = bdrv_get_allocated_file_size(blk_bs(blk));
>> +    ret = blk_pread(blk, 0, buf,
>> +            (!blk_is_read_only(blk) || actual_len < 0) ? size : actual_len);
>>      if (ret < 0) {
>>          error_setg_errno(errp, -ret, "can't read block backend");
>>          return false;
>>
> 
> This hunk looks dubious for a general helper function. It seems to
> assume that the holes are punched at the end of the file.
> 
> Sorry, this discussion is making me uncomfortable. I don't want to
> ignore your questions, but I have no idea about the right answers. This
> really needs the attention of the block people.


I feel deeply sorry for making you uncomfortable. It's generous of you to give 
me so much
of your time.

-- 

Thanks,
Xiang

Re: [Qemu-devel] [RFC PATCH] hw/arm/virt: use variable size of flash device to save memory

Reply via email to