Sorry for the delay in replying - holiday then distractions. On Thursday, 2020-11-19 at 11:45:11 GMT, Alex Bennée wrote:
> Philippe Mathieu-Daudé <phi...@redhat.com> writes: > >> On 11/16/20 2:48 PM, Markus Armbruster wrote: >>> Philippe Mathieu-Daudé <phi...@redhat.com> writes: >>> >>>> Hi David, >>>> >>>> On 11/16/20 11:42 AM, David Edmondson wrote: >>>>> Currently ARM UEFI images are typically built as 2MB/768kB flash >>>>> images for code and variables respectively. These images are both then >>>>> padded out to 64MB before being loaded by QEMU. >>>>> >>>>> Because the images are 64MB each, QEMU allocates 128MB of memory to >>>>> read them, and then proceeds to read all 128MB from disk (dirtying the >>>>> memory). Of this 128MB less than 3MB is useful - the rest is zero >>>>> padding. >>>> >>>> 2 years ago I commented the same problem, and suggested to access the >>>> underlying storage by "block", as this is a "block storage". >>>> >>>> Back then the response was "do not try to fix something that works". >>>> This is why we choose the big hammer "do not accept image size not >>>> matching device size" way. >>>> >>>> While your series seems to help, it only postpone the same >>>> implementation problem. If what you want is use the least memory >>>> required, I still believe accessing the device by block is the >>>> best approach. >>> >>> "Do not try to fix something that works" is hard to disagree with. >>> However, at least some users seem to disagree with "this works". Enough >>> to overcome the resistance to change? >> >> Yeah, at least 3 big users so far: >> >> - Huawei >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg607292.html >> - Tencent >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg742066.html >> - Oracle >> (this series). >> >> Then Huawei tried the MicroVM approach: >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg680103.html >> >> I simply wanted to save David time by remembering this other approach, >> since Peter already commented on David's one (see Huawei link). > > IIRC the two questions that came up were: > > - what would reading memory not covered by a file look like (0's or > something more like real HW, 7f?). > > - what happens when the guest writes beyond the bounds of a backing > file? In the non-backward-compatible-case (pre virt-5.2 in the patches, but obviously not actually in 5.2, where the memory range is declared to the guest as the extent of the file) neither of these issues should arise. Reading/writing outside of the declared range should generate a fault, much like any other non-existent region. In the backward-compatible case the patches are obviously flawed - a write outside of the extent of the backing file would have nowhere to go. In order to figure out how to move forward, I think that the summary of the previous conversations is that either: - it's not possible to reduce the size of the memory region declared to the guest from 64M+64M because of $reasons, or - we don't want to reduce the size of the memory region declared to the guest from 64M+64M because of $reasons. If that's right, I'd be curious to better understand $reasons, but have no basis on which to argue. Presuming that it's not acceptable to reduce the declared extent of the flash regions, I will probably look to implement Philippe's suggestion: > If what you want is use the least memory required, I still believe > accessing the device by block is the best approach. ...which I interpret to mean that the pflash drivers should not allocate memory and read the images (later writing back modified blocks), but handle each IO request from the caller by reading and writing blocks from the underlying device as necessary. If this interpretation is wrong, please let me know :-) Looking at doing that, it seems that a new variant of memory_region_init_rom_device() that does not allocate memory will be required, unless there anything already available that performs a similar function? Booting a VM with AAVMF and writing some variables to the flash doesn't exercise much of the pflash functionality. In particular it would seem like the right time to fix the batch write omission in the current code. Is there another consumer of the pflash drivers that is likely to exercise more? (I can write tests that run in the guest, of course, presuming that the Linux driver can be pushed into using those paths.) > I'm guessing for these cloudy type applications no one cares about > persistence of EFI variables? Maybe we just need a formulation for the > second pflash which is explicit about writes being ephemeral while also > being accepted? Our current deployments on x86 do not have writable flash for VARS, though I believe that was problematic on arm64 until a recent change. It's my suspicion that assuming we can continue to present read-only VARS generally is going to be proven wrong before too long. dme. -- I'm not living in the real world, no more, no more.