Re: [edk2-devel] A problem with live migration of UEFI virtual machines

Andrew Fish via Tue, 25 Feb 2020 13:35:47 -0800

> On Feb 25, 2020, at 12:40 PM, Laszlo Ersek <ler...@redhat.com> wrote:
> 
> Hi Andrew,
> 
> On 02/25/20 19:56, Andrew Fish wrote:
>> Laszlo,
>> 
>> If I understand this correctly is it not more complicated than just size. It 
>> also assumes the memory layout is the same?
> 
> Yes.
> 
>> The legacy BIOS used fixed magic address ranges, but UEFI uses dynamically 
>> allocated memory so addresses are not fixed. While the UEFI firmware does 
>> try to keep S3 and S4 layouts consistent between boots, I'm not aware of any 
>> mechanism to keep the memory map address the same between versions of the 
>> firmware?
> 
> It's not about RAM, but platform MMIO.
> 

Laszlo,

The FLASH offsets changing breaking things makes sense. 

I now realize this is like updating the EFI ROM without rebooting the system.  
Thus changes in how the new EFI code works is not the issue. 

Is this migration event visible to the firmware? Traditionally the NVRAM is a 
region in the FD so if you update the FD you have to skip NVRAM region or save 
and restore it. Is that activity happening in this case? Even if the ROM layout 
does not change how do you not lose the contents of the NVRAM store when the 
live migration happens? Sorry if this is a remedial question but I'm trying to 
learn how this migration works. 

Thanks,

Andrew Fish

> The core of the issue here is that the -D FD_SIZE_4MB and -D FD_SIZE_2MB
> build options (or more directly, the different FD_SIZE_IN_KB macro
> settings) set a bunch of flash-related build-time constant macros, and
> PCDs, differently, in the following files:
> 
> - OvmfPkg/OvmfPkg.fdf.inc
> - OvmfPkg/VarStore.fdf.inc
> - OvmfPkg/OvmfPkg*.dsc
> 
> As a result, the OVMF_CODE.fd firmware binary will have different
> hard-coded references to the variable store pflash addresses.
> (Guest-physical MMIO addresses that point into the pflash range.)
> 
> If someone tries to combine an OVMF_CODE.fd firmware binary from e.g.
> the 4MB build, with a variable store file that was originally
> instantiated from an OVMF_VARS.fd varstore template from the 2MB build,
> then the firmware binary's physical address references and various size
> references will not match the contents / layout of the varstore pflash
> chip, which maps an incompatibly structured varstore file.
> 
> For example, "OvmfPkg/VarStore.fdf.inc" describes two incompatible
> EFI_FIRMWARE_VOLUME_HEADER structures (which "build" generates for the
> OVMF_VARS.fd template) between the 4MB (total size) build, and the
> 1MB/2MB (total size) build.
> 
> The commit message below summarizes the internal layout differences,
> from 1MB/2MB -> 4MB:
> 
> https://github.com/tianocore/edk2/commit/b24fca05751f
> 
> Excerpt (relevant for OVMF_VARS.fd):
> 
>  Description                Compression type                Size [KB]
>  -------------------------  -----------------  ----------------------
>  Non-volatile data storage  open-coded binary    128 ->   528 ( +400)
>                               data
>    Variable store                                 56 ->   256 ( +200)
>    Event log                                       4 ->     4 (   +0)
>    Working block                                   4 ->     4 (   +0)
>    Spare area                                     64 ->   264 ( +200)
> 
> Thanks
> Laszlo
> 
> 
>>> On Feb 25, 2020, at 9:53 AM, Laszlo Ersek <ler...@redhat.com> wrote:
>>> 
>>> On 02/24/20 16:28, Daniel P. Berrangé wrote:
>>>> On Tue, Feb 11, 2020 at 05:39:59PM +0000, Alex Bennée wrote:
>>>>> 
>>>>> wuchenye1995 <wuchenye1...@gmail.com> writes:
>>>>> 
>>>>>> Hi all,
>>>>>>  We found a problem with live migration of UEFI virtual machines
>>>>>>  due to size of OVMF.fd changes.
>>>>>>  Specifically, the size of OVMF.fd in edk with low version such as
>>>>>>  edk-2.0-25 is 2MB while the size of it in higher version such as
>>>>>>  edk-2.0-30 is 4MB.
>>>>>>  When we migrate a UEFI virtual machine from the host with low
>>>>>>  version of edk2 to the host with higher one, qemu component will
>>>>>>  report an error in function qemu_ram_resize while
>>>>>> checking size of ovmf_pcbios: Length mismatch: pc.bios: 0x200000 in
>>>>>> != 0x400000: Invalid argument.
>>>>>>  We want to know how to solve this problem after updating the
>>>>>>  version of edk2.
>>>>> 
>>>>> You can only migrate a machine that is identical - so instantiating a
>>>>> empty machine with a different EDK image is bound to cause a problem
>>>>> because the machines don't match.
>>>> 
>>>> I don't believe we are that strict for firmware in general. The
>>>> firmware is loaded when QEMU starts, but that only matters for the
>>>> original source host QEMU. During migration, the memory content of the
>>>> original firmware will be copied during live migration, overwriting
>>>> whatever the target QEMU loaded off disk. This works....provided the
>>>> memory region is the same size on source & target host, which is where
>>>> the problem arises in this case.
>>>> 
>>>> If there's a risk that newer firmware will be larger than old firmware
>>>> there's only really two options:
>>>> 
>>>> - Keep all firmware images forever, each with a unique versioned
>>>>   filename. This ensures target QEMU will always load the original
>>>>   smaller firmware
>>>> 
>>>> - Add padding to the firmware images. IOW, if the firmware is 2 MB,
>>>>   add zero-padding to the end of the image to round it upto 4 MB
>>>>   (whatever you anticipate the largest size wil be in future).
>>>> 
>>>> Distros have often taken the latter approach for QEMU firmware in the
>>>> past. The main issue is that you have to plan ahead of time and get
>>>> this padding right from the very start. You can't add the padding
>>>> after the fact on an existing VM.
>>> 
>>> Following up here *too*, just for completeness.
>>> 
>>> The query in this thread has been posted three times now (and I have
>>> zero idea why). Each time it generated a different set of responses. For
>>> completes, I'm now going to link the other two threads here (because the
>>> present thread seems to have gotten the most feedback).
>>> 
>>> To the OP:
>>> 
>>> - please do *NOT* repost the same question once you get an answer. It
>>> only fragments the discussion and creates confusion. It also doesn't
>>> hurt if you *confirm* that you understood the answer.
>>> 
>>> - Yet further, if your email address has @gmail.com for domain, but your
>>> msgids contain "tencent", that raises some eyebrows (mine for sure).
>>> You say "we" in the query, but never identify the organization behind
>>> the plural pronoun.
>>> 
>>> (I've been fuming about the triple-posting of the question for a while
>>> now, but it's only now that, upon seeing how much work Dan has put into
>>> his answer, I've decided that dishing out a bit of netiquette would be
>>> in order.)
>>> 
>>> * First posting:
>>> - msgid:      <tencent_f1295f826e46edff3d778...@qq.com 
>>> <mailto:tencent_f1295f826e46edff3d778...@qq.com>>
>>> - edk2-devel: https://edk2.groups.io/g/devel/message/54146 
>>> <https://edk2.groups.io/g/devel/message/54146>
>>> - qemu-devel: 
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html 
>>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02419.html>
>>> 
>>> * my response:
>>>   - msgid:      <12553.1581366059422195...@groups.io 
>>> <mailto:12553.1581366059422195...@groups.io>>
>>>   - edk2-devel: https://edk2.groups.io/g/devel/message/54161 
>>> <https://edk2.groups.io/g/devel/message/54161>
>>>   - qemu-devel: none, because (as an exception) I used the stupid
>>>                 groups.io <http://groups.io/> web interface to respond, and 
>>> so my response
>>>                 never reached qemu-devel
>>> 
>>> * Second posting (~4 hours after the first)
>>> - msgid:      <tencent_3cd8845ec159f01617258...@qq.com 
>>> <mailto:tencent_3cd8845ec159f01617258...@qq.com>>
>>> - edk2-devel: https://edk2.groups.io/g/devel/message/54147 
>>> <https://edk2.groups.io/g/devel/message/54147>
>>> - qemu-devel: 
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html 
>>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02415.html>
>>> 
>>> * Dave's response:
>>>   - msgid:      <20200220154742.GC2882@work-vm>
>>>   - edk2-devel: https://edk2.groups.io/g/devel/message/54681 
>>> <https://edk2.groups.io/g/devel/message/54681>
>>>   - qemu-devel: 
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html 
>>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg05632.html>
>>> 
>>> * Third posting (next day, present thread) -- cross posted to yet
>>> another list (!), because apparently Dave's feedback and mine had not
>>> been enough:
>>> - msgid:        <tencent_bc7fd00363690990994e9...@qq.com 
>>> <mailto:tencent_bc7fd00363690990994e9...@qq.com>>
>>> - edk2-devel:   https://edk2.groups.io/g/devel/message/54220 
>>> <https://edk2.groups.io/g/devel/message/54220>
>>> - edk2-discuss: https://edk2.groups.io/g/discuss/message/135 
>>> <https://edk2.groups.io/g/discuss/message/135>
>>> - qemu-devel:   
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html 
>>> <https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg02735.html>
>>> 
>>> Back on topic: see my response again. The answer is, you can't solve the
>>> problem (specifically with OVMF), and QEMU in fact does you service by
>>> preventing the migration.
>>> 
>>> Laszlo
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Groups.io Links: You receive all messages sent to this group.
> 
> View/Reply Online (#54796): https://edk2.groups.io/g/devel/message/54796
> Mute This Topic: https://groups.io/mt/71141681/1755084
> Group Owner: devel+ow...@edk2.groups.io
> Unsubscribe: https://edk2.groups.io/g/devel/unsub  [af...@apple.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>
Re: [edk2-devel] A problem with live migration of UEFI virtual machines

Reply via email to