Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot

Laszlo Ersek Wed, 13 Jan 2016 02:45:53 -0800

On 01/12/16 18:50, Kevin Wolf wrote:
> Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kw...@redhat.com> writes:
>>
>>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben:
>>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote:
>>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote:
>>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben:
>>>>>>>
>>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote:
>>>>>>>>> Thus we should avoid selection of "pflash" drives for VM
>>>>>>>>> state saving.
>>>>>>>>>
>>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt.
>>>>>>>>> Thus there are no such images in the field and we could
>>>>>>>>> safely disable
>>>>>>>>> ability to save state to those images inside QEMU.
>>>>>>>> This is obviously broken. If you write to the pflash, then it needs to
>>>>>>>> be snapshotted in order to keep a consistent state.
>>>>>>>>
>>>>>>>> If you want to avoid snapshotting the image, make it read-only and it
>>>>>>>> will be skipped even today.
>>>>>>> Sort of.  The point of having flash is to _not_ make it read-only, so
>>>>>>> that is not a solution.
>>>>>>>
>>>>>>> Flash is already being snapshotted as part of saving RAM state.  In
>>>>>>> fact, for this reason the device (at least the one used with OVMF; I
>>>>>>> haven't checked other pflash devices) can simply save it back to disk
>>>>>>> on the migration destination, without the need to use "migrate -b" or
>>>>>>> shared storage.
>>>>>>> [...]
>>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't
>>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to
>>>>>>> suggest instead, or whether to just accept it.  However, it does work.
>>>>>>>
>>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option?  Or
>>>>>>> a device callback saying "not snapshotting this is fine"?
>>>>>> Boy, is this ugly...
>>>>>>
>>>>>> What do you do with disk-only snapshots? The recovery only works as long
>>>>>> as you have VM state.
>>>>>>
>>>>>> Kevin
>>>>> actually I am in a bit of trouble :(
>>>>>
>>>>> I understand that this is ugly, but I would like to make working
>>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make
>>>>> a release.
>>>>>
>>>>> Currently libvirt guys generate XML in the following way:
>>>>>
>>>>>  <os>
>>>>>    <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type>
>>>>>    <loader readonly='yes'
>>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader>
>>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram>
>>>>>  </os>
>>>>>
>>>>> This results in:
>>>>>
>>>>> qemu -drive 
>>>>> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>     -drive 
>>>>> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1
>>>>>
>>>>> This obviously can not pass check in bdrv_all_can_snapshot()
>>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted.
>>>>>
>>>>> They have discussed the switch to the following command line:
>>>>>
>>>>> qemu -drive 
>>>>> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on
>>>>> \
>>>>>     -drive 
>>>>> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1
>>>>>
>>>>> and say that in this case VM state could fall into PFLASH
>>>>> drive which is should not be big as the location of the
>>>>> file is different. This means that I am doomed here.
>>>>>
>>>>> Either we should force libvirt people to forget about their
>>>>> opinion that pflash should be small which I am unable to
>>>>> do or I should invent a way to ban VM state saving into
>>>>> pflash.
>>>>>
>>>>> OK. There are 2 options.
>>>>>
>>>>> 1) Ban pflash as it was done.
>>>>> 2) Add 'no-vmstate' flag to -drive (invented just now).
>>>>>
>>>> something like this:
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 3e1877d..8900589 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = {
>>>>              .help = "Block driver to use for the node",
>>>>          },
>>>>          {
>>>> +            .name = "novmstate",
>>>> +            .type = QEMU_OPT_BOOL,
>>>> +            .help = "Ignore for selecting to save VM state",
>>>> +        },
>>>> +        {
>>>>              .name = BDRV_OPT_CACHE_WB,
>>>>              .type = QEMU_OPT_BOOL,
>>>>              .help = "Enable writeback mode",
>>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState
>>>> *bs, BdrvChild *file,
>>>>      bs->request_alignment = 512;
>>>>      bs->zero_beyond_eof = true;
>>>>      bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
>>>> +    bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", 
>>>> false);
>>>>
>>>>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) {
>>>>          error_setg(errp,
>>>> diff --git a/block/snapshot.c b/block/snapshot.c
>>>> index 2d86b88..33cdd86 100644
>>>> --- a/block/snapshot.c
>>>> +++ b/block/snapshot.c
>>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void)
>>>>      while (not_found && (bs = bdrv_next(bs))) {
>>>>          AioContext *ctx = bdrv_get_aio_context(bs);
>>>>
>>>> +        if (bs->disable_vmstate_save) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>>          aio_context_acquire(ctx);
>>>>          not_found = !bdrv_can_snapshot(bs);
>>>>          aio_context_release(ctx);
>>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>>> index 256609d..855a209 100644
>>>> --- a/include/block/block_int.h
>>>> +++ b/include/block/block_int.h
>>>> @@ -438,6 +438,9 @@ struct BlockDriverState {
>>>>      /* do we need to tell the quest if we have a volatile write cache? */
>>>>      int enable_write_cache;
>>>>
>>>> +    /* skip this BDS searching for one to save VM state */
>>>> +    bool disable_vmstate_save;
>>>> +
>>>>      /* the following member gives a name to every node on the bs graph. */
>>>>      char node_name[32];
>>>>      /* element of the list of named nodes building the graph */
>>>
>>> That sounds like an option. (No pun intended.)
>>>
>>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is
>>> better?) and variable names (I'd prefer them to match the option name);
>>> also you'll need to extend the QAPI schema for blockdev-add. But all of
>>> these are minor points and the idea seems sane.
>>
>> I've always thought that QEMU picking the image to take the VM state is
>> backwards.  Adding means to guide that pick like "don't pick this one,
>> please" may help ease the pain, but it's still backwards.
>>
>> The *user* should pick it.
> 
> Designing the API now when it has been in use for ten years is
> backwards, too. We have to take it as is and make the best of it.
> 
> We could add an optional argument to savevm that tells which image to
> save the VM state to. But if it's missing, we still need to make a pick.
> Of course, libvirt should then always use that option and then we don't
> need a separate vmstate=[on|off] option.


Sounds great!

> If we go that way, we need to improve loadvm to get VM state from any of
> the images of a VM, because the user could have saved the state to any.
> (Making that improvement is probably a good idea anyway.)

What if there are several images with vmstate in them?

(OTOH that doesn't seem to be a well-defined case even now, so nothing
would amount to a regression.)

Thanks
Laszlo

> 
> Kevin
>

Re: [Qemu-devel] [PATCH 1/1] blk: do not select PFLASH device for internal snapshot

Reply via email to