On 01/12/16 18:50, Kevin Wolf wrote: > Am 12.01.2016 um 18:40 hat Markus Armbruster geschrieben: >> Kevin Wolf <kw...@redhat.com> writes: >> >>> Am 12.01.2016 um 17:35 hat Denis V. Lunev geschrieben: >>>> On 01/12/2016 06:47 PM, Denis V. Lunev wrote: >>>>> On 01/12/2016 06:20 PM, Kevin Wolf wrote: >>>>>> Am 12.01.2016 um 15:59 hat Paolo Bonzini geschrieben: >>>>>>> >>>>>>> On 12/01/2016 15:16, Kevin Wolf wrote: >>>>>>>>> Thus we should avoid selection of "pflash" drives for VM >>>>>>>>> state saving. >>>>>>>>> >>>>>>>>> For now "pflash" is read-write raw image as it configured by libvirt. >>>>>>>>> Thus there are no such images in the field and we could >>>>>>>>> safely disable >>>>>>>>> ability to save state to those images inside QEMU. >>>>>>>> This is obviously broken. If you write to the pflash, then it needs to >>>>>>>> be snapshotted in order to keep a consistent state. >>>>>>>> >>>>>>>> If you want to avoid snapshotting the image, make it read-only and it >>>>>>>> will be skipped even today. >>>>>>> Sort of. The point of having flash is to _not_ make it read-only, so >>>>>>> that is not a solution. >>>>>>> >>>>>>> Flash is already being snapshotted as part of saving RAM state. In >>>>>>> fact, for this reason the device (at least the one used with OVMF; I >>>>>>> haven't checked other pflash devices) can simply save it back to disk >>>>>>> on the migration destination, without the need to use "migrate -b" or >>>>>>> shared storage. >>>>>>> [...] >>>>>>> I don't like very much using IF_PFLASH this way, which is why I hadn't >>>>>>> replied to the patch so far---I hadn't made up my mind about *what* to >>>>>>> suggest instead, or whether to just accept it. However, it does work. >>>>>>> >>>>>>> Perhaps a separate "I know what I am doing" skip-snapshot option? Or >>>>>>> a device callback saying "not snapshotting this is fine"? >>>>>> Boy, is this ugly... >>>>>> >>>>>> What do you do with disk-only snapshots? The recovery only works as long >>>>>> as you have VM state. >>>>>> >>>>>> Kevin >>>>> actually I am in a bit of trouble :( >>>>> >>>>> I understand that this is ugly, but I would like to make working >>>>> 'virsh snapshot' for OVFM VMs. This is necessary for us to make >>>>> a release. >>>>> >>>>> Currently libvirt guys generate XML in the following way: >>>>> >>>>> <os> >>>>> <type arch='x86_64' machine='pc-i440fx-2.3'>hvm</type> >>>>> <loader readonly='yes' >>>>> type='pflash'>/usr/share/OVMF/OVMF_CODE_new.fd</loader> >>>>> <nvram>/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd</nvram> >>>>> </os> >>>>> >>>>> This results in: >>>>> >>>>> qemu -drive >>>>> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive >>>>> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd,if=pflash,format=raw,unit=1 >>>>> >>>>> This obviously can not pass check in bdrv_all_can_snapshot() >>>>> as 'pflash' is RW and raw, i.e. can not be snapshoted. >>>>> >>>>> They have discussed the switch to the following command line: >>>>> >>>>> qemu -drive >>>>> file=/usr/share/OVMF/OVMF_CODE_new.fd,if=pflash,format=raw,unit=0,readonly=on >>>>> \ >>>>> -drive >>>>> file=/var/lib/libvirt/qemu/nvram/f20efi_VARS.fd.qcow2,if=pflash,format=qcow2,unit=1 >>>>> >>>>> and say that in this case VM state could fall into PFLASH >>>>> drive which is should not be big as the location of the >>>>> file is different. This means that I am doomed here. >>>>> >>>>> Either we should force libvirt people to forget about their >>>>> opinion that pflash should be small which I am unable to >>>>> do or I should invent a way to ban VM state saving into >>>>> pflash. >>>>> >>>>> OK. There are 2 options. >>>>> >>>>> 1) Ban pflash as it was done. >>>>> 2) Add 'no-vmstate' flag to -drive (invented just now). >>>>> >>>> something like this: >>>> >>>> diff --git a/block.c b/block.c >>>> index 3e1877d..8900589 100644 >>>> --- a/block.c >>>> +++ b/block.c >>>> @@ -881,6 +881,11 @@ static QemuOptsList bdrv_runtime_opts = { >>>> .help = "Block driver to use for the node", >>>> }, >>>> { >>>> + .name = "novmstate", >>>> + .type = QEMU_OPT_BOOL, >>>> + .help = "Ignore for selecting to save VM state", >>>> + }, >>>> + { >>>> .name = BDRV_OPT_CACHE_WB, >>>> .type = QEMU_OPT_BOOL, >>>> .help = "Enable writeback mode", >>>> @@ -957,6 +962,7 @@ static int bdrv_open_common(BlockDriverState >>>> *bs, BdrvChild *file, >>>> bs->request_alignment = 512; >>>> bs->zero_beyond_eof = true; >>>> bs->read_only = !(bs->open_flags & BDRV_O_RDWR); >>>> + bs->disable_vmstate_save = qemu_opt_get_bool(opts, "novmstate", >>>> false); >>>> >>>> if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv, bs->read_only)) { >>>> error_setg(errp, >>>> diff --git a/block/snapshot.c b/block/snapshot.c >>>> index 2d86b88..33cdd86 100644 >>>> --- a/block/snapshot.c >>>> +++ b/block/snapshot.c >>>> @@ -483,6 +483,10 @@ BlockDriverState *bdrv_all_find_vmstate_bs(void) >>>> while (not_found && (bs = bdrv_next(bs))) { >>>> AioContext *ctx = bdrv_get_aio_context(bs); >>>> >>>> + if (bs->disable_vmstate_save) { >>>> + continue; >>>> + } >>>> + >>>> aio_context_acquire(ctx); >>>> not_found = !bdrv_can_snapshot(bs); >>>> aio_context_release(ctx); >>>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>>> index 256609d..855a209 100644 >>>> --- a/include/block/block_int.h >>>> +++ b/include/block/block_int.h >>>> @@ -438,6 +438,9 @@ struct BlockDriverState { >>>> /* do we need to tell the quest if we have a volatile write cache? */ >>>> int enable_write_cache; >>>> >>>> + /* skip this BDS searching for one to save VM state */ >>>> + bool disable_vmstate_save; >>>> + >>>> /* the following member gives a name to every node on the bs graph. */ >>>> char node_name[32]; >>>> /* element of the list of named nodes building the graph */ >>> >>> That sounds like an option. (No pun intended.) >>> >>> We can discuss the option name (perhaps "vmstate" defaulting to "on" is >>> better?) and variable names (I'd prefer them to match the option name); >>> also you'll need to extend the QAPI schema for blockdev-add. But all of >>> these are minor points and the idea seems sane. >> >> I've always thought that QEMU picking the image to take the VM state is >> backwards. Adding means to guide that pick like "don't pick this one, >> please" may help ease the pain, but it's still backwards. >> >> The *user* should pick it. > > Designing the API now when it has been in use for ten years is > backwards, too. We have to take it as is and make the best of it. > > We could add an optional argument to savevm that tells which image to > save the VM state to. But if it's missing, we still need to make a pick. > Of course, libvirt should then always use that option and then we don't > need a separate vmstate=[on|off] option.
Sounds great! > If we go that way, we need to improve loadvm to get VM state from any of > the images of a VM, because the user could have saved the state to any. > (Making that improvement is probably a good idea anyway.) What if there are several images with vmstate in them? (OTOH that doesn't seem to be a well-defined case even now, so nothing would amount to a regression.) Thanks Laszlo > > Kevin >