Re: [pve-devel] [RFC qemu] savevm-async: improve check for blockers

2024-05-17 Thread Thomas Lamprecht
subject might be improved by being less general/ambiguous, something like:

savevm-async: improve coverage by also checking for migration blockers

or 

savevm-async: block snapshot also if migration would fail

or

savevm-async: reuse migration blocker check for snapshots

Would have helped me to have a better initial context for reading this commit
(message).

Am 17/05/2024 um 13:39 schrieb Fiona Ebner:
> Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
> savevm: consult migration blockers"), migration and (async) snapshot
> are essentially the same operation and thus snapshot also needs to
> check for migration blockers. For example, this catches passed-through
> PCI devices, where the driver does not support migration.
> 
> However, the commit notes:
> 
>> There is really no difference between live migration and savevm, except
>> that savevm does not require bdrv_invalidate_cache to be implemented
>> by all disks.  However, it is unlikely that savevm is used with anything
>> except qcow2 disks, so the penalty is small and worth the improvement
>> in catching bad usage of savevm.
> 
> and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
> and would be broken by simply using migration_is_blocked(). To keep
> this working, introduce a new helper that filters blockers with the
> prefix used by the VMDK migration blocker.
> 
> The function qemu_savevm_state_blocked() is called as part of
> migration_is_blocked_allow_prefix() so no check is lost with this
> patch.
> 
> Signed-off-by: Fiona Ebner 
> ---
> 
> An alternative would be to mark the VMDK blocker as a
> "live-migration-only" blocker and extending migration_is_blocked() or
> using separate helpers to check for migration and snapshot blockers
> differently. But that requires touching more machinery and probably
> needs more adaptation going forward than the approach here.
> 
>  migration/migration.c| 22 ++
>  migration/migration.h|  1 +
>  migration/savevm-async.c |  7 ++-
>  3 files changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index b8d7e471a4..6235309a00 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1897,6 +1897,28 @@ void qmp_migrate_pause(Error **errp)
> "during postcopy-active or postcopy-recover state");
>  }
>  
> +/*
> + * HACK to allow hibernation in Proxmox VE even when VMDK image is present.
> + */
> +bool migration_is_blocked_allow_prefix(Error **errp, const char *prefix)
> +{
> +GSList *blockers = migration_blockers[migrate_mode()];
> +
> +if (qemu_savevm_state_blocked(errp)) {
> +return true;
> +}
> +
> +while (blockers) {
> +if (!g_str_has_prefix(error_get_pretty(blockers->data), prefix)) {
> +error_propagate(errp, error_copy(blockers->data));
> +return true;
> +}
> +blockers = g_slist_next(blockers);
> +}
> +
> +return false;
> +}
> +
>  bool migration_is_blocked(Error **errp)
>  {
>  GSList *blockers = migration_blockers[migrate_mode()];
> diff --git a/migration/migration.h b/migration/migration.h
> index 8045e39c26..575805a8e2 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -484,6 +484,7 @@ int migration_call_notifiers(MigrationState *s, 
> MigrationEventType type,
>   Error **errp);
>  
>  int migrate_init(MigrationState *s, Error **errp);
> +bool migration_is_blocked_allow_prefix(Error **errp, const char *prefix);
>  bool migration_is_blocked(Error **errp);
>  /* True if outgoing migration has entered postcopy phase */
>  bool migration_in_postcopy(void);
> diff --git a/migration/savevm-async.c b/migration/savevm-async.c
> index bf36fc06d2..33085446e1 100644
> --- a/migration/savevm-async.c
> +++ b/migration/savevm-async.c
> @@ -363,7 +363,12 @@ void qmp_savevm_start(const char *statefile, Error 
> **errp)
>  return;
>  }
>  
> -if (qemu_savevm_state_blocked(errp)) {
> +/*
> + * The limitation for VMDK images only applies to live-migration, not
> + * snapshots, see commit 5aaac46793 ("migration: savevm: consult 
> migration
> + * blockers").
> + */
> +if (migration_is_blocked_allow_prefix(errp, "The vmdk format used by 
> node")) {

meh, not a big fan of matching strings here, especially as that is not
stable ABI, I mean I did not check, but I would be surprised if that's
the case – maybe we could factor out that string here and when its added
as blocker into a common constant so that we'd notice if it changes.

And if we only uses this here then why add a generic "ignore one specific
blocker" helper, might be better to at least contain that detail in a
"qemu_savevm_async_state_blocked" one that takes only the `errp` as
parameter, as hacks should IMO always be quite specific to avoid the
spread of them (I know you would check in detail before doing so, but
not everybody does).

>  

Re: [pve-devel] [RFC qemu] savevm-async: improve check for blockers

2024-05-17 Thread Fiona Ebner
Am 17.05.24 um 13:39 schrieb Fiona Ebner:
> Same rationale as with upstream QEMU commit 5aaac46793 ("migration:
> savevm: consult migration blockers"), migration and (async) snapshot
> are essentially the same operation and thus snapshot also needs to
> check for migration blockers. For example, this catches passed-through
> PCI devices, where the driver does not support migration.
> 
> However, the commit notes:
> 
>> There is really no difference between live migration and savevm, except
>> that savevm does not require bdrv_invalidate_cache to be implemented
>> by all disks.  However, it is unlikely that savevm is used with anything
>> except qcow2 disks, so the penalty is small and worth the improvement
>> in catching bad usage of savevm.
> 
> and for Proxmox VE, suspend-to-disk with VMDK does use savevm-async
> and would be broken by simply using migration_is_blocked(). To keep
> this working, introduce a new helper that filters blockers with the
> prefix used by the VMDK migration blocker.
> 
> The function qemu_savevm_state_blocked() is called as part of
> migration_is_blocked_allow_prefix() so no check is lost with this
> patch.
> 
> Signed-off-by: Fiona Ebner 
> ---
> 
> An alternative would be to mark the VMDK blocker as a
> "live-migration-only" blocker and extending migration_is_blocked() or
> using separate helpers to check for migration and snapshot blockers
> differently. But that requires touching more machinery and probably
> needs more adaptation going forward than the approach here.
> 

So, this also "breaks" and at the same time fixes snapshot and hibernate
with VNC clipboard by preventing it. Currently, we do not have any
checks in the snapshot and hibernate API calls for the VNC clipboard and
they "work", but the clipboard will be broken after restore (and I mean
broken, not just the contents lost). So some users might consider adding
such checks (and this patch) a breaking change even if it's technically
correct to prevent snapshot and hibernate with VNC clipboard. And other
users might (rightly) complain about broken clipboard.


___
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel