jmsperu commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3508213029
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +122,104 @@ backup_running_vm() {
mount_operation
mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit
1; }
+ # Determine effective mode for this run.
+ # Legacy callers (no -M argument) get the original full-only behavior with
no checkpoint.
+ # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required
args before
+ # invoking the script; the case below is a defensive fallback for direct
invocations.
+ local effective_mode="${MODE:-legacy-full}"
+ local make_checkpoint=0
+ case "$effective_mode" in
+ incremental|full)
+ make_checkpoint=1
+ ;;
+ legacy-full)
+ make_checkpoint=0
+ ;;
+ *)
+ echo "Unknown mode: $effective_mode"
+ cleanup
+ exit 1
+ ;;
+ esac
+
+ # When incremental, make sure the parent checkpoint is registered with
libvirt. CloudStack
+ # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory
checkpoint
+ # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it
on start). A
+ # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already
exists" — so the
+ # parent must be re-registered with --redefine. libvirt only needs the
checkpoint name and a
+ # creationTime for a redefine (the value need not be accurate — checkpoints
are ephemeral),
+ # so we synthesize a minimal XML on the fly instead of persisting the full
checkpoint dump.
+ #
+ # First verify the parent bitmap actually exists on the running qcow2 — it
can be absent after
+ # a migration even though the orchestrator's active_checkpoint says it
should be there. If it
+ # is gone, fall back to a full backup rather than letting backup-begin fail
below.
+ if [[ "$effective_mode" == "incremental" ]]; then
+ # The parent bitmap must be present on EVERY disk's qcow2, not just one of
them. A volume
+ # snapshot restore (or a partial migration) can wipe the bitmap on some
disks while leaving
+ # it on others; a plain "is the name anywhere in query-block" check passes
in that case and
+ # backup-begin then fails on the disk that is missing the bitmap. Require
the bitmap on all
+ # disks: compare the disk count to the number of disks reporting the
bitmap (tests 17/19).
+ disk_count=$(virsh -c qemu:///system domblklist "$VM" --details
2>/dev/null | awk '$2=="disk"{c++} END{print c+0}')
+ bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM"
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l
| tr -d ' ')
Review Comment:
Good catch, fixed in ae2a6b2. The probe no longer aborts on a no-match: I
replaced the `grep | wc` count with a per-device parse that returns 0 when
nothing matches and ends in `|| echo 0`, so a snapshot restore that clears the
bitmap on all disks now falls through to the full-backup fallback instead of
exiting under `set -eo pipefail`. Verified tests 17 and 18.
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +122,104 @@ backup_running_vm() {
mount_operation
mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit
1; }
+ # Determine effective mode for this run.
+ # Legacy callers (no -M argument) get the original full-only behavior with
no checkpoint.
+ # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required
args before
+ # invoking the script; the case below is a defensive fallback for direct
invocations.
+ local effective_mode="${MODE:-legacy-full}"
+ local make_checkpoint=0
+ case "$effective_mode" in
+ incremental|full)
+ make_checkpoint=1
+ ;;
+ legacy-full)
+ make_checkpoint=0
+ ;;
+ *)
+ echo "Unknown mode: $effective_mode"
+ cleanup
+ exit 1
+ ;;
+ esac
+
+ # When incremental, make sure the parent checkpoint is registered with
libvirt. CloudStack
+ # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory
checkpoint
+ # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it
on start). A
+ # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already
exists" — so the
+ # parent must be re-registered with --redefine. libvirt only needs the
checkpoint name and a
+ # creationTime for a redefine (the value need not be accurate — checkpoints
are ephemeral),
+ # so we synthesize a minimal XML on the fly instead of persisting the full
checkpoint dump.
+ #
+ # First verify the parent bitmap actually exists on the running qcow2 — it
can be absent after
+ # a migration even though the orchestrator's active_checkpoint says it
should be there. If it
+ # is gone, fall back to a full backup rather than letting backup-begin fail
below.
+ if [[ "$effective_mode" == "incremental" ]]; then
+ # The parent bitmap must be present on EVERY disk's qcow2, not just one of
them. A volume
+ # snapshot restore (or a partial migration) can wipe the bitmap on some
disks while leaving
+ # it on others; a plain "is the name anywhere in query-block" check passes
in that case and
+ # backup-begin then fails on the disk that is missing the bitmap. Require
the bitmap on all
+ # disks: compare the disk count to the number of disks reporting the
bitmap (tests 17/19).
+ disk_count=$(virsh -c qemu:///system domblklist "$VM" --details
2>/dev/null | awk '$2=="disk"{c++} END{print c+0}')
+ bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM"
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l
| tr -d ' ')
Review Comment:
Right, `query-block` lists the bitmap under more than one node per disk, so
`grep -o name | wc -l` double-counted and the 1-of-2 case read as 2. ae2a6b2
now counts one per distinct `inserted.file` whose `dirty-bitmaps` holds the
parent, so that case returns 1 < 2 disks and correctly falls back to full.
Verified test 19.
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +122,104 @@ backup_running_vm() {
mount_operation
mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit
1; }
+ # Determine effective mode for this run.
+ # Legacy callers (no -M argument) get the original full-only behavior with
no checkpoint.
+ # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required
args before
+ # invoking the script; the case below is a defensive fallback for direct
invocations.
+ local effective_mode="${MODE:-legacy-full}"
+ local make_checkpoint=0
+ case "$effective_mode" in
+ incremental|full)
+ make_checkpoint=1
+ ;;
+ legacy-full)
+ make_checkpoint=0
+ ;;
+ *)
+ echo "Unknown mode: $effective_mode"
+ cleanup
+ exit 1
+ ;;
+ esac
+
+ # When incremental, make sure the parent checkpoint is registered with
libvirt. CloudStack
+ # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory
checkpoint
+ # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it
on start). A
+ # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already
exists" — so the
+ # parent must be re-registered with --redefine. libvirt only needs the
checkpoint name and a
+ # creationTime for a redefine (the value need not be accurate — checkpoints
are ephemeral),
+ # so we synthesize a minimal XML on the fly instead of persisting the full
checkpoint dump.
+ #
+ # First verify the parent bitmap actually exists on the running qcow2 — it
can be absent after
+ # a migration even though the orchestrator's active_checkpoint says it
should be there. If it
+ # is gone, fall back to a full backup rather than letting backup-begin fail
below.
+ if [[ "$effective_mode" == "incremental" ]]; then
+ # The parent bitmap must be present on EVERY disk's qcow2, not just one of
them. A volume
+ # snapshot restore (or a partial migration) can wipe the bitmap on some
disks while leaving
+ # it on others; a plain "is the name anywhere in query-block" check passes
in that case and
+ # backup-begin then fails on the disk that is missing the bitmap. Require
the bitmap on all
+ # disks: compare the disk count to the number of disks reporting the
bitmap (tests 17/19).
+ disk_count=$(virsh -c qemu:///system domblklist "$VM" --details
2>/dev/null | awk '$2=="disk"{c++} END{print c+0}')
+ bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM"
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l
| tr -d ' ')
Review Comment:
Done. The in-script probe now mirrors `getVmDiskPathHasFromCheckpointMap()`:
it parses `query-block` per device (`inserted.file` + `inserted.dirty-bitmaps`)
instead of counting raw name matches, so the push-mode path uses the same logic
as the pull-mode wrapper. ae2a6b2.
##########
server/src/main/java/org/apache/cloudstack/backup/BackupManagerImpl.java:
##########
@@ -1701,6 +1701,14 @@ private boolean deleteCheckedBackup(Boolean forced,
BackupProvider backupProvide
reservationDao, resourceLimitMgr)) {
boolean result = backupProvider.deleteBackup(backup, forced);
if (result) {
+ // Chain-aware providers (e.g. NAS) physically remove several
backups per call
+ // (leaf + swept delete-pending ancestors) and decrement
resource count/usage and
+ // remove each DB row themselves, exactly once per removed
backup. Decrementing or
+ // removing again here would double-handle and destroy
delete-pending tombstones,
+ // so defer entirely to the provider for those.
+ if (backupProvider.handlesChainDeleteResourceAccounting()) {
+ return true;
Review Comment:
Addressed in ae2a6b2, see the per-device count and set-e-safe fallback
described in the replies above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]