abh1sar commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3507774229
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +122,104 @@ backup_running_vm() {
mount_operation
mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit
1; }
+ # Determine effective mode for this run.
+ # Legacy callers (no -M argument) get the original full-only behavior with
no checkpoint.
+ # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required
args before
+ # invoking the script; the case below is a defensive fallback for direct
invocations.
+ local effective_mode="${MODE:-legacy-full}"
+ local make_checkpoint=0
+ case "$effective_mode" in
+ incremental|full)
+ make_checkpoint=1
+ ;;
+ legacy-full)
+ make_checkpoint=0
+ ;;
+ *)
+ echo "Unknown mode: $effective_mode"
+ cleanup
+ exit 1
+ ;;
+ esac
+
+ # When incremental, make sure the parent checkpoint is registered with
libvirt. CloudStack
+ # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory
checkpoint
+ # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it
on start). A
+ # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already
exists" — so the
+ # parent must be re-registered with --redefine. libvirt only needs the
checkpoint name and a
+ # creationTime for a redefine (the value need not be accurate — checkpoints
are ephemeral),
+ # so we synthesize a minimal XML on the fly instead of persisting the full
checkpoint dump.
+ #
+ # First verify the parent bitmap actually exists on the running qcow2 — it
can be absent after
+ # a migration even though the orchestrator's active_checkpoint says it
should be there. If it
+ # is gone, fall back to a full backup rather than letting backup-begin fail
below.
+ if [[ "$effective_mode" == "incremental" ]]; then
+ # The parent bitmap must be present on EVERY disk's qcow2, not just one of
them. A volume
+ # snapshot restore (or a partial migration) can wipe the bitmap on some
disks while leaving
+ # it on others; a plain "is the name anywhere in query-block" check passes
in that case and
+ # backup-begin then fails on the disk that is missing the bitmap. Require
the bitmap on all
+ # disks: compare the disk count to the number of disks reporting the
bitmap (tests 17/19).
+ disk_count=$(virsh -c qemu:///system domblklist "$VM" --details
2>/dev/null | awk '$2=="disk"{c++} END{print c+0}')
+ bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM"
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l
| tr -d ' ')
Review Comment:
```suggestion
bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM"
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l
| tr -d ' ') || true
```
the bitmap-presence check breaks under set -e.
When the parent bitmap is not present on any disk (volume snapshot restore),
grep -o matches nothing and exits 1. With pipefail, that makes the whole
pipeline — and thus this assignment — exit 1, and set -e kills the script right
here, before the fallback-to-full check on line 164 ever runs. And backup fails
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]