abh1sar commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3507774229


##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +122,104 @@ backup_running_vm() {
   mount_operation
   mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit 
1; }
 
+  # Determine effective mode for this run.
+  # Legacy callers (no -M argument) get the original full-only behavior with 
no checkpoint.
+  # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required 
args before
+  # invoking the script; the case below is a defensive fallback for direct 
invocations.
+  local effective_mode="${MODE:-legacy-full}"
+  local make_checkpoint=0
+  case "$effective_mode" in
+    incremental|full)
+      make_checkpoint=1
+      ;;
+    legacy-full)
+      make_checkpoint=0
+      ;;
+    *)
+      echo "Unknown mode: $effective_mode"
+      cleanup
+      exit 1
+      ;;
+  esac
+
+  # When incremental, make sure the parent checkpoint is registered with 
libvirt. CloudStack
+  # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory 
checkpoint
+  # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it 
on start). A
+  # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already 
exists" — so the
+  # parent must be re-registered with --redefine. libvirt only needs the 
checkpoint name and a
+  # creationTime for a redefine (the value need not be accurate — checkpoints 
are ephemeral),
+  # so we synthesize a minimal XML on the fly instead of persisting the full 
checkpoint dump.
+  #
+  # First verify the parent bitmap actually exists on the running qcow2 — it 
can be absent after
+  # a migration even though the orchestrator's active_checkpoint says it 
should be there. If it
+  # is gone, fall back to a full backup rather than letting backup-begin fail 
below.
+  if [[ "$effective_mode" == "incremental" ]]; then
+    # The parent bitmap must be present on EVERY disk's qcow2, not just one of 
them. A volume
+    # snapshot restore (or a partial migration) can wipe the bitmap on some 
disks while leaving
+    # it on others; a plain "is the name anywhere in query-block" check passes 
in that case and
+    # backup-begin then fails on the disk that is missing the bitmap. Require 
the bitmap on all
+    # disks: compare the disk count to the number of disks reporting the 
bitmap (tests 17/19).
+    disk_count=$(virsh -c qemu:///system domblklist "$VM" --details 
2>/dev/null | awk '$2=="disk"{c++} END{print c+0}')
+    bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM" 
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l 
| tr -d ' ')

Review Comment:
   ```suggestion
       bitmap_count=$(virsh -c qemu:///system qemu-monitor-command "$VM" 
'{"execute":"query-block"}' 2>/dev/null | grep -o "\"$BITMAP_PARENT\"" | wc -l 
| tr -d ' ') || true
   ```
   the bitmap-presence check breaks under set -e.                               
                                                                                
                                   
   
   When the parent bitmap is not present on any disk (volume snapshot restore), 
grep -o matches nothing and exits 1. With pipefail, that makes the whole 
pipeline — and thus this assignment — exit 1, and set -e kills the script right 
here, before the fallback-to-full check on line 164 ever runs. And backup fails



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to