abh1sar commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3450216741


##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -324,6 +553,36 @@ while [[ $# -gt 0 ]]; do
       shift
       shift
       ;;
+    -M|--mode)
+      MODE="$2"
+      shift
+      shift
+      ;;
+    --bitmap-new)
+      BITMAP_NEW="$2"
+      shift
+      shift
+      ;;
+    --bitmap-parent)
+      BITMAP_PARENT="$2"
+      shift
+      shift
+      ;;
+    --parent-paths)
+      PARENT_PATHS="$2"
+      shift
+      shift
+      ;;
+    --rebase-target)
+      REBASE_TARGET="$2"
+      shift
+      shift
+      ;;
+    --rebase-new-backing)
+      REBASE_NEW_BACKING="$2"
+      shift
+      shift
+      ;;

Review Comment:
   remove these as well



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -335,8 +594,13 @@ while [[ $# -gt 0 ]]; do
   esac
 done
 
-# Perform Initial sanity checks
-sanity_checks
+# QEMU >= 4.2 and libvirt >= 7.2 are only required for backup-begin 
(incremental
+# checkpoints and per-bitmap exports). Legacy full-only backups, plus delete /
+# stats / rebase operations, run on older versions just fine. Gate the version
+# check to the paths that actually need it to preserve backward compatibility.
+if [ "$OP" = "backup" ] && [ -n "$MODE" ]; then
+  sanity_checks
+fi

Review Comment:
   This doesn't look right.
   `sanity_checks` was being called unconditionally before. It has nothing to 
do with incremental backups.
   This change should be removed. 



##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -168,6 +200,276 @@ protected Host getVMHypervisorHost(VirtualMachine vm) {
         return 
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
 vm.getDataCenterId());
     }
 
+    /**
+     * Returned by {@link #decideChain(VirtualMachine)} to describe the next 
backup's place in
+     * the chain: full vs incremental, the bitmap name to create, and (for 
incrementals) the
+     * parent bitmap and parent file path.
+     */
+    static final class ChainDecision {
+        final String mode;            // "full" or "incremental"
+        final String bitmapNew;
+        final String bitmapParent;    // null for full
+        // Per-volume parent backup file paths, one per current VM volume in 
deviceId order.
+        // null/empty for full. Each volume needs its own parent file because 
backup files
+        // are named after each volume's own UUID (root.<uuid>.qcow2 / 
datadisk.<uuid>.qcow2).
+        final List<String> parentPaths;
+        final String chainId;         // chain identifier this backup belongs 
to
+        final int chainPosition;      // 0 for full, N for the Nth incremental 
in the chain
+
+        private ChainDecision(String mode, String bitmapNew, String 
bitmapParent, List<String> parentPaths,
+                              String chainId, int chainPosition) {
+            this.mode = mode;
+            this.bitmapNew = bitmapNew;
+            this.bitmapParent = bitmapParent;
+            this.parentPaths = parentPaths;
+            this.chainId = chainId;
+            this.chainPosition = chainPosition;
+        }
+
+        static ChainDecision fullStart(String bitmapName) {
+            return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName, 
null, null,
+                    UUID.randomUUID().toString(), 0);
+        }
+
+        static ChainDecision incremental(String bitmapNew, String 
bitmapParent, List<String> parentPaths,
+                                         String chainId, int chainPosition) {
+            return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL, 
bitmapNew, bitmapParent,
+                    parentPaths, chainId, chainPosition);
+        }
+
+        boolean isIncremental() {
+            return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode);
+        }
+    }
+
+    /**
+     * Decides whether the next backup for {@code vm} should be a fresh full 
or an incremental
+     * appended to the existing chain. Stopped VMs are always full (libvirt 
{@code backup-begin}
+     * requires a running QEMU process). The {@code nas.backup.full.every} 
ConfigKey controls
+     * how many backups (full + incrementals) form one chain before a new full 
is forced.
+     *
+     * <p>The decision is anchored on the VM's {@code 
nas.active_checkpoint_id} detail, which
+     * records the bitmap that currently exists on the running QEMU. After a 
restore that
+     * detail is cleared, so the next backup is automatically full — even 
though there may be
+     * a more recent "last backup taken" row in the database. The decision 
deliberately avoids
+     * relying on "last backup taken", because that row is misleading after a 
restore.</p>
+     */
+    protected ChainDecision decideChain(VirtualMachine vm) {
+        final String newBitmap = "backup-" + System.currentTimeMillis() / 
1000L;
+
+        // Master switch — when the operator disables incrementals at the zone 
level every
+        // backup is taken as a fresh full. Existing chains stay restorable 
because each
+        // backup's metadata is kept independently; restoring an incremental 
still walks its
+        // own chain (the per-backup chain_id / parent_backup_id details 
persist regardless
+        // of the live config). The next backup with this flag back on starts 
a new chain.
+        Boolean incrementalEnabled = 
NASBackupIncrementalEnabled.valueIn(vm.getDataCenterId());
+        if (incrementalEnabled == null || !incrementalEnabled) {
+            return ChainDecision.fullStart(newBitmap);

Review Comment:
   Also return mode as `legacy-full` as suggested in the other comment



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -113,20 +131,86 @@ backup_running_vm() {
   mount_operation
   mkdir -p "$dest" || { echo "Failed to create backup directory $dest"; exit 
1; }
 
+  # Determine effective mode for this run.
+  # Legacy callers (no -M argument) get the original full-only behavior with 
no checkpoint.
+  # The Java wrapper (LibvirtTakeBackupCommandWrapper) pre-validates required 
args before
+  # invoking the script; the case below is a defensive fallback for direct 
invocations.
+  local effective_mode="${MODE:-legacy-full}"
+  local make_checkpoint=0
+  case "$effective_mode" in
+    incremental|full)
+      make_checkpoint=1
+      ;;
+    legacy-full)
+      make_checkpoint=0
+      ;;
+    *)
+      echo "Unknown mode: $effective_mode"
+      cleanup
+      exit 1
+      ;;
+  esac
+
+  # When incremental, make sure the parent checkpoint is registered with 
libvirt. CloudStack
+  # rebuilds the domain XML on every VM start, which wipes libvirt's in-memory 
checkpoint
+  # registry, while the dirty bitmap persists on the qcow2 (QEMU re-loads it 
on start). A
+  # fresh checkpoint-create cannot be used then — QEMU reports "Bitmap already 
exists" — and
+  # qemu-img cannot drop the bitmap on a running disk (the image is 
write-locked). The parent
+  # must instead be re-registered with --redefine, using the FULL checkpoint 
XML this script
+  # saved alongside the parent backup when it was taken (a minimal/synthesized 
XML is rejected
+  # by libvirt's checkpoint schema on redefine, so the full dump is required).
+  if [[ "$effective_mode" == "incremental" ]]; then
+    if ! virsh -c qemu:///system checkpoint-list "$VM" --name 2>/dev/null | 
grep -qx "$BITMAP_PARENT"; then
+      parent_first="${PARENT_PATHS%%,*}"
+      parent_cp_xml="$(dirname "$parent_first")/$BITMAP_PARENT.checkpoint.xml"
+      if [[ -f "$parent_cp_xml" ]] && \
+         virsh -c qemu:///system checkpoint-create "$VM" --xmlfile 
"$parent_cp_xml" --redefine > /dev/null 2>&1; then
+        : # parent checkpoint re-registered; the incremental can proceed 
against it
+      else
+        # No saved checkpoint XML (e.g. a backup taken before this fix) or 
redefine failed.
+        # Signal the Java wrapper to retry as a full backup so the chain 
restarts cleanly
+        # instead of failing the backup. The wrapper is responsible for the 
retry and for
+        # recording incrementalFallback=true on the resulting BackupAnswer.
+        log -e "incremental: parent checkpoint $BITMAP_PARENT could not be 
re-registered — exiting $EXIT_INCREMENTAL_UNSUPPORTED for caller-driven 
fallback"
+        cleanup
+        exit $EXIT_INCREMENTAL_UNSUPPORTED
+      fi

Review Comment:
   Can't we just set $effective_mode="full" here instead of returning? I think 
that will make the code much simpler. 
   We can remove the below check as well since backup of stopped VM should not 
be sent using incremental mode anyway.
   ```
   backup_stopped_vm() {
     # Stopped VMs cannot use libvirt's backup-begin (no QEMU process). Take a 
full
     # backup via qemu-img convert. If the caller asked for incremental, signal 
the
     # Java wrapper to retry as full and record the fallback on the 
BackupAnswer.
     if [[ "$MODE" == "incremental" ]]; then
       log -e "incremental: VM stopped — exiting $EXIT_INCREMENTAL_UNSUPPORTED 
for caller-driven fallback to full"
       exit $EXIT_INCREMENTAL_UNSUPPORTED
     fi
   ```
   With these two changes we can remove EXIT_INCREMENTAL_UNSUPPORTED completely 
from the script and the wrapper 



##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -238,6 +413,51 @@ delete_backup() {
   rmdir $mount_point
 }
 
+# Rebase an existing backup qcow2 (e.g. a chain child) onto a new backing 
parent so the chain
+# stays valid after a middle backup is deleted. Both --target and 
--new-backing are passed as
+# paths relative to the NAS mount root; we resolve them under $mount_point and 
write the new
+# backing reference relative to the target file's directory (mount points are 
ephemeral).
+rebase_backup() {
+  mount_operation
+
+  if [[ -z "$REBASE_TARGET" || -z "$REBASE_NEW_BACKING" ]]; then
+    echo "rebase requires --rebase-target and --rebase-new-backing"
+    cleanup
+    exit 1
+  fi
+
+  local target_abs="$mount_point/$REBASE_TARGET"
+  local backing_abs="$mount_point/$REBASE_NEW_BACKING"
+  if [[ ! -f "$target_abs" ]]; then
+    echo "Rebase target file does not exist: $target_abs"
+    cleanup
+    exit 1
+  fi
+  if [[ ! -f "$backing_abs" ]]; then
+    echo "New backing file does not exist: $backing_abs"
+    cleanup
+    exit 1
+  fi
+  local target_dir
+  target_dir=$(dirname "$target_abs")
+  local backing_rel
+  backing_rel=$(realpath --relative-to="$target_dir" "$backing_abs")
+
+  # SAFE rebase (no -u): qemu-img reads blocks from the old chain and writes 
them into
+  # the target where the new chain doesn't cover them. This is the "merge 
into" semantic
+  # required when we're about to delete the old immediate parent — the target 
needs to
+  # absorb the to-be-deleted parent's blocks so the chain remains consistent 
against the
+  # new (further-back) backing.
+  if ! qemu-img rebase -b "$backing_rel" -F qcow2 "$target_abs" >> "$logFile" 
2> >(cat >&2); then
+    echo "qemu-img rebase failed for $target_abs onto $backing_rel"
+    cleanup
+    exit 1
+  fi
+  sync
+  umount $mount_point
+  rmdir $mount_point

Review Comment:
   rebase_backup() is not being called now. Remove all occurrences from script.



##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -205,12 +507,20 @@ public Pair<Boolean, Backup> takeBackup(final 
VirtualMachine vm, Boolean quiesce
         final String backupPath = String.format("%s/%s", vm.getInstanceName(),
                 new 
SimpleDateFormat("yyyy.MM.dd.HH.mm.ss").format(creationDate));
 
-        BackupVO backupVO = createBackupObject(vm, backupPath);
+        // Decide full vs incremental for this backup. Stopped VMs are always 
full
+        // (libvirt backup-begin requires a running QEMU process).
+        ChainDecision decision = decideChain(vm);
+
+        BackupVO backupVO = createBackupObject(vm, backupPath, 
decision.isIncremental() ? "INCREMENTAL" : "FULL");
         TakeBackupCommand command = new 
TakeBackupCommand(vm.getInstanceName(), backupPath);
         command.setBackupRepoType(backupRepository.getType());
         command.setBackupRepoAddress(backupRepository.getAddress());
         command.setMountOptions(backupRepository.getMountOptions());
         command.setQuiesce(quiesceVM);
+        command.setMode(decision.mode);
+        command.setBitmapNew(decision.bitmapNew);
+        command.setBitmapParent(decision.bitmapParent);
+        command.setParentPaths(decision.parentPaths);

Review Comment:
   Let's have 3 modes here as well: incremental | full and | legacy_full
   And not set any bitmap fields or parent path if the mode is legacy_full.



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/wrapper/LibvirtTakeBackupCommandWrapper.java:
##########
@@ -69,8 +84,60 @@ public Answer execute(TakeBackupCommand command, 
LibvirtComputingResource libvir
             }
         }
 
-        List<String[]> commands = new ArrayList<>();
-        commands.add(new String[]{
+        final String requestedMode = command.getMode();
+        Pair<Integer, String> result = 
runBackupScript(libvirtComputingResource, command, vmName, backupRepoType, 
backupRepoAddress,
+                mountOptions, backupPath, diskPaths, requestedMode,
+                command.getBitmapNew(), command.getBitmapParent(), 
command.getParentPaths(), timeout);
+
+        boolean incrementalFallback = false;
+        if (result.first() == EXIT_INCREMENTAL_UNSUPPORTED && 
MODE_INCREMENTAL.equals(requestedMode)) {
+            // Script told us the incremental can't proceed (parent checkpoint 
can't be
+            // re-registered, or VM is stopped). Re-invoke as full with the 
same --bitmap-new
+            // so the chain restarts cleanly. Drop --bitmap-parent + 
--parent-paths since
+            // they no longer apply.
+            logger.info("nasbackup.sh signalled incremental unsupported for VM 
" + vmName + " — retrying as full");
+            result = runBackupScript(libvirtComputingResource, command, 
vmName, backupRepoType, backupRepoAddress,
+                    mountOptions, backupPath, diskPaths, MODE_FULL,
+                    command.getBitmapNew(), null, null, timeout);
+            incrementalFallback = true;
+        }
+
+        boolean bitmapSeeded = true;
+        if (result.first() == EXIT_BITMAP_NOT_SEEDED) {
+            // Backup file is valid; the host just has no bitmap. Treat as 
success but
+            // mark bitmapCreated=null so the orchestrator clears 
active_checkpoint_id.
+            bitmapSeeded = false;
+        } else if (result.first() != 0) {
+            logger.debug("Failed to take VM backup: " + result.second());
+            BackupAnswer answer = new BackupAnswer(command, false, 
result.second().trim());
+            if (result.first() == EXIT_CLEANUP_FAILED) {
+                logger.debug("Backup cleanup failed");
+                answer.setNeedsCleanup(true);
+            }
+            return answer;
+        }
+
+        String stdout = result.second().trim();
+        long backupSize = parseBackupSize(stdout, diskPaths);
+
+        BackupAnswer answer = new BackupAnswer(command, true, stdout);
+        answer.setSize(backupSize);
+        // bitmapCreated mirrors what we asked the script to create — except 
when the
+        // script exited EXIT_BITMAP_NOT_SEEDED, in which case the host has no 
bitmap
+        // and the orchestrator must clear active_checkpoint_id.
+        answer.setBitmapCreated(bitmapSeeded ? command.getBitmapNew() : null);

Review Comment:
   bitmapSeeded is by default true. Event if incremental backup feature is not 
enabled and bitmaps are not actually created.
   
   This causes nas.active_checkpoint_id to be set in vm_details without the 
feature being enabled.
   
   I suggest returning an error if bitmaps cannot be created instead of 
EXIT_BITMAP_NOT_SEEDED.
   
   It is only being used in case of stopped VMs. bitmap --add operation should 
not fail ideally and it should be ok to fail the backup if bitmap --add fails 
for any disk, so that the underlying problem is dealt with instead of forcing 
full backups.
   
   And for the cases where the incremental feature is not enabled, we should 
not be sending `bitmap_new` in  the `TakeBackupCommand`



##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -168,6 +200,276 @@ protected Host getVMHypervisorHost(VirtualMachine vm) {
         return 
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
 vm.getDataCenterId());
     }
 
+    /**
+     * Returned by {@link #decideChain(VirtualMachine)} to describe the next 
backup's place in
+     * the chain: full vs incremental, the bitmap name to create, and (for 
incrementals) the
+     * parent bitmap and parent file path.
+     */
+    static final class ChainDecision {
+        final String mode;            // "full" or "incremental"
+        final String bitmapNew;
+        final String bitmapParent;    // null for full
+        // Per-volume parent backup file paths, one per current VM volume in 
deviceId order.
+        // null/empty for full. Each volume needs its own parent file because 
backup files
+        // are named after each volume's own UUID (root.<uuid>.qcow2 / 
datadisk.<uuid>.qcow2).
+        final List<String> parentPaths;
+        final String chainId;         // chain identifier this backup belongs 
to
+        final int chainPosition;      // 0 for full, N for the Nth incremental 
in the chain
+
+        private ChainDecision(String mode, String bitmapNew, String 
bitmapParent, List<String> parentPaths,
+                              String chainId, int chainPosition) {
+            this.mode = mode;
+            this.bitmapNew = bitmapNew;
+            this.bitmapParent = bitmapParent;
+            this.parentPaths = parentPaths;
+            this.chainId = chainId;
+            this.chainPosition = chainPosition;
+        }
+
+        static ChainDecision fullStart(String bitmapName) {
+            return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName, 
null, null,
+                    UUID.randomUUID().toString(), 0);
+        }
+
+        static ChainDecision incremental(String bitmapNew, String 
bitmapParent, List<String> parentPaths,
+                                         String chainId, int chainPosition) {
+            return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL, 
bitmapNew, bitmapParent,
+                    parentPaths, chainId, chainPosition);
+        }
+
+        boolean isIncremental() {
+            return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode);
+        }
+    }
+
+    /**
+     * Decides whether the next backup for {@code vm} should be a fresh full 
or an incremental
+     * appended to the existing chain. Stopped VMs are always full (libvirt 
{@code backup-begin}
+     * requires a running QEMU process). The {@code nas.backup.full.every} 
ConfigKey controls
+     * how many backups (full + incrementals) form one chain before a new full 
is forced.
+     *
+     * <p>The decision is anchored on the VM's {@code 
nas.active_checkpoint_id} detail, which
+     * records the bitmap that currently exists on the running QEMU. After a 
restore that
+     * detail is cleared, so the next backup is automatically full — even 
though there may be
+     * a more recent "last backup taken" row in the database. The decision 
deliberately avoids
+     * relying on "last backup taken", because that row is misleading after a 
restore.</p>
+     */
+    protected ChainDecision decideChain(VirtualMachine vm) {
+        final String newBitmap = "backup-" + System.currentTimeMillis() / 
1000L;
+
+        // Master switch — when the operator disables incrementals at the zone 
level every
+        // backup is taken as a fresh full. Existing chains stay restorable 
because each
+        // backup's metadata is kept independently; restoring an incremental 
still walks its
+        // own chain (the per-backup chain_id / parent_backup_id details 
persist regardless
+        // of the live config). The next backup with this flag back on starts 
a new chain.
+        Boolean incrementalEnabled = 
NASBackupIncrementalEnabled.valueIn(vm.getDataCenterId());
+        if (incrementalEnabled == null || !incrementalEnabled) {
+            return ChainDecision.fullStart(newBitmap);

Review Comment:
   Don't set `newBitmap` if incremental backups are not enabled.
   Otherwise it causes the bitmap information being persisted in backup_details 
and vm_instance_details even though the bitmap didn't actually get persisted on 
disk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to