Copilot commented on code in PR #13074:
URL: https://github.com/apache/cloudstack/pull/13074#discussion_r3304722954
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -191,24 +337,37 @@ backup_running_vm() {
virsh -c qemu:///system domblklist "$VM" --details 2>/dev/null | awk
'$2=="disk"{print $3, $4}'
)
- rm -f $dest/backup.xml
+ rm -f $dest/backup.xml $dest/checkpoint.xml
sync
# Print statistics
virsh -c qemu:///system domjobinfo $VM --completed
du -sb $dest | cut -f1
+ if [[ -n "$BITMAP_NEW" ]]; then
+ # Echo the bitmap name on its own line so the Java caller can capture it
for backup_details.
+ echo "BITMAP_CREATED=$BITMAP_NEW"
+ fi
umount $mount_point
rmdir $mount_point
}
backup_stopped_vm() {
+ # Stopped VMs cannot use libvirt's backup-begin (no QEMU process). Take a
full
+ # backup via qemu-img convert. If the caller asked for incremental, fall back
+ # to full and signal the fallback so the orchestrator can record it as a full
+ # in the chain.
+ if [[ "$MODE" == "incremental" ]]; then
+ echo "INCREMENTAL_FALLBACK=full (VM stopped — incremental requires running
VM)" >&2
+ fi
Review Comment:
INCREMENTAL_FALLBACK is echoed to stderr, but the KVM agent wrapper
captures/parses only stdout from Script.executePipedCommands (it reads the last
process' stdout only). As a result, incrementalFallback will never be detected
by LibvirtTakeBackupCommandWrapper. Emit this marker on stdout (or change the
wrapper to also consume stderr) so management can reliably record the backup as
FULL when fallback happens.
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -278,6 +505,13 @@ cleanup() {
function usage {
echo ""
echo "Usage: $0 -o <operation> -v|--vm <domain name> -t <storage type> -s
<storage address> -m <mount options> -p <backup path> -d <disks path>
-q|--quiesce <true|false>"
+ echo " [-M|--mode <full|incremental>] [--bitmap-new <name>]
[--bitmap-parent <name>] [--parent-path <path>]"
+ echo ""
+ echo "Incremental backup options (running VMs only; requires QEMU >= 4.2 and
libvirt >= 7.2):"
+ echo " -M|--mode full Take a full backup AND create a checkpoint
(--bitmap-new required) for future incrementals."
+ echo " -M|--mode incremental Take an incremental backup since
--bitmap-parent and create new checkpoint --bitmap-new."
+ echo " Requires --bitmap-parent, --bitmap-new, and
--parent-path (parent backup file for rebase)."
Review Comment:
The usage/help text advertises `--parent-path`, but the implemented option
is `--parent-paths` (plural, comma-separated list). This makes `-h/--help`
misleading and can lead to incorrect invocations. Update the help strings to
match the actual flag name and semantics.
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -85,6 +88,27 @@ public class NASBackupProvider extends AdapterBase
implements BackupProvider, Co
true,
BackupFrameworkEnabled.key());
+ ConfigKey<Integer> NASBackupFullEvery = new ConfigKey<>("Advanced",
Integer.class,
+ "nas.backup.full.every",
+ "10",
+ "Take a full NAS backup every Nth backup; remaining backups in
between are incremental. " +
+ "Counts backups, not days, so it works for hourly, daily,
and ad-hoc schedules. " +
+ "Set to 1 to disable incrementals (every backup is full).",
+ true,
+ ConfigKey.Scope.Zone,
+ BackupFrameworkEnabled.key());
+
+ ConfigKey<Boolean> NASBackupIncrementalEnabled = new
ConfigKey<>("Advanced", Boolean.class,
+ "nas.backup.incremental.enabled",
+ "true",
Review Comment:
`nas.backup.incremental.enabled` is introduced with default value "true".
The PR description/backwards-compatibility section describes incremental as
opt-in and legacy behavior as full-only unless enabled; defaulting to true
changes behavior for existing zones immediately after upgrade. Consider
defaulting this to "false" (and/or gating incrementals on an explicit zone
setting).
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupChainKeys.java:
##########
@@ -0,0 +1,67 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+package org.apache.cloudstack.backup;
+
+/**
+ * Keys used by the NAS backup provider when storing incremental-chain metadata
+ * in the existing {@code backup_details} key/value table. Stored here (not on
+ * the {@code backups} table) so other providers do not need a schema change to
+ * support their own incremental implementations.
+ */
+public final class NASBackupChainKeys {
+
+ /** UUID of the parent backup (full or previous incremental). Empty for
full backups. */
+ public static final String PARENT_BACKUP_ID = "nas.parent_backup_id";
+
+ /** QEMU dirty-bitmap name created by this backup, used as the {@code
<incremental>} reference for the next one. */
+ public static final String BITMAP_NAME = "nas.bitmap_name";
+
+ /** Identifier shared by every backup in the same chain (the full anchors
a chain; its incrementals inherit the id). */
+ public static final String CHAIN_ID = "nas.chain_id";
+
+ /** Position within the chain: 0 for the full, 1 for the first
incremental, and so on. */
+ public static final String CHAIN_POSITION = "nas.chain_position";
+
+ /** Backup type marker: {@value #TYPE_FULL} or {@value #TYPE_INCREMENTAL}.
Mirrors {@code backups.type} for fast lookup without a join. */
+ public static final String TYPE = "nas.type";
+
+ public static final String TYPE_FULL = "full";
+ public static final String TYPE_INCREMENTAL = "incremental";
Review Comment:
NASBackupChainKeys.TYPE is documented as mirroring `backups.type`, but
NASBackupProvider persists lower-case values ("full"/"incremental") for
decision.mode while backups.type uses upper-case ("FULL"/"INCREMENTAL"). This
mismatch makes the comment inaccurate and can confuse future consumers of
backup_details. Either persist the same values as backups.type, or update the
docs/constants to reflect the intended casing.
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -222,10 +381,33 @@ backup_stopped_vm() {
echo "qemu-img convert failed for $disk $output"
cleanup
fi
+
+ # Pre-seed a persistent bitmap on the source disk so the NEXT backup (taken
+ # after this VM is started again) can be incremental against the qcow2 we
+ # just wrote. Without this, every backup after a stopped-VM backup would
+ # fall back to full because no parent bitmap exists on the host yet.
+ # Addresses abh1sar review at nasbackup.sh:513.
+ # Only applies to file-backed qcow2 sources — RBD/LINSTOR have their own
+ # snapshot mechanisms and qemu-img bitmap is not the right primitive there.
+ if [[ -n "$BITMAP_NEW" && "$disk" != rbd:* && "$disk" !=
/dev/drbd/by-res/* ]]; then
+ if qemu-img bitmap --add "$disk" "$BITMAP_NEW" 2>>"$logFile"; then
+ bitmap_seeded=1
+ else
+ echo "WARN: failed to pre-seed bitmap $BITMAP_NEW on $disk — next
backup will fall back to full" >&2
+ fi
+ fi
+
name="datadisk"
done
sync
+ # Surface the bitmap name we created so the orchestrator can persist it as
+ # the VM's active_checkpoint_id. Empty when sources weren't file-backed or
+ # qemu-img bitmap failed — orchestrator handles either case.
+ if [[ "$bitmap_seeded" == "1" ]]; then
+ echo "BITMAP_CREATED=$BITMAP_NEW" >&2
Review Comment:
In backup_stopped_vm, BITMAP_CREATED is echoed to stderr (`>&2`), but
LibvirtTakeBackupCommandWrapper only inspects stdout for BITMAP_CREATED=
markers (Script.executePipedCommands captures stdout only). This prevents
NASBackupProvider from persisting/updating nas.active_checkpoint_id after an
offline backup, breaking the intended stop/start -> incremental flow. Emit
BITMAP_CREATED on stdout (consistent with backup_running_vm).
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -135,10 +234,16 @@ backup_running_vm() {
fi
fi
- # Start push backup
+ # Start push backup, atomically registering the new checkpoint when
applicable.
local backup_begin=0
- if virsh -c qemu:///system backup-begin --domain $VM --backupxml
$dest/backup.xml 2>&1 > /dev/null; then
- backup_begin=1;
+ if [[ $make_checkpoint -eq 1 ]]; then
+ if virsh -c qemu:///system backup-begin --domain $VM --backupxml
$dest/backup.xml --checkpointxml $dest/checkpoint.xml 2>&1 > /dev/null; then
+ backup_begin=1;
Review Comment:
The redirection order `2>&1 > /dev/null` leaves stderr pointing at the
original stdout, so virsh error output may leak to the agent stderr and won’t
be captured in the Script.executePipedCommands stdout result. Swap to `>
/dev/null 2>&1` so failures return useful details and output handling is
consistent.
##########
plugins/backup/nas/src/main/java/org/apache/cloudstack/backup/NASBackupProvider.java:
##########
@@ -168,6 +198,284 @@ protected Host getVMHypervisorHost(VirtualMachine vm) {
return
resourceManager.findOneRandomRunningHostByHypervisor(Hypervisor.HypervisorType.KVM,
vm.getDataCenterId());
}
+ /**
+ * Returned by {@link #decideChain(VirtualMachine)} to describe the next
backup's place in
+ * the chain: full vs incremental, the bitmap name to create, and (for
incrementals) the
+ * parent bitmap and parent file path.
+ */
+ static final class ChainDecision {
+ final String mode; // "full" or "incremental"
+ final String bitmapNew;
+ final String bitmapParent; // null for full
+ // Per-volume parent backup file paths, one per current VM volume in
deviceId order.
+ // null/empty for full. Replaces the single parentPath field — each
volume needs its
+ // own parent file because backup files are named after each volume's
own UUID
+ // (root.<uuid>.qcow2 / datadisk.<uuid>.qcow2), abh1sar review at line
340.
+ final List<String> parentPaths;
+ final String chainId; // chain identifier this backup belongs
to
+ final int chainPosition; // 0 for full, N for the Nth incremental
in the chain
+
+ private ChainDecision(String mode, String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ this.mode = mode;
+ this.bitmapNew = bitmapNew;
+ this.bitmapParent = bitmapParent;
+ this.parentPaths = parentPaths;
+ this.chainId = chainId;
+ this.chainPosition = chainPosition;
+ }
+
+ static ChainDecision fullStart(String bitmapName) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_FULL, bitmapName,
null, null,
+ UUID.randomUUID().toString(), 0);
+ }
+
+ static ChainDecision incremental(String bitmapNew, String
bitmapParent, List<String> parentPaths,
+ String chainId, int chainPosition) {
+ return new ChainDecision(NASBackupChainKeys.TYPE_INCREMENTAL,
bitmapNew, bitmapParent,
+ parentPaths, chainId, chainPosition);
+ }
+
+ boolean isIncremental() {
+ return NASBackupChainKeys.TYPE_INCREMENTAL.equals(mode);
+ }
+ }
+
+ /**
+ * Decides whether the next backup for {@code vm} should be a fresh full
or an incremental
+ * appended to the existing chain. Stopped VMs are always full (libvirt
{@code backup-begin}
+ * requires a running QEMU process). The {@code nas.backup.full.every}
ConfigKey controls
+ * how many backups (full + incrementals) form one chain before a new full
is forced.
+ *
+ * <p>The decision is anchored on the VM's {@code
nas.active_checkpoint_id} detail, which
+ * records the bitmap that currently exists on the running QEMU. After a
restore that
+ * detail is cleared, so the next backup is automatically full — even
though there may be
+ * a more recent "last backup taken" row in the database. This matches the
prescription in
+ * the PR review (avoid relying on "last backup" because that breaks after
restore).</p>
+ */
+ protected ChainDecision decideChain(VirtualMachine vm) {
+ final String newBitmap = "backup-" + System.currentTimeMillis() /
1000L;
+
+ // Master switch — when the operator disables incrementals at the zone
level every
+ // backup is taken as a fresh full. Existing chains stay restorable
because each
+ // backup's metadata is kept independently; restoring an incremental
still walks its
+ // own chain (the per-backup chain_id / parent_backup_id details
persist regardless
+ // of the live config). The next backup with this flag back on starts
a new chain.
+ Boolean incrementalEnabled =
NASBackupIncrementalEnabled.valueIn(vm.getDataCenterId());
+ if (incrementalEnabled == null || !incrementalEnabled) {
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ // Stopped VMs cannot do incrementals — script will also fall back,
but we make the
+ // decision here so we register the right type up-front.
+ if (VirtualMachine.State.Stopped.equals(vm.getState())) {
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ Integer fullEvery = NASBackupFullEvery.valueIn(vm.getDataCenterId());
+ if (fullEvery == null || fullEvery <= 1) {
+ // Disabled or every-backup-is-full mode.
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ // 1. If the VM has no active_checkpoint_id, there is no bitmap on the
host to use as
+ // a parent. This is the case after restore (we clear it), after VM
was just assigned
+ // to the offering, or on the very first backup.
+ String activeCheckpoint = readVmActiveCheckpoint(vm.getId());
+ if (activeCheckpoint == null) {
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ // 2. Find the latest BackedUp backup of this VM whose BITMAP_NAME
matches the VM's
+ // active_checkpoint_id. Only that backup is a safe parent — any
earlier backup
+ // would have a bitmap that QEMU may have rotated out. Per the
review:
+ // "The latest backup should have the bitmap_name equal to the
VM's
+ // active_checkpoint_id which will become the parent backup. If
not, force full."
+ Backup parent = findLatestBackedUpBackupWithBitmap(vm.getId(),
activeCheckpoint);
+ if (parent == null) {
+ LOG.debug("VM {} has active_checkpoint_id={} but no matching
backup found — forcing full",
+ vm.getInstanceName(), activeCheckpoint);
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ String parentChainId = readDetail(parent, NASBackupChainKeys.CHAIN_ID);
+ int parentChainPosition = chainPosition(parent);
+ if (parentChainId == null || parentChainPosition == Integer.MAX_VALUE)
{
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ // Force a fresh full when the chain has reached the configured length.
+ if (parentChainPosition + 1 >= fullEvery) {
+ return ChainDecision.fullStart(newBitmap);
+ }
+
+ // The script needs the parent backup's on-NAS file path PER VOLUME so
it can rebase
+ // each new qcow2 onto the matching parent. The paths are stored
relative to the NAS
+ // mount root — the script resolves them inside its mount session.
When alignment
+ // fails (volume count changed, etc.) compose returns null and we fall
back to full
+ // so we don't risk corrupting the chain.
+ List<String> parentPaths = composeParentBackupPaths(parent,
vm.getId());
+ if (parentPaths == null) {
+ LOG.debug("VM {} parent backup {} volume layout no longer matches
current VM — forcing full",
+ vm.getInstanceName(), parent.getUuid());
+ return ChainDecision.fullStart(newBitmap);
+ }
+ return ChainDecision.incremental(newBitmap, activeCheckpoint,
parentPaths,
+ parentChainId, parentChainPosition + 1);
+ }
+
+ /**
+ * Read the {@code nas.active_checkpoint_id} VM detail. Returns {@code
null} when no detail
+ * exists (post-restore, first backup, or after explicit reset).
+ */
+ private String readVmActiveCheckpoint(long vmId) {
+ VMInstanceDetailVO d = vmInstanceDetailsDao.findDetail(vmId,
NASBackupChainKeys.VM_ACTIVE_CHECKPOINT_ID);
+ if (d == null) {
+ return null;
+ }
+ String v = d.getValue();
+ return (v == null || v.isEmpty()) ? null : v;
+ }
+
+ /**
+ * Locate the most-recent {@code BackedUp} backup for {@code vmId} whose
bitmap name equals
+ * {@code bitmapName}. Used by {@link #decideChain} to anchor the next
incremental.
+ */
+ private Backup findLatestBackedUpBackupWithBitmap(long vmId, String
bitmapName) {
+ List<Backup> history = backupDao.listByVmId(null, vmId);
+ if (history == null || history.isEmpty()) {
+ return null;
+ }
+ history.sort(Comparator.comparing(Backup::getDate).reversed());
+ for (Backup b : history) {
+ if (!Backup.Status.BackedUp.equals(b.getStatus())) {
+ continue;
+ }
+ if (bitmapName.equals(readDetail(b,
NASBackupChainKeys.BITMAP_NAME))) {
+ return b;
+ }
+ }
+ return null;
+ }
+
+ private String readDetail(Backup backup, String key) {
+ BackupDetailVO d = backupDetailsDao.findDetail(backup.getId(), key);
+ return d == null ? null : d.getValue();
+ }
+
+ /**
+ * Compose the on-NAS path of EVERY parent backup file (one per VM volume)
in the same
+ * order the script will iterate the current VM's disks (deviceId asc).
Relative to the
+ * NAS mount, matches the layout written by {@code nasbackup.sh}:
+ * first disk -> {@code <backupPath>/root.<volUuid>.qcow2}
+ * others -> {@code <backupPath>/datadisk.<volUuid>.qcow2}
+ *
+ * Returns {@code null} if the parent's stored volume list can't be
aligned with the
+ * current VM's volumes (count mismatch / different volume identities). In
that case the
+ * caller should force a fresh full so we don't accidentally rebase a data
disk onto the
+ * root parent — exactly the bug abh1sar flagged at line 340.
+ */
+ private List<String> composeParentBackupPaths(Backup parent, long vmId) {
+ // backupPath is stored as externalId by createBackupObject — e.g.
+ // "i-2-1234-VM/2026.04.27.13.45.00".
+ String dir = parent.getExternalId();
+ if (dir == null || dir.isEmpty()) {
+ return null;
+ }
+
+ // Read the parent's backed-up volume list (uuid order = deviceId
order at the time
+ // the parent was taken). The script names files as root.<uuid>.qcow2
for the first
+ // entry and datadisk.<uuid>.qcow2 for subsequent entries — match that
here.
+ List<Backup.VolumeInfo> parentVols = parent.getBackedUpVolumes();
+ if (parentVols == null || parentVols.isEmpty()) {
+ return null;
+ }
+
+ // Sanity: the current VM must have the same number of volumes. If it
doesn't (volume
+ // added or removed since the parent), positional alignment is unsafe
— fall back to
+ // full at the caller.
+ List<VolumeVO> currentVols = volumeDao.findByInstance(vmId);
+ if (currentVols == null || currentVols.size() != parentVols.size()) {
+ return null;
+ }
+
+ List<String> paths = new ArrayList<>(parentVols.size());
+ for (int i = 0; i < parentVols.size(); i++) {
+ String volUuid = parentVols.get(i).getUuid();
+ if (volUuid == null || volUuid.isEmpty()) {
+ return null;
+ }
Review Comment:
composeParentBackupPaths() only checks that the current VM has the same
number of volumes as the parent backup, but it doesn’t verify identity/order
alignment. Since VolumeDao.findByInstance() has no ordering guarantee,
positional alignment can be wrong and can rebase a disk onto the wrong parent
file (chain corruption). Sort current volumes by deviceId and verify their UUID
sequence matches parent.getBackedUpVolumes() before returning parentPaths;
otherwise return null to force a full.
##########
scripts/vm/hypervisor/kvm/nasbackup.sh:
##########
@@ -135,10 +234,16 @@ backup_running_vm() {
fi
fi
- # Start push backup
+ # Start push backup, atomically registering the new checkpoint when
applicable.
local backup_begin=0
- if virsh -c qemu:///system backup-begin --domain $VM --backupxml
$dest/backup.xml 2>&1 > /dev/null; then
- backup_begin=1;
+ if [[ $make_checkpoint -eq 1 ]]; then
+ if virsh -c qemu:///system backup-begin --domain $VM --backupxml
$dest/backup.xml --checkpointxml $dest/checkpoint.xml 2>&1 > /dev/null; then
+ backup_begin=1;
+ fi
+ else
+ if virsh -c qemu:///system backup-begin --domain $VM --backupxml
$dest/backup.xml 2>&1 > /dev/null; then
+ backup_begin=1;
Review Comment:
Same redirection issue here (`2>&1 > /dev/null`) means virsh backup-begin
stderr isn’t captured in the returned stdout and can leak to the agent stderr.
Use `> /dev/null 2>&1` so error details are captured consistently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]