Copilot commented on code in PR #12666:
URL: https://github.com/apache/cloudstack/pull/12666#discussion_r2828005730


##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {
+                logger.error("Exception while rebasing incremental snapshot 
[{}] due to: [{}].", snapshotName, e.getMessage(), e);
+                throw new CloudRuntimeException(e);
+            }
+            retryRebase(snapshotName, wait, e, snapshotFile, 
parentSnapshotFile);
+        }
+    }
+
+    private void retryRebase(String snapshotName, int wait, Exception e, 
QemuImgFile snapshotFile, QemuImgFile parentSnapshotFile) {
+        logger.warn("Libvirt still has not released the lock, will wait 60 
seconds and try again later.");
+        try {
+            Thread.sleep(60*1000);
+            QemuImg qemuImg = new QemuImg(wait);
+            qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
+        } catch (LibvirtException | QemuImgException | InterruptedException 
ex) {
+            logger.error("Unable to rebase snapshot [{}].", snapshotName, ex);
             throw new CloudRuntimeException(e);

Review Comment:
   In `retryRebase`, the catch block logs `ex` but throws `new 
CloudRuntimeException(e)` (the *original* exception). This loses the actual 
retry failure cause/stacktrace for callers and can be misleading if the retry 
fails for a different reason. Throw/wrap `ex` instead (optionally preserving 
the original as suppressed) so the propagated exception reflects what actually 
failed.
   ```suggestion
               CloudRuntimeException cre = new CloudRuntimeException(ex);
               cre.addSuppressed(e);
               throw cre;
   ```



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {
+                logger.error("Exception while rebasing incremental snapshot 
[{}] due to: [{}].", snapshotName, e.getMessage(), e);
+                throw new CloudRuntimeException(e);
+            }
+            retryRebase(snapshotName, wait, e, snapshotFile, 
parentSnapshotFile);
+        }
+    }
+
+    private void retryRebase(String snapshotName, int wait, Exception e, 
QemuImgFile snapshotFile, QemuImgFile parentSnapshotFile) {
+        logger.warn("Libvirt still has not released the lock, will wait 60 
seconds and try again later.");
+        try {
+            Thread.sleep(60*1000);
+            QemuImg qemuImg = new QemuImg(wait);

Review Comment:
   The 60-second retry delay is a new hard-coded magic number. To make this 
easier to tune in different environments (and to keep it consistent with the 
existing `INCREMENTAL_SNAPSHOT_TIMEOUT` configuration), consider extracting it 
to a named constant and/or an agent property.



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {

Review Comment:
   `e.getMessage().contains(...)` can throw a NullPointerException if the 
exception message is null (e.g., some `LibvirtException`s). Consider using a 
null-safe check (this file already imports `StringUtils`, so 
`StringUtils.contains(e.getMessage(), ...)` would avoid the NPE) before 
deciding whether to retry.
   ```suggestion
               if (!StringUtils.contains(e.getMessage(), "Is another process 
using the image")) {
   ```



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {
+                logger.error("Exception while rebasing incremental snapshot 
[{}] due to: [{}].", snapshotName, e.getMessage(), e);
+                throw new CloudRuntimeException(e);
+            }
+            retryRebase(snapshotName, wait, e, snapshotFile, 
parentSnapshotFile);
+        }
+    }
+
+    private void retryRebase(String snapshotName, int wait, Exception e, 
QemuImgFile snapshotFile, QemuImgFile parentSnapshotFile) {
+        logger.warn("Libvirt still has not released the lock, will wait 60 
seconds and try again later.");
+        try {
+            Thread.sleep(60*1000);

Review Comment:
   `rebaseSnapshot` now has retry-on-lock behavior but there are no unit tests 
covering (a) lock error triggers a retry and (b) non-lock errors still fail 
fast. `KVMStorageProcessorTest` already uses 
`Mockito.mockConstruction(QemuImg.class)`; consider adding tests for this 
method as well, possibly by extracting the sleep into an overridable method so 
tests don’t wait 60 seconds.
   ```suggestion
       protected void sleepBeforeRetry(long millis) throws InterruptedException 
{
           Thread.sleep(millis);
       }
   
       private void retryRebase(String snapshotName, int wait, Exception e, 
QemuImgFile snapshotFile, QemuImgFile parentSnapshotFile) {
           logger.warn("Libvirt still has not released the lock, will wait 60 
seconds and try again later.");
           try {
               sleepBeforeRetry(60 * 1000L);
   ```



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {
+                logger.error("Exception while rebasing incremental snapshot 
[{}] due to: [{}].", snapshotName, e.getMessage(), e);
+                throw new CloudRuntimeException(e);
+            }
+            retryRebase(snapshotName, wait, e, snapshotFile, 
parentSnapshotFile);
+        }
+    }
+
+    private void retryRebase(String snapshotName, int wait, Exception e, 
QemuImgFile snapshotFile, QemuImgFile parentSnapshotFile) {
+        logger.warn("Libvirt still has not released the lock, will wait 60 
seconds and try again later.");
+        try {
+            Thread.sleep(60*1000);
+            QemuImg qemuImg = new QemuImg(wait);
+            qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
+        } catch (LibvirtException | QemuImgException | InterruptedException 
ex) {

Review Comment:
   `InterruptedException` is handled together with 
`LibvirtException`/`QemuImgException`, but the interrupt status is not 
restored. If the thread is interrupted during the 60s sleep, consider calling 
`Thread.currentThread().interrupt()` before propagating, so higher-level code 
can observe the interrupt.
   ```suggestion
           } catch (LibvirtException | QemuImgException | InterruptedException 
ex) {
               if (ex instanceof InterruptedException) {
                   Thread.currentThread().interrupt();
               }
   ```



##########
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStorageProcessor.java:
##########
@@ -2093,7 +2093,22 @@ protected void rebaseSnapshot(SnapshotObjectTO 
snapshotObjectTO, KVMStoragePool
             QemuImg qemuImg = new QemuImg(wait);
             qemuImg.rebase(snapshotFile, parentSnapshotFile, 
PhysicalDiskFormat.QCOW2.toString(), false);
         } catch (LibvirtException | QemuImgException e) {
-            logger.error("Exception while rebasing incremental snapshot [{}] 
due to: [{}].", snapshotName, e.getMessage(), e);
+            if (!e.getMessage().contains("Is another process using the 
image")) {
+                logger.error("Exception while rebasing incremental snapshot 
[{}] due to: [{}].", snapshotName, e.getMessage(), e);
+                throw new CloudRuntimeException(e);
+            }

Review Comment:
   This change does more than add a retry: it also changes behavior for 
non-lock failures from “log and continue” to throwing `CloudRuntimeException`, 
which will abort the incremental snapshot flow. If this is intended (it likely 
is), please reflect it in the PR description and ensure callers expect 
`rebaseSnapshot` to be fail-fast on any rebase error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to