Copilot commented on code in PR #13077:
URL: https://github.com/apache/cloudstack/pull/13077#discussion_r3152709171


##########
plugins/storage/volume/linstor/src/main/java/org/apache/cloudstack/storage/motion/LinstorDataMotionStrategy.java:
##########
@@ -314,6 +314,58 @@ private boolean needsExactSizeProp(VolumeInfo 
srcVolumeInfo) {
         return true;
     }
 
+    /**
+     * Verify that the destination KVM host is a registered LINSTOR satellite 
on the controller
+     * backing every destination pool involved in this migration. Throws 
CloudRuntimeException
+     * with a clear message when it isn't, instead of letting the resource 
creation later fail
+     * obscurely inside auto-placement.
+     *
+     * Best-effort: a transient controller error during this check does not 
block the migration
+     * — we log a warning and let the downstream resource-create surface the 
real issue. Only a
+     * confirmed "host not in node list" outcome aborts the migration up-front.
+     */
+    private void verifyDestinationIsLinstorSatellite(Map<VolumeInfo, 
DataStore> volumeDataStoreMap, Host destHost) {
+        if (destHost == null || destHost.getName() == null) {
+            // Without a destination host name to match, the only sensible 
thing is to let the
+            // existing flow run and report whatever it would have reported.
+            return;
+        }
+        for (Map.Entry<VolumeInfo, DataStore> entry : 
volumeDataStoreMap.entrySet()) {
+            DataStore destDataStore = entry.getValue();
+            StoragePoolVO destStoragePool = 
_storagePool.findById(destDataStore.getId());
+            if (destStoragePool == null
+                    || destStoragePool.getPoolType() != 
Storage.StoragePoolType.Linstor) {
+                continue;
+            }
+            DevelopersApi api = 
LinstorUtil.getLinstorAPI(destStoragePool.getHostAddress());
+            try {
+                List<String> nodes = LinstorUtil.getLinstorNodeNames(api);
+                if (nodes == null) {
+                    logger.warn("LINSTOR controller {} returned null node 
list; skipping pre-flight",
+                            destStoragePool.getHostAddress());
+                    return;
+                }
+                if (!nodes.contains(destHost.getName())) {
+                    throw new CloudRuntimeException(String.format(
+                            "Cannot migrate to host '%s': it is not a 
registered LINSTOR satellite on " +
+                                    "controller %s (pool '%s'). Known 
satellites: %s. Either register the " +
+                                    "host with `linstor node create` or pick a 
different destination.",
+                            destHost.getName(),
+                            destStoragePool.getHostAddress(),
+                            destStoragePool.getName(),
+                            nodes));
+                }
+            } catch (ApiException apiEx) {
+                // Don't block migration on a transient controller hiccup — 
log and let the
+                // downstream resource creation handle the real failure.
+                logger.warn("LINSTOR pre-flight check could not contact 
controller {}: {}; " +
+                                "letting downstream resource creation proceed",
+                        destStoragePool.getHostAddress(), 
apiEx.getBestMessage());
+                return;
+            }

Review Comment:
   The `catch (ApiException)` path currently logs and then `return`s, which 
stops checking any additional destination pools/controllers in 
`volumeDataStoreMap`. To match the method Javadoc/PR description (“best-effort” 
per pool), this should usually `continue` so that other pools can still be 
validated and can fail fast if the destination host is not a satellite there.



##########
plugins/storage/volume/linstor/src/main/java/org/apache/cloudstack/storage/motion/LinstorDataMotionStrategy.java:
##########
@@ -323,6 +375,15 @@ public void copyAsync(Map<VolumeInfo, DataStore> 
volumeDataStoreMap, VirtualMach
                     String.format("Invalid hypervisor type [%s]. Only KVM 
supported", srcHost.getHypervisorType()));
         }
 
+        // Pre-flight: verify the destination KVM host is registered as a 
satellite on the
+        // LINSTOR controller backing each destination pool. Without this 
check, resource
+        // creation falls through to the resource-group's auto-placement 
filters and may
+        // either silently place the resource on the wrong node or fail with 
an opaque
+        // auto-place error from the LINSTOR API. Failing fast here gives 
operators a clear
+        // actionable message instead of having to correlate the 
live-migration failure with
+        // an unrelated LINSTOR controller log entry.
+        verifyDestinationIsLinstorSatellite(volumeDataStoreMap, destHost);
+

Review Comment:
   `verifyDestinationIsLinstorSatellite(...)` is invoked before the method’s 
`try/catch/finally`. If it throws `CloudRuntimeException` (the intended 
failure-fast path), `copyAsync` will exit without reaching the `finally` block 
and `callback.complete(...)` will never be called, potentially leaving the 
migration task hanging. Move this pre-flight call inside the existing `try` so 
failures are handled consistently and the callback is always completed.



##########
plugins/storage/volume/linstor/src/main/java/org/apache/cloudstack/storage/motion/LinstorDataMotionStrategy.java:
##########
@@ -314,6 +314,58 @@ private boolean needsExactSizeProp(VolumeInfo 
srcVolumeInfo) {
         return true;
     }
 
+    /**
+     * Verify that the destination KVM host is a registered LINSTOR satellite 
on the controller
+     * backing every destination pool involved in this migration. Throws 
CloudRuntimeException
+     * with a clear message when it isn't, instead of letting the resource 
creation later fail
+     * obscurely inside auto-placement.
+     *
+     * Best-effort: a transient controller error during this check does not 
block the migration
+     * — we log a warning and let the downstream resource-create surface the 
real issue. Only a
+     * confirmed "host not in node list" outcome aborts the migration up-front.
+     */
+    private void verifyDestinationIsLinstorSatellite(Map<VolumeInfo, 
DataStore> volumeDataStoreMap, Host destHost) {
+        if (destHost == null || destHost.getName() == null) {
+            // Without a destination host name to match, the only sensible 
thing is to let the
+            // existing flow run and report whatever it would have reported.
+            return;
+        }
+        for (Map.Entry<VolumeInfo, DataStore> entry : 
volumeDataStoreMap.entrySet()) {
+            DataStore destDataStore = entry.getValue();
+            StoragePoolVO destStoragePool = 
_storagePool.findById(destDataStore.getId());
+            if (destStoragePool == null
+                    || destStoragePool.getPoolType() != 
Storage.StoragePoolType.Linstor) {
+                continue;
+            }
+            DevelopersApi api = 
LinstorUtil.getLinstorAPI(destStoragePool.getHostAddress());
+            try {
+                List<String> nodes = LinstorUtil.getLinstorNodeNames(api);
+                if (nodes == null) {
+                    logger.warn("LINSTOR controller {} returned null node 
list; skipping pre-flight",
+                            destStoragePool.getHostAddress());
+                    return;
+                }

Review Comment:
   Inside the loop, the method `return`s when `nodes == null`. That exits the 
entire pre-flight check and skips verifying any remaining destination 
pools/volumes. If the intent is best-effort per pool, this should `continue` to 
the next entry (or better: de-duplicate pools/controllers and keep checking the 
rest).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to