Copilot commented on code in PR #10440:
URL: https://github.com/apache/ozone/pull/10440#discussion_r3420608141


##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/replication/MisReplicationHandler.java:
##########
@@ -157,13 +157,21 @@ public int processAndSendCommands(
         .getTargetDatanodes(containerPlacement, requiredNodes,
             excludedAndUsedNodes.getUsedNodes(),
             excludedAndUsedNodes.getExcludedNodes(), currentContainerSize,
-            container);
+            container, replicationManager.getNodeManager());
     List<DatanodeDetails> availableSources = sources.stream()
         .map(ContainerReplica::getDatanodeDetails)
         .collect(Collectors.toList());
 
-    int count = sendReplicateCommands(container, replicasToBeReplicated,
-        availableSources, targetDatanodes);
+    int count;
+    try {
+      count = sendReplicateCommands(container, replicasToBeReplicated,
+          availableSources, targetDatanodes);
+    } catch (CommandTargetOverloadedException | NotLeaderException e) {
+      // No commands were dispatched; roll back all recorded 
PendingContainerTracker slots.
+      ReplicationManagerUtil.rollbackTargets(
+          targetDatanodes, container, replicationManager.getNodeManager());
+      throw e;
+    }

Review Comment:
   The rollback in this catch block assumes that no replication commands were 
dispatched, but the current sendReplicateCommands implementations can throw 
after sending some commands (eg RatisMisReplicationHandler loops over targets 
and can throw mid-loop; ECMisReplicationHandler may send some commands and then 
throw at the end if any target was overloaded). Rolling back *all* target 
reservations here can under-account pending space for commands that were 
successfully sent, causing the PendingContainerTracker to think there is more 
free space than there really is.
   
   A safer approach is to roll back only the targets for which a command was 
not sent (eg catch per-target inside sendReplicateCommands and rollback just 
the failing target(s), or change sendReplicateCommands to report which targets 
were successfully dispatched so the caller can rollback the remainder).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to