ashishkumar50 commented on code in PR #10490:
URL: https://github.com/apache/ozone/pull/10490#discussion_r3428276620
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java:
##########
@@ -1193,6 +1197,10 @@ public List<DatanodeUsageInfo> getUnderUtilizedNodes() {
return underUtilizedNodes;
}
+ ContainerBalancerSelectionCriteria getSelectionCriteria() {
Review Comment:
nit: `@VisibleForTesting`
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerSelectionCriteria.java:
##########
@@ -365,6 +365,10 @@ public void addToExcludeDueToFailContainers(ContainerID
container) {
this.excludeContainersDueToFailure.add(container);
}
+ Set<ContainerID> getExcludeDueToFailContainers() {
Review Comment:
nit: `@VisibleForTesting`
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancerTask.java:
##########
@@ -999,11 +999,15 @@ private boolean moveContainer(DatanodeDetails source,
result ==
MoveManager.MoveResult.REPLICATION_FAIL_CONTAINER_NOT_CLOSED ||
result ==
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_DELETION ||
result ==
MoveManager.MoveResult.REPLICATION_FAIL_INFLIGHT_REPLICATION ||
- result ==
MoveManager.MoveResult.REPLICATION_NOT_HEALTHY_BEFORE_MOVE) {
+ result ==
MoveManager.MoveResult.REPLICATION_NOT_HEALTHY_BEFORE_MOVE ||
+ result ==
MoveManager.MoveResult.FAIL_CONTAINER_ALREADY_BEING_MOVED) {
// add source back to queue as a different container can be selected
in next run.
// the container which caused failure of move is not excluded
// as it is an intermittent failure or a replica related failure
findSourceStrategy.addBackSourceDataNode(source);
+ } else if (result ==
MoveManager.MoveResult.REPLICATION_NOT_HEALTHY_AFTER_MOVE) {
+ findSourceStrategy.addBackSourceDataNode(source);
+ selectionCriteria.addToExcludeDueToFailContainers(containerID);
Review Comment:
`REPLICATION_NOT_HEALTHY_AFTER_MOVE` means the replica set fails for
placement rule. This is specific to the source→target pair, not the container
globally. A different target on a different rack might produce a healthy
placement. Excluding the container prevents any other source→target pair from
being tried for this container in the entire iteration.
But if we don't exclude container, the loop will become infinite. As
currently there is no support to exclude the target. We can keep this change
for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]