symious commented on code in PR #3272:
URL: https://github.com/apache/ozone/pull/3272#discussion_r846599820
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/balancer/ContainerBalancer.java:
##########
@@ -462,49 +462,80 @@ private void checkIterationResults(boolean
isMoveGenerated,
private void checkIterationMoveResults(Set<DatanodeDetails> selectedTargets)
{
this.countDatanodesInvolvedPerIteration = 0;
this.sizeMovedPerIteration = 0;
- for (Map.Entry<ContainerMoveSelection,
- CompletableFuture<ReplicationManager.MoveResult>>
- futureEntry : moveSelectionToFutureMap.entrySet()) {
- ContainerMoveSelection moveSelection = futureEntry.getKey();
- CompletableFuture<ReplicationManager.MoveResult> future =
- futureEntry.getValue();
+
+ CompletableFuture<Void> allFuturesResult = CompletableFuture.allOf(
+ moveSelectionToFutureMap.values()
+ .toArray(new CompletableFuture[moveSelectionToFutureMap.size()]));
+ try {
+ allFuturesResult.get(config.getMoveTimeout().toMillis(),
+ TimeUnit.MILLISECONDS);
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ } catch (Exception e) {
+ // For exceptions other than InterruptedException, we ignore them.
+ e.printStackTrace();
+ }
+
+ List<ContainerMoveSelection> completeKeyList =
+ moveSelectionToFutureMap.keySet().stream()
+ .filter(key -> moveSelectionToFutureMap.get(key).isDone())
+ .collect(Collectors.toList());
+
+ // Handle completed futures
+ for (ContainerMoveSelection moveSelection : completeKeyList) {
try {
- ReplicationManager.MoveResult result = future.get(
- config.getMoveTimeout().toMillis(), TimeUnit.MILLISECONDS);
- if (result == ReplicationManager.MoveResult.COMPLETED) {
+ CompletableFuture<ReplicationManager.MoveResult> future =
+ moveSelectionToFutureMap.get(moveSelection);
+ if (future.isCompletedExceptionally()) {
try {
- ContainerInfo container =
- containerManager.getContainer(moveSelection.getContainerID());
- this.sizeMovedPerIteration += container.getUsedBytes();
- metrics.incrementNumContainerMovesInLatestIteration(1);
- LOG.info("Container move completed for container {} to target {}",
- container.containerID(),
- moveSelection.getTargetNode().getUuidString());
- } catch (ContainerNotFoundException e) {
- LOG.warn("Could not find Container {} while " +
- "checking move results in ContainerBalancer",
- moveSelection.getContainerID(), e);
+ future.join();
+ } catch (CompletionException ex) {
+ LOG.info("Container move for container {} to target {} " +
+ "failed with exceptions",
+ moveSelection.getContainerID().toString(),
+ moveSelection.getTargetNode().getUuidString(),
+ ex.getCause());
}
} else {
- if (LOG.isDebugEnabled()) {
- LOG.debug("Container move for container {} to target {} failed:
{}",
- moveSelection.getContainerID(),
- moveSelection.getTargetNode().getUuidString(), result);
+ ReplicationManager.MoveResult result = future.join();
+ if (result == ReplicationManager.MoveResult.COMPLETED) {
+ try {
Review Comment:
@lokeshj1703 Thanks for the review.
There is a problem if added the post process of futures in whenComplete. The
moves after timeout can not be cancelled, so the move will keep running until
finished.
As the TestContainerBalancer.checkIterationResultTimeout shows in
https://github.com/symious/ozone/tree/HDDS-6553-
2, the metrics of numContainerMovesInLatestIteration will always be larger
than 1, which are incremented by the timeouted futures.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]