Gargi-jais11 commented on code in PR #10593:
URL: https://github.com/apache/ozone/pull/10593#discussion_r3471873569
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java:
##########
@@ -601,21 +601,28 @@ public BackgroundTaskResult call() {
pauseInjector();
// Mark old container as DELETED and persist state.
// markContainerForDelete require writeLock, so release readLock first
- moveSucceeded = true;
container.readUnlock();
readLockReleased = true;
try {
container.markContainerForDelete();
+ moveSucceeded = true;
+ balancedBytesInLastWindow.addAndGet(containerSize);
+ metrics.incrSuccessBytes(containerSize);
+ totalBalancedBytes.addAndGet(containerSize);
} catch (Throwable e) {
- // mark container for deleted failure will not fail the whole
process, it will leave both old and new replica
- // on disk, while new container in the ContainerSet.
- LOG.warn("Failed to mark the old container {} for delete. " +
- "It will be handled after DN restart.", containerId, e);
+ LOG.warn("Failed to mark the old container {} for delete. Rolling
back move.",
+ containerId, e);
+ ozoneContainer.getContainerSet().updateContainer(container);
+ destVolume.decrementUsedSpace(containerSize);
+ if (diskBalancerDestDir != null) {
+ try {
+ FileUtils.deleteDirectory(diskBalancerDestDir.toFile());
+ } catch (IOException ex) {
+ LOG.warn("Failed to delete destination replica during rollback
for container {}",
Review Comment:
By the time `markContainerForDelete()` runs, copy/import/ContainerSet update
are already done. Just deleting the new destination container is not a roll
back. As ContainerSet is already pointing to new container on destination disk.
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java:
##########
@@ -601,21 +601,28 @@ public BackgroundTaskResult call() {
pauseInjector();
// Mark old container as DELETED and persist state.
// markContainerForDelete require writeLock, so release readLock first
- moveSucceeded = true;
container.readUnlock();
readLockReleased = true;
try {
container.markContainerForDelete();
+ moveSucceeded = true;
Review Comment:
By the time `markContainerForDelete()` runs, the expensive part is already
done:
- container copied to destination
- import completed
- ContainerSet updated to the new replica
- destination used space incremented
If mark fails and we roll back, we would:
r- restore ContainerSet to the source replica
- delete the destination directory
- revert volume accounting
- report the move as failed
That means we throw away a valid destination copy and redo the whole move
later. For large containers, that is a lot of wasted I/O.
**Why the current behavior is acceptable?**
The move and cleanup are intentionally separate:
**Move phase —** copy + import + ContainerSet update
**Cleanup phase —** mark/delete source replica (with lazy deletion for
in-flight reads)
If phase 1 succeeds, the container is effectively moved. Source cleanup is a
follow-up step.
Also, even when mark fails:
- the old replica is still queued in `pendingDeletionContainers`
- deleteContainer() → removeContainer() does not require DELETED state
- for Ratis, HDDS-9322 cleans up duplicates on DN restart
- So this is mostly an operational/accounting issue, not a data-loss issue
for Ratis.
cc: @ChenSammi
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/DiskBalancerService.java:
##########
@@ -601,21 +601,28 @@ public BackgroundTaskResult call() {
pauseInjector();
// Mark old container as DELETED and persist state.
// markContainerForDelete require writeLock, so release readLock first
- moveSucceeded = true;
container.readUnlock();
readLockReleased = true;
try {
container.markContainerForDelete();
+ moveSucceeded = true;
Review Comment:
@arunsarin85
I don’t think we should fail and fully roll back a completed move just
because source cleanup failed. That adds heavy work and can make DiskBalancer
less effective. Existing lazy deletion + dn restart already cover most of the
cleanup path for Ratis. For EC we issue can be there as the Pr is not merged
yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]