arunsarin85 opened a new pull request, #10593:
URL: https://github.com/apache/ozone/pull/10593
## What changes were proposed in this pull request?
When markContainerForDelete() fails after a container has been copied to the
destination volume, treat the move as a failure instead of success.
Restore ContainerSet to the source replica, revert destination volume
accounting, delete the destination replica directory, and do not queue the
source replica for delayed deletion. Add a regression test.
Please describe your PR in detail:
Bug: DiskBalancer reported a successful move even when
markContainerForDelete() failed on the source replica.
Fix: On mark failure, the move is rolled back and counted as a failure.
<google-sheets-html-origin><style type="text/css"><!--td {border: 1px solid
#cccccc;}br {mso-data-placement:same-cell;}--></style>
Before (bug) | After (fix)
-- | --
moveSucceeded = true set before calling markContainerForDelete() |
moveSucceeded = true only after mark succeeds
Success metrics updated regardless of mark outcome | Success metrics updated
only on full success
ContainerSet kept pointing at destination replica | ContainerSet restored to
source replica
Destination volume used space left incremented | Destination used space
decremented
Destination replica directory left on disk | Destination replica directory
deleted
Source replica queued for delayed deletion | Source replica not queued
Log: "It will be handled after DN restart" | Log: "Rolling back move"
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15651
## How was this patch tested?
Regression test
TestDiskBalancerTask.moveSucceedsDespiteMarkContainerForDeleteFailure
(HDDS-15651):
Creates a CLOSED container on the source volume
Sets replicaDeletionDelay = 60_000 ms so delayed deletion does not hide
duplicate-replica bugs
Look for on the source KeyValueContainer and makes markContainerForDelete()
throw
Runs DiskBalancerTask.call()
[repro_HDDS_markContainerForDelete_BEFORE_fix.log](https://github.com/user-attachments/files/29266518/repro_HDDS_markContainerForDelete_BEFORE_fix.log)
[repro_HDDS_markContainerForDelete_AFTER_fix.log](https://github.com/user-attachments/files/29266552/repro_HDDS_markContainerForDelete_AFTER_fix.log)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]