adoroszlai opened a new pull request, #4810:
URL: https://github.com/apache/ozone/pull/4810
## What changes were proposed in this pull request?
EC offline reconstruction has two properties that may be in conflict:
* tries to exclude from targets any datanodes that are currently overloaded
with replication commands
* attempts partial reconstruction if full reconstruction is not possible
due to lack of enough targets
This may result in multiple subsequent partial reconstructions, wasting
resources.
This change lets SCM defer reconstruction if:
* only partial reconstruction is currently possible AND
* full reconstruction would be possible using overloaded nodes AND
* at least one more replica may be lost before the container becomes
unrecoverable (i.e. recovery is not yet critical)
The improvement applies to EC(6,3) and EC(10,4) only, because EC(3,2) can
only have 1 or 2 replicas missing before becoming unrecoverable: with 1 missing
replica recovery is not partial, with 2 missing replicas recovery is "critical".
https://issues.apache.org/jira/browse/HDDS-8727
## How was this patch tested?
Added unit test.
https://github.com/adoroszlai/hadoop-ozone/actions/runs/5134479384
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]