[
https://issues.apache.org/jira/browse/HDDS-15135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HDDS-15135:
----------------------------------
Labels: pull-request-available (was: )
> DN replication/reconstruction should retry another volume when selected
> target volume already has same container directory
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: HDDS-15135
> URL: https://issues.apache.org/jira/browse/HDDS-15135
> Project: Apache Ozone
> Issue Type: Task
> Components: Ozone Datanode, SCM
> Reporter: Devesh Kumar Singh
> Assignee: Devesh Kumar Singh
> Priority: Major
> Labels: pull-request-available
>
> During DN-side replication or EC reconstruction, the target DN may have a
> stale on-disk directory for the same container ID on one volume while the
> container is absent from ContainerSet. In this case SCM can select the DN as
> a recovery target.
> Today, if the selected target volume already contains that container
> directory, the create/import path can fail the operation instead of trying
> another available volume. Rejecting the whole DN would be too aggressive,
> because duplicate/stale container directories can also happen due to volume
> failures, failed deletes, or startup edge cases. Blocking the DN globally can
> leave containers under-replicated if no other target nodes are available.
> The DN should keep the node eligible but avoid writing into the conflicting
> selected volume. If the chosen volume already has the same container
> directory, the DN should skip that volume and retry another suitable volume.
> If all candidate volumes are exhausted, the operation should fail with a
> clear error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]