Devesh Kumar Singh created HDDS-15135:
-----------------------------------------
Summary: DN replication/reconstruction should retry another volume
when selected target volume already has same container directory
Key: HDDS-15135
URL: https://issues.apache.org/jira/browse/HDDS-15135
Project: Apache Ozone
Issue Type: Task
Components: Ozone Datanode, SCM
Reporter: Devesh Kumar Singh
Assignee: Devesh Kumar Singh
During DN-side replication or EC reconstruction, the target DN may have a stale
on-disk directory for the same container ID on one volume while the container
is absent from ContainerSet. In this case SCM can select the DN as a recovery
target.
Today, if the selected target volume already contains that container directory,
the create/import path can fail the operation instead of trying another
available volume. Rejecting the whole DN would be too aggressive, because
duplicate/stale container directories can also happen due to volume failures,
failed deletes, or startup edge cases. Blocking the DN globally can leave
containers under-replicated if no other target nodes are available.
The DN should keep the node eligible but avoid writing into the conflicting
selected volume. If the chosen volume already has the same container directory,
the DN should skip that volume and retry another suitable volume. If all
candidate volumes are exhausted, the operation should fail with a clear error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]