Devesh Kumar Singh created HDDS-15135:
-----------------------------------------

             Summary: DN replication/reconstruction should retry another volume 
when selected target volume already has same container directory
                 Key: HDDS-15135
                 URL: https://issues.apache.org/jira/browse/HDDS-15135
             Project: Apache Ozone
          Issue Type: Task
          Components: Ozone Datanode, SCM
            Reporter: Devesh Kumar Singh
            Assignee: Devesh Kumar Singh


During DN-side replication or EC reconstruction, the target DN may have a stale 
on-disk directory for the same container ID on one volume while the container 
is absent from ContainerSet. In this case  SCM can select the DN as a recovery 
target.

Today, if the selected target volume already contains that container directory, 
the create/import path can fail the operation instead of trying another 
available volume. Rejecting the whole DN would be  too aggressive, because 
duplicate/stale container directories can also happen due to volume failures, 
failed deletes, or startup edge cases. Blocking the DN globally can leave 
containers under-replicated if no other target nodes are available.

The DN should keep the node eligible but avoid writing into the conflicting 
selected volume. If the chosen volume already has the same container directory, 
the DN should skip that volume and retry another suitable volume. If all 
candidate volumes are exhausted, the operation should fail with a clear error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to