devmadhuu commented on PR #10161:
URL: https://github.com/apache/ozone/pull/10161#issuecomment-4386043605

   > Hi @devmadhuu , do we see container replication eventually failed in real 
case due to this? I'm not sure if we need this as it will create more same 
container replicas on single datanode cases, and is current duplicate replica 
deletion logic ready to handle this? If we are sure the existing container 
directory is stale, can we just delete it and continue the import?
   
   @ChenSammi thanks for your review. This was identified as part of some use 
case discussed. We don't have good conflict resolution for Ratis or EC when a 
DN ends up with multiple copies, but we can end up in this situation due to 
volume failures as well. We need to prioritize moving to a safer state, which 
means allowing replication to pass. We do not want to end up in a situation 
where containers are perpetually under-replicated because no target nodes are 
valid.
   
   Since replication failure is not the only way DNs can end up with duplicate 
replicas, we should allow replication and just pick a different volume. Better 
duplicate replica handling on startup may require internal reconciliation 
within the DN but that can be done later.
   
   cc: @errose28 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to