slfan1989 commented on PR #7111:
URL: https://github.com/apache/ozone/pull/7111#issuecomment-2327694266

   > Hi @slfan1989 this is being developed as part of the container 
reconciliation feature in 
[HDDS-10239](https://issues.apache.org/jira/browse/HDDS-10239). This feature 
provides two high level functionalities for containers:
   > 
   > 1. The ability to report their contents to SCM via a container level hash 
which can be compared to other replicas.
   > 2. The ability to "reconcile" a container replica with its peers when that 
hash differs. This means making incremental updates to a container based on 
data a peer node has that the current node may be missing or have lost.
   > 
   > The current design document can be found 
[here](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md).
 In particular you can refer to the section on [phases of 
implementation](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md#phase-i-outlined-in-this-document).
 We are currently implementing phase 1, which only applies to Ratis containers. 
Support for EC containers are in phase 3, which we have not planned for yet. 
This is because EC already has a reconciliation algorithm as described in (2) 
above, which is reconstruction.
   > 
   > > For 3-replica blocks, if we find that a block write operation has an 
issue, we can repair it using the other replicas.
   > 
   > So in this case, the fix should be made in the reconstruction code path, 
since that is an existing way to repair EC containers after they have been 
closed.
   > 
   > > However, for EC blocks, it becomes more challenging to determine the 
true length of the block.
   > 
   > EC and Ratis differ here. In Ratis the longest block length wins, because 
we have a quorum on the server side to commit the last write. In EC, the 
shortest block wins because it is up to the client to make sure all datanode 
replicas have committed the last issued write before the client commits that 
length back to the OM. If only a few datanodes commit, that stripe is invalid 
and not committed back to OM.
   
   @errose28 Thank you very much for your response! the content is very 
thorough and complete.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to