slfan1989 commented on PR #7111: URL: https://github.com/apache/ozone/pull/7111#issuecomment-2327694266
> Hi @slfan1989 this is being developed as part of the container reconciliation feature in [HDDS-10239](https://issues.apache.org/jira/browse/HDDS-10239). This feature provides two high level functionalities for containers: > > 1. The ability to report their contents to SCM via a container level hash which can be compared to other replicas. > 2. The ability to "reconcile" a container replica with its peers when that hash differs. This means making incremental updates to a container based on data a peer node has that the current node may be missing or have lost. > > The current design document can be found [here](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md). In particular you can refer to the section on [phases of implementation](https://github.com/apache/ozone/blob/HDDS-10239-container-reconciliation/hadoop-hdds/docs/content/design/container-reconciliation.md#phase-i-outlined-in-this-document). We are currently implementing phase 1, which only applies to Ratis containers. Support for EC containers are in phase 3, which we have not planned for yet. This is because EC already has a reconciliation algorithm as described in (2) above, which is reconstruction. > > > For 3-replica blocks, if we find that a block write operation has an issue, we can repair it using the other replicas. > > So in this case, the fix should be made in the reconstruction code path, since that is an existing way to repair EC containers after they have been closed. > > > However, for EC blocks, it becomes more challenging to determine the true length of the block. > > EC and Ratis differ here. In Ratis the longest block length wins, because we have a quorum on the server side to commit the last write. In EC, the shortest block wins because it is up to the client to make sure all datanode replicas have committed the last issued write before the client commits that length back to the OM. If only a few datanodes commit, that stripe is invalid and not committed back to OM. @errose28 Thank you very much for your response! the content is very thorough and complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
