[ 
https://issues.apache.org/jira/browse/HDDS-10239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDDS-10239:
-----------------------------------
    Fix Version/s: 2.1.0

> Storage Container Reconciliation
> --------------------------------
>
>                 Key: HDDS-10239
>                 URL: https://issues.apache.org/jira/browse/HDDS-10239
>             Project: Apache Ozone
>          Issue Type: New Feature
>          Components: Ozone Datanode, SCM
>            Reporter: Ethan Rose
>            Assignee: Ethan Rose
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.0
>
>
> Ideally, a healthy Ozone cluster would contain only open and closed 
> containers. However, container replicas commonly end up with a mix of states 
> including quasi-closed and unhealthy that the current system is not able to 
> resolve to cleanly closed replicas. The cause of these states is often bugs 
> or broad failure handling on the write path. While we should fix these 
> causes, they raise the problem that Ozone is not able to reconcile these 
> mismatched container states on its own, regardless of their cause. This has 
> lead to significant complexity in the replication manager for how to handle 
> cases where only quasi-closed and unhealthy replicas are available, 
> especially in the case of decommissioning.
> Even when all replicas are closed, the system assumes that these closed 
> container replicas are equal with no way to verify this. Checksumming is done 
> for individual chunks within each container, but if two container replicas 
> somehow end up with chunks that differ in length or content despite being 
> marked closed with local checksums matching, the system has no way to detect 
> or resolve this anomaly.
> This Jira proposes a container reconciliation protocol to solve these 
> problems. After implementing the proposal:
> 1. It should be possible for a cluster to progress to a state where it has 
> only properly replicated closed and open containers.
> 2. We can verify the equality and integrity of all closed containers.
> The design doc is linked here as a markdown pull request for inline comments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to