[
https://issues.apache.org/jira/browse/HDDS-9600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ethan Rose updated HDDS-9600:
-----------------------------
Attachment: Handling empty missing containers in ozone.pdf
> Clear out empty containers that are never created
> -------------------------------------------------
>
> Key: HDDS-9600
> URL: https://issues.apache.org/jira/browse/HDDS-9600
> Project: Apache Ozone
> Issue Type: Bug
> Reporter: Ethan Rose
> Priority: Major
> Attachments: Handling empty missing containers in ozone.pdf
>
>
> HDDS-9550 documented a case where containers can be created on SCM, but
> replicas are never created on datanodes and then being tracked as missing in
> the system even though there is no data in them. Since it is tricky to
> determine whether or not these containers are actually empty from SCM's point
> of view, pull request 5523 implemented a solution that keeps tracking the
> containers in SCM, but reports them as empty instead of missing.
> In this Jira, I propose a solution that is a bit more involved, but should
> provide a path for these containers to be cleared from the system safely:
> - When SCM first creates the container, it knows the datanode replicas that
> are supposed to have the container. It should track this information until it
> gets reports that the container is created, even after the pipeline is closed.
> - When the pipeline is either closed gracefully by SCM or fails on the
> datanode, SCM should send close commands for all affected containers,
> including these empty ones.
> - When a datanode gets a close container command for a container it does not
> have, it can ack back to the SCM that the container is closed with BCSID=0,
> block count=0, empty, etc. If the container has data then the normal
> container flow still applies.
> - If the container was never created, SCM will now see it as empty and can
> then move this container through the regular close and delete flow. A
> datanode getting a delete command for a container it does not have should be
> ok.
> With this approach, we can re-use the normal delete flow and safely clean the
> containers out of the system, because it requires one round of back and forth
> between SCM and datanodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]