[ https://issues.apache.org/jira/browse/HDDS-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119653#comment-17119653 ]
Istvan Fajth commented on HDDS-2939: ------------------------------------ Hi [~sdeka] and [~rakeshr], I have gone through the doc, and one question arise during reading about deletion. As it seems, that one still won't be atomic for directories, as the doc says deletes will be a client driven thing, and directories can not be removed until there are entries in them. This got me an idea, what if we introduce a preserved prefix for deleted stuff, and implement a garbage collector background thread that removes unnecessary stuff, just as with block deletion with this we can defer the heavy work to a place where there is no need for locking and with that we can change deletes to a rename first which happens atomically, while we can ensure that there won't be collisions or problems from removing a folder while in parallel creating a new file in it that is not removed. So for example if we choose the 0 prefix id to be reserved for everything that was deleted, then we can implement delete as a rename to under the 0 prefix id. With this all things can be made unavailable immediately under any other prefix, if we do not assign a path to this prefix, or the path is 0 bytes for example. As in these cases if the path translation algorithm does not allow a path element to be 0 length, the prefixes and keys under this prefix will not be available anymore. Now with this a background thread can clean up all the elements periodically under this prefix predefined prefix. Pros I see: - atomic delete through rename - we do not need extensive locking under the special prefix, as we can allow any possible name collision there because the contents of this prefix should not be accessible anyways, the deletion on the other hand in the background work based on ids, which is not ambiguous - we do not need locking during the actual key/prefix removals, as that does not affect other parts of the prefix/key space, and even if there are multiple background cleanup thread, the operation can be seen as idempotent, but with just one background thread the operations are serial anyways Cons I see: - there might be a possibility to have orphan prefixes/keys if we do not introduce locking here, so we definitely will need to involve orphan cleanup as well in the background, probably on the same thread, or as in the original proposal we need to synchronize the renames to this special prefix, and the deletions happening on the background thread, and fail prefix removal if there are new entries under the prefix. I am not sure which one is easier/more beneficial... orphan detection, or synchronization? Was this possibility considered? If so why the idea got rejected at the end? If not what do you think about this approach? > Ozone FS namespace > ------------------ > > Key: HDDS-2939 > URL: https://issues.apache.org/jira/browse/HDDS-2939 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Manager > Reporter: Supratim Deka > Assignee: Rakesh Radhakrishnan > Priority: Major > Attachments: Ozone FS Namespace Proposal v1.0.docx > > > Create the structures and metadata layout required to support efficient FS > namespace operations in Ozone - operations involving folders/directories > required to support the Hadoop compatible Filesystem interface. > The details are described in the attached document. The work is divided up > into sub-tasks as per the task list in the document. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org