[ 
https://issues.apache.org/jira/browse/HDDS-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119653#comment-17119653
 ] 

Istvan Fajth commented on HDDS-2939:
------------------------------------

Hi [~sdeka] and [~rakeshr],

I have gone through the doc, and one question arise during reading about 
deletion. As it seems, that one still won't be atomic for directories, as the 
doc says deletes will be a client driven thing, and directories can not be 
removed until there are entries in them.
This got me an idea, what if we introduce a preserved prefix for deleted stuff, 
and implement a garbage collector background thread that removes unnecessary 
stuff, just as with block deletion with this we can defer the heavy work to a 
place where there is no need for locking and with that we can change deletes to 
a rename first which happens atomically, while we can ensure that there won't 
be collisions or problems from removing a folder while in parallel creating a 
new file in it that is not removed.

So for example if we choose the 0 prefix id to be reserved for everything that 
was deleted, then we can implement delete as a rename to under the 0 prefix id. 
With this all things can be made unavailable immediately under any other 
prefix, if we do not assign a path to this prefix, or the path is 0 bytes for 
example. As in these cases if the path translation algorithm does not allow a 
path element to be 0 length, the prefixes and keys under this prefix will not 
be available anymore. Now with this a background thread can clean up all the 
elements periodically under this prefix predefined prefix.
Pros I see:
- atomic delete through rename
- we do not need extensive locking under the special prefix, as we can allow 
any possible name collision there because the contents of this prefix should 
not be accessible anyways, the deletion on the other hand in the background 
work based on ids, which is not ambiguous
- we do not need locking during the actual key/prefix removals, as that does 
not affect other parts of the prefix/key space, and even if there are multiple 
background cleanup thread, the operation can be seen as idempotent, but with 
just one background thread the operations are serial anyways

Cons I see:
- there might be a possibility to have orphan prefixes/keys if we do not 
introduce locking here, so we definitely will need to involve orphan cleanup as 
well in the background, probably on the same thread, or as in the original 
proposal we need to synchronize the renames to this special prefix, and the 
deletions happening on the background thread, and fail prefix removal if there 
are new entries under the prefix. I am not sure which one is easier/more 
beneficial... orphan detection, or synchronization?

Was this possibility considered? If so why the idea got rejected at the end? If 
not what do you think about this approach?

> Ozone FS namespace
> ------------------
>
>                 Key: HDDS-2939
>                 URL: https://issues.apache.org/jira/browse/HDDS-2939
>             Project: Hadoop Distributed Data Store
>          Issue Type: Improvement
>          Components: Ozone Manager
>            Reporter: Supratim Deka
>            Assignee: Rakesh Radhakrishnan
>            Priority: Major
>         Attachments: Ozone FS Namespace Proposal v1.0.docx
>
>
> Create the structures and metadata layout required to support efficient FS 
> namespace operations in Ozone - operations involving folders/directories 
> required to support the Hadoop compatible Filesystem interface.
> The details are described in the attached document. The work is divided up 
> into sub-tasks as per the task list in the document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to