[ 
https://issues.apache.org/jira/browse/HADOOP-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715867#comment-14715867
 ] 

Andrew Wang commented on HADOOP-12358:
--------------------------------------

bq. If the client OOM because of deleting large directory, make it OOM upon 
getContentSummary can actually help avoiding an inconsistent (half completed) 
deletion states.

This leads into one of my favorite topics, which is how and why HDFS APIs 
differ from POSIX. POSIX gives you unlink and rmdir, so {{rm}} has to crawl the 
directory tree, doing {{O(n)}} operations. However, HDFS implements recursive 
delete as a single RPC, so 1 operation. This is for performance. We want to 
avoid recursing when doing a big delete since RPCs are expensive. Deletes are 
also most of the time intentional. So, this patch greatly slows down the common 
case, when we already have safety mechanisms like trash and snapshots in place, 
and is counter to the intent of the recursive delete RPC.

The other API difference I like is how HDFS combines readdir and stat into 
listStatus, again to avoid extra RPCs.

Finally, to tie it back to your comment, right now there is no OOM (or partial 
delete) since the client just calls the single RPC and does not need to 
enumerate the directory. With this patch, it would. This would be a regression 
where a client with a small heap now cannot delete a large directory.

> FSShell should prompt before deleting directories bigger than a configured 
> size
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-12358
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12358
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HADOOP-12358.00.patch, HADOOP-12358.01.patch, 
> HADOOP-12358.02.patch, HADOOP-12358.03.patch
>
>
> We have seen many cases with customers deleting data inadvertently with 
> -skipTrash. The FSShell should prompt user if the size of the data or the 
> number of files being deleted is bigger than a threshold even though 
> -skipTrash is being used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to