[ 
https://issues.apache.org/jira/browse/HADOOP-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761068#comment-16761068
 ] 

Steve Loughran commented on HADOOP-16090:
-----------------------------------------

{{copyFromLocalFile}} has its own issues (HADOOP-15932): we can massively 
optimise that and move to a proper upload-to-store operation.

I like that thought about doing HEAD only. getObjectMetadata(path) is the 
operation to call. For versioned mode you'd just iterate up the path.

One other thought, and it relates to HADOOP-15209: there's a fair few workflows 
which consist of multiple files being created in the same directory, by the 
same process. It's very inefficient to repeatedly do a scan and delete. with a 
brief caching of the fact that directories have been deleted. Complicates life 
in other ways though, obviously

bq. amortize the cost of the expensive O(depth) search to the time when the 
first dirent is added to an empty dir. 

So do it when createFile() is kicked off? Possibly, though as the PUT doesn't 
manifest data until the write completes, that's not ideal. But I'm thinking 
about what we could maybe do there, especially given we do have a pool of 
threads to play with.

e could do the HEAD of markers async, scanning at grandparent up and building 
the list to DELETE. I output stream close() + multipart commit, then there'd be 
some problem of waiting for the scan and kicking off the delete.

those things get complicated fast, and so scare me. 

I think a more efficient HEAD + bulk DELETE sounds good. 

*Maybe also*: cache the name of the last directory which was cleaned up. Then 
on any process where >1 file is written to the same place, all cleanup is 
skipped. That would actually be useful for all stores, because it'd save the 
DELETE call and its RTT



> deleteUnnecessaryFakeDirectories() creates unnecessary delete markers in a 
> versioned S3 bucket
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16090
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.1
>            Reporter: Dmitri Chmelev
>            Priority: Minor
>
> The fix to avoid calls to getFileStatus() for each path component in 
> deleteUnnecessaryFakeDirectories() (HADOOP-13164) results in accumulation of 
> delete markers in versioned S3 buckets. The above patch replaced 
> getFileStatus() checks with a single batch delete request formed by 
> generating all ancestor keys formed from a given path. Since the delete 
> request is not checking for existence of fake directories, it will create a 
> delete marker for every path component that did not exist (or was previously 
> deleted). Note that issuing a DELETE request without specifying a version ID 
> will always create a new delete marker, even if one already exists ([AWS S3 
> Developer 
> Guide|https://docs.aws.amazon.com/AmazonS3/latest/dev/RemDelMarker.html])
> Since deleteUnnecessaryFakeDirectories() is called as a callback on 
> successful writes and on renames, delete markers accumulate rather quickly 
> and their rate of accumulation is inversely proportional to the depth of the 
> path. In other words, directories closer to the root will have more delete 
> markers than the leaves.
> This behavior negatively impacts performance of getFileStatus() operation 
> when it has to issue listObjects() request (especially v1) as the delete 
> markers have to be examined when the request searches for first current 
> non-deleted version of an object following a given prefix.
> I did a quick comparison against 3.x and the issue is still present: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to