[ 
https://issues.apache.org/jira/browse/HADOOP-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17068083#comment-17068083
 ] 

Dmitri Chmelev commented on HADOOP-16090:
-----------------------------------------

This issue is a function of the number of writes, but the topology of the 
underlying namespace acts as an amplification factor. Here is an example:

{code}
 root -> dir-1 -> file-1
    |
     --> dir-2 -> file-2
             |
              --> file-3
{code}

If a single write is issued to file-1, file-2 and file-3, then dir-1 will 
accumulate 1 delete marker, dir-2 will have 2 delete markers and root dir will 
accumulate 3 delete markers.

Backporting HADOOP-13421 is a bandaid that will resolve the XML parsing errors, 
without fixing the underlying issue; list v2 API simply returns a pagination 
token to the client that it can use to continue listing (and skipping over) the 
delete markers. The latency of the LIST operation will still grow proportional 
to the number of delete markers that need to be skipped.

The diff that is included by Steve (https://github.com/apache/hadoop/pull/621) 
will avoid delete marker accumulation. But this is a trade-off and the number 
of HEAD operations will increase. The "fix" probes for presence of a delete 
marker before deleting it. The amplification of HEAD requests could be 
significant as one is issued for every path component to the root. Using the 
example above, root will get a HEAD request for every write anywhere below it. 
This, in turn, could result in throttling by AWS.


> S3A Client to add explicit support for versioned stores
> -------------------------------------------------------
>
>                 Key: HADOOP-16090
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16090
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.1
>            Reporter: Dmitri Chmelev
>            Assignee: Steve Loughran
>            Priority: Minor
>
> The fix to avoid calls to getFileStatus() for each path component in 
> deleteUnnecessaryFakeDirectories() (HADOOP-13164) results in accumulation of 
> delete markers in versioned S3 buckets. The above patch replaced 
> getFileStatus() checks with a single batch delete request formed by 
> generating all ancestor keys formed from a given path. Since the delete 
> request is not checking for existence of fake directories, it will create a 
> delete marker for every path component that did not exist (or was previously 
> deleted). Note that issuing a DELETE request without specifying a version ID 
> will always create a new delete marker, even if one already exists ([AWS S3 
> Developer 
> Guide|https://docs.aws.amazon.com/AmazonS3/latest/dev/RemDelMarker.html])
> Since deleteUnnecessaryFakeDirectories() is called as a callback on 
> successful writes and on renames, delete markers accumulate rather quickly 
> and their rate of accumulation is inversely proportional to the depth of the 
> path. In other words, directories closer to the root will have more delete 
> markers than the leaves.
> This behavior negatively impacts performance of getFileStatus() operation 
> when it has to issue listObjects() request (especially v1) as the delete 
> markers have to be examined when the request searches for first current 
> non-deleted version of an object following a given prefix.
> I did a quick comparison against 3.x and the issue is still present: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2947]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to