[ 
https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Mackrory updated HADOOP-13760:
-----------------------------------
    Attachment: HADOOP-13760-HADOOP-13345.004.patch

Attaching another patch. All tests are passing for me now with the Null, Local 
and DynamoDB implementations.

There's still an issue to resolve with regard to innerRename's need to operate 
just on the blobs that exist in S3 (e.g. files and empty directories) present a 
real problem: the LeafNodeIterator can only return things that are empty as far 
as DynamoDB knows. Reconciling that with S3 (as S3 might still list objects 
that have been deleted recently and might know about current children that 
DynamoDB isn't aware of). A few general possibilities:

* Ignore exceptions that get thrown if we call copyFile on an intermediate blob 
that doesn't exist. Some unnecessary round trips to S3 and we might miss 
legitimate errors that shouldn't be ignored.
* Have LeafNodeIterator check every possible leaf node against S3 to confirm 
there are no other children. Adds more round trips to rename (and any other use 
of listFilesAndEmptyDirectories, but to really be *correct* I think this logic 
is going to have to happen somewhere and it may as well be in that new 
Iterator. Better than having it directly in innerRename.

We should note that the performance hit of the extra round trips to S3 is 
probably heavily alleviated if / when we can parallelize / make asynchronous 
the rename loop since they would use very little bandwidth and intermediate / 
inferred directories *may* be vastly outnumbered by files. Parallelizing the 
LeafNodeIterator approach would be an interesting challenge, but moving that 
logic into innerRename would of course be even more interesting...

Tough problem that needs some though..

> S3Guard: add delete tracking
> ----------------------------
>
>                 Key: HADOOP-13760
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13760
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Sean Mackrory
>         Attachments: HADOOP-13760-HADOOP-13345.001.patch, 
> HADOOP-13760-HADOOP-13345.002.patch, HADOOP-13760-HADOOP-13345.003.patch, 
> HADOOP-13760-HADOOP-13345.004.patch
>
>
> Following the S3AFileSystem integration patch in HADOOP-13651, we need to add 
> delete tracking.
> Current behavior on delete is to remove the metadata from the MetadataStore.  
> To make deletes consistent, we need to add a {{isDeleted}} flag to 
> {{PathMetadata}} and check it when returning results from functions like 
> {{getFileStatus()}} and {{listStatus()}}.  In HADOOP-13651, I added TODO 
> comments in most of the places these new conditions are needed.  The work 
> does not look too bad.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to