[ https://issues.apache.org/jira/browse/HADOOP-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Mackrory updated HADOOP-13760: ----------------------------------- Attachment: HADOOP-13760-HADOOP-13345.004.patch Attaching another patch. All tests are passing for me now with the Null, Local and DynamoDB implementations. There's still an issue to resolve with regard to innerRename's need to operate just on the blobs that exist in S3 (e.g. files and empty directories) present a real problem: the LeafNodeIterator can only return things that are empty as far as DynamoDB knows. Reconciling that with S3 (as S3 might still list objects that have been deleted recently and might know about current children that DynamoDB isn't aware of). A few general possibilities: * Ignore exceptions that get thrown if we call copyFile on an intermediate blob that doesn't exist. Some unnecessary round trips to S3 and we might miss legitimate errors that shouldn't be ignored. * Have LeafNodeIterator check every possible leaf node against S3 to confirm there are no other children. Adds more round trips to rename (and any other use of listFilesAndEmptyDirectories, but to really be *correct* I think this logic is going to have to happen somewhere and it may as well be in that new Iterator. Better than having it directly in innerRename. We should note that the performance hit of the extra round trips to S3 is probably heavily alleviated if / when we can parallelize / make asynchronous the rename loop since they would use very little bandwidth and intermediate / inferred directories *may* be vastly outnumbered by files. Parallelizing the LeafNodeIterator approach would be an interesting challenge, but moving that logic into innerRename would of course be even more interesting... Tough problem that needs some though.. > S3Guard: add delete tracking > ---------------------------- > > Key: HADOOP-13760 > URL: https://issues.apache.org/jira/browse/HADOOP-13760 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Aaron Fabbri > Assignee: Sean Mackrory > Attachments: HADOOP-13760-HADOOP-13345.001.patch, > HADOOP-13760-HADOOP-13345.002.patch, HADOOP-13760-HADOOP-13345.003.patch, > HADOOP-13760-HADOOP-13345.004.patch > > > Following the S3AFileSystem integration patch in HADOOP-13651, we need to add > delete tracking. > Current behavior on delete is to remove the metadata from the MetadataStore. > To make deletes consistent, we need to add a {{isDeleted}} flag to > {{PathMetadata}} and check it when returning results from functions like > {{getFileStatus()}} and {{listStatus()}}. In HADOOP-13651, I added TODO > comments in most of the places these new conditions are needed. The work > does not look too bad. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org