[ https://issues.apache.org/jira/browse/HADOOP-13448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510928#comment-15510928 ]
Aaron Fabbri commented on HADOOP-13448: --------------------------------------- Another interface thing to discuss is move(). I'm working on HADOOP-13631 and there a couple of negatives to the current interface: {code} /** * Moves metadata from {@code src} to {@code dst}, including all descendants * recursively. * * @param src the source path * @param dst the new path of the file/dir after the move * @throws IOException if there is an error */ void move(Path src, Path dst) throws IOException; {code} 1. This does not support sparsely-stored metadata well. 2. The client (FileSystem) likely is already traversing the source and destination tree paths, and that work will be repeated in the MetadataStore to affect all the paths that need to change. (not too bad, depending on how a particular MetadataStore is implemented.) On #1, consider the fact that we do not require the entire filesystem directory structure to be in the MetadataStore. For example, a user could have some bucket s3a://big-bucket with many files and (emulated) directories in it. They then fire up a Hadoop cluster for the first time, and the metadata store tracks changes they make from that point forward. Imagine the big-bucket contains this subtree: {code} . ├── a1 │ └── b1 │ ├── file1 │ ├── file2 │ ├── file3 │ ├── file4 │ └── file5 └── a2 {code} Since this is existing data, and user has just started a new Hadoop cluster, the MetadataStore is currently empty. Now, the user does a {{move(/a1/b1, /a2/b1)}} The MetadataStore will still be empty. It received a move request but it did not match any stored paths. Next, another node does a {{listStatus(/a2)}}. Assuming the underlying blob store is eventually consistent, the user may see an empty listing, as both the underlying store and the MetadataStore have no entries in that directory. Some possible solutions: A. Clients must call both {{MetadataStore#move()}} on src, dst paths, and also {{put()}} on all destination paths affected (recursively). B. We could remove move() from MetadataStore, and tell clients to use put()s for each moved path, and deleteSubtree(src). This does not allow atomic move to be implemented, however. C. We could expose a batch version of B, i.e. move(Collection<Path> pathsToDelete, Collection<Path> pathsToCreate). Any thoughts? Option C does solve both issues I brought up. It would have good usability for s3a, but could be a little awkward for other filesystems perhaps. > S3Guard: Define MetadataStore interface. > ---------------------------------------- > > Key: HADOOP-13448 > URL: https://issues.apache.org/jira/browse/HADOOP-13448 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Reporter: Chris Nauroth > Assignee: Chris Nauroth > Fix For: HADOOP-13345 > > Attachments: HADOOP-13448-HADOOP-13345.001.patch, > HADOOP-13448-HADOOP-13345.002.patch, HADOOP-13448-HADOOP-13345.003.patch, > HADOOP-13448-HADOOP-13345.004.patch, HADOOP-13448-HADOOP-13345.005.patch > > > Define the common interface for metadata store operations. This is the > interface that any metadata back-end must implement in order to integrate > with S3Guard. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org