[ https://issues.apache.org/jira/browse/HADOOP-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16771817#comment-16771817 ]
Steve Loughran commented on HADOOP-16077: ----------------------------------------- If you call {{FileSystems.listFiles(path, recursive)}}, you get a RemoteIterator<LocatedFileStatus> ; LocatedFileStatus contains an array of blocklocations, which are meant to contain the block locations and storage types This is the best API For a recursive file listing as * on HDFS: bulk incremental updates to reduce marshalling & time NN is locked * on object stores: the option of switching to more efficient path enumeration over treewalks. S3A does this & delivers O(files/1000) listings irrespective of the directory tree depth now, that's a bigger leap for ls -R than just listing the storage type, but it'd be great to expose that operation in general, because ls -R is so inefficient here. Trouble is of course, both Ls and LsR extend Command, which implements its treewalk recursively. Moving to a new iterator would be traumatic. Except maybe, just maybe, we could do something like have it support both forms of list & recurse, and for it to become an option to switch to; if you ask for storage levels, you must explicitly ask for the new recurse option. Maybe a separate "deepLs" command would be the strategy Have a look at {{S3aUtils.applyLocatedFiles()}} if you want to see some fun with closures and iterating over a list of LocatedFileStatus entries. That could all be promoted into {{org.apache.hadoop.util.LambdaUtils}} or the new {{org.apache.hadoop.fs.impl}} package. BTW: I'm thinking that we could have the object stores expose their archive status of files in the storage type, so things like AWS Glacier storage would be visible. Being able to list here would be idea. > Add an option in ls command to include storage policy > ----------------------------------------------------- > > Key: HADOOP-16077 > URL: https://issues.apache.org/jira/browse/HADOOP-16077 > Project: Hadoop Common > Issue Type: Improvement > Components: common > Affects Versions: 3.3.0 > Reporter: Ayush Saxena > Assignee: Ayush Saxena > Priority: Major > Attachments: HADOOP-16077-01.patch, HADOOP-16077-02.patch, > HADOOP-16077-03.patch, HADOOP-16077-04.patch, HADOOP-16077-05.patch, > HADOOP-16077-06.patch, HADOOP-16077-07.patch, HADOOP-16077-08.patch, > HADOOP-16077-09.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org