[ https://issues.apache.org/jira/browse/HDFS-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15329298#comment-15329298 ]
Steve Loughran commented on HDFS-10413: --------------------------------------- Can I note that from the perspective of S3a, using listFiles(recursive=true) is significantly faster than using listStatus(). If code were encouraged to use that API rather than their own treewalk, then anything that works with object stores would see significant speedup. Also, listFiles and similar use the RemoteIterator. That code can be async, to the extent that the results can be arriving while the client is processing the previous results. The code I'm doing in HADOOP-13208 doesn't do that, but it does do windowed queries; you only get a window-full of files listed, filtered and made available at a time. This keeps memory consumption down. > Implement asynchronous listStatus for DistributedFileSystem > ----------------------------------------------------------- > > Key: HDFS-10413 > URL: https://issues.apache.org/jira/browse/HDFS-10413 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Xiaobing Zhou > Assignee: Xiaobing Zhou > > Per the > [comment|https://issues.apache.org/jira/browse/HDFS-9924?focusedCommentId=15285597&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15285597] > from [~mingma], this Jira tracks efforts of implementing async listStatus. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org