[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558129#action_12558129 ]
Doug Cutting commented on HADOOP-2566: -------------------------------------- Globbing is implemented on top of listPaths() which is implemented on top of listStatus(). The primitive globbing API should not throw away that status information. It should keep it so that glob clients which need it do not have to call getStatus() for each file that matches. Currently the cache of FileStatus hides the cost of these getStatus() calls, but that cache will break things once files and their status can change. So we need globStatus() before we can remove the cache. FileInputFormat, for example, uses globPaths() to list files matching the input specification, then it uses getStatus() on each matching path when building splits. This must change to call globStatus() before the cache is removed. Long-term, globPaths() and listPaths() may perhaps still be useful as a utility methods implemented in terms of of globStatus() and listStatus(), but since most current users of these will be broken performancewise once the cache is removed, we should deprecate them now to strongly encourage folks to stop using them before that cache is removed, to give fair warning. > need FileSystem#globStatus method > --------------------------------- > > Key: HADOOP-2566 > URL: https://issues.apache.org/jira/browse/HADOOP-2566 > Project: Hadoop > Issue Type: Improvement > Components: fs > Reporter: Doug Cutting > Assignee: Hairong Kuang > Fix For: 0.16.0 > > > To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting > performance, we must use file enumeration APIs that return FileStatus[] > rather than Path[]. Currently we have FileSystem#globPaths(), but that > method should be deprecated and replaced with a FileSystem#globStatus(). > We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the > cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.