[ https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559735#action_12559735 ]
Hairong Kuang commented on HADOOP-2566: --------------------------------------- > Doesn't this patch essentially do ' arr; for (path : old_globPaths()) > arr[i++] = getFileStatus(path); return arr; '. Is this what we wanted? I > thought we wanted other way around. No, this patch does not do what you described. Basically it only listStatus on the parent directories when there is a glob in the component and it calls getFileStatus on the last component if there is no glob in there. For example 1, globStatus("/user/hairong/file*") only does listStatus("/user/hairong") and returns status for the matched files/subdirectories; Previously globPath("/user/hairong/file*") would listStatus("/user/hairong") then discard all statuses of the matched files/subdirectories then return only paths. The caller has to call getFileStatus again for each returned path. For example 2, globStatus("/user/*/file") calls listStaus("/user") and then calls getFileStatus on all the paths matched /user/*/file. Then it does as what you described. > need FileSystem#globStatus method > --------------------------------- > > Key: HADOOP-2566 > URL: https://issues.apache.org/jira/browse/HADOOP-2566 > Project: Hadoop > Issue Type: Improvement > Components: fs > Reporter: Doug Cutting > Assignee: Hairong Kuang > Fix For: 0.16.0 > > Attachments: globStatus.patch, globStatus1.patch > > > To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting > performance, we must use file enumeration APIs that return FileStatus[] > rather than Path[]. Currently we have FileSystem#globPaths(), but that > method should be deprecated and replaced with a FileSystem#globStatus(). > We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the > cache in 0.17. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.