[
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559735#action_12559735
]
Hairong Kuang commented on HADOOP-2566:
---------------------------------------
> Doesn't this patch essentially do ' arr; for (path : old_globPaths())
> arr[i++] = getFileStatus(path); return arr; '. Is this what we wanted? I
> thought we wanted other way around.
No, this patch does not do what you described. Basically it only listStatus on
the parent directories when there is a glob in the component and it calls
getFileStatus on the last component if there is no glob in there.
For example 1, globStatus("/user/hairong/file*") only does
listStatus("/user/hairong") and returns status for the matched
files/subdirectories; Previously globPath("/user/hairong/file*") would
listStatus("/user/hairong") then discard all statuses of the matched
files/subdirectories then return only paths. The caller has to call
getFileStatus again for each returned path.
For example 2, globStatus("/user/*/file") calls listStaus("/user") and then
calls getFileStatus on all the paths matched /user/*/file. Then it does as what
you described.
> need FileSystem#globStatus method
> ---------------------------------
>
> Key: HADOOP-2566
> URL: https://issues.apache.org/jira/browse/HADOOP-2566
> Project: Hadoop
> Issue Type: Improvement
> Components: fs
> Reporter: Doug Cutting
> Assignee: Hairong Kuang
> Fix For: 0.16.0
>
> Attachments: globStatus.patch, globStatus1.patch
>
>
> To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting
> performance, we must use file enumeration APIs that return FileStatus[]
> rather than Path[]. Currently we have FileSystem#globPaths(), but that
> method should be deprecated and replaced with a FileSystem#globStatus().
> We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the
> cache in 0.17.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.