[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559735#action_12559735
 ] 

Hairong Kuang commented on HADOOP-2566:
---------------------------------------

> Doesn't this patch essentially do ' arr; for (path : old_globPaths()) 
> arr[i++] = getFileStatus(path); return arr; '. Is this what we wanted? I 
> thought we wanted other way around.
No, this patch does not do what you described. Basically it only listStatus on 
the parent directories when there is a glob in the component and it calls 
getFileStatus on the last component if there is no glob in there.

For example 1, globStatus("/user/hairong/file*") only does 
listStatus("/user/hairong") and returns status for the matched 
files/subdirectories; Previously globPath("/user/hairong/file*") would 
listStatus("/user/hairong") then discard all statuses of the matched 
files/subdirectories then return only paths. The caller has to call 
getFileStatus again for each returned path.

For example 2, globStatus("/user/*/file") calls listStaus("/user") and then 
calls getFileStatus on all the paths matched /user/*/file. Then it does as what 
you described.

> need FileSystem#globStatus method
> ---------------------------------
>
>                 Key: HADOOP-2566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2566
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Doug Cutting
>            Assignee: Hairong Kuang
>             Fix For: 0.16.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
> performance, we must use file enumeration APIs that return FileStatus[] 
> rather than Path[].  Currently we have FileSystem#globPaths(), but that 
> method should be deprecated and replaced with a FileSystem#globStatus().
> We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
> cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to