[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559735#action_12559735
 ] 

hairong edited comment on HADOOP-2566 at 1/16/08 3:53 PM:
----------------------------------------------------------------

> Doesn't this patch essentially do ' arr; for (path : old_globPaths()) 
> arr[i++] = getFileStatus(path); return arr; '. Is this what we wanted? I 
> thought we wanted other way around.
No, this patch does not do what you described. Basically it only listStatus on 
the parent directories when there is a glob in the component and it calls 
getFileStatus on the last component if there is no glob in there.

For example 1, globStatus("/user/hairong/file*") only does 
listStatus("/user/hairong") and returns status for the matched 
files/subdirectories; Previously globPath("/user/hairong/file*") would 
listStatus("/user/hairong") then discard all statuses of the matched 
files/subdirectories then return only paths. The caller has to call 
getFileStatus again for each returned path.

For example 2, globStatus("/user/\*/file") calls listStaus("/user") and then 
calls getFileStatus on all the paths matched /user/*/file. It does as what you 
described.

      was (Author: hairong):
    > Doesn't this patch essentially do ' arr; for (path : old_globPaths()) 
arr[i++] = getFileStatus(path); return arr; '. Is this what we wanted? I 
thought we wanted other way around.
No, this patch does not do what you described. Basically it only listStatus on 
the parent directories when there is a glob in the component and it calls 
getFileStatus on the last component if there is no glob in there.

For example 1, globStatus("/user/hairong/file*") only does 
listStatus("/user/hairong") and returns status for the matched 
files/subdirectories; Previously globPath("/user/hairong/file*") would 
listStatus("/user/hairong") then discard all statuses of the matched 
files/subdirectories then return only paths. The caller has to call 
getFileStatus again for each returned path.

For example 2, globStatus("/user/*/file") calls listStaus("/user") and then 
calls getFileStatus on all the paths matched /user/*/file. Then it does as what 
you described.
  
> need FileSystem#globStatus method
> ---------------------------------
>
>                 Key: HADOOP-2566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2566
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Doug Cutting
>            Assignee: Hairong Kuang
>             Fix For: 0.16.0
>
>         Attachments: globStatus.patch, globStatus1.patch
>
>
> To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
> performance, we must use file enumeration APIs that return FileStatus[] 
> rather than Path[].  Currently we have FileSystem#globPaths(), but that 
> method should be deprecated and replaced with a FileSystem#globStatus().
> We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
> cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to