[ 
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558742#action_12558742
 ] 

Doug Cutting commented on HADOOP-2566:
--------------------------------------



> For example, globPath("/user/*/data") needs only to listPath("/user").

But listPaths() is not a primitive, it is a utility method defined in terms of 
listStatus().  So this example is calling listStatus("/user") and then 
stripping the list of FileStatus objects down to a list of Path objects.  We 
should remove that stripping, or at least make it optional.  To make it 
optional, the primitive glob operation should be globStatus, and globPaths() 
should become a utility method defined in terms of globStatus().

> Some of shell commands like delete, copy, and rename use globPath but don't 
> need FileStatus.

These actually all do need the FileStatus.  They need to find out whether each 
file is a directory or not, to find out when to recurse.  Copy also needs other 
attributes so that they can be set on the copy too.  So we'll end up needing to 
rework these.

We will not remove globPaths() in this release, so these commands do not need 
to change right now.  But before we can remove the cache we need to examine 
every place that calls globPaths to check whether these must be converted to 
use globStatus.  That's why we're deprecating globPaths(), to force folks to do 
this.  Then, in 0.17, we can remove the cache from trunk, and start identifying 
all the problems.  But we want users who upgrade to 0.17 to be forwarned, and 
to have an API that supports cache-free use before we remove the cache, so that 
they can upgrade to 0.17 more smoothly.


> need FileSystem#globStatus method
> ---------------------------------
>
>                 Key: HADOOP-2566
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2566
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Doug Cutting
>            Assignee: Hairong Kuang
>             Fix For: 0.16.0
>
>
> To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting 
> performance, we must use file enumeration APIs that return FileStatus[] 
> rather than Path[].  Currently we have FileSystem#globPaths(), but that 
> method should be deprecated and replaced with a FileSystem#globStatus().
> We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the 
> cache in 0.17.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to