[
https://issues.apache.org/jira/browse/HADOOP-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12558129#action_12558129
]
Doug Cutting commented on HADOOP-2566:
--------------------------------------
Globbing is implemented on top of listPaths() which is implemented on top of
listStatus(). The primitive globbing API should not throw away that status
information. It should keep it so that glob clients which need it do not have
to call getStatus() for each file that matches. Currently the cache of
FileStatus hides the cost of these getStatus() calls, but that cache will break
things once files and their status can change. So we need globStatus() before
we can remove the cache.
FileInputFormat, for example, uses globPaths() to list files matching the input
specification, then it uses getStatus() on each matching path when building
splits. This must change to call globStatus() before the cache is removed.
Long-term, globPaths() and listPaths() may perhaps still be useful as a utility
methods implemented in terms of of globStatus() and listStatus(), but since
most current users of these will be broken performancewise once the cache is
removed, we should deprecate them now to strongly encourage folks to stop using
them before that cache is removed, to give fair warning.
> need FileSystem#globStatus method
> ---------------------------------
>
> Key: HADOOP-2566
> URL: https://issues.apache.org/jira/browse/HADOOP-2566
> Project: Hadoop
> Issue Type: Improvement
> Components: fs
> Reporter: Doug Cutting
> Assignee: Hairong Kuang
> Fix For: 0.16.0
>
>
> To remove the cache of FileStatus in DFSPath (HADOOP-2565) without hurting
> performance, we must use file enumeration APIs that return FileStatus[]
> rather than Path[]. Currently we have FileSystem#globPaths(), but that
> method should be deprecated and replaced with a FileSystem#globStatus().
> We need to deprecate FileSystem#globPaths() in 0.16 in order to remove the
> cache in 0.17.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.