[ 
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754115#comment-13754115
 ] 

Andrew Wang commented on HADOOP-9912:
-------------------------------------

Let's be constructive and figure out the right fix. Jason, thanks for the 
attached test case, that helped me understand the issue.

bq. listStatus resolves symlinks. globStatus is supposed to be equivalent to 
listStatus with wildcard support...Symlinks should be transparent to users 
unless they specifically want to know if a path is a symlink.

In HDFS, {{listStatus}} only transparently resolves symlinks in the input path. 
It doesn't resolve the results of the listing, and this is the correct 
behavior. {{globStatus}} behaves the same way, in that it returns FileStatuses 
for Paths that match the glob, and it doesn't resolve these results. You can 
(and should) see symlinks returned by listStatus and globStatus in HDFS.

I also wouldn't say {{globStatus}} is equivalent to {{listStatus}}, since it 
doesn't list directories. If you want listStatus with matching, you can use 
{{listStatus(Path, PathFilter)}}.

In RLFS there is automatic symlink resolution, so {{listStatus}} results are 
resolved, and it seems like Pig depends on this behavior. Because of 
HADOOP-9877), {{globStatus}} went from always calling {{listStatus}} to calling 
{{getFileLinkStatus}} for non-wildcard glob components. Thus, when passed a 
{{Path}} that's a symlink, {{globStatus}} says it's a symlink.

bq. Why does .snapshot support require a getFileLinkStatus? Does getFileStatus 
not work for a .snapshot directory?

It does work, but it's incorrect. globStatus is not supposed to return resolved 
statuses. It's unfortunate that RLFS has been auto-resolving all this time, but 
since apps apparently depend on it, all we can do is embrace it.

How about this: we add a fixup step that, for symlink results on a 
LocalFileSystem, resolves them (but still keeping the link path). This means no 
more symlinks in RLFS {{globStatus}} results. It's a bit obnoxious to do 
(globStatus could symlink through HDFS to a link on a local filesystem), but it 
seems like a reasonable solution.
                
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-9912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9912
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-9912-testcase.patch
>
>
> globStatus for a path that is a symlink to a directory used to report the 
> resulting FileStatus as a directory but recently this has changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to