[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754115#comment-13754115 ]
Andrew Wang commented on HADOOP-9912: ------------------------------------- Let's be constructive and figure out the right fix. Jason, thanks for the attached test case, that helped me understand the issue. bq. listStatus resolves symlinks. globStatus is supposed to be equivalent to listStatus with wildcard support...Symlinks should be transparent to users unless they specifically want to know if a path is a symlink. In HDFS, {{listStatus}} only transparently resolves symlinks in the input path. It doesn't resolve the results of the listing, and this is the correct behavior. {{globStatus}} behaves the same way, in that it returns FileStatuses for Paths that match the glob, and it doesn't resolve these results. You can (and should) see symlinks returned by listStatus and globStatus in HDFS. I also wouldn't say {{globStatus}} is equivalent to {{listStatus}}, since it doesn't list directories. If you want listStatus with matching, you can use {{listStatus(Path, PathFilter)}}. In RLFS there is automatic symlink resolution, so {{listStatus}} results are resolved, and it seems like Pig depends on this behavior. Because of HADOOP-9877), {{globStatus}} went from always calling {{listStatus}} to calling {{getFileLinkStatus}} for non-wildcard glob components. Thus, when passed a {{Path}} that's a symlink, {{globStatus}} says it's a symlink. bq. Why does .snapshot support require a getFileLinkStatus? Does getFileStatus not work for a .snapshot directory? It does work, but it's incorrect. globStatus is not supposed to return resolved statuses. It's unfortunate that RLFS has been auto-resolving all this time, but since apps apparently depend on it, all we can do is embrace it. How about this: we add a fixup step that, for symlink results on a LocalFileSystem, resolves them (but still keeping the link path). This means no more symlinks in RLFS {{globStatus}} results. It's a bit obnoxious to do (globStatus could symlink through HDFS to a link on a local filesystem), but it seems like a reasonable solution. > globStatus of a symlink to a directory does not report symlink as a directory > ----------------------------------------------------------------------------- > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Affects Versions: 2.3.0 > Reporter: Jason Lowe > Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira