[
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788653#comment-13788653
]
Colin Patrick McCabe commented on HADOOP-9780:
----------------------------------------------
So, the problem we have right now with returning unresolved paths right now is
that each time the server encounters a symlink, it throws an exception, which
triggers the client to do another RPC (well, actually two RPCs, due to an
implementation quirk right now-- see HDFS-5293). Returning unresolved paths
would cause the client to keep redoing these path resolution RPCs over and
over. This doesn't scale-- basically it multiplies the load on the NameNode by
at least 3x and possibly more, depending on the number of links.
To avoid this, I think we should resolve as much as possible of the symlink on
the NameNode. The NameNode already knows which inodes are symlinks, and it
knows what they point to. If what they point to is on the local NameNode
(which should be the common case), we should just resolve it then and there and
keep going, rather than doing the "please make another RPC to me" dance.
Obviously, this doesn't help in the case of cross-namespace symlinks. However,
it does help a lot in the extremely common case of links to things on the same
NameNode.
In a way, this is similar to how {{LocalFileSystem}} already operates. When
you try to read a local file, it resolves as many symlinks as it can without
throwing {{UnresolvedLinkException}}, unless a symlink is dangling. There's no
reason to ask the client for help if you don't need the help.
> Filesystem and FileContext methods that follow symlinks should return
> unresolved paths
> --------------------------------------------------------------------------------------
>
> Key: HADOOP-9780
> URL: https://issues.apache.org/jira/browse/HADOOP-9780
> Project: Hadoop Common
> Issue Type: Sub-task
> Reporter: Colin Patrick McCabe
> Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with
> all symlinks removed. For compatibility reasons, we might want to have the
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}}
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}
--
This message was sent by Atlassian JIRA
(v6.1#6144)