[ 
https://issues.apache.org/jira/browse/HADOOP-9780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13788653#comment-13788653
 ] 

Colin Patrick McCabe commented on HADOOP-9780:
----------------------------------------------

So, the problem we have right now with returning unresolved paths right now is 
that each time the server encounters a symlink, it throws an exception, which 
triggers the client to do another RPC (well, actually two RPCs, due to an 
implementation quirk right now-- see HDFS-5293).  Returning unresolved paths 
would cause the client to keep redoing these path resolution RPCs over and 
over.  This doesn't scale-- basically it multiplies the load on the NameNode by 
at least 3x and possibly more, depending on the number of links.

To avoid this, I think we should resolve as much as possible of the symlink on 
the NameNode.  The NameNode already knows which inodes are symlinks, and it 
knows what they point to.  If what they point to is on the local NameNode 
(which should be the common case), we should just resolve it then and there and 
keep going, rather than doing the "please make another RPC to me" dance.

Obviously, this doesn't help in the case of cross-namespace symlinks.  However, 
it does help a lot in the extremely common case of links to things on the same 
NameNode.

In a way, this is similar to how {{LocalFileSystem}} already operates.  When 
you try to read a local file, it resolves as many symlinks as it can without 
throwing {{UnresolvedLinkException}}, unless a symlink is dangling.  There's no 
reason to ask the client for help if you don't need the help.

> Filesystem and FileContext methods that follow symlinks should return 
> unresolved paths
> --------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9780
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9780
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Colin Patrick McCabe
>            Priority: Minor
>
> Currently, when you follow a symlink, you get back the resolved path, with 
> all symlinks removed.  For compatibility reasons, we might want to have the 
> returned path be an unresolved path.
> Example: if you have:
> {code}
> /a -> b
> /b
> /b/c
> {code}
> {{getFileStatus("/a/c")}} will return a {{FileStatus}} object with a {{Path}} 
> of {{"/b/c"}}.
> If we returned the unresolved path, that would be {{"/a/c"}}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to