[ 
https://issues.apache.org/jira/browse/HADOOP-9984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801839#comment-13801839
 ] 

Daryn Sharp commented on HADOOP-9984:
-------------------------------------

bq.  Unix's readir does return the file type - see my comment. So your 
statement is not true. It is mostly transparent.

Yes, but {{listStatus}} is not {{readdir}} - it's {{readdir}} + {{stat}}.  No 
pre-2.x {{FileSystem}} code is prepared to deal with symlinks so the approach 
must start with the premise that symlinks are invisible and never exposed to 
the user.

bq. So your prefer the second option (b) for readDir. Is you layer file system 
proposal for fixing symlinks, an implementation choice for option (b) or 
something with fundamentally different semantics?

A superset of (b) for all methods that return a file status.

The problem isn't just what {{isDir}} returns, but about what any of the file 
status calls return.  The main concerns voiced with returning the symlink 
target's file stat are the following failure scenarios:
# What if the link is dangling?
# What if the user lacks permission?
# What if it's on a remote fs that's down?
# For any of the above, what if the user was going to filter out the 
problematic path based on name?  Why completely fail?

Failing the entire list/glob status in those scenarios is undesirable.  
Returning a file status that lazy resolves, ala local fs, to the target file 
stat neatly avoids all those problems.  The user gets a list back instead of an 
exception, and will fail when an individual file status attribute other than 
path is queried.

bq.  Our tools may want to copy a symlink as-is rather than copy the file it 
refers to; all I am saying is that if there is a need to do that we need to fix 
such tools.

Agreed.  Such tools need modification to specifically become symlink aware, but 
the default must be invisible symlinks for existing code.

> FileSystem#globStatus and FileSystem#listStatus should resolve symlinks by 
> default
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-9984
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9984
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.1.0-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HADOOP-9984.001.patch, HADOOP-9984.003.patch, 
> HADOOP-9984.005.patch, HADOOP-9984.007.patch, HADOOP-9984.009.patch, 
> HADOOP-9984.010.patch, HADOOP-9984.011.patch, HADOOP-9984.012.patch, 
> HADOOP-9984.013.patch, HADOOP-9984.014.patch, HADOOP-9984.015.patch
>
>
> During the process of adding symlink support to FileSystem, we realized that 
> many existing HDFS clients would be broken by listStatus and globStatus 
> returning symlinks.  One example is applications that assume that 
> !FileStatus#isFile implies that the inode is a directory.  As we discussed in 
> HADOOP-9972 and HADOOP-9912, we should default these APIs to returning 
> resolved paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to