[ 
https://issues.apache.org/jira/browse/HADOOP-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771563#comment-13771563
 ] 

Binglin Chang commented on HADOOP-9972:
---------------------------------------

bq. I'm afraid that what you're seeing is a bug. I introduced this bug and I 
have a patch available to fix it: 
https://issues.apache.org/jira/browse/HADOOP-9929
The experiment is done in linux, I was talking about linux practice, the 
practice is glob ignore all permission/dangling errors, then ls handle errors 
properly. We should better follow linux practice, I don't see it is related to 
HADOOP-9929.
I think the correct fix for HADOOP-9929 should be:
{code}
hadoop fs -ls /user/abc/tests/data
  glob(/user/abc/tests/data)
    pattern matches nothing because of permission issue, so just return 
[/user/abc/tests/data]
  ls [/user/abc/tests/data] return permission error
{code}

bq. I want to avoid a combinatorial explosion of function overloads.
There is no combinatorial explosion, every fs already has a listStatus 
implementation, if the fs support symlink(to my knowledge only LocalFS and HDFS 
support symlink), we add listLinkStatus(for HDFS, just rename listStatus to 
listLinkStatus), if the fs does not support symlink, by default listStatus = 
listLinkStatus, the change is minimal. All other non core API(listStatus(Path), 
listStatus(Path, PathFilter), listStatus(Path[]), listStatus(Path[], 
PathFilter), listStatus(Path, PathOption)) should only implemented in FS/FC)

listStatus(Path, PathOption) doesn't like a core API, core API should be 
minimal, orthogonal, and complete. listStatus(Path, PathOption) in the end 
still need readdir/getLinkStatus equivalent to implement. 

bq. The Linux practice is based on the fact that readdir only returns path 
names (i.e. strings) in POSIX
Most linux/bsd system, readdir return filename and type.
http://man7.org/linux/man-pages/man3/readdir.3.html







                
> new APIs for listStatus and globStatus to deal with symlinks
> ------------------------------------------------------------
>
>                 Key: HADOOP-9972
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9972
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.1.1-beta
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>
> Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to 
> deal with symlinks.  The issue is that code has been written which is 
> incompatible with the existence of things which are not files or directories. 
>  For example,
> there is a lot of code out there that looks at FileStatus#isFile, and
> if it returns false, assumes that what it is looking at is a
> directory.  In the case of a symlink, this assumption is incorrect.
> It seems reasonable to make the default behavior of {{FileSystem#listStatus}} 
> and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring 
> dangling ones.  This will prevent incompatibility with existing MR jobs and 
> other HDFS users.  We should also add new versions of listStatus and 
> globStatus that allow new, symlink-aware code to deal with symlinks as 
> symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to