[ 
https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762149#comment-13762149
 ] 

Eli Collins commented on HADOOP-9912:
-------------------------------------

Jason, Daryn, Kihwal, Colin, Andrew and myself discussed last Friday. Here are 
my notes:

There are two types of clients and three behaviors the APIs could reasonably 
support:
# Clients that are symlink-aware and want to see status objects for links, not 
have them auto-resolved. An example is the shell (it should list the link) or 
distcp (so it can optionally follow or not follow symlinks).
# Clients that are not symlink aware (ie most existing programs). This case is 
further broken down into:
## Clients that want symlink resolution exceptions exposed. Eg suppose user X 
moves a directory D and replaces it with a symlink S to that directory, but 
accidentally changed the permissions to D so that user Y can no longer access D 
via S. If user Y regularly recursively copies X's parent directory for backup 
then the copy should now fail, otherwise Y has no indication that is no longer 
backing up the data it needs to. 
## Clients that want symlinks resolution exceptions swallowed. Eg suppose a job 
uses a /\*/D glob path and there is a symlink /S that is either dangling or 
points somewhere the client doesn't have permission, should the job start 
failing because a root-level symlink is introduced that the user can list but 
not resolve?  It seems like some clients would want an option to swallow such 
resolution failures. This is arguably weaker than the previous example since if 
you want /*/D you might also have reasonably meant to access whatever /S/D 
referred to in which case you'd want the job to fail.

Also..

- FileSystem and FileContext should be consistent 
- We need to make a call as to whether symlinks for local FileSystem are:
-- Just for exposing symlinks in the underlying local file system
-- Supporting HDFS style symlinks (eg URIs that can span file systems)
-- I originally introduced them in HADOOP-6421 for to create/expose symlinks in 
the local file system (and for testing purposes)
- The easiest way to fix the Pig breakage in the near term while we figure this 
out is to revert HADOOP-9987 

So the next steps are:
- Articulate an API that supports all three usage patterns, it should covers 
all APIs that return FileStatus objects, not just listStatus. I volunteered to 
writeup a strawman proposal.
- Figure out which behavior should be the default. We need to finish figuring 
out the compatibility implications of the proposal, all options are 
incompatible at some level but we should favor the one that breaks 
compatibility the least for most existing programs (which do not use symlinks).

                
> globStatus of a symlink to a directory does not report symlink as a directory
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-9912
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9912
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, 
> old-hdfs.txt, old-local.txt
>
>
> globStatus for a path that is a symlink to a directory used to report the 
> resulting FileStatus as a directory but recently this has changed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to