[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762149#comment-13762149 ]
Eli Collins commented on HADOOP-9912: ------------------------------------- Jason, Daryn, Kihwal, Colin, Andrew and myself discussed last Friday. Here are my notes: There are two types of clients and three behaviors the APIs could reasonably support: # Clients that are symlink-aware and want to see status objects for links, not have them auto-resolved. An example is the shell (it should list the link) or distcp (so it can optionally follow or not follow symlinks). # Clients that are not symlink aware (ie most existing programs). This case is further broken down into: ## Clients that want symlink resolution exceptions exposed. Eg suppose user X moves a directory D and replaces it with a symlink S to that directory, but accidentally changed the permissions to D so that user Y can no longer access D via S. If user Y regularly recursively copies X's parent directory for backup then the copy should now fail, otherwise Y has no indication that is no longer backing up the data it needs to. ## Clients that want symlinks resolution exceptions swallowed. Eg suppose a job uses a /\*/D glob path and there is a symlink /S that is either dangling or points somewhere the client doesn't have permission, should the job start failing because a root-level symlink is introduced that the user can list but not resolve? It seems like some clients would want an option to swallow such resolution failures. This is arguably weaker than the previous example since if you want /*/D you might also have reasonably meant to access whatever /S/D referred to in which case you'd want the job to fail. Also.. - FileSystem and FileContext should be consistent - We need to make a call as to whether symlinks for local FileSystem are: -- Just for exposing symlinks in the underlying local file system -- Supporting HDFS style symlinks (eg URIs that can span file systems) -- I originally introduced them in HADOOP-6421 for to create/expose symlinks in the local file system (and for testing purposes) - The easiest way to fix the Pig breakage in the near term while we figure this out is to revert HADOOP-9987 So the next steps are: - Articulate an API that supports all three usage patterns, it should covers all APIs that return FileStatus objects, not just listStatus. I volunteered to writeup a strawman proposal. - Figure out which behavior should be the default. We need to finish figuring out the compatibility implications of the proposal, all options are incompatible at some level but we should favor the one that breaks compatibility the least for most existing programs (which do not use symlinks). > globStatus of a symlink to a directory does not report symlink as a directory > ----------------------------------------------------------------------------- > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs > Affects Versions: 2.3.0 > Reporter: Jason Lowe > Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira