[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786814#comment-13786814 ] Colin Patrick McCabe commented on HADOOP-9912: -- I agree. We should dupe this to HADOOP-9984 and close. Any objections? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786335#comment-13786335 ] Daryn Sharp commented on HADOOP-9912: - I'm not sure this patch makes sense anymore. It's what originally kickstarted the symlink discussion before we did a deeper dive. If globStatus is resolving symlinks along the way, it's never going to see a link so it doesn't need to know if a link is a link to a directory. Or am I missing something? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13785811#comment-13785811 ] Colin Patrick McCabe commented on HADOOP-9912: -- I believe this should be fixed by HADOOP-9984. Let's wait for that patch to land and then triage. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773484#comment-13773484 ] Colin Patrick McCabe commented on HADOOP-9912: -- By the way, as far as I know, HADOOP-9984 is the only one of the JIRAs in this area that needs to get into branch-2.1-beta. HADOOP-9981 is not in 2.1-beta > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773302#comment-13773302 ] Arun C Murthy commented on HADOOP-9912: --- Thanks [~cmccabe]. How about breaking HADOOP-9972 into two independent pieces: # Fix globStatus/listStatus to resolve and throw an exception when it can't. # Add new apis being discussed in HADOOP-9972. This way #1 can be expedited and we can unblock hadoop-2.2 (GA). #2 can come in hadoop-2.3. In hadoop-2.2 we can put appropriate notices that symlinks aren't yet ready for primetime. Thoughts? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773395#comment-13773395 ] Colin Patrick McCabe commented on HADOOP-9912: -- Sure. We can change the default behavior in a separate JIRA. I filed HADOOP-9984 to change the default behavior of FileSystem#globStatus and FileSystem#listStatus to be resolving symlinks. bq. Plus multiple exceptions are now erroneously being reported as FNF, etc. Ex. I know of at least AccessControlException (I think Colin fixed?) and StandbyException but I think we encountered more. This was already fixed in HADOOP-9929. bq. The new implementation is causing increased RPC load by listing parent directories when no patterns are present in the component. We're discussing this (plus a fix) on HADOOP-9981. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773377#comment-13773377 ] Daryn Sharp commented on HADOOP-9912: - We also need to revert most of HADOOP-9817, or at least change the new {{Globber}} class to behave identically (although resolving symlinks) to how it did before being split out of FileSystem. The new implementation is causing increased RPC load by listing parent directories when no patterns are present in the component. We've had apps go OOM, or run extremely slowly, because parent dirs had lots of items. Plus multiple exceptions are now erroneously being reported as FNF, etc. Ex. I know of at least AccessControlException (I think Colin fixed?) and StandbyException but I think we encountered more. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773250#comment-13773250 ] Arun C Murthy commented on HADOOP-9912: --- IAC, any thoughts on how far we are from coming to a resolution? Everyone agrees with [~jlowe]'s proposal? How quickly can we resolve it? Or, should we revert HADOOP-9418 and move on? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773246#comment-13773246 ] Arun C Murthy commented on HADOOP-9912: --- Excellent points [~jlowe], I here-by withdraw my crazy motion. :) > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773310#comment-13773310 ] Suresh Srinivas commented on HADOOP-9912: - +1 for splitting this into two parts. As high priority, lets fix the globStatus/listStatus as proposed. Newer and cleaner APIs should be done in a separate jira. This is lower priority in my opinion and should be done separately and is not a blocker for 2.X GA. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773271#comment-13773271 ] Colin Patrick McCabe commented on HADOOP-9912: -- +1 for jason's proposal. I'd like to do this as part of HADOOP-9972... would appreciate feedback there > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773027#comment-13773027 ] Jason Lowe commented on HADOOP-9912: bq. Another crazy thought I'd like to throw out - what if we just returned false for isDir if we cannot resolve the symlink rather than throw an exception? This sounds equivalent to the earlier proposal where "bad" symlinks are returned as the raw symlink. isDir() and isFile() both return false for symlinks, and old clients are not aware of isFile() since it was added with symlink support. An old client of listStatus will interpret the link as a file since isDir() is false, but we don't know if that's the proper thing to do since we don't know the client's intent. If a directory walker is concerned about directories and not files at some point in the traverse, it could end up silently skipping a "bad" symlink when it should have failed. i.e.: symlink to directory in remote filesystem but filesystem is temporarily unavailable, symlink to directory in permission-protected tree, symlink intended to point to a directory but typo'd the target when link was created, etc. I'm not sure how common that case really is in practice. Our recent proposal is trying to err on the side of caution so we don't accidentally drop data when we should have failed. It does mean some scenarios for old clients will fail when they should have succeeded despite "bad" symlinks, but it seems better to report a failure that can be corrected (i.e.: fix the "bad" symlink and re-run the app) than to potentially skip desired inputs. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772592#comment-13772592 ] Arun C Murthy commented on HADOOP-9912: --- Another option is to revert HADOOP-9418 in branch-2.1 so it doesn't make it to hadoop-2.2 too. We add everything back when we tie out details, so symlinks is *officially* supported in hadoop-2.3 (with all sorts of ugly caveats). I agree this isn't ideal, but seems like HADOOP-9418 came in too late anyway to branch-2.1. Thoughts? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772589#comment-13772589 ] Arun C Murthy commented on HADOOP-9912: --- If we do agree that following symlinks by default, what does it imply for the beta series. Is it something we can get done quickly? I'm trying to suss the effort involved and what impact it has on hadoop-2.x GA. Thanks. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772588#comment-13772588 ] Arun C Murthy commented on HADOOP-9912: --- bq. At this point we think the best choice for the existing listStatus API is to follow symlinks when it can but throw an exception when it cannot. Makes sense to me. I agree it's unfortunate that "bad" symlinks, but also that this is significantly better than the alternative of silently missing them since apps might follow the hadoop-1.x semantics of isDir. Another crazy thought I'd like to throw out - what if we just returned false for isDir if we cannot resolve the symlink rather than throw an exception? This has the benefit of treating the unresolved symlink as a 'file' which means apps can still move them etc. while not blowing them up. Is it too crazy? :) > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772002#comment-13772002 ] Jason Lowe commented on HADOOP-9912: Thanks [~eli] for the excellent writeup and thanks [~cmccabe] for proposing a design of new APIs. Regarding the issue of what to do with the existing APIs, I had an offline discussion with [~daryn], [~kihwal], and [~nroberts]. At this point we think the best choice for the existing listStatus API is to follow symlinks when it can but throw an exception when it cannot. It's a conservative approach where we're basically saying we have no idea how the caller will react if it sees the raw symlink since it never exposed one in previous versions. Therefore we aren't assuming it will end well if we start exposing symlinks in the results and change the previously-assumed semantics of isDir() on those results. Yes, this does mean that "bad" symlinks (dangling, pointing within a protected directory tree, referencing another filesystem that is currently unavailable) could cause programs to blow up when they encounter them in a directory. However we think that is preferable to the possible alternative of silently missing data because the app misinterpreted the raw symlink and skipped it when it should have caused an error. Ideally the existing globStatus API would use a symlink-aware listStatus API internally so it could avoid erroring-out on symlinks that would be filtered out by the specified pattern/filter. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13769869#comment-13769869 ] Colin Patrick McCabe commented on HADOOP-9912: -- I posted a proposed API based on our WebEx discussion at https://issues.apache.org/jira/browse/HADOOP-9972 > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13762149#comment-13762149 ] Eli Collins commented on HADOOP-9912: - Jason, Daryn, Kihwal, Colin, Andrew and myself discussed last Friday. Here are my notes: There are two types of clients and three behaviors the APIs could reasonably support: # Clients that are symlink-aware and want to see status objects for links, not have them auto-resolved. An example is the shell (it should list the link) or distcp (so it can optionally follow or not follow symlinks). # Clients that are not symlink aware (ie most existing programs). This case is further broken down into: ## Clients that want symlink resolution exceptions exposed. Eg suppose user X moves a directory D and replaces it with a symlink S to that directory, but accidentally changed the permissions to D so that user Y can no longer access D via S. If user Y regularly recursively copies X's parent directory for backup then the copy should now fail, otherwise Y has no indication that is no longer backing up the data it needs to. ## Clients that want symlinks resolution exceptions swallowed. Eg suppose a job uses a /\*/D glob path and there is a symlink /S that is either dangling or points somewhere the client doesn't have permission, should the job start failing because a root-level symlink is introduced that the user can list but not resolve? It seems like some clients would want an option to swallow such resolution failures. This is arguably weaker than the previous example since if you want /*/D you might also have reasonably meant to access whatever /S/D referred to in which case you'd want the job to fail. Also.. - FileSystem and FileContext should be consistent - We need to make a call as to whether symlinks for local FileSystem are: -- Just for exposing symlinks in the underlying local file system -- Supporting HDFS style symlinks (eg URIs that can span file systems) -- I originally introduced them in HADOOP-6421 for to create/expose symlinks in the local file system (and for testing purposes) - The easiest way to fix the Pig breakage in the near term while we figure this out is to revert HADOOP-9987 So the next steps are: - Articulate an API that supports all three usage patterns, it should covers all APIs that return FileStatus objects, not just listStatus. I volunteered to writeup a strawman proposal. - Figure out which behavior should be the default. We need to finish figuring out the compatibility implications of the proposal, all options are incompatible at some level but we should favor the one that breaks compatibility the least for most existing programs (which do not use symlinks). > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761048#comment-13761048 ] Hudson commented on HADOOP-9912: FAILURE: Integrated in Hadoop-Mapreduce-trunk #1541 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1541/]) Revert HADOOP-9877 because of breakage reported in HADOOP-9912 (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520713) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextMainOperationsBaseTest.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShellReturnCode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestHDFSFileContextMainOperations.java > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761033#comment-13761033 ] Hudson commented on HADOOP-9912: SUCCESS: Integrated in Hadoop-Hdfs-trunk #1515 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1515/]) Revert HADOOP-9877 because of breakage reported in HADOOP-9912 (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520713) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextMainOperationsBaseTest.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShellReturnCode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestHDFSFileContextMainOperations.java > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760992#comment-13760992 ] Hudson commented on HADOOP-9912: SUCCESS: Integrated in Hadoop-Yarn-trunk #325 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/325/]) Revert HADOOP-9877 because of breakage reported in HADOOP-9912 (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520713) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextMainOperationsBaseTest.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShellReturnCode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestHDFSFileContextMainOperations.java > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760735#comment-13760735 ] Andrew Wang commented on HADOOP-9912: - Above Hudson spam is because I reverted HADOOP-9877 temporarily while we sort out these globStatus/listStatus semantics issues. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760729#comment-13760729 ] Hudson commented on HADOOP-9912: SUCCESS: Integrated in Hadoop-trunk-Commit #4382 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4382/]) Revert HADOOP-9877 because of breakage reported in HADOOP-9912 (wang: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1520713) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/Globber.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextMainOperationsBaseTest.java * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFsShellReturnCode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestHDFSFileContextMainOperations.java > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760425#comment-13760425 ] Andrew Wang commented on HADOOP-9912: - Hi all, I'm all in favor of a compatibility mode for listStatus and not breaking existing programs, but I'm not sure there actually is a compatible solution. Specifically, there are three cases I'm wondering about: - Symlink loops. If we're auto-resolving, does our directory walker infinite loop? - Dangling symlinks. What happens when we hit one of these? An exception? Prune it from the results? - Symlink to another FileSystem. An HDFS symlink could link to another HDFS, or the local filesystem, or theoretically any implementing filesystem (e.g. S3, Swift). Would you really want to walk across filesystems transparently? Please prove me wrong :) > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760476#comment-13760476 ] Colin Patrick McCabe commented on HADOOP-9912: -- HDFS symlinks support in FileContext has been in many official releases of Hadoop 2. "I doubt they're being used" is not really a good reason to break the existing behavior. That's why we have been trying to keep as close as possible to the FileContext behavior in our port of symlinks to FileSystem. As Andrew mentioned, it is not always possible to resolve symlinks. There can be infinite symlink loops, or dangling symlinks. "Just make them go away, I don't want to see them at all" is not a viable strategy. Symlinks exist and their semantics are different than directories or files. There may be an occasional program that needs a tiny change to be compatible with symlinks. I think this is likely to be extremely rare, since getFileStatus resolves symlinks fully, and symlinks are mostly transparent to the application. If system administrators don't want to change anything, they don't have to-- they can just continue not using symlinks on their clusters. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760496#comment-13760496 ] Jason Lowe commented on HADOOP-9912: bq. There may be an occasional program that needs a tiny change to be compatible with symlinks. I think this is likely to be extremely rare, since getFileStatus resolves symlinks fully, and symlinks are mostly transparent to the application. For symlinks to files, I agree most programs will "just work." However for symlinks to directories, getFileStatus isn't applicable since directory walkers are going to rely on the status returned from listStatus rather than doing another getFileStatus on each of the results from listStatus. That's why Pig/MapReduce break, and I suspect many other walkers would as well. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760486#comment-13760486 ] Jason Lowe commented on HADOOP-9912: bq. I sent the calendar invite out to everyone who filled in the Doodle poll. [...] Let me know if you didn't receive an invite. Thanks for organizing it, Andrew. For some reason I have yet to see the invitation. Could you please try sending it to me again? As for the three cases: bq. Symlink loops. If we're auto-resolving, does our directory walker infinite loop? Yes, the directory walker would infinite loop. This is similar to any simple directory walker on other filesystems. The tradeoff here is all walkers work, unmodified, for the common cases where there isn't a loop, or they break even in the common case and have to update for symlink detection with no guarantee they will bother to do the bookkeeping for loop detection. bq. Dangling symlinks. What happens when we hit one of these? An exception? Prune it from the results? That case is covered in the proposal above. If a symlink cannot be resolved then it would be returned as a symlink in the results. bq. Symlink to another FileSystem. An HDFS symlink could link to another HDFS, or the local filesystem, or theoretically any implementing filesystem (e.g. S3, Swift). Would you really want to walk across filesystems transparently? Yes, it would traverse to the other filesystem, just as it does on other filesystems (e.g.: local filesystems on Linux). Isn't that the whole point of the symlink, otherwise why is it there? I understand there will be classes of tools that will need to be symlink aware and not follow them in certain situations, but I think users would expect a symlink to be followed by most tools when they set it up that way. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760241#comment-13760241 ] Jason Lowe commented on HADOOP-9912: Thanks for the behavior matrix, Colin. I think the issue of compatible/incompatible is about *expectations* of the FileSystem listStatus API. FileSystem hasn't supported symlinks until very recently, and as a result I doubt many, if any, symlinks were being used in HDFS. It required custom Java code to manipulate them and nothing written with FileSystem would work with them. I am under the impression that we want symlinks to "just work" for the majority of existing applications. If that's the case then we need to avoid exposing raw symlinks as results from the existing FileSystem APIs as callers aren't expecting to deal with them. A directory walker is the classic case of this, as it will expect isDir() to tell it when to traverse subdirectories and symlinks to directories breaks that assumption. A proposal to keep the existing FileSystem users working with symlinks in HDFS: - listStatus resolves symlinks when possible. If the symlink cannot be resolved (e.g.: dangling, permission-restricted target path, etc.) it will return the status of the symlink since it cannot stat the symlink target. - A separate API, either an overload of listStatus with an extra flag to control symlink resolution or a separate listLinkStatus, can be used for callers that always want the symlink status and not the status of the symlink target. I would not expect the majority of existing listStatus callers to want to see symlinks and have to resolve them. This is akin to the getFileStatus/getFileLinkStatus pairing. Existing callers of getFileStatus never expected symlinks so that's why it always follows them and a new API was added to examine the symlink itself rather than adding a new status API to always follow the symlink. For me it's all about what callers are expecting FileSystem's listStatus semantics to be. I believe that existing callers are *not* expecting symlinks to be returned since FileSystem never supported them in the past and I doubt they were being used in HDFS in general. Most callers are expecting listStatus to be a readdir and stat, and stat follows symlinks. If listStatus does not resolve symlinks then it breaks existing Pig and MapReduce code, and I believe that's an indication it will break a lot more code out there. The code that breaks can be updated to understand symlinks, but I believe in practice that means symlinks to directories will be fragile for a long time. Each tool that encounters them will have to be updated to check for them and behave accordingly. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760282#comment-13760282 ] Kihwal Lee commented on HADOOP-9912: bq. on HDFS, the behavior of listStatus has always been not resolving symlinks. Ever since Eli added the feature a few years ago. It has never dereferenced. Perhaps a bit of context might help. Yes, that was the design decision made in HDFS-245 and HADOOP-6421, for accessing HDFS through *FileContext*. It was an incompatible change and users were expected to update their code to be symlink-aware as they migrate from FileSystem to FileContext. The migration has been slow and we have finally decided to support symlinks in FileSystem. A number of people worked hard to implement something equivalent to the symlink feature in FileContext. The change was obviously semantically incompatible. Impact of incompatibility in FileSystem now is quite different from doing it back in HADOOP-6421/HDFS-245 to FileContext. So, we cannot simply say this is the right way since it is consistent with what we did years ago in a completely different context. Whatever decision we make about symlinks, we should not overlook the fact that the situation is different now. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch, new-hdfs.txt, new-local.txt, > old-hdfs.txt, old-local.txt > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13760020#comment-13760020 ] Colin Patrick McCabe commented on HADOOP-9912: -- Hi all. There's been some confusion about old and new behavior, so I created a test program so you can see for yourself. Then I ran it on a very old branch-2 derived Hadoop (CDH 4.0.0, in fact), as well as current trunk. The old branch is from June 2012. The summary: * on HDFS, the behavior of listStatus *has always been* not resolving symlinks. Ever since Eli added the feature a few years ago. It has never dereferenced. * on HDFS, {{globStatus}} was previously inconsistent about whether symlinks that were the last path component were dereferenced-- sometimes they were, other times not. Now they are never dereferenced. * on LocalFileSystem, {{getFileLinkStatus}} previously did the exact same thing as {{getFileStatus}} (oops!) This *bug* was fixed, and now calling {{getFileLinkStatus}} on a symlink allows you to identify it as a symlink. * {{LocalFS}} had a bug then, which it apparently stil has, where {{listStatus}} doesn't list dangling symlinks at all. Hopefully we'll be able to fix this bug without people asking for the old broken behavior. * There are some other irregularities in {{LocalFS}}. In general symlink support seems very poor in LocalFS. the test program is up on github at https://github.com/cmccabe/HADOOP-9912_test > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759958#comment-13759958 ] Suresh Srinivas commented on HADOOP-9912: - As I had indicated (on doodle) I cannot make it 2PM meeting. As I said, incompatible behavior in existing java APIs is not acceptable. Consider adding newer APIs, if you think the current behavior can be improved upon. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759949#comment-13759949 ] Andrew Wang commented on HADOOP-9912: - I settled on 2PM since it'll get Eli and Jason in the same call (both seem to have the strongest opinions on this), but I sent the calendar invite out to everyone who filled in the Doodle poll. Minutes will be posted to this JIRA afterwards, and we can always do another call if we really need to. Let me know if you didn't receive an invite. It also would be nice if Jason or Daryn could post a summary with their proposal (as Suresh is requesting), since it'll let us go into the call with a clear objective. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759835#comment-13759835 ] Suresh Srinivas commented on HADOOP-9912: - Given whole bunch of discussion in this thread and other related threads, it requires significant time to catchup. Tt would be great if someone could summarize old behavior and new changed behavior, before tomorrow's meeting (or we can spend the time in meeting tomorrow). If the current behavior is not desirable, lets introduce a new API with the right behavior and leave the old API compatible. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759464#comment-13759464 ] Andrew Wang commented on HADOOP-9912: - Hi everyone, sorry for the late notice, but I just created a Doodle poll to settle on a time. Either before 11 or after 2 seems to work best for myself, Eli, and Colin, so please add your availability as well. I'll email the call-in details either when I get a good number of responses (e.g. Binglin, Daryn, Jason). http://doodle.com/xihfagzb9azfpc5r > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759057#comment-13759057 ] Jason Lowe commented on HADOOP-9912: +1 for a Friday webex. The main issue with listStatus not following symlinks is that all directory walkers written for FileSystem must make the change suggested for Pig above. It's not just a Pig issue, it's an issue for any code written for FileSystem that expects to walk a directory tree. listStatus is not just readdir, it's readdir *and* stat. It's the stat that's messing things up, because it's acting like an lstat when most code is expecting stat. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13758653#comment-13758653 ] Colin Patrick McCabe commented on HADOOP-9912: -- Agree with Binglin that we don't want to follow symlinks in listStatus. In general, I'd like to avoid changing HDFS or LocalFileSystem semantics at all. I also would like to avoid having LocalFileSystem semantics diverge from HDFS. Does a webex on Friday (Sep 6) sound good? If so, I will set up a dial-in number and post it here. My suggestion for Pig would be to simply follow symlinks in Pig whenever you find them. That is, if {{FileStatus#isSymlink}} is true, call {{FileSystem#getFileStatus}} on the path to get the target. That should work on both the new and old Hadoop, and require no other changes. (Unless there's something I'm missing here, which is possible...) > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757455#comment-13757455 ] Binglin Chang commented on HADOOP-9912: --- I grep all the usage in hadoop code base, here are some listStatus usage we need to consider: listStatus is used in FileSystem.getContentSummary use listStatus to traverse dir, I think it should not follow symlink Distcp use listStatus to traverse src dir, should not follow too.. FileUtil.copy copy directory recursively, not sure When consider semantics and compatibility, these should also be considered. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757361#comment-13757361 ] Eli Collins commented on HADOOP-9912: - Webex sounds good to me too. bq. Is it unreasonable to have listStatus resolve symlinks and provide a separate API or flag for symlink-aware clients? IMO listStatus is equivalent to readdir and should therefore not resolve paths (lists each entry as either file/dir/link). If users need an API that list the status' in a directory and resolves each we (or they) can write a helper function that does the same thing but resolves links. This would not be less optimal in terms of performance since links are resolved by the client, and it's not clear if good semantics exist (do you fail if a link fails to resolve? do dangling links stay links and everything else is resolved?) in which case it's good to not have this behavior as part of the core API. If we change FileSystem#listStatus to resolve links then we need to change FileContext#listStatus as well and that has supported but not resolved links for several releases. And does the iterable version of listStatus resolve links by default now too? Clearly FileSystem has more compatibility concerns than FileSystem but I don't see an option where we preserve compatibility. We're balancing compatibility against friendly semantics (would a typical caller expect that they need to pass a flag to listStatus to prevent it from resolving links?) and while I agree we should help the transition by providing an API it's not clear to me it should be the default, and if we do provide a helper that's not the default would it be easier for frameworks like Pig to just update the relevant code to check the FileStatus? They'll need to do this anyway if they have assumptions like HADOOP-6585 and it seems like they might want to do something different for links to directories than links to files in which case one helper might not work for everyone. I agree with Andrew that we don't want to set the symlink bit for a non-symlink (resolved) FileStatus as that would definitely break/confuse some things. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756964#comment-13756964 ] Daryn Sharp commented on HADOOP-9912: - bq. The shell needs to be able to "see" symlinks, not just the files they point to This is why I proposed we return the resolved stat with the symlink bit set. Symlink-aware code, like the shell's ls can call getFileLinkStatus if they need to. Like users, the rest of the shell's commands probably won't care about symlinks either. Adding a flag for {{globStatus}} will fix the specific case observed by pig, but the same issue still applies for any user code that calls {{listStatus}}. We probably need for both to have a flag. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757071#comment-13757071 ] Jason Lowe commented on HADOOP-9912: +1 for a WebEx if we can reach a consensus faster that way. Does anyone mind if we reopen and revert HADOOP-9877 while we determine what to do long term with globStatus/listStatus? Replicated joins for Pig are dead in the water right now while we are hashing this out. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756985#comment-13756985 ] Colin Patrick McCabe commented on HADOOP-9912: -- would it be faster to hash this out on a WebEx? I could organize one on Thursday afternoon or Friday. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757015#comment-13757015 ] Andrew Wang commented on HADOOP-9912: - I don't mind Daryn's proposal of a {{resolveLink}} flag defaulting to true for {{globStatus}} and {{listStatus}}. It's a little gross since (as Eli noted) Hadoop 2 GA APIs should all be symlink-aware by default (and we should be allowed to break compat here), but compatibility is compatibility. However, I don't want the {{isSymlink}} bit being set for a resolved {{FileStatus}}, since isSymlink/isDir/isFile should be exclusive properties. There isn't much you can do with the knowledge a FileStatus was reached through a symlink. Symlink-aware clients will just use {{resolveLink=false}} and walk it properly themselves. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756877#comment-13756877 ] Jason Lowe commented on HADOOP-9912: Thanks for chiming in, Eli. I understand that there are going to be times when we can't avoid exposing symlinks to older clients, but we should try to avoid that when reasonable to do so. I believe the common use-case for symlinks will be links within the same filesystem, and if listStatus proceeds to resolve symlinks that it can resolve then existing directory walkers should work as-is. Using a symlink in HDFS to another directory in the same filesystem should "just work," but that's not going to be the case if listStatus behaves as it does today. Is it unreasonable to have listStatus resolve symlinks and provide a separate API or flag for symlink-aware clients? Understandably listStatus will still have to expose symlinks that cannot be resolved (e.g.: dangling links or links to permission-restricted areas), but that seems preferable to breaking most of the directory walking code built for FileSystem. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756934#comment-13756934 ] Colin Patrick McCabe commented on HADOOP-9912: -- The shell needs to be able to "see" symlinks, not just the files they point to. However, we don't want to break Pig and other programs that are relying on the old behavior. Jason wrote: bq. Is it unreasonable to have listStatus resolve symlinks and provide a separate API or flag for symlink-aware clients? I think you meant to say {{globStatus}}? If so, I agree... let's add a flag to {{globStatus}}, {{resolveLinks}}, which controls whether symlinks are fully resolved. We can default it to {{true}}, for maximum compatibility. Daryn wrote: bq. ... calls in Globber are trapping IOException and returning null. This unexpectedly causes file not found exceptions. This is a separate topic. Check out HADOOP-9929. The solution to this is going to be complicated, since the probably want to throw an exception only when listing a single file rather than a glob. (Imagine if your ls /* threw an exception because you lacked permission for one directory in / out of 1000... not good.) > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756808#comment-13756808 ] Eli Collins commented on HADOOP-9912: - bq. While I agree that from a purity perspective that returning the link status is arguably correct, in practice it's likely to break a lot of code if they try to use symlinks which will impede the use of symlinks. It's not just a purity perspective. While we've attempted to make symlinks mostly transparent to users they are not going to be completely transparent. For example, in HADOOP-6585 we added isFile because some clients assume !isDir means the file status is a file, which was a valid assumption that's now broken. So while we tried to avoid this, using symlinks does introduce some incompatibilities that other frameworks and users need to be aware of that we are just not going to be able to hack around. The challenge here is that for some users auto-resolving is not the right behavior, and the clients can't easily undo it (you might have hopped file systems). In which case you want to not resolve and modify clients to be symlink aware. The challenge here of course is keeping new programs working on older systems, which is the idea behind backporting symlinks to FileSystem - all v2 GA APIs should support symlinks. listStatus is going to be inconsistent across HDFS and local file system because the local file system doesn't really implement symlinks (just passes through to the underlying file system). > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756725#comment-13756725 ] Daryn Sharp commented on HADOOP-9912: - This {{Globber}} re-write is problematic, and the optimization we are talking about is very broken regardless of whether we call file or link status. Both of these calls in {{Globber}} are trapping {{IOException}} and returning null. This unexpectedly causes file not found exceptions. Let's say I have "/dir/noperms/file". If I do a ls on /dir/noperms, I get a permission denied. If I do ls on /dir/noperms/file I get no such file or directory. If I do a ls on a standby, the {{StandbyException}} is trapped, and I get no such file or directory. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754807#comment-13754807 ] Binglin Chang commented on HADOOP-9912: --- bq. If the NN resolves links it allows the user to unexpectedly access stuff outside the mount point. Your comments reminds me another potential issue about permission checking. How to handle permission denied in listStatus if some(not all) of its entry is permission denied? getFileStatus doesn't have this issue because it is either success or failed. I did a check and found out, it just simply skip the symlink, I think it is not the right. {code} decster:~/hadoop> ll test/ total 8 drwxr-xr-x 2 decster staff 68 Aug 30 23:29 aa drwx-- 3 root staff 102 Aug 30 23:30 bb lrwxr-xr-x 1 decster staff5 Aug 30 23:32 cc -> bb/cc decster:~/hadoop> ll test/cc lrwxr-xr-x 1 decster staff 5 Aug 30 23:32 test/cc -> bb/cc decster:~/hadoop> ll test/cc/bb ls: test/cc/bb: Permission denied decster:~/hadoop> bin/hadoop fs -ls file:///Users/decster/hadoop/test/ 13/08/30 23:38:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 2 items drwxr-xr-x - decster staff 68 2013-08-30 23:29 file:///Users/decster/hadoop/test/aa drwx-- - rootstaff102 2013-08-30 23:30 file:///Users/decster/hadoop/test/bb {code} > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754889#comment-13754889 ] Binglin Chang commented on HADOOP-9912: --- bq. Jason makes an excellent point about how the crux of the problem is listStatus combines both readdir + stat In this case, we really should follow linux/bsd practice(which can prevent most problematic behavior), implement a readdir primitive in FS, make listStatus none primitive and implement listStatus on top of it, currently there is no readdir primitive and listStatus is considered and used as readdir cause it is the only option. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754884#comment-13754884 ] Daryn Sharp commented on HADOOP-9912: - To further clarify, the resolved link status would still have the isLink bit set in additional to the bit for whatever it resolved to. This would allow FsShell's ls to check isLink, and issue getFileLinkStatus if necessary. It's a smaller evil than inconveniencing all user code. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754874#comment-13754874 ] Daryn Sharp commented on HADOOP-9912: - Yes, it's becoming clear this is very complicated issue... While I agree that from a purity perspective that returning the link status is arguably correct, in practice it's likely to break a lot of code if they try to use symlinks which will impede the use of symlinks. The use case where user code cares if a path is symlink is probably nearly non-existent. {{FsShell}} ls cares, but I question if it's reasonable to require additional overhead for the vast majority of use cases. Ie. everyone that wants to do file/dir tests on directory contents will have to check {{isLink}}, issue another stat, and then re-test. Jason and I spoke offline, and we have another proposal. Return the resolved status if possible, else return the link status. This makes sense for a dangling symlink because it's nothing but a symlink. For the permission denied scenario, it's a bit more ambiguous but if the link target has path components after a no-permission path component because you again don't know what it is. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754716#comment-13754716 ] Daryn Sharp commented on HADOOP-9912: - There is of course the related problem that having the NN resolve any links is wrong. Viewfs is essentially a chrooted mount. If the NN resolves links it allows the user to unexpectedly access stuff outside the mount point. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754713#comment-13754713 ] Daryn Sharp commented on HADOOP-9912: - Jason makes an excellent point about how the crux of the problem is {{listStatus}} combines both {{readdir}} + {{stat}}. There may be another alternative to a new API call. I haven't thought it through, but perhaps a middle-ground to address compatibility and to better handle symlinks is to return the resolved link's file status but to set the {{isLink}} bit in the status. This allows the extremely few use cases that care about whether something is a symlink to call {{getFileLinkStatus}}. It's an extra RPC, but probably 99% of the time the user doesn't care that something is a symlink. The only "common" case is probably {{FSShell}} ls. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754695#comment-13754695 ] Jason Lowe commented on HADOOP-9912: bq. If you look at readdir as an example, it does not automatically dereference by default. Neither does ls, unless you use the -L flag on Linux. I think that's the expected default behavior, showing the actual contents of the directory. It's possible to build a directory walking program via the current listStatus, it just requires dereferencing any links to see if the target is a directory. This appears to be what ls -R does. Thanks for the rational, Andrew. However I don't believe {{ls}} is a good example. {{ls -l}} is symlink-aware and therefore expecting to find them. If you strace it, you'll notice it's using {{getdents}}, {{lstat}}, and {{readlink}}. We can't really look to POSIX for an equivalent, since listStatus is a combination of readdir *and* stat. The equivalent directory walker for POSIX calls readdir and then stat on each dir entry (not lstat, since it's not symlink-aware or wants to follow symlinks) to determine if each entry is another directory (because for POSIX, the type of directory entry is not included with the dirent). If listStatus is a combination of readdir and lstat then it breaks existing code that is not symlink-aware and expects isDir/isDirectory to return true for directories and isFile() to return true for files. Lots of code has been written for FileSystem, and since FileSystem did not support symlinks until very recently, all of that code is not symlink-aware. To make listStatus expose symlinks to those callers is going to be problematic, just as it is for Pig here. That's why there are symlink-aware forms of stat calls so that code that desires to be aware of symlinks can detect them, and older code or code that just wants to follow them calls the original forms. The proposed fix handles the issue for Pig with a local filesystem, but someone who uses Pig against an input directory that happens to be a symlink in HDFS is going to have the same issue. My apologies if I'm missing something, but the more I think about it, the more I'm convinced that listStatus returning symlinks is not correct. It's going to break existing code since almost all of that code is not expecting symlinks. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754540#comment-13754540 ] Binglin Chang commented on HADOOP-9912: --- Another reason for adding new API is that listLinkStatus(both for RLFS and HDFS) is really useful in many cases, like Andrew said, readdir API(linux/bsd/mac) follows the same rule, FSShell ls can list symlinks(HDFS-4019) like system ls cmd. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754436#comment-13754436 ] Binglin Chang commented on HADOOP-9912: --- Thanks Andrew for the explanation and proposal. I can give some additional inputs, hope that helps: bq. why did the behavior of globStatus and symlinks change with HADOOP-9877 which appears to be a snapshot-related JIRA. The reason is listStatus can't get hidden directory, so getFileStatus/getFileLinkStatus needs to be added to get hidden directory status. Before this change only listStatus is used in globStatus, so there were not any consistency issue within same FS, but after adding getFileStatus/getFileLinkStatus, problem occurs. Basically, the existence of hidden directory require usage of getFileStatus/getFileLinkStatus, the requirement of wildcards require usage of listStatus, but: In RLFS, listStatus resolve symlink In HDFS, listStatus doesn't resolve symlink If we only use one API, we don't have inconsistency issue within one FS, but have inconsistency issue across FS, so before HADOOP-9877 problem exists but not so serious, after HADOOP-9877, I break consistency within RLFS in order to remain consistency in HDFS... Sorry for not realizing this problem earlier. @[~andrew.wang]: About the proposal, I think it is better to leave listStatus compatible(both HDFS and RLFS, for cross FS symlink), and add new listLinkStatus API, I guess symlink support(both RLFS and HDFS) does not have wild adoption currently, add new feature in new API makes sense. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754180#comment-13754180 ] Andrew Wang commented on HADOOP-9912: - Daryn, Jason, thanks for the input: bq. I need to look at this further but if DFS.listStatus isn't resolving, then we've got to think hard about the semantics of symlinks. 99% of the time, the user expects a symlink to be transparent. bq. Aren't there separate calls if one wants to know the true details of a link rather than what the link references? If you look at {{readdir}} as an example, it does not automatically dereference by default. Neither does {{ls}}, unless you use the {{-L}} flag on Linux. I think that's the expected default behavior, showing the actual contents of the directory. It's possible to build a directory walking program via the current {{listStatus}}, it just requires dereferencing any links to see if the target is a directory. This appears to be what {{ls -R}} does. I think my proposal to fix RLFS still makes sense (let RLFS be inconsistent and compatible), and then we can think about adding a {{ls -L}} style convenience flag or a new call for auto-deref of listing and glob results. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754139#comment-13754139 ] Jason Lowe commented on HADOOP-9912: bq. In HDFS, listStatus only transparently resolves symlinks in the input path. It doesn't resolve the results of the listing, and this is the correct behavior. Isn't that going to break clients who are not symlink-aware? That means we can't have a tree of files with a symlink to a directory in it. A symlink-unaware tree walker client will not realize that the symlink is actually pointing to a directory and should be traversed since the file status will say it's not a directory. That's what's happening with Pig now. Aren't there separate calls if one wants to know the true details of a link rather than what the link references? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754132#comment-13754132 ] Daryn Sharp commented on HADOOP-9912: - I need to look at this further but if {{DFS.listStatus}} isn't resolving, then we've got to think hard about the semantics of symlinks. 99% of the time, the user expects a symlink to be transparent. Lots of code uses {{listStatus}} or {{globStatus}} and expects to perform file/dir checks. Now that code will be required to check if the path is a symlink, if yes, re-stat. This will greatly inhibit the use of symlinks which is why I think a new api is required. Either way we go, we can't have the inconsistency I cited for how globbing is now returning different results based on whether the symlink was matched by a static or globbed path component. It must always be a resolved status or an unresolved status. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754115#comment-13754115 ] Andrew Wang commented on HADOOP-9912: - Let's be constructive and figure out the right fix. Jason, thanks for the attached test case, that helped me understand the issue. bq. listStatus resolves symlinks. globStatus is supposed to be equivalent to listStatus with wildcard support...Symlinks should be transparent to users unless they specifically want to know if a path is a symlink. In HDFS, {{listStatus}} only transparently resolves symlinks in the input path. It doesn't resolve the results of the listing, and this is the correct behavior. {{globStatus}} behaves the same way, in that it returns FileStatuses for Paths that match the glob, and it doesn't resolve these results. You can (and should) see symlinks returned by listStatus and globStatus in HDFS. I also wouldn't say {{globStatus}} is equivalent to {{listStatus}}, since it doesn't list directories. If you want listStatus with matching, you can use {{listStatus(Path, PathFilter)}}. In RLFS there is automatic symlink resolution, so {{listStatus}} results are resolved, and it seems like Pig depends on this behavior. Because of HADOOP-9877), {{globStatus}} went from always calling {{listStatus}} to calling {{getFileLinkStatus}} for non-wildcard glob components. Thus, when passed a {{Path}} that's a symlink, {{globStatus}} says it's a symlink. bq. Why does .snapshot support require a getFileLinkStatus? Does getFileStatus not work for a .snapshot directory? It does work, but it's incorrect. globStatus is not supposed to return resolved statuses. It's unfortunate that RLFS has been auto-resolving all this time, but since apps apparently depend on it, all we can do is embrace it. How about this: we add a fixup step that, for symlink results on a LocalFileSystem, resolves them (but still keeping the link path). This means no more symlinks in RLFS {{globStatus}} results. It's a bit obnoxious to do (globStatus could symlink through HDFS to a link on a local filesystem), but it seems like a reasonable solution. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754053#comment-13754053 ] Jason Lowe commented on HADOOP-9912: bq. This issue is not related to .snapshot support, this issue is caused by add symlink support to HDFS and LocalFileSystem but not handle consistency well. If this has nothing to do with snapshot support, then why did the behavior of globStatus and symlinks change with HADOOP-9877 which appears to be a snapshot-related JIRA? listStatus needs to follow symlinks, even in the HDFS case, otherwise symlinks are not very useful. If symlinks never auto-resolve, then every client will have to be symlink-aware and manually resolve the link for the symlink feature to be useful in practice. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753707#comment-13753707 ] Binglin Chang commented on HADOOP-9912: --- Just checked again: In LocalFileSystem listStatus resolves symlinks. In HDFS listStatus does not resolve symlinks. I did find this conflict when I was doing HADOOP-9877, and followed HDFS convention and uses getFileLinkStatus. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753679#comment-13753679 ] Binglin Chang commented on HADOOP-9912: --- This issue is not related to .snapshot support, this issue is caused by add symlink support to HDFS and LocalFileSystem but not handle consistency well. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753666#comment-13753666 ] Binglin Chang commented on HADOOP-9912: --- @Daryn I am confused, I originally use getFileStatus, later changed, please see [this comment|https://issues.apache.org/jira/browse/HADOOP-9877?focusedCommentId=13741497&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13741497] > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753620#comment-13753620 ] Daryn Sharp commented on HADOOP-9912: - bq. The intended behavior of Globber.glob (which calls listStatus) is to return symlink rather than symlink target I believe bq. I guess for a long time, pig is using this behavior(listStatus return symlink target rather than symlink), I am afraid this behavior is wrong and is inconsistent with HDFS. Wrong. Wrong. Wrong. {{listStatus}} resolves symlinks. {{globStatus}} is supposed to be equivalent to {{listStatus}} with wildcard support. All existing code depends on these semantics, and rightly so. Symlinks should be transparent to users unless they specifically want to know if a path is a symlink. That's why there is a counterpart to {{getFileStatus}} called {{getFileLinkStatus}} which does not resolve symlinks. HADOOP-9877 fundamentally broke the semantics of {{globStatus}} based on whether the last path component is a glob or static. The result is: * /path/symlink - the static component "symlink" results in a file status of the symlink, breaking isFile/isDir/etc * /path/sym*link - the glob component "symlink" returns the file status of the resolved link, working as expected {{globStatus}} _must_ consistently return resolved paths. The semantics altered by HADOOP-9877 will break lots of code. I'm pretty sure that includes {{FsShell}}. We cannot break lot standing semantics just for snapshots. Why does .snapshot support require a {{getFileLinkStatus}}? Does {{getFileStatus}} not work for a .snapshot directory? > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753223#comment-13753223 ] Binglin Chang commented on HADOOP-9912: --- After analyze the code, here is what happens: 1. The intended behavior of Globber.glob (which calls listStatus) is to return symlink rather than symlink target I believe. glob and listStatus for HDFS follows this rule, so glob and listStatus for RawLocalFileSystem should follow this rule as well, but because java lacks symlink support, listStatus for RawLocalFileSystem will list symlink targets (o.a.h.fs.Stat only fix getFileStatus & getFileLinkStatus, not listStatus). 2. I guess for a long time, pig is using this behavior(listStatus return symlink target rather than symlink), I am afraid this behavior is wrong and is inconsistent with HDFS. I guess we can only choose: remain old behavior(then listStatus behavior is inconsistent across FileSystem), or adopt new behavior, or perhaps and new interface: listLinkStatus vs listStatus... 3. About test success on mac, it is because o.a.h.fs.Stat currently don't support Mac, and old implementation doesn't support symlink very well. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753083#comment-13753083 ] Binglin Chang commented on HADOOP-9912: --- Wired, the test passed on my laptop(on macox).. will look more into it. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > Attachments: HADOOP-9912-testcase.patch > > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9912) globStatus of a symlink to a directory does not report symlink as a directory
[ https://issues.apache.org/jira/browse/HADOOP-9912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752848#comment-13752848 ] Rohini Palaniswamy commented on HADOOP-9912: Replicated joins in pig is broken with this. We are doing FileSystem.listStatus() calls in pig and use FileInputFormat as well which does listStatus. > globStatus of a symlink to a directory does not report symlink as a directory > - > > Key: HADOOP-9912 > URL: https://issues.apache.org/jira/browse/HADOOP-9912 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.3.0 >Reporter: Jason Lowe >Priority: Blocker > > globStatus for a path that is a symlink to a directory used to report the > resulting FileStatus as a directory but recently this has changed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira