[jira] [Created] (HADOOP-10013) FileSystem checkPath accepts invalid paths with an authority but no scheme
Daryn Sharp created HADOOP-10013: Summary: FileSystem checkPath accepts invalid paths with an authority but no scheme Key: HADOOP-10013 URL: https://issues.apache.org/jira/browse/HADOOP-10013 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp {{FileSystem#checkPath}} will consider paths of the form //junk/path as being valid for the given fs. The problem is {{checkPath}} shorts out if the path contains no scheme - assuming it must be a relative or absolute path for the given fs - whereas the condition should be no scheme _and_ no authority. This causes {{DistributedFileSystem#getPathName}} to convert //junk/path into /path, which silently hides the use of invalid paths. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10014) Symlink resolution is fundamentally broken for multi-layered filesystems
Daryn Sharp created HADOOP-10014: Summary: Symlink resolution is fundamentally broken for multi-layered filesystems Key: HADOOP-10014 URL: https://issues.apache.org/jira/browse/HADOOP-10014 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Priority: Critical Symlink resolution is performed on a per-filesystem basis. In a multi-layered filesystem, the symlinks need to be resolved relative to the highest level filesystem in the stack. Otherwise, fs implementations like viewfs and chroot fs behave incorrectly. Absolute symlinks may violate the base of the chroot. Links that should have crossed viewfs mount points are again incorrectly resolved relative to the base filesystem. Symlink resolution has occur above the level of any individual fs to allow a multi-layered fs stack to work correctly, such as via a symlink-aware {{FilteredFileSystem}} that wraps any arbitrary fs to ensure links are resolved from the top-down of the stack. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10015) UserGroupInformation prints out excessive ERROR warnings
Haohui Mai created HADOOP-10015: --- Summary: UserGroupInformation prints out excessive ERROR warnings Key: HADOOP-10015 URL: https://issues.apache.org/jira/browse/HADOOP-10015 Project: Hadoop Common Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai In UserGroupInformation::doAs(), it prints out a log at ERROR level whenever it catches an exception. However, in some cases IOExceptions are expected -- For example, the upp (e.g., calling exists() in filesystem). As a result, these error logs are benign but they are printed at the ERROR level, which confuses the operators. -- This message was sent by Atlassian JIRA (v6.1#6144)
Re: symlink support in Hadoop 2 GA
There are a number of issues (some minor, some more than minor). GA is close and we are are still in discussion on the some of them; while I believe we will close on these very very shortly, code change like this so close to GA is dangerous. I suggest we do the following: 1) Disable Symlinks in 2.2 GA- throw unsupported exception on createSymlink in both FileSystem and FileContext. 2) Deal with the isDir() in 2.2GA in preparation for item 3 coming after GA: a) Deprecate isDir() b) Add a new API that returns an enum (see FileContext). 3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA a) change the stack to use the new API replacing isDir(). b) fix isDIr() to do something smarter (we can detail this later but there is a solution that has been discussed). This helps customer applications that call isDir(). c) Remove isDir in a future release when customers have had sufficient time to migrate. sanjay PS. J Rottinghuis expressed a similar sentiment in a previous email in this thread: On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote: I like symlink functionality, but in our migration to Hadoop 2.x this is a total distraction. If the APIs stay in 2.2 GA we'll have to choose to: a) Not uprev until symlink support is figured out up and down the stack, and we've been able to migrate all our 1.x (equivalent) clusters to 2.x (equivalent). Or b) rip out the API altogether. Or c) change the implementation to throw an UnsupportedOperationException I'm not sure yet which of these I like least. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: symlink support in Hadoop 2 GA
I reluctantly agree that we should disable symlinks in 2.2 until we can sort out the compatibility issues. I'm reluctant in the sense that its a feature users have long wanted, and it's something we'd like to use from an administrative view. However I don't see all the issues being shorted out in the very near future. I filed some jiras today that have led me to believe that the current implementation of fs symlinks is irreparably flawed. Adding optional primitives to filesystems to make them symlink capable is ok. However, adding symlink resolution to individual filesystems is fundamentally broken. It doesn't work for stacked filesystems (viewfs, chroots, filters, etc) because the resolution must occur at the highest level, not within an individual filesystem itself. Otherwise the abstraction of the top-level filesystem is violated and all kinds of unexpected behavior like walking out of chroots becomes possible. Daryn On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote: There are a number of issues (some minor, some more than minor). GA is close and we are are still in discussion on the some of them; while I believe we will close on these very very shortly, code change like this so close to GA is dangerous. I suggest we do the following: 1) Disable Symlinks in 2.2 GA- throw unsupported exception on createSymlink in both FileSystem and FileContext. 2) Deal with the isDir() in 2.2GA in preparation for item 3 coming after GA: a) Deprecate isDir() b) Add a new API that returns an enum (see FileContext). 3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA a) change the stack to use the new API replacing isDir(). b) fix isDIr() to do something smarter (we can detail this later but there is a solution that has been discussed). This helps customer applications that call isDir(). c) Remove isDir in a future release when customers have had sufficient time to migrate. sanjay PS. J Rottinghuis expressed a similar sentiment in a previous email in this thread: On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote: I like symlink functionality, but in our migration to Hadoop 2.x this is a total distraction. If the APIs stay in 2.2 GA we'll have to choose to: a) Not uprev until symlink support is figured out up and down the stack, and we've been able to migrate all our 1.x (equivalent) clusters to 2.x (equivalent). Or b) rip out the API altogether. Or c) change the implementation to throw an UnsupportedOperationException I'm not sure yet which of these I like least. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HADOOP-10016) Distcp should support copy from secure to insecure cluster for migration
Haohui Mai created HADOOP-10016: --- Summary: Distcp should support copy from secure to insecure cluster for migration Key: HADOOP-10016 URL: https://issues.apache.org/jira/browse/HADOOP-10016 Project: Hadoop Common Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Distcp should be able to copy from a secure cluster to an insecure cluster. This functionality is important for operators to migrate data to a new Hadoop installation. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10017) Fix NPE in DFSClient#getDelegationToken when doing Distcp from a secured cluster to an insecured cluster
Jing Zhao created HADOOP-10017: -- Summary: Fix NPE in DFSClient#getDelegationToken when doing Distcp from a secured cluster to an insecured cluster Key: HADOOP-10017 URL: https://issues.apache.org/jira/browse/HADOOP-10017 Project: Hadoop Common Issue Type: Bug Affects Versions: 2.1.1-beta, 3.0.0 Reporter: Jing Zhao Assignee: Haohui Mai Currently if we run Distcp from a secured cluster and copy data to an insecured cluster, DFSClient#getDelegationToken will throw NPE when processing the NULL token returned by the NN in the insecured cluster. We should be able to handle the NULL token here. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10018) TestUserGroupInformation throws NPE when HADOOP_HOME is not set
Haohui Mai created HADOOP-10018: --- Summary: TestUserGroupInformation throws NPE when HADOOP_HOME is not set Key: HADOOP-10018 URL: https://issues.apache.org/jira/browse/HADOOP-10018 Project: Hadoop Common Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Priority: Minor TestUserGroupInformation throws NPE in System.setProperty() when HADOOP_HOME is not set. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10020) disable symlinks temporarily
Colin Patrick McCabe created HADOOP-10020: - Summary: disable symlinks temporarily Key: HADOOP-10020 URL: https://issues.apache.org/jira/browse/HADOOP-10020 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 2.1.2-beta Reporter: Colin Patrick McCabe disable symlinks temporarily until we can make them production-ready in Hadoop 2.3 -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HADOOP-10021) distCp support for symlinks
Colin Patrick McCabe created HADOOP-10021: - Summary: distCp support for symlinks Key: HADOOP-10021 URL: https://issues.apache.org/jira/browse/HADOOP-10021 Project: Hadoop Common Issue Type: Sub-task Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Add support for symlinks to distCp. We probably want something like rsync, where you can choose to copy symlinks as links, or copy what they refer to. -- This message was sent by Atlassian JIRA (v6.1#6144)