[jira] [Created] (HADOOP-10013) FileSystem checkPath accepts invalid paths with an authority but no scheme

2013-10-03 Thread Daryn Sharp (JIRA)
Daryn Sharp created HADOOP-10013:


 Summary: FileSystem checkPath accepts invalid paths with an 
authority but no scheme
 Key: HADOOP-10013
 URL: https://issues.apache.org/jira/browse/HADOOP-10013
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp


{{FileSystem#checkPath}} will consider paths of the form //junk/path as being 
valid for the given fs.  The problem is {{checkPath}} shorts out if the path 
contains no scheme - assuming it must be a relative or absolute path for the 
given fs - whereas the condition should be no scheme _and_ no authority.

This causes {{DistributedFileSystem#getPathName}} to convert //junk/path into 
/path, which silently hides the use of invalid paths.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10014) Symlink resolution is fundamentally broken for multi-layered filesystems

2013-10-03 Thread Daryn Sharp (JIRA)
Daryn Sharp created HADOOP-10014:


 Summary: Symlink resolution is fundamentally broken for 
multi-layered filesystems
 Key: HADOOP-10014
 URL: https://issues.apache.org/jira/browse/HADOOP-10014
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha, 3.0.0
Reporter: Daryn Sharp
Priority: Critical


Symlink resolution is performed on a per-filesystem basis.  In a multi-layered 
filesystem, the symlinks need to be resolved relative to the highest level 
filesystem in the stack.  Otherwise, fs implementations like viewfs and chroot 
fs behave incorrectly.  Absolute symlinks may violate the base of the chroot.  
Links that should have crossed viewfs mount points are again incorrectly 
resolved relative to the base filesystem.

Symlink resolution has occur above the level of any individual fs to allow a 
multi-layered fs stack to work correctly, such as via a symlink-aware 
{{FilteredFileSystem}} that wraps any arbitrary fs to ensure links are resolved 
from the top-down of the stack.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10015) UserGroupInformation prints out excessive ERROR warnings

2013-10-03 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10015:
---

 Summary: UserGroupInformation prints out excessive ERROR warnings
 Key: HADOOP-10015
 URL: https://issues.apache.org/jira/browse/HADOOP-10015
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


In UserGroupInformation::doAs(), it prints out a log at ERROR level whenever it 
catches an exception.

However, in some cases IOExceptions are expected -- For example, the upp  
(e.g., calling exists() in filesystem). As a result, these error logs are 
benign but they are printed at the ERROR level, which confuses the operators.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: symlink support in Hadoop 2 GA

2013-10-03 Thread sanjay Radia
There are a number of issues (some minor, some more than minor).
GA is close and we are are still in discussion on the some of them; while I 
believe we will close on these very very shortly, code change like this so 
close to GA is dangerous.

I suggest we do the following:
1) Disable Symlinks  in 2.2 GA- throw unsupported exception on createSymlink in 
both FileSystem and FileContext.
2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming after GA:
a) Deprecate isDir()
b) Add a new API that returns an enum (see FileContext).
3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA
   a)  change the stack to use the new API replacing isDir(). 
   b) fix isDIr() to do something smarter (we can detail this later but there 
is a solution that has been discussed). This helps customer applications that 
call isDir(). 
  c) Remove isDir in a future release when customers have had sufficient time 
to migrate.

sanjay

PS. J Rottinghuis expressed a similar sentiment in a previous email in this 
thread:



On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:

 I like symlink functionality, but in our migration to Hadoop 2.x this is a
 total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
 a) Not uprev until symlink support is figured out up and down the stack,
 and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
 (equivalent). Or
 b) rip out the API altogether. Or
 c) change the implementation to throw an UnsupportedOperationException
 I'm not sure yet which of these I like least.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: symlink support in Hadoop 2 GA

2013-10-03 Thread Daryn Sharp
I reluctantly agree that we should disable symlinks in 2.2 until we can sort 
out the compatibility issues.  I'm reluctant in the sense that its a feature 
users have long wanted, and it's something we'd like to use from an 
administrative view.  However I don't see all the issues being shorted out in 
the very near future.

I filed some jiras today that have led me to believe that the current 
implementation of fs symlinks is irreparably flawed.  Adding optional 
primitives to filesystems to make them symlink capable is ok.  However, adding 
symlink resolution to individual filesystems is fundamentally broken.  It 
doesn't work for stacked filesystems (viewfs, chroots, filters, etc) because 
the resolution must occur at the highest level, not within an individual 
filesystem itself.  Otherwise the abstraction of the top-level filesystem is 
violated and all kinds of unexpected behavior like walking out of chroots 
becomes possible.

Daryn

On Oct 3, 2013, at 1:39 PM, sanjay Radia wrote:

 There are a number of issues (some minor, some more than minor).
 GA is close and we are are still in discussion on the some of them; while I 
 believe we will close on these very very shortly, code change like this so 
 close to GA is dangerous.
 
 I suggest we do the following:
 1) Disable Symlinks  in 2.2 GA- throw unsupported exception on createSymlink 
 in both FileSystem and FileContext.
 2) Deal with the  isDir() in 2.2GA in preparation for item 3 coming after GA:
   a) Deprecate isDir()
b) Add a new API that returns an enum (see FileContext).
 3) Fix Symlinks, in a future release, hopefully the very next one after 2.2GA
   a)  change the stack to use the new API replacing isDir(). 
   b) fix isDIr() to do something smarter (we can detail this later but there 
 is a solution that has been discussed). This helps customer applications that 
 call isDir(). 
  c) Remove isDir in a future release when customers have had sufficient time 
 to migrate.
 
 sanjay
 
 PS. J Rottinghuis expressed a similar sentiment in a previous email in this 
 thread:
 
 
 
 On Sep 18, 2013, at 5:11 PM, J. Rottinghuis wrote:
 
 I like symlink functionality, but in our migration to Hadoop 2.x this is a
 total distraction. If the APIs stay in 2.2 GA we'll have to choose to:
 a) Not uprev until symlink support is figured out up and down the stack,
 and we've been able to migrate all our 1.x (equivalent) clusters to 2.x
 (equivalent). Or
 b) rip out the API altogether. Or
 c) change the implementation to throw an UnsupportedOperationException
 I'm not sure yet which of these I like least.
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.



[jira] [Created] (HADOOP-10016) Distcp should support copy from secure to insecure cluster for migration

2013-10-03 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10016:
---

 Summary: Distcp should support copy from secure to insecure 
cluster for migration
 Key: HADOOP-10016
 URL: https://issues.apache.org/jira/browse/HADOOP-10016
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


Distcp should be able to copy from a secure cluster to an insecure cluster. 
This functionality is important for operators to migrate data to a new Hadoop 
installation.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10017) Fix NPE in DFSClient#getDelegationToken when doing Distcp from a secured cluster to an insecured cluster

2013-10-03 Thread Jing Zhao (JIRA)
Jing Zhao created HADOOP-10017:
--

 Summary: Fix NPE in DFSClient#getDelegationToken when doing Distcp 
from a secured cluster to an insecured cluster
 Key: HADOOP-10017
 URL: https://issues.apache.org/jira/browse/HADOOP-10017
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.1-beta, 3.0.0
Reporter: Jing Zhao
Assignee: Haohui Mai


Currently if we run Distcp from a secured cluster and copy data to an insecured 
cluster, DFSClient#getDelegationToken will throw NPE when processing the NULL 
token returned by the NN in the insecured cluster. We should be able to handle 
the NULL token here. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10018) TestUserGroupInformation throws NPE when HADOOP_HOME is not set

2013-10-03 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10018:
---

 Summary: TestUserGroupInformation throws NPE when HADOOP_HOME is 
not set
 Key: HADOOP-10018
 URL: https://issues.apache.org/jira/browse/HADOOP-10018
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Minor


TestUserGroupInformation throws NPE in System.setProperty() when HADOOP_HOME is 
not set.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10020) disable symlinks temporarily

2013-10-03 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HADOOP-10020:
-

 Summary: disable symlinks temporarily
 Key: HADOOP-10020
 URL: https://issues.apache.org/jira/browse/HADOOP-10020
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 2.1.2-beta
Reporter: Colin Patrick McCabe


disable symlinks temporarily until we can make them production-ready in Hadoop 
2.3



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10021) distCp support for symlinks

2013-10-03 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HADOOP-10021:
-

 Summary: distCp support for symlinks
 Key: HADOOP-10021
 URL: https://issues.apache.org/jira/browse/HADOOP-10021
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 2.3.0
Reporter: Colin Patrick McCabe


Add support for symlinks to distCp.  We probably want something like rsync, 
where you can choose to copy symlinks as links, or copy what they refer to.



--
This message was sent by Atlassian JIRA
(v6.1#6144)