[VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Arun C Murthy
Folks,

I've created a release candidate (rc0) for hadoop-2.1.1-beta that I would like 
to get released - this release fixes a number of bugs on top of 
hadoop-2.1.0-beta as a result of significant amounts of testing.

If things go well, this might be the last of the *beta* releases of hadoop-2.x.

The RC is available at: http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
The RC tag in svn is here: 
http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0

The maven artifacts are available via repository.apache.org.

Please try the release and vote; the vote will run for the usual 7 days.

thanks,
Arun


--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Alejandro Abdelnur
Thanks Arun.

+1

* Downloaded source tarball.
* Verified MD5
* Verified signature
* run apache-rat:check ok after minor tweak (see NIT1 below)
* checked CHANGES.txt headers (see NIT2 below)
* built DIST from source
* verified hadoop version of Hadoop JARs
* configured pseudo cluster
* tested HttpFS
* run a few MR examples
* run a few unmanaged AM app examples

The following NITs should be addressed if there is a new RC or in the next
release

--
NIT1, empty files that make apache-rat:check to fail, these files should be
removed:

*
/Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextSymlinkBaseTest.java

*
/Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFSFileContextSymlink.java

*
/Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSymlink.java
--
NIT2, common/hdfs/mapreduce/yarn CHANGES.txt have 2.2.0 header, they should
not
--



On Tue, Sep 17, 2013 at 8:38 AM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

 I've created a release candidate (rc0) for hadoop-2.1.1-beta that I would
 like to get released - this release fixes a number of bugs on top of
 hadoop-2.1.0-beta as a result of significant amounts of testing.

 If things go well, this might be the last of the *beta* releases of
 hadoop-2.x.

 The RC is available at:
 http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
 The RC tag in svn is here:
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Alejandro


[jira] [Created] (HADOOP-9971) Pack hadoop compress native libs and upload it to maven for other projects to depend on

2013-09-17 Thread Liu Shaohui (JIRA)
Liu Shaohui created HADOOP-9971:
---

 Summary: Pack hadoop compress native libs and upload it to maven 
for other projects to depend on
 Key: HADOOP-9971
 URL: https://issues.apache.org/jira/browse/HADOOP-9971
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Liu Shaohui
Priority: Minor
 Attachments: HADOOP-9971-trunk-v1.diff

Currently, if other projects like hbase want to using hadoop common native lib, 
they must copy the native libs to their distribution, which is not agile. From 
the idea of hadoop-snappy(http://code.google.com/p/hadoop-snappy), we can pack 
the hadoop common native lib and upload it to maven repository for other 
projects to depend on.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Andrew Wang
Hey all,

Sorry to hijack the vote thread, but it'd be good to get some input on my
email from yesterday re: symlink support in branch-2.1. I think it really
should be in GA one way or the other.

http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201309.mbox/%3CCAGB5D2ZDjqt69oFfv_HOsWEH18T9GanuuF1Y%3DaKG-JptvV3ViA%40mail.gmail.com%3E

Thanks,
Andrew


On Tue, Sep 17, 2013 at 2:23 AM, Alejandro Abdelnur t...@cloudera.comwrote:

 Thanks Arun.

 +1

 * Downloaded source tarball.
 * Verified MD5
 * Verified signature
 * run apache-rat:check ok after minor tweak (see NIT1 below)
 * checked CHANGES.txt headers (see NIT2 below)
 * built DIST from source
 * verified hadoop version of Hadoop JARs
 * configured pseudo cluster
 * tested HttpFS
 * run a few MR examples
 * run a few unmanaged AM app examples

 The following NITs should be addressed if there is a new RC or in the next
 release

 --
 NIT1, empty files that make apache-rat:check to fail, these files should be
 removed:

 *

 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextSymlinkBaseTest.java

 *

 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFSFileContextSymlink.java

 *

 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSymlink.java
 --
 NIT2, common/hdfs/mapreduce/yarn CHANGES.txt have 2.2.0 header, they should
 not
 --



 On Tue, Sep 17, 2013 at 8:38 AM, Arun C Murthy a...@hortonworks.com
 wrote:

  Folks,
 
  I've created a release candidate (rc0) for hadoop-2.1.1-beta that I would
  like to get released - this release fixes a number of bugs on top of
  hadoop-2.1.0-beta as a result of significant amounts of testing.
 
  If things go well, this might be the last of the *beta* releases of
  hadoop-2.x.
 
  The RC is available at:
  http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
  The RC tag in svn is here:
 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
 
  The maven artifacts are available via repository.apache.org.
 
  Please try the release and vote; the vote will run for the usual 7 days.
 
  thanks,
  Arun
 
 
  --
  Arun C. Murthy
  Hortonworks Inc.
  http://hortonworks.com/
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



 --
 Alejandro



Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe
I think it makes sense to finish symlinks support in the Hadoop 2 GA release.

Colin

On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi all,

 I wanted to broadcast plans for putting the FileSystem symlinks work
 (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
 it's pretty important we get it in since it's not a compatible change; if
 it misses the GA train, we're not going to have symlinks until the next
 major release.

 However, we're still dealing with ongoing issues revealed via testing.
 There's user-code out there that only handles files and directories and
 will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
 for a nice example where globStatus returning symlinks broke Pig; some of
 us had a conference call to talk it through, and one definite conclusion
 was that this wasn't solvable in a generally compatible manner.

 There are also still some gaps in symlink support right now. For example,
 the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
 resolution, and tooling like the FsShell and Distcp still need to be
 updated as well.

 So, there's definitely work to be done, but there are a lot of users
 interested in the feature, and symlinks really should be in GA. Would
 appreciate any thoughts/input on the matter.

 Thanks,
 Andrew


Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Suresh Srinivas
I agree that this is an important change. However, 2.2.0 GA is getting
ready to rollout in weeks. I am concerned that these changes will add not
only incompatible changes late in the game, but also possibly instability.
Java API incompatibility is some thing we have avoided for the most part
and I am concerned that this is adding such incompatibility in FileSystem
APIs. We should find work arounds by adding possibly newer APIs and leaving
existing APIs as is. If this can be done, my vote is to enable this feature
in 2.3. Even if it cannot be done, I am concerned that this is coming quite
late and we should see if could allow some incompatible changes into 2.3
for this feature.


On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hi all,

 I wanted to broadcast plans for putting the FileSystem symlinks work
 (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
 it's pretty important we get it in since it's not a compatible change; if
 it misses the GA train, we're not going to have symlinks until the next
 major release.

 However, we're still dealing with ongoing issues revealed via testing.
 There's user-code out there that only handles files and directories and
 will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
 for a nice example where globStatus returning symlinks broke Pig; some of
 us had a conference call to talk it through, and one definite conclusion
 was that this wasn't solvable in a generally compatible manner.

 There are also still some gaps in symlink support right now. For example,
 the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
 resolution, and tooling like the FsShell and Distcp still need to be
 updated as well.

 So, there's definitely work to be done, but there are a lot of users
 interested in the feature, and symlinks really should be in GA. Would
 appreciate any thoughts/input on the matter.

 Thanks,
 Andrew




-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe
The issue is not modifying existing APIs.  The issue is that code has
been written that makes assumptions that are incompatible with the
existence of things that are not files or directories.  For example,
there is a lot of code out there that looks at FileStatus#isFile, and
if it returns false, assumes that what it is looking at is a
directory.  In the case of a symlink, this assumption is incorrect.

Faced with this, we have considered making the default behavior of
listStatus and globStatus to be fully resolving symlinks, and simply
not listing dangling symlinks. Code which is prepared to deal symlinks
can use newer versions of the listStatus and globStatus functions
which do return symlinks as symlinks.

We might consider defaulting FileSystem#listStatus and
FileSystem#globStatus to fully resolving symlinks by default and
defaulting FileContext#listStatus and FileContext#Util#globStatus to
the opposite.  This seems like the maximally compatible solution that
we're going to get.  I think this makes sense.

The alternative is kicking the can down the road to Hadoop 3, and
letting vendors of alternative (including some proprietary
alternative) systems continue to claim that Hadoop doesn't support
symlinks yet (with some justice).

P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
more appropriate.

sincerely,
Colin

On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas sur...@hortonworks.com wrote:
 I agree that this is an important change. However, 2.2.0 GA is getting
 ready to rollout in weeks. I am concerned that these changes will add not
 only incompatible changes late in the game, but also possibly instability.
 Java API incompatibility is some thing we have avoided for the most part
 and I am concerned that this is adding such incompatibility in FileSystem
 APIs. We should find work arounds by adding possibly newer APIs and leaving
 existing APIs as is. If this can be done, my vote is to enable this feature
 in 2.3. Even if it cannot be done, I am concerned that this is coming quite
 late and we should see if could allow some incompatible changes into 2.3
 for this feature.


 On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hi all,

 I wanted to broadcast plans for putting the FileSystem symlinks work
 (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
 it's pretty important we get it in since it's not a compatible change; if
 it misses the GA train, we're not going to have symlinks until the next
 major release.

 However, we're still dealing with ongoing issues revealed via testing.
 There's user-code out there that only handles files and directories and
 will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
 for a nice example where globStatus returning symlinks broke Pig; some of
 us had a conference call to talk it through, and one definite conclusion
 was that this wasn't solvable in a generally compatible manner.

 There are also still some gaps in symlink support right now. For example,
 the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
 resolution, and tooling like the FsShell and Distcp still need to be
 updated as well.

 So, there's definitely work to be done, but there are a lot of users
 interested in the feature, and symlinks really should be in GA. Would
 appreciate any thoughts/input on the matter.

 Thanks,
 Andrew




 --
 http://hortonworks.com/download/

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


[jira] [Created] (HADOOP-9972) new APIs for listStatus and globStatus to deal with symlinks

2013-09-17 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HADOOP-9972:


 Summary: new APIs for listStatus and globStatus to deal with 
symlinks
 Key: HADOOP-9972
 URL: https://issues.apache.org/jira/browse/HADOOP-9972
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Based on the discussion in HADOOP-9912, we need new APIs for FileSystem to deal 
with symlinks.  The issue is that code has been written which is incompatible 
with the existence of things which are not files or directories.  For example,
there is a lot of code out there that looks at FileStatus#isFile, and
if it returns false, assumes that what it is looking at is a
directory.  In the case of a symlink, this assumption is incorrect.

It seems reasonable to make the default behavior of {{FileSystem#listStatus}} 
and {{FileSystem#globStatus}} be fully resolving symlinks, and ignoring 
dangling ones.  This will prevent incompatibility with existing MR jobs and 
other HDFS users.  We should also add new versions of listStatus and globStatus 
that allow new, symlink-aware code to deal with symlinks as symlinks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9973) wrong dependencies

2013-09-17 Thread Nicolas Liochon (JIRA)
Nicolas Liochon created HADOOP-9973:
---

 Summary: wrong dependencies
 Key: HADOOP-9973
 URL: https://issues.apache.org/jira/browse/HADOOP-9973
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.1.1-beta
Reporter: Nicolas Liochon
Priority: Minor


See HBASE-9557 for the impact: for some of them, it seems it's pushing these 
dependencies to the client applications even if they are not used.

mvn dependency:analyze -pl hadoop-common
[WARNING] Used undeclared dependencies found:
[WARNING]com.google.code.findbugs:jsr305:jar:1.3.9:compile
[WARNING]commons-collections:commons-collections:jar:3.2.1:compile
[WARNING] Unused declared dependencies found:
[WARNING]com.sun.jersey:jersey-json:jar:1.9:compile
[WARNING]tomcat:jasper-compiler:jar:5.5.23:runtime
[WARNING]tomcat:jasper-runtime:jar:5.5.23:runtime
[WARNING]javax.servlet.jsp:jsp-api:jar:2.1:runtime
[WARNING]commons-el:commons-el:jar:1.0:runtime
[WARNING]org.slf4j:slf4j-log4j12:jar:1.7.5:runtime


mvn dependency:analyze -pl hadoop-yarn-client
[WARNING] Used undeclared dependencies found:
[WARNING]org.mortbay.jetty:jetty-util:jar:6.1.26:provided
[WARNING]log4j:log4j:jar:1.2.17:compile
[WARNING]com.google.guava:guava:jar:11.0.2:provided
[WARNING]commons-lang:commons-lang:jar:2.5:provided
[WARNING]commons-logging:commons-logging:jar:1.1.1:provided
[WARNING]commons-cli:commons-cli:jar:1.2:provided
[WARNING]org.apache.hadoop:hadoop-yarn-server-common:jar:2.1.2-SNAPSHOT:test
[WARNING] Unused declared dependencies found:
[WARNING]org.slf4j:slf4j-api:jar:1.7.5:compile
[WARNING]org.slf4j:slf4j-log4j12:jar:1.7.5:compile
[WARNING]com.google.inject.extensions:guice-servlet:jar:3.0:compile
[WARNING]io.netty:netty:jar:3.6.2.Final:compile
[WARNING]com.google.protobuf:protobuf-java:jar:2.5.0:compile
[WARNING]commons-io:commons-io:jar:2.1:compile
[WARNING]org.apache.hadoop:hadoop-hdfs:jar:2.1.2-SNAPSHOT:test
[WARNING]com.google.inject:guice:jar:3.0:compile
[WARNING]
com.sun.jersey.jersey-test-framework:jersey-test-framework-core:jar:1.9:test
[WARNING]
com.sun.jersey.jersey-test-framework:jersey-test-framework-grizzly2:jar:1.9:compile
[WARNING]com.sun.jersey:jersey-server:jar:1.9:compile
[WARNING]com.sun.jersey:jersey-json:jar:1.9:compile
[WARNING]com.sun.jersey.contribs:jersey-guice:jar:1.9:compile







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Andrew Wang
I encourage interested parties to read through HADOOP-9912 to get a feel
for the issues. There really is no way to add symlink support without
changing the behavior of existing APIs. Ultimately, anything that returns a
FileStatus is going to be different. Even if we default to resolving
symlinks, resolving can lead to FileNotFound or permission errors. Thus, we
have to choose whether to prune the bad links, show the bad links as
dangling, or throwing an exception. None of these options are compatible.

I'm really concerned about putting this in a minor release like 2.3 since
it has the potential to break a lot of user code. HADOOP-9912 is an example
from within our own ecosystem, but think of all the custom user code out
there written against FileSystem. 2.2 GA is basically our last chance to
make this kind of change before Hadoop 3.

Thanks,
Andrew


On Tue, Sep 17, 2013 at 9:10 AM, Colin McCabe cmcc...@alumni.cmu.eduwrote:

 The issue is not modifying existing APIs.  The issue is that code has
 been written that makes assumptions that are incompatible with the
 existence of things that are not files or directories.  For example,
 there is a lot of code out there that looks at FileStatus#isFile, and
 if it returns false, assumes that what it is looking at is a
 directory.  In the case of a symlink, this assumption is incorrect.

 Faced with this, we have considered making the default behavior of
 listStatus and globStatus to be fully resolving symlinks, and simply
 not listing dangling symlinks. Code which is prepared to deal symlinks
 can use newer versions of the listStatus and globStatus functions
 which do return symlinks as symlinks.

 We might consider defaulting FileSystem#listStatus and
 FileSystem#globStatus to fully resolving symlinks by default and
 defaulting FileContext#listStatus and FileContext#Util#globStatus to
 the opposite.  This seems like the maximally compatible solution that
 we're going to get.  I think this makes sense.

 The alternative is kicking the can down the road to Hadoop 3, and
 letting vendors of alternative (including some proprietary
 alternative) systems continue to claim that Hadoop doesn't support
 symlinks yet (with some justice).

 P.S.  I would be fine with putting this in 2.2 or 2.3 if that seems
 more appropriate.

 sincerely,
 Colin

 On Tue, Sep 17, 2013 at 8:23 AM, Suresh Srinivas sur...@hortonworks.com
 wrote:
  I agree that this is an important change. However, 2.2.0 GA is getting
  ready to rollout in weeks. I am concerned that these changes will add not
  only incompatible changes late in the game, but also possibly
 instability.
  Java API incompatibility is some thing we have avoided for the most part
  and I am concerned that this is adding such incompatibility in FileSystem
  APIs. We should find work arounds by adding possibly newer APIs and
 leaving
  existing APIs as is. If this can be done, my vote is to enable this
 feature
  in 2.3. Even if it cannot be done, I am concerned that this is coming
 quite
  late and we should see if could allow some incompatible changes into 2.3
  for this feature.
 
 
  On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.com
 wrote:
 
  Hi all,
 
  I wanted to broadcast plans for putting the FileSystem symlinks work
  (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I
 think
  it's pretty important we get it in since it's not a compatible change;
 if
  it misses the GA train, we're not going to have symlinks until the next
  major release.
 
  However, we're still dealing with ongoing issues revealed via testing.
  There's user-code out there that only handles files and directories and
  will barf when given a symlink (perhaps a dangling one!). See
 HADOOP-9912
  for a nice example where globStatus returning symlinks broke Pig; some
 of
  us had a conference call to talk it through, and one definite conclusion
  was that this wasn't solvable in a generally compatible manner.
 
  There are also still some gaps in symlink support right now. For
 example,
  the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need
 symlink
  resolution, and tooling like the FsShell and Distcp still need to be
  updated as well.
 
  So, there's definitely work to be done, but there are a lot of users
  interested in the feature, and symlinks really should be in GA. Would
  appreciate any thoughts/input on the matter.
 
  Thanks,
  Andrew
 
 
 
 
  --
  http://hortonworks.com/download/
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, 

[jira] [Created] (HADOOP-9975) Adding relogin() method to UGI

2013-09-17 Thread Kai Zheng (JIRA)
Kai Zheng created HADOOP-9975:
-

 Summary: Adding relogin() method to UGI
 Key: HADOOP-9975
 URL: https://issues.apache.org/jira/browse/HADOOP-9975
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Kai Zheng
Assignee: Kai Zheng


In current Hadoop UGI implementation, it has API methods like 
reloginFromKeytab() and reloginFromTicketCache().  However, such methods are 
too Kerberos specific and also involves login implementation details, it would 
be better to add generic relogin() method regardless authentication mechanism. 
This is possible since relevant authentication specific parameters like 
principal and keytab are already passed and saved in the UGI object after 
initial login.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Karthik Kambatla
Not sure if this should be a blocker for 2.1.1, but filed HADOOP-9976 to
have a single version of avro.


On Tue, Sep 17, 2013 at 6:51 AM, Andrew Wang andrew.w...@cloudera.comwrote:

 Hey all,

 Sorry to hijack the vote thread, but it'd be good to get some input on my
 email from yesterday re: symlink support in branch-2.1. I think it really
 should be in GA one way or the other.


 http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201309.mbox/%3CCAGB5D2ZDjqt69oFfv_HOsWEH18T9GanuuF1Y%3DaKG-JptvV3ViA%40mail.gmail.com%3E

 Thanks,
 Andrew


 On Tue, Sep 17, 2013 at 2:23 AM, Alejandro Abdelnur t...@cloudera.com
 wrote:

  Thanks Arun.
 
  +1
 
  * Downloaded source tarball.
  * Verified MD5
  * Verified signature
  * run apache-rat:check ok after minor tweak (see NIT1 below)
  * checked CHANGES.txt headers (see NIT2 below)
  * built DIST from source
  * verified hadoop version of Hadoop JARs
  * configured pseudo cluster
  * tested HttpFS
  * run a few MR examples
  * run a few unmanaged AM app examples
 
  The following NITs should be addressed if there is a new RC or in the
 next
  release
 
  --
  NIT1, empty files that make apache-rat:check to fail, these files should
 be
  removed:
 
  *
 
 
 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/FileContextSymlinkBaseTest.java
 
  *
 
 
 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestLocalFSFileContextSymlink.java
 
  *
 
 
 /Users/tucu/Downloads/h/hadoop-2.1.1-beta-src/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestFcHdfsSymlink.java
  --
  NIT2, common/hdfs/mapreduce/yarn CHANGES.txt have 2.2.0 header, they
 should
  not
  --
 
 
 
  On Tue, Sep 17, 2013 at 8:38 AM, Arun C Murthy a...@hortonworks.com
  wrote:
 
   Folks,
  
   I've created a release candidate (rc0) for hadoop-2.1.1-beta that I
 would
   like to get released - this release fixes a number of bugs on top of
   hadoop-2.1.0-beta as a result of significant amounts of testing.
  
   If things go well, this might be the last of the *beta* releases of
   hadoop-2.x.
  
   The RC is available at:
   http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
   The RC tag in svn is here:
  
 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0
  
   The maven artifacts are available via repository.apache.org.
  
   Please try the release and vote; the vote will run for the usual 7
 days.
  
   thanks,
   Arun
  
  
   --
   Arun C. Murthy
   Hortonworks Inc.
   http://hortonworks.com/
  
  
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity
  to
   which it is addressed and may contain information that is confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 
 
 
  --
  Alejandro
 



[jira] [Created] (HADOOP-9976) Different versions of avro and avro-maven-plugin

2013-09-17 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created HADOOP-9976:


 Summary: Different versions of avro and avro-maven-plugin
 Key: HADOOP-9976
 URL: https://issues.apache.org/jira/browse/HADOOP-9976
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Post HADOOP-9672, the versions for avro and avro-maven-plugin are different - 
1.7.4 and 1.5.3 respectively. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9977) Hadoop services won't start with different keypass and keystorepass when https is enabled

2013-09-17 Thread Yesha Vora (JIRA)
Yesha Vora created HADOOP-9977:
--

 Summary: Hadoop services won't start with different keypass and 
keystorepass when https is enabled
 Key: HADOOP-9977
 URL: https://issues.apache.org/jira/browse/HADOOP-9977
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.1.0-beta
Reporter: Yesha Vora


Enable ssl in the configuration. While creating keystore, give different 
keypass and keystore password. (here, keypass = hadoop and storepass=hadoopKey)

keytool -genkey -alias host1 -keyalg RSA -keysize 1024 -dname 
CN=host1,OU=hw,O=hw,L=palo alto,ST=ca,C=us -keypass hadoop -keystore 
keystore.jks -storepass hadoopKey

In , ssl-server.xml set below two properties.
propertynamessl.server.keystore.keypassword/namevaluehadoop/value/property
propertynamessl.server.keystore.password/namevaluehadoopKey/value/property

Namenode, ResourceManager, Datanode, Nodemanager, SecondaryNamenode fails to 
start with below error.

2013-09-17 21:39:00,794 FATAL namenode.NameNode (NameNode.java:main(1325)) - 
Exception in namenode join
java.io.IOException: java.security.UnrecoverableKeyException: Cannot recover key
at org.apache.hadoop.http.HttpServer.init(HttpServer.java:222)
at org.apache.hadoop.http.HttpServer.init(HttpServer.java:174)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer$1.init(NameNodeHttpServer.java:76)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:74)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:626)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:488)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:684)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:669)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1254)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1320)
Caused by: java.security.UnrecoverableKeyException: Cannot recover key
at sun.security.provider.KeyProtector.recover(KeyProtector.java:328)
at 
sun.security.provider.JavaKeyStore.engineGetKey(JavaKeyStore.java:138)
at 
sun.security.provider.JavaKeyStore$JKS.engineGetKey(JavaKeyStore.java:55)
at java.security.KeyStore.getKey(KeyStore.java:792)
at 
sun.security.ssl.SunX509KeyManagerImpl.init(SunX509KeyManagerImpl.java:131)
at 
sun.security.ssl.KeyManagerFactoryImpl$SunX509.engineInit(KeyManagerFactoryImpl.java:68)
at javax.net.ssl.KeyManagerFactory.init(KeyManagerFactory.java:259)
at 
org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory.init(FileBasedKeyStoresFactory.java:170)
at org.apache.hadoop.security.ssl.SSLFactory.init(SSLFactory.java:121)
at org.apache.hadoop.http.HttpServer.init(HttpServer.java:220)
... 9 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Eli Collins
(Looping in Arun since this impacts 2.x releases)

I updated the versions on HADOOP-8040 and sub-tasks to reflect where
the changes have landed. All of these changes (modulo HADOOP-9417)
were merged to branch-2.1 and are in the 2.1.0 release.

While symlinks are in 2.1.0 I don't think we can really claim they're
ready until issues like HADOOP-9912 are resolved, and they are
supported in the shell, distcp and WebHDFS/HttpFS/Hftp (these are not
esoteric!).  Someone can create a symlink with FileSystem causing
someone else's distcp job to fail. Unlikely given they're not exposed
outside the Java API but still not great.   Ideally this work would
have been done on a feature branch and then merged when complete, but
that's water under the bridge.

I see the following options:

1. Fixup the current symlink support so that symlinks are ready for
2.2 (GA), or at least the public APIs. This means the APIs will be in
GA from the get go so while the functionality might be fully baked we
don't have to worry about incompatible changes like FileStatus#isDir()
changing behavior in 2.3 or a later update.  The downside is this will
take at least a couple weeks (to resolve HADOOP-9912 and potentially
implement the remaining pieces) and so may impact the 2.2 release
timing. This option means 2.2 won't remove the new APIs introduced in
2.1.  We'd want to spin a 2.1.2 beta with the new API changes so we
don't introduce new APIs in the beta to GA transition.

2. Revert symlinks from branch-2.1-beta and branch-2. Finish up the
work in trunk (or a feature branch) and merge for a subsequent 2.x
update.  While this helps get us to GA faster it would be preferable
to get an API change like this in for 2.2 GA since they may be
disruptive to introduce in an update (eg see example in #1). And of
course our users would like symlinks functionality in the GA release.
This option would mean 2.2 is incompatible with 2.1 because it's
dropping the new APIs, not ideal for a beta to GA transition.

3. Revert and punt symlinks to 3.x.  IMO should be the last resort.

If we have sufficient time I think option #1 would be best.  What do
others think?

Thanks,
Eli


On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang andrew.w...@cloudera.com wrote:
 Hi all,

 I wanted to broadcast plans for putting the FileSystem symlinks work
 (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think
 it's pretty important we get it in since it's not a compatible change; if
 it misses the GA train, we're not going to have symlinks until the next
 major release.

 However, we're still dealing with ongoing issues revealed via testing.
 There's user-code out there that only handles files and directories and
 will barf when given a symlink (perhaps a dangling one!). See HADOOP-9912
 for a nice example where globStatus returning symlinks broke Pig; some of
 us had a conference call to talk it through, and one definite conclusion
 was that this wasn't solvable in a generally compatible manner.

 There are also still some gaps in symlink support right now. For example,
 the more esoteric FileSystems like WebHDFS, HttpFS, and HFTP need symlink
 resolution, and tooling like the FsShell and Distcp still need to be
 updated as well.

 So, there's definitely work to be done, but there are a lot of users
 interested in the feature, and symlinks really should be in GA. Would
 appreciate any thoughts/input on the matter.

 Thanks,
 Andrew


Re: [VOTE] Release Apache Hadoop 2.1.1-beta

2013-09-17 Thread Roman Shaposhnik
On Mon, Sep 16, 2013 at 11:38 PM, Arun C Murthy a...@hortonworks.com wrote:
 Folks,

 I've created a release candidate (rc0) for hadoop-2.1.1-beta that I would 
 like to get
 released - this release fixes a number of bugs on top of hadoop-2.1.0-beta as 
 a result of significant amounts of testing.

 If things go well, this might be the last of the *beta* releases of 
 hadoop-2.x.

 The RC is available at: 
 http://people.apache.org/~acmurthy/hadoop-2.1.1-beta-rc0
 The RC tag in svn is here: 
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.1-beta-rc0

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

As usual, here's a full Bigtop stack built on top of this RC:
http://bigtop01.cloudera.org:8080/view/Upstream-tests/job/Hadoop-2.1.1/

Those who would feel more adventurous in helping test the entire Hadoop
ecosystem that Bigtop packages can simply yum/apt-get/zypper install
everything from package in those locations.

I'm also running the tests on fully distributed clusters in Bigtop --
will report
the findings tomorrow.

Thanks,
Roman.


[jira] [Created] (HADOOP-9978) Support range reads in s3n interface to split objects for mappers to read

2013-09-17 Thread Amandeep Khurana (JIRA)
Amandeep Khurana created HADOOP-9978:


 Summary: Support range reads in s3n interface to split objects for 
mappers to read
 Key: HADOOP-9978
 URL: https://issues.apache.org/jira/browse/HADOOP-9978
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Amandeep Khurana




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira