[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Edward Nevill (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Nevill updated HADOOP-11660:
---
Status: Open  (was: Patch Available)

Patch to be replaced with a version which does pipelining

 Add support for hardware crc on ARM aarch64 architecture
 

 Key: HADOOP-11660
 URL: https://issues.apache.org/jira/browse/HADOOP-11660
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: 3.0.0
 Environment: ARM aarch64 development platform
Reporter: Edward Nevill
Assignee: Edward Nevill
Priority: Minor
  Labels: performance
 Attachments: jira-11660.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 This patch adds support for hardware crc for ARM's new 64 bit architecture
 The patch is completely conditionalized on __aarch64__
 I have only added support for the non pipelined version as I benchmarked the 
 pipelined version on aarch64 and it showed no performance improvement.
 The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
 are supported on ARM aarch64 hardwre.
 To benchmark this I modified the test_bulk_crc32 test to print out the time 
 taken to CRC a 1MB dataset 1000 times.
 Before:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 After:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347034#comment-14347034
 ] 

Hudson commented on HADOOP-11183:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: 
rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
* hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
* 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java


 Memory-based S3AOutputstream
 

 Key: HADOOP-11183
 URL: https://issues.apache.org/jira/browse/HADOOP-11183
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Thomas Demoor
Assignee: Thomas Demoor
 Fix For: 2.7.0

 Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, 
 HADOOP-11183-006.patch, HADOOP-11183-007.patch, HADOOP-11183-008.patch, 
 HADOOP-11183-009.patch, HADOOP-11183-010.patch, HADOOP-11183.001.patch, 
 HADOOP-11183.002.patch, HADOOP-11183.003.patch, design-comments.pdf


 Currently s3a buffers files on disk(s) before uploading. This JIRA 
 investigates adding a memory-based upload implementation.
 The motivation is evidently performance: this would be beneficial for users 
 with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
 an S3-compatible object store (FYI: my contributions are made in name of 
 Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347032#comment-14347032
 ] 

Hudson commented on HADOOP-6857:


SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
Move HADOOP-6857 to 3.0.0. (aajisaka: rev 
29bb6898654199a809f1c3e8e536a63fb0d4f073)
* hadoop-common-project/hadoop-common/CHANGES.txt


 FsShell should report raw disk usage including replication factor
 -

 Key: HADOOP-6857
 URL: https://issues.apache.org/jira/browse/HADOOP-6857
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Alex Kozlov
Assignee: Byron Wong
 Fix For: 3.0.0

 Attachments: HADOOP-6857-revert.patch, HADOOP-6857.patch, 
 HADOOP-6857.patch, HADOOP-6857.patch, revert-HADOOP-6857-from-branch-2.patch, 
 show-space-consumed.txt


 Currently FsShell report HDFS usage with hadoop fs -dus path command.  
 Since replication level is per file level, it would be nice to add raw disk 
 usage including the replication factor (maybe hadoop fs -dus -raw path?). 
  This will allow to assess resource usage more accurately.  -- Alex K



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Edward Nevill (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Nevill updated HADOOP-11660:
---
Attachment: (was: jira-11660.patch)

 Add support for hardware crc on ARM aarch64 architecture
 

 Key: HADOOP-11660
 URL: https://issues.apache.org/jira/browse/HADOOP-11660
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: 3.0.0
 Environment: ARM aarch64 development platform
Reporter: Edward Nevill
Assignee: Edward Nevill
Priority: Minor
  Labels: performance
   Original Estimate: 48h
  Remaining Estimate: 48h

 This patch adds support for hardware crc for ARM's new 64 bit architecture
 The patch is completely conditionalized on __aarch64__
 I have only added support for the non pipelined version as I benchmarked the 
 pipelined version on aarch64 and it showed no performance improvement.
 The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
 are supported on ARM aarch64 hardwre.
 To benchmark this I modified the test_bulk_crc32 test to print out the time 
 taken to CRC a 1MB dataset 1000 times.
 Before:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 After:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing this in favor of HADOOP-11590, which rewrites these scripts.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Attachments: HADOOP-11668-01.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347174#comment-14347174
 ] 

Sean Busbey commented on HADOOP-11656:
--

I don't see how we can do this compatibly. Even defaulting to use the 
application classloader will break some downstream projects. Certainly going 
the step farther to make sure we also only expose our API to them, wether via 
an OSGi container or not, will break even more of them.

I can understand the desire to have a compatible version of this in the 2.x 
line. Probably the option to have it off would make the most sense for that. 
However, this kind of isolation is something we _should_ be doing. The reason 
to focus first on a breaking version is so we can have doing things correctly 
staked to some point in the future.

There are plenty of ways we can make the transition easier for downstream 
folks. I've already mentioned giving upgrade docs that include maven pom 
changes needed to get the same set of dependencies. As you mention, we could 
also include some option toggle that says I want to see the framework 
libraries. I happen to think this is a bad idea because it leads straight back 
to where we are now. In any case, either of these mitigations require 
downstream projects to change what they are doing, which sounds incompatible to 
me.

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11669) Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347107#comment-14347107
 ] 

Hadoop QA commented on HADOOP-11669:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702457/001-HADOOP-11669.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//console

This message is automatically generated.

 Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class
 -

 Key: HADOOP-11669
 URL: https://issues.apache.org/jira/browse/HADOOP-11669
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: nijel
Assignee: nijel
Priority: Minor
 Attachments: 0001-HDFS-7883.patch, 001-HADOOP-11669.patch


 These 2 configurations in HttpServer2.java is hadoop configurations.
 {code}
   static final String FILTER_INITIALIZER_PROPERTY
   = hadoop.http.filter.initializers;
   public static final String HTTP_MAX_THREADS = hadoop.http.max.threads;
 {code}
 It is better to keep it inside CommonConfigurationKeys



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347188#comment-14347188
 ] 

Sean Busbey commented on HADOOP-11656:
--

{quote}
One troublespot, even with that tactic, is shown by HADOOP-11064: 
UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to NativeCRC32 
method changes. Changes in the internal JNI bindings meant that no hadoop-2.4 
app (like HBase) would run in a Hadoop 2.6-alpha cluster. We were lucky that I 
got to find that before 2.6 shipped, otherwise we'd have a lot of complaints. 
The problem here is that even with HBase isolated on classpath, it was picking 
up the hadoop-native binaries from somewhere on PATH/LIB or whatever, and so 
failing to link.

Classloader isolation  shading isn't going to be sufficient here. HADOOP-11127 
proposes some versioning, which will help —but I don't think it will let us 
load 1 hadoop.lib into a JVM. As a result, the only version of 
hadoop-common.jar which can be reliably loaded into a process is the one that 
is in sync with the version of the native library on the target machine.
{quote}

Yes, native library support is an entire additional can of worms. For this 
improvement I'd prefer to leave that to future work, if only because the JVM 
doesn't really offer options. Perhaps docs that cover the limitations of what 
isolation we offer would be a good start.

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347217#comment-14347217
 ] 

Jason Lowe commented on HADOOP-11656:
-

bq. There are plenty of ways we can make the transition easier for downstream 
folks. I've already mentioned giving upgrade docs that include maven pom 
changes needed to get the same set of dependencies. As you mention, we could 
also include some option toggle that says I want to see the framework 
libraries. I happen to think this is a bad idea because it leads straight back 
to where we are now. In any case, either of these mitigations require 
downstream projects to change what they are doing, which sounds incompatible to 
me.

I think the idea here is to flip the defaults around.  The easiest transition 
for existing downstream folks is to opt in, rather than opt out, of classpath 
isolation.  We can debate whether that's custom classloaders, OSGi packaging, 
or what-not when it's turned on.  But if not turned on by default then it is 
backwards compatible, to the extent that we support backwards compatibility 
today.  Clients/jobs that ran before continue to run on the new version.  Those 
that want/need the isolation can ask for it, and we can iterate the isolation 
feature without necessarily breaking the existing users that aren't asking for 
it because it didn't exist back then and would break their old workflow if it 
suddenly does.  At some point in the future we can (and probably want to) 
switch the defaults so clients/apps get classpath isolation by default.  I 
totally agree that decision necessarily breaks backwards compatibility.

IMHO the smoothest transition for major features, this or otherwise, is to 
develop the feature if possible as opt in, rather than opt out, until it is 
mature, stable, and the community agrees it should be active by default.  Some 
features are such that they inherently cannot be turned off, but if possible 
it'd be great to develop and mature them as options that people can try out 
until they become stable to ease transitions and avoid unnecessary breakage at 
an early stage.

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347234#comment-14347234
 ] 

stack commented on HADOOP-11656:


bq. To add, I think we can and should strive for doing this in a compatible 
manner, whatever the approach.

Sure. Sounds good if possible at all as well as being a load of work proving 
changes are indeed compatible.

bq. Marking and calling it incompatible before we see proposal/patch seems 
premature to me.

I'd suggest you open a new issue to do classpath isolation in a 'compatible 
manner' rather than add this imposition here. In this issue, the reporter 
thinks it a breaking change (At a minimum we'll break dependency compatibility 
and operational compatibility.). The two issues can move along independent of 
each other.

And to be clear when we talk 'compatible manner', the expectation is that a 
downstream apps, for example hbase, should be able to move from hadoop-2.X to 
hadoop-2.Y without breakage, right? That is, in spite of shading, new locations 
for dependencies, cleaned up exposure of libs likely transitively included, 
etc., there will be no need for downstreamers to add in new compensatory code, 
no need of our having to release special versions to work with hadoop-2.Z, and 
no need of callouts in code or for us to do educate our community's that if on 
hadoop-2.X do this...but if on hadoop-2.Y do that? Or are we talking 
something else (And downstreamers, you are doing it wrong is not allowed).

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347298#comment-14347298
 ] 

Arun C Murthy commented on HADOOP-11656:


Agree 1000% with [~jlowe].

Starting with the thesis that we should break compat is less than ideal - we 
should certainly strive to add features in a compatible manner, this allows all 
existing users to consume the feature without the need to make a *should I use 
this or not* choice.

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: HADOOP-11659.patch

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HADOOP-11659.patch


 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: HADOOP-11653.patch

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Trivial
 Attachments: HADOOP-11653.patch


 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Priority: Minor  (was: Trivial)

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HADOOP-11653.patch


 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347425#comment-14347425
 ] 

Hadoop QA commented on HADOOP-11618:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702572/HADOOP-11618-002.patch
  against trunk revision 03cc229.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//console

This message is automatically generated.

 DelegateToFileSystem always uses default FS's default port 
 ---

 Key: HADOOP-11618
 URL: https://issues.apache.org/jira/browse/HADOOP-11618
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
 HADOOP-11618.patch


 DelegateToFileSystem constructor has the following code:
 {code}
 super(theUri, supportedScheme, authorityRequired,
 FileSystem.getDefaultUri(conf).getPort());
 {code}
 The default port should be taken from theFsImpl instead.
 {code}
 super(theUri, supportedScheme, authorityRequired,
 theFsImpl.getDefaultPort());
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Status: Patch Available  (was: Open)

Attached the patch and did not return testcases but executed effected testcases 
for regression and all are passing..

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HADOOP-11659.patch


 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347446#comment-14347446
 ] 

Steve Loughran commented on HADOOP-11656:
-

[~saint@gmail.com], as someone downstream, I know you know the situation we 
have now; everyone who goes down experiences this, with HBase and OOzie being 
core pain points. Not exposing the transitive dependencies means that you can 
stop worrying about what version of Guava or protobuf is used by Hadoop, 
leaving only our consistent semantics to maintain.

The native lib problem will mean no more than one version of the hadoop JARs 
can be reliably loaded.

Now, unless I'm confused about how classloaders bootstrap, it has to be done in 
an order; classloader above classloader, with OSGi doing some magic at startup 
so the first CL can pick up stuff from external CLs and make them visible to 
others.

Does this mean that adoption of the new CL is a whole new startup process? as 
if so, it is going to be visible to everything downstream. Now, we could design 
YARN-679 to be ready for this, so if you adopt that as the launcher for your 
app then you can get the CL setup in there.

But what about every single client app that wants to talk HDFS? We may be able 
to go to HBase  Accumulo  say new launcher, maybe go to spark and say your 
AM needs to do this, but it's harder to say your general purpose code to read 
off HDFS must now use our CL chain to work. Especially for the use case 
webapp running in tomcat with the Classloader isolation of Java EE. 

Things like aren't going to work if we start imposing a new CL, it will need to 
flip the switch to say no dependency magic. 

So why is this being proposed as on-by-default? And, since there isn't a 
clear proposal yet, are we trying to define that  we should be incompatible 
from the outset?

Please: give us a proposal, let's work towards an implementation, actually test 
this downstream including in an Oozie version (hence tomcat tests), in-cluster 
apps, and remote client apps. Then we can consider whether or not it would be 
justifiable to say you must do this to move to Hadoop 3

Oh, and given the schedules, we should start planning for Java 9  Jigsaw...



 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11627:
--
Status: Patch Available  (was: In Progress)

 Remove io.native.lib.available from trunk
 -

 Key: HADOOP-11627
 URL: https://issues.apache.org/jira/browse/HADOOP-11627
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
 HADOOP-11627-004.patch, HADOOP-11627.patch


 According to the discussion in HADOOP-8642, we should remove 
 {{io.native.lib.available}} from trunk, and always use native libraries if 
 they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11618:
--
Attachment: HADOOP-11618-002.patch

 DelegateToFileSystem always uses default FS's default port 
 ---

 Key: HADOOP-11618
 URL: https://issues.apache.org/jira/browse/HADOOP-11618
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
 HADOOP-11618.patch


 DelegateToFileSystem constructor has the following code:
 {code}
 super(theUri, supportedScheme, authorityRequired,
 FileSystem.getDefaultUri(conf).getPort());
 {code}
 The default port should be taken from theFsImpl instead.
 {code}
 super(theUri, supportedScheme, authorityRequired,
 theFsImpl.getDefaultPort());
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: (was: HADOOP-11653.patch)

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Minor

 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11627:
--
Attachment: HADOOP-11627-004.patch

 Remove io.native.lib.available from trunk
 -

 Key: HADOOP-11627
 URL: https://issues.apache.org/jira/browse/HADOOP-11627
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
 HADOOP-11627-004.patch, HADOOP-11627.patch


 According to the discussion in HADOOP-8642, we should remove 
 {{io.native.lib.available}} from trunk, and always use native libraries if 
 they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347469#comment-14347469
 ] 

Brahma Reddy Battula commented on HADOOP-11627:
---

Thanks a lot for review..Please check updated patch..

 Remove io.native.lib.available from trunk
 -

 Key: HADOOP-11627
 URL: https://issues.apache.org/jira/browse/HADOOP-11627
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
 HADOOP-11627-004.patch, HADOOP-11627.patch


 According to the discussion in HADOOP-8642, we should remove 
 {{io.native.lib.available}} from trunk, and always use native libraries if 
 they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347479#comment-14347479
 ] 

Brahma Reddy Battula commented on HADOOP-11627:
---

Ran all the testcases for regression locally  , all are passing.

 Remove io.native.lib.available from trunk
 -

 Key: HADOOP-11627
 URL: https://issues.apache.org/jira/browse/HADOOP-11627
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
 HADOOP-11627-004.patch, HADOOP-11627.patch


 According to the discussion in HADOOP-8642, we should remove 
 {{io.native.lib.available}} from trunk, and always use native libraries if 
 they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347500#comment-14347500
 ] 

Hadoop QA commented on HADOOP-11627:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702595/HADOOP-11627-004.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5847//console

This message is automatically generated.

 Remove io.native.lib.available from trunk
 -

 Key: HADOOP-11627
 URL: https://issues.apache.org/jira/browse/HADOOP-11627
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
 HADOOP-11627-004.patch, HADOOP-11627.patch


 According to the discussion in HADOOP-8642, we should remove 
 {{io.native.lib.available}} from trunk, and always use native libraries if 
 they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-11103) Clean up RemoteException

2015-03-04 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HADOOP-11103:


Assignee: Sean Busbey

 Clean up RemoteException
 

 Key: HADOOP-11103
 URL: https://issues.apache.org/jira/browse/HADOOP-11103
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Trivial
 Attachments: HADOOP-11103.1.patch


 RemoteException has a number of undocumented behaviors
 * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
 source, the String returned is the classname of the wrapped remote exception.
 * RemoteException(String, String) is equivalent to calling 
 RemoteException(String, String, null)
 * Constructors allow null for all arguments
 * Some of the test code doesn't check for correct error codes to correspond 
 with the wrapped exception type
 * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10895) HTTP KerberosAuthenticator fallback should have a flag to disable it

2015-03-04 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347611#comment-14347611
 ] 

Yongjun Zhang commented on HADOOP-10895:


Hi [~tucu00], [~atm], [~zjshen], [~daryn],

This jira originates from the discussion in HADOOP-10771 you guys participated. 
I'd like to bring to your attention, to see if we want to move this one 
forward. Please see my comment at 
https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823

Thanks for your time, and thanks [~vinodkv] for suggesting me in the email 
thread to collect feedback from you guys.


 HTTP KerberosAuthenticator fallback should have a flag to disable it
 

 Key: HADOOP-10895
 URL: https://issues.apache.org/jira/browse/HADOOP-10895
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.1
Reporter: Alejandro Abdelnur
Assignee: Yongjun Zhang
Priority: Blocker
 Attachments: HADOOP-10895.001.patch, HADOOP-10895.002.patch, 
 HADOOP-10895.003.patch, HADOOP-10895.003v1.patch, HADOOP-10895.003v2.patch, 
 HADOOP-10895.003v2improved.patch, HADOOP-10895.004.patch, 
 HADOOP-10895.005.patch, HADOOP-10895.006.patch, HADOOP-10895.007.patch, 
 HADOOP-10895.008.patch, HADOOP-10895.009.patch


 Per review feedback in HADOOP-10771, {{KerberosAuthenticator}} and the 
 delegation token version coming in with HADOOP-10771 should have a flag to 
 disable fallback to pseudo, similarly to the one that was introduced in 
 Hadoop RPC client with HADOOP-9698.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347566#comment-14347566
 ] 

Allen Wittenauer commented on HADOOP-11656:
---

FYI, I'm adding the 'shell' label because regardless of the outcome, this will 
almost certainly have an impact on how the various classpath commands and 
shellprofile.d code works in the future.

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies, shell

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11656:
--
Labels: classloading classpath dependencies shell  (was: classloading 
classpath dependencies)

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies, shell

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347538#comment-14347538
 ] 

Hadoop QA commented on HADOOP-11659:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702588/HADOOP-11659.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//console

This message is automatically generated.

 o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
 

 Key: HADOOP-11659
 URL: https://issues.apache.org/jira/browse/HADOOP-11659
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
Priority: Minor
 Attachments: HADOOP-11659.patch


 The method looks up the same key in the same hash map potentially 3 times
 {code}
 if (map.containsKey(key)  fs == map.get(key)) {
   map.remove(key)
 {code}
 Instead it could do a single lookup
 {code}
 FileSystem cachedFs = map.remove(key);
 {code}
 and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11613) Remove httpclient dependency from hadoop-azure

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347560#comment-14347560
 ] 

Brahma Reddy Battula commented on HADOOP-11613:
---

 *Testcase failures*  are because of {{encodedKey = URLEncoder.encode(key, 
UTF-8);}} which is having the limitations on special characters..( All other 
characters are unsafe and are first converted into one or more bytes using some 
encoding scheme..Check the following java doc for same)

https://docs.oracle.com/javase/6/docs/api/java/net/URLEncoder.html

Just I replaced with bitset ( like following ), all the testcases are 
passing..I am always happy work with bitset..Hence I had given intial patch 
with bitset..

{code}
byte[] rawdata = URLCodec.encodeUrl(allowed_abs_path,
   EncodingUtils.getBytes(key, UTF-8));
 String encodedKey = EncodingUtils.getAsciiString(rawdata);
{code}

[~ajisakaa] If you agree with you, please consider initial patch which is 
having the bitset..( with this all the testcases are passing)..  Please correct 
me If I am wrong.

 Remove httpclient dependency from hadoop-azure
 --

 Key: HADOOP-11613
 URL: https://issues.apache.org/jira/browse/HADOOP-11613
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11613-001.patch, HADOOP-11613-002.patch, 
 HADOOP-11613-003.patch, HADOOP-11613.patch


 Remove httpclient dependency from MockStorageInterface.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11656:
--
Labels: classloading classpath dependencies scripts shell  (was: 
classloading classpath dependencies shell)

 Classpath isolation for downstream clients
 --

 Key: HADOOP-11656
 URL: https://issues.apache.org/jira/browse/HADOOP-11656
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Sean Busbey
Assignee: Sean Busbey
  Labels: classloading, classpath, dependencies, scripts, shell

 Currently, Hadoop exposes downstream clients to a variety of third party 
 libraries. As our code base grows and matures we increase the set of 
 libraries we rely on. At the same time, as our user base grows we increase 
 the likelihood that some downstream project will run into a conflict while 
 attempting to use a different version of some library we depend on. This has 
 already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
 (and I'm sure others).
 While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
 off and they don't do anything to help dependency conflicts on the driver 
 side or for folks talking to HDFS directly. This should serve as an umbrella 
 for changes needed to do things thoroughly on the next major version.
 We should ensure that downstream clients
 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
 doesn't pull in any third party dependencies
 2) only see our public API classes (or as close to this as feasible) when 
 executing user provided code, whether client side in a launcher/driver or on 
 the cluster in a container or within MR.
 This provides us with a double benefit: users get less grief when they want 
 to run substantially ahead or behind the versions we need and the project is 
 freer to change our own dependency versions because they'll no longer be in 
 our compatibility promises.
 Project specific task jiras to follow after I get some justifying use cases 
 written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347327#comment-14347327
 ] 

Brahma Reddy Battula commented on HADOOP-11618:
---

Thanks a lot for review..
{quote}
In both cases, we are going to assert that ftpFs.getUri() results in 
ftp://dummy-host:21
{quote}
will not return default port where URI having port..whenever port=-1(not 
configured)then  only default port will be return..Please check following code 
for same..
{code}
 private URI getUri(URI uri, String supportedScheme,
  boolean authorityNeeded, int defaultPort) throws URISyntaxException {
checkScheme(uri, supportedScheme);
// A file system implementation that requires authority must always
// specify default port
if (defaultPort  0  authorityNeeded) {
  throw new HadoopIllegalArgumentException(
  FileSystem implementation error -  default port  + defaultPort
  +  is not valid);
}
String authority = uri.getAuthority();
if (authority == null) {
   if (authorityNeeded) {
 throw new HadoopIllegalArgumentException(Uri without authority:  + 
uri);
   } else {
 return new URI(supportedScheme + :///);
   }   
}
// authority is non null  - AuthorityNeeded may be true or false.
int port = uri.getPort();
port = (port == -1 ? defaultPort : port);
if (port == -1) { // no port supplied and default port is not specified
  return new URI(supportedScheme, authority, /, null);
}
return new URI(supportedScheme + :// + uri.getHost() + : + port);
  }
{code}

and 001 patch also will call this method only...Anyway I updated patch,,kindly 
review the same..

 DelegateToFileSystem always uses default FS's default port 
 ---

 Key: HADOOP-11618
 URL: https://issues.apache.org/jira/browse/HADOOP-11618
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.6.0
Reporter: Gera Shegalov
Assignee: Brahma Reddy Battula
 Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
 HADOOP-11618.patch


 DelegateToFileSystem constructor has the following code:
 {code}
 super(theUri, supportedScheme, authorityRequired,
 FileSystem.getDefaultUri(conf).getPort());
 {code}
 The default port should be taken from theFsImpl instead.
 {code}
 super(theUri, supportedScheme, authorityRequired,
 theFsImpl.getDefaultPort());
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened HADOOP-11668:
---
  Assignee: Allen Wittenauer  (was: Vinayakumar B)

Re-opening.  The problem here isn't start/stop, it's *-daemons.sh, which are 
now broken.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Attachment: HADOOP-11668-02.patch

-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were preserving quotes around parameters that contained $IFS due to lack 
of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347709#comment-14347709
 ] 

Allen Wittenauer edited comment on HADOOP-11668 at 3/4/15 10:40 PM:


-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were not preserving quotes around parameters that contained $IFS due to 
lack of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.


was (Author: aw):
-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were preserving quotes around parameters that contained $IFS due to lack 
of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Status: Patch Available  (was: Reopened)

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11103) Clean up RemoteException

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11103:
--
Status: Patch Available  (was: Open)

 Clean up RemoteException
 

 Key: HADOOP-11103
 URL: https://issues.apache.org/jira/browse/HADOOP-11103
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Trivial
 Attachments: HADOOP-11103.1.patch


 RemoteException has a number of undocumented behaviors
 * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
 source, the String returned is the classname of the wrapped remote exception.
 * RemoteException(String, String) is equivalent to calling 
 RemoteException(String, String, null)
 * Constructors allow null for all arguments
 * Some of the test code doesn't check for correct error codes to correspond 
 with the wrapped exception type
 * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347783#comment-14347783
 ] 

Hadoop QA commented on HADOOP-11668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702627/HADOOP-11668-02.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//console

This message is automatically generated.

 start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
 option
 ---

 Key: HADOOP-11668
 URL: https://issues.apache.org/jira/browse/HADOOP-11668
 Project: Hadoop Common
  Issue Type: Bug
  Components: scripts
Reporter: Vinayakumar B
Assignee: Allen Wittenauer
 Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch


 After introduction of --slaves option for the scripts, start-dfs.sh and 
 stop-dfs.sh will no longer work in HA mode.
 This is due to multiple hostnames passed for '--hostnames' delimited with 
 space.
 These hostnames are treated as commands and script fails.
 So, instead of delimiting with space, delimiting with comma(,) before passing 
 to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a

2015-03-04 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11670:

Affects Version/s: (was: 2.6.0)
   2.7.0

 Fix IAM instance profile auth for s3a
 -

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
 S3Credentials class to read the value of these two params (this change is 
 unrelated to resolving HADOOP-11446). The change in question is presented 
 below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),
 new AnonymousAWSCredentialsProvider()
 );
 {code}
 As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
 S3Credentials class are now used to provide constructor arguments to 
 BasicAWSCredentialsProvider. These methods will raise an exception if the 
 fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
 respectively. If a user is relying on an IAM instance profile to authenticate 
 to an S3 bucket and therefore doesn't supply values for these params, they 
 will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

2015-03-04 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347801#comment-14347801
 ] 

Steve Loughran commented on HADOOP-11670:
-

looks more like HADOOP-10714 was the change that did this

 Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
 --

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
 S3Credentials class to read the value of these two params (this change is 
 unrelated to resolving HADOOP-11446). The change in question is presented 
 below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),
 new AnonymousAWSCredentialsProvider()
 );
 {code}
 As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
 S3Credentials class are now used to provide constructor arguments to 
 BasicAWSCredentialsProvider. These methods will raise an exception if the 
 fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
 respectively. If a user is relying on an IAM instance profile to authenticate 
 to an S3 bucket and therefore doesn't supply values for these params, they 
 will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348010#comment-14348010
 ] 

Colin Patrick McCabe commented on HADOOP-11638:
---

Can you add an {{#else}} clause that has an {{#error}}?  +1 after that is done.

thanks.

 Linux-specific gettid() used in OpensslSecureRandom.c
 -

 Key: HADOOP-11638
 URL: https://issues.apache.org/jira/browse/HADOOP-11638
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.6.0
Reporter: Dmitry Sivachenko
Assignee: Kiran Kumar M R
  Labels: freebsd
 Attachments: HADOOP-11638-001.patch


 In OpensslSecureRandom.c you use Linux-specific syscall gettid():
 static unsigned long pthreads_thread_id(void)
 {
 return (unsigned long)syscall(SYS_gettid);
 }
 Man page says:
 gettid()  is Linux-specific and should not be used in programs that are
 intended to be portable.
 This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11460) Deprecate shell vars

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11460:
--
Release Note: 
The following shell environment variables have been deprecated:

| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |

  was:
The following shell environment variables have been deprecated:
| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |


 Deprecate shell vars
 

 Key: HADOOP-11460
 URL: https://issues.apache.org/jira/browse/HADOOP-11460
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: John Smith
  Labels: scripts, shell
 Fix For: 3.0.0

 Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, 
 HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch


 It is a very common shell pattern in 3.x to effectively replace sub-project 
 specific vars with generics.  We should have a function that does this 
 replacement and provides a warning to the end user that the old shell var is 
 deprecated.  Additionally, we should use this shell function to deprecate the 
 shell vars that are holdovers already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348013#comment-14348013
 ] 

Colin Patrick McCabe commented on HADOOP-11660:
---

OK.  Thanks, Edward.

 Add support for hardware crc on ARM aarch64 architecture
 

 Key: HADOOP-11660
 URL: https://issues.apache.org/jira/browse/HADOOP-11660
 Project: Hadoop Common
  Issue Type: Improvement
  Components: native
Affects Versions: 3.0.0
 Environment: ARM aarch64 development platform
Reporter: Edward Nevill
Assignee: Edward Nevill
Priority: Minor
  Labels: performance
   Original Estimate: 48h
  Remaining Estimate: 48h

 This patch adds support for hardware crc for ARM's new 64 bit architecture
 The patch is completely conditionalized on __aarch64__
 I have only added support for the non pipelined version as I benchmarked the 
 pipelined version on aarch64 and it showed no performance improvement.
 The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
 are supported on ARM aarch64 hardwre.
 To benchmark this I modified the test_bulk_crc32 test to print out the time 
 taken to CRC a 1MB dataset 1000 times.
 Before:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
 After:
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
 So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348057#comment-14348057
 ] 

Hadoop QA commented on HADOOP-10027:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702397/HADOOP-10027.3.patch
  against trunk revision ded0200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common:

org.apache.hadoop.io.compress.TestCodec

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//console

This message is automatically generated.

 *Compressor_deflateBytesDirect passes instance instead of jclass to 
 GetStaticObjectField
 

 Key: HADOOP-10027
 URL: https://issues.apache.org/jira/browse/HADOOP-10027
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Reporter: Eric Abbott
Assignee: Hui Zheng
Priority: Minor
 Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, 
 HADOOP-10027.3.patch


 http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup
 This pattern appears in all the native compressors.
 // Get members of ZlibCompressor
 jobject clazz = (*env)-GetStaticObjectField(env, this,
  ZlibCompressor_clazz);
 The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a 
 jobject. Adding the JVM param -Xcheck:jni will cause FATAL ERROR in native 
 method: JNI received a class argument that is not a class and a core dump 
 such as the following.
 (gdb) 
 #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6
 #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6
 #2 0x7f02e45bd727 in os::abort(bool) () from 
 /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, 
 bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from 
 /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #5 0x7f02d38eaf79 in 
 Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () 
 from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
 In addition, that clazz object is only used for synchronization. In the case 
 of the native method _deflateBytesDirect, the result is a class wide lock 
 used to access the instance field uncompressed_direct_buf. Perhaps using the 
 instance as the sync point is more appropriate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField

2015-03-04 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348006#comment-14348006
 ] 

Colin Patrick McCabe commented on HADOOP-10027:
---

Not sure what the issue was here.  It looks kind of like a jenkins problem?  
Not sure.
{code}
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hadoop-auth ---
FATAL: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy57.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
{code}

I will retrigger.

 *Compressor_deflateBytesDirect passes instance instead of jclass to 
 GetStaticObjectField
 

 Key: HADOOP-10027
 URL: https://issues.apache.org/jira/browse/HADOOP-10027
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Reporter: Eric Abbott
Assignee: Hui Zheng
Priority: Minor
 Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, 
 HADOOP-10027.3.patch


 http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup
 This pattern appears in all the native compressors.
 // Get members of ZlibCompressor
 jobject clazz = (*env)-GetStaticObjectField(env, this,
  ZlibCompressor_clazz);
 The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a 
 jobject. Adding the JVM param -Xcheck:jni will cause FATAL ERROR in native 
 method: JNI received a class argument that is not a class and a core dump 
 such as the following.
 (gdb) 
 #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6
 #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6
 #2 0x7f02e45bd727 in os::abort(bool) () from 
 /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, 
 bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from 
 /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
 #5 0x7f02d38eaf79 in 
 Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () 
 from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
 In addition, that clazz object is only used for synchronization. In the case 
 of the native method _deflateBytesDirect, the result is a class wide lock 
 used to access the instance field uncompressed_direct_buf. Perhaps using the 
 instance as the sync point is more appropriate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Moved] (HADOOP-11671) Asynchronous native RPC v9 client

2015-03-04 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai moved HDFS-7887 to HADOOP-11671:
---

Key: HADOOP-11671  (was: HDFS-7887)
Project: Hadoop Common  (was: Hadoop HDFS)

 Asynchronous native RPC v9 client
 -

 Key: HADOOP-11671
 URL: https://issues.apache.org/jira/browse/HADOOP-11671
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Haohui Mai

 There are more and more integration happening between Hadoop and applications 
 that are implemented using languages other than Java.
 To access Hadoop, applications either have to go through JNI (e.g. libhdfs), 
 or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). 
 Unfortunately, neither of them are satisfactory:
 * Integrating with JNI requires running a JVM inside the application. Some 
 applications (e.g., real-time processing, MPP database) does not want the 
 footprints and GC behavior of the JVM.
 * The Hadoop RPC protocol has a rich feature set including wire encryption, 
 SASL, Kerberos authentication. Many 3rd-party implementations can fully cover 
 the feature sets thus they might work in limited environment.
 This jira is to propose implementing an Hadoop RPC library in C++ that 
 provides a common ground to implement higher-level native client for HDFS, 
 YARN, and MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

2015-03-04 Thread Adam Budde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Budde updated HADOOP-11670:

Description: 
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). The change in question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.

  was:
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.


 Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
 --

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
 S3Credentials class to read the value of these two params (this change is 
 unrelated to resolving HADOOP-11446). The change in question is presented 
 below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),

[jira] [Updated] (HADOOP-9902) Shell script rewrite

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-9902:
-
Release Note: 
The Hadoop shell scripts have been rewritten to fix many long standing bugs and 
include some new features.  While an eye has been kept towards compatibility, 
some changes may break existing installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the 
appropriate ${HADOOP_IDENT_STR}.  This should allow, with proper configurations 
in place, for multiple versions of the same secure daemon to run on a host. 
Additionally, pid files are now created when daemons are run in interactive 
mode.  This will also prevent the accidental starting of two daemons with the 
same configuration prior to launching java (i.e., fast fail without having to 
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows 
for all of the environment variables to be in one location.  This was not the 
case previously.
* The default content of *-env.sh has been significantly altered, with the 
majority of defaults moved into more protected areas inside the code. 
Additionally, these files do not auto-append anymore; setting a variable on the 
command line prior to calling a shell command must contain the entire content, 
not just any extra settings.  This brings Hadoop more in-line with the vast 
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to 
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', 
and related commands are executed. Previously, these were separated out which 
meant a significant amount of duplication of common settings.  
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec 
and sbin.  The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been 
removed.  These settings are now configurable in the *-env.sh files via *_OPT. 
* Some formerly 'documented' entries in yarn-env.sh have been undocumented as a 
simple form of deprecration in order to greatly simplify configuration and 
reduce unnecessary duplication.  They still work, but those variables will 
likely be removed in a future release.
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take 
advantage of better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or 
${HADOOP_PREFIX} to find the necessary binaries.  (See other note regarding 
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be 
ignored and stripped from their respective environment settings.
* cygwin support has been removed.

NEW FEATURES:

* Daemonization has been moved from *-daemon.sh to the bin commands via the 
--daemon option. Simply use --daemon start to start a daemon, --daemon stop to 
stop a daemon, and --daemon status to set $? to the daemon's status.  The 
return code for status is LSB-compatible.  For example, 'hdfs --daemon start 
namenode'.
* It is now possible to override some of the shell code capabilities to provide 
site specific functionality without replacing the shipped versions.  
Replacement functions should go into the new hadoop-user-functions.sh file.
* A new option called --buildpaths will attempt to add developer build 
directories to the classpath to allow for in source tree testing.
* Operations which trigger ssh connections can now use pdsh if installed.  
${HADOOP_SSH_OPTS} still gets applied. 
* Added distch and jnipath subcommands to the hadoop command.
* Shell scripts now support a --debug option which will report basic 
information on the construction of various environment variables, java options, 
classpath, etc. to help in configuration debugging.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring 
symlinking and other such tricks.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided 
hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path, 
without ${HADOOP_PREFIX} being defined, and as the target of bash -x for 
debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined 
based upon the current location of the shell library.  Note that other parts of 
the extended Hadoop ecosystem may still require this environment variable to be 
configured.
* Operations which trigger ssh will now limit the number of 

[jira] [Updated] (HADOOP-11460) Deprecate shell vars

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11460:
--
Release Note: 
The following shell environment variables have been deprecated:
| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |

  was:
The following shell environment variables have been deprecated:
|| Old || New ||
|  |  |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |


 Deprecate shell vars
 

 Key: HADOOP-11460
 URL: https://issues.apache.org/jira/browse/HADOOP-11460
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 3.0.0
Reporter: Allen Wittenauer
Assignee: John Smith
  Labels: scripts, shell
 Fix For: 3.0.0

 Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, 
 HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch


 It is a very common shell pattern in 3.x to effectively replace sub-project 
 specific vars with generics.  We should have a function that does this 
 replacement and provides a warning to the end user that the old shell var is 
 deprecated.  Additionally, we should use this shell function to deprecate the 
 shell vars that are holdovers already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11103) Clean up RemoteException

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11103:
--
Status: Open  (was: Patch Available)

 Clean up RemoteException
 

 Key: HADOOP-11103
 URL: https://issues.apache.org/jira/browse/HADOOP-11103
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Trivial
 Attachments: HADOOP-11103.1.patch


 RemoteException has a number of undocumented behaviors
 * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
 source, the String returned is the classname of the wrapped remote exception.
 * RemoteException(String, String) is equivalent to calling 
 RemoteException(String, String, null)
 * Constructors allow null for all arguments
 * Some of the test code doesn't check for correct error codes to correspond 
 with the wrapped exception type
 * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

2015-03-04 Thread Adam Budde (JIRA)
Adam Budde created HADOOP-11670:
---

 Summary: Fix IAM instance profile auth for s3a (broken in 
HADOOP-11446)
 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11671) Asynchronous native RPC v9 client

2015-03-04 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348018#comment-14348018
 ] 

Haohui Mai commented on HADOOP-11671:
-

bq. Is this really a good, long term strategy given our use of protobuf now 
that gRPC exists?

The Hadoop RPC library allows more native applications to be integrated with 
Hadoop and to benefit the ecosystem. Once Hadoop have switched to GRPC, we can 
transform this library as a shim library to GRPC, or retire it. :-)

 Asynchronous native RPC v9 client
 -

 Key: HADOOP-11671
 URL: https://issues.apache.org/jira/browse/HADOOP-11671
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Haohui Mai

 There are more and more integration happening between Hadoop and applications 
 that are implemented using languages other than Java.
 To access Hadoop, applications either have to go through JNI (e.g. libhdfs), 
 or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). 
 Unfortunately, neither of them are satisfactory:
 * Integrating with JNI requires running a JVM inside the application. Some 
 applications (e.g., real-time processing, MPP database) does not want the 
 footprints and GC behavior of the JVM.
 * The Hadoop RPC protocol has a rich feature set including wire encryption, 
 SASL, Kerberos authentication. Many 3rd-party implementations can fully cover 
 the feature sets thus they might work in limited environment.
 This jira is to propose implementing an Hadoop RPC library in C++ that 
 provides a common ground to implement higher-level native client for HDFS, 
 YARN, and MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a

2015-03-04 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11670:

Summary: Fix IAM instance profile auth for s3a  (was: Fix IAM instance 
profile auth for s3a (broken in HADOOP-11446))

 Fix IAM instance profile auth for s3a
 -

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
 S3Credentials class to read the value of these two params (this change is 
 unrelated to resolving HADOOP-11446). The change in question is presented 
 below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),
 new AnonymousAWSCredentialsProvider()
 );
 {code}
 As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
 S3Credentials class are now used to provide constructor arguments to 
 BasicAWSCredentialsProvider. These methods will raise an exception if the 
 fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
 respectively. If a user is relying on an IAM instance profile to authenticate 
 to an S3 bucket and therefore doesn't supply values for these params, they 
 will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11103) Clean up RemoteException

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347901#comment-14347901
 ] 

Hadoop QA commented on HADOOP-11103:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669533/HADOOP-11103.1.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//console

This message is automatically generated.

 Clean up RemoteException
 

 Key: HADOOP-11103
 URL: https://issues.apache.org/jira/browse/HADOOP-11103
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Trivial
 Attachments: HADOOP-11103.1.patch


 RemoteException has a number of undocumented behaviors
 * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
 source, the String returned is the classname of the wrapped remote exception.
 * RemoteException(String, String) is equivalent to calling 
 RemoteException(String, String, null)
 * Constructors allow null for all arguments
 * Some of the test code doesn't check for correct error codes to correspond 
 with the wrapped exception type
 * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return

2015-03-04 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HADOOP-11673:
--

 Summary: Use org.junit.Assume to skip tests instead of return
 Key: HADOOP-11673
 URL: https://issues.apache.org/jira/browse/HADOOP-11673
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Akira AJISAKA
Priority: Minor


We see the following code many times:
{code:title=TestCodec.java}
if (!ZlibFactory.isNativeZlibLoaded(conf)) {
  LOG.warn(skipped: native libs not loaded);
  return;
}
{code}
If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
with a warn log. I'd like to *skip* this test case by using 
{{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a

2015-03-04 Thread Adam Budde (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Budde updated HADOOP-11670:

Description: 
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
S3Credentials class to read the value of these two params. The change in 
question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.

  was:
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). The change in question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.


 Fix IAM instance profile auth for s3a
 -

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
 S3Credentials class to read the value of these two params. The change in 
 question is presented below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),
 new AnonymousAWSCredentialsProvider()
 );
 {code}
 As you can see, the getAccessKey() and getSecretAccessKey() 

[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a

2015-03-04 Thread Adam Budde (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347806#comment-14347806
 ] 

Adam Budde commented on HADOOP-11670:
-

My mistake-- looks like you're correct. I've updated the description.

 Fix IAM instance profile auth for s3a
 -

 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Adam Budde
 Fix For: 2.7.0


 One big advantage provided by the s3a filesystem is the ability to use an IAM 
 instance profile in order to authenticate when attempting to access an S3 
 bucket from an EC2 instance. This eliminates the need to deploy AWS account 
 credentials to the instance or to provide them to Hadoop via the 
 fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
 The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
 S3Credentials class to read the value of these two params. The change in 
 question is presented below:
 S3AFileSystem.java, lines 161-170:
 {code}
 // Try to get our credentials or just connect anonymously
 S3Credentials s3Credentials = new S3Credentials();
 s3Credentials.initialize(name, conf);
 AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
 new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
 s3Credentials.getSecretAccessKey()),
 new InstanceProfileCredentialsProvider(),
 new AnonymousAWSCredentialsProvider()
 );
 {code}
 As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
 S3Credentials class are now used to provide constructor arguments to 
 BasicAWSCredentialsProvider. These methods will raise an exception if the 
 fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
 respectively. If a user is relying on an IAM instance profile to authenticate 
 to an S3 bucket and therefore doesn't supply values for these params, they 
 will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11672) test

2015-03-04 Thread xiangqian.xu (JIRA)
xiangqian.xu created HADOOP-11672:
-

 Summary: test
 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11672) test

2015-03-04 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348165#comment-14348165
 ] 

Brahma Reddy Battula commented on HADOOP-11672:
---

FYKI Please go through the following link to contributions ..

http://wiki.apache.org/hadoop/HowToContribute

 test
 

 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code

2015-03-04 Thread Ayappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
 Environment: PowerPC Big Endian  other Big Endian platforms
Target Version/s: 2.7.0

 Provide and unify cross platform byteorder support in native code
 -

 Key: HADOOP-11665
 URL: https://issues.apache.org/jira/browse/HADOOP-11665
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.4.1, 2.6.0
 Environment: PowerPC Big Endian  other Big Endian platforms
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HADOOP-11665.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

Summary: Set DomainSocketWatcher thread name explicitly  (was: set 
DomainSocketWatcher thread name explicitly)

 Set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk and branch-2. Thanks Liang for your contribution and 
thanks Colin Patrick McCabe for your review.

 Set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 2.7.0

 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Kiran Kumar M R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Kumar M R updated HADOOP-11638:
-
Attachment: HADOOP-11638-002.patch

 Linux-specific gettid() used in OpensslSecureRandom.c
 -

 Key: HADOOP-11638
 URL: https://issues.apache.org/jira/browse/HADOOP-11638
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.6.0
Reporter: Dmitry Sivachenko
Assignee: Kiran Kumar M R
  Labels: freebsd
 Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch


 In OpensslSecureRandom.c you use Linux-specific syscall gettid():
 static unsigned long pthreads_thread_id(void)
 {
 return (unsigned long)syscall(SYS_gettid);
 }
 Man page says:
 gettid()  is Linux-specific and should not be used in programs that are
 intended to be portable.
 This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-9902) Shell script rewrite

2015-03-04 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-9902:
-
Release Note: 
The Hadoop shell scripts have been rewritten to fix many long standing bugs and 
include some new features.  While an eye has been kept towards compatibility, 
some changes may break existing installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the 
appropriate ${HADOOP_IDENT_STR}.  This should allow, with proper configurations 
in place, for multiple versions of the same secure daemon to run on a host. 
Additionally, pid files are now created when daemons are run in interactive 
mode.  This will also prevent the accidental starting of two daemons with the 
same configuration prior to launching java (i.e., fast fail without having to 
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows 
for all of the environment variables to be in one location.  This was not the 
case previously.
* The default content of *-env.sh has been significantly altered, with the 
majority of defaults moved into more protected areas inside the code. 
Additionally, these files do not auto-append anymore; setting a variable on the 
command line prior to calling a shell command must contain the entire content, 
not just any extra settings.  This brings Hadoop more in-line with the vast 
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to 
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', 
and related commands are executed. Previously, these were separated out which 
meant a significant amount of duplication of common settings.  
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec 
and sbin.  The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been 
removed.  These settings are now configurable in the *-env.sh files via *_OPT. 
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take 
advantage of better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or 
${HADOOP_PREFIX} to find the necessary binaries.  (See other note regarding 
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be 
ignored and stripped from their respective environment settings.

NEW FEATURES:

* Daemonization has been moved from *-daemon.sh to the bin commands via the 
--daemon option. Simply use --daemon start to start a daemon, --daemon stop to 
stop a daemon, and --daemon status to set $? to the daemon's status.  The 
return code for status is LSB-compatible.  For example, 'hdfs --daemon start 
namenode'.
* It is now possible to override some of the shell code capabilities to provide 
site specific functionality without replacing the shipped versions.  
Replacement functions should go into the new hadoop-user-functions.sh file.
* A new option called --buildpaths will attempt to add developer build 
directories to the classpath to allow for in source tree testing.
* Operations which trigger ssh connections can now use pdsh if installed.  
${HADOOP_SSH_OPTS} still gets applied. 
* Added distch and jnipath subcommands to the hadoop command.
* Shell scripts now support a --debug option which will report basic 
information on the construction of various environment variables, java options, 
classpath, etc. to help in configuration debugging.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring 
symlinking and other such tricks.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided 
hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path, 
without ${HADOOP_PREFIX} being defined, and as the target of bash -x for 
debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined 
based upon the current location of the shell library.  Note that other parts of 
the extended Hadoop ecosystem may still require this environment variable to be 
configured.
* Operations which trigger ssh will now limit the number of connections to run 
in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. 
 By default, this is set to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Some subcommands were not listed in the usage.
* Various options on hadoop command lines were supported 

[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

2015-03-04 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11675:
---
Status: Patch Available  (was: Open)

 tiny exception log with checking storedBlock is null or not
 ---

 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HADOOP-11675-001.txt


 Found this log at our product cluster:
 {code}
 2015-03-05,10:33:31,778 ERROR 
 org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: 
 Compaction failed 
 regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
  storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 
 M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
 java.io.IOException: 
 BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
 exist or is not under Constructionnull
 {code}
 let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11674 started by Sean Busbey.

 data corruption for parallel CryptoInputStream and CryptoOutputStream
 -

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical

 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11589) NetUtils.createSocketAddr should trim the input URI

2015-03-04 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11589:

Component/s: net

 NetUtils.createSocketAddr should trim the input URI
 ---

 Key: HADOOP-11589
 URL: https://issues.apache.org/jira/browse/HADOOP-11589
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Akira AJISAKA
Assignee: Rakesh R
Priority: Minor
  Labels: newbie
 Fix For: 2.7.0

 Attachments: HADOOP-11589-1.patch, HADOOP-11589-2.patch


 NetUtils.createSocketAddr does not trim the input URI, should be trimmed.
 HDFS-7684 and HADOOP-9869 are trying to trim some URIs to be passed to the 
 method, however, not all of the inputs have been trimmed already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348197#comment-14348197
 ] 

Li Bo commented on HADOOP-11643:


hi, Kai
I think the code is ok in general.
One point:
When catching a {{NumberFormatException}}, {{IllegalArgumentException}} with 
message {{No codec option is provided}} is thrown. I think the codec option 
is provided but not in correct integer format, so how about changing the 
message like Option XXX is an integer, please provide the correct format.

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11643_v1.patch, HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348217#comment-14348217
 ] 

Hadoop QA commented on HADOOP-11648:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702060/HADOOP-11648-003.txt
  against trunk revision ded0200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//console

This message is automatically generated.

 set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10846) DataChecksum#calculateChunkedSums not working for PPC when buffers not backed by array

2015-03-04 Thread Ayappan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348236#comment-14348236
 ] 

Ayappan commented on HADOOP-10846:
--

A new jira ( HADOOP-11665 ) has been opened to fix this issue in a more 
standard way.

 DataChecksum#calculateChunkedSums not working for PPC when buffers not backed 
 by array
 --

 Key: HADOOP-10846
 URL: https://issues.apache.org/jira/browse/HADOOP-10846
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.4.1, 2.5.2
 Environment: PowerPC platform
Reporter: Jinghui Wang
Assignee: Ayappan
 Attachments: HADOOP-10846-v1.patch, HADOOP-10846-v2.patch, 
 HADOOP-10846-v3.patch, HADOOP-10846-v4.patch, HADOOP-10846.patch


 Got the following exception when running Hadoop on Power PC. The 
 implementation for computing checksum when the data buffer and checksum 
 buffer are not backed by arrays.
 13/09/16 04:06:57 ERROR security.UserGroupInformation: 
 PriviledgedActionException as:biadmin (auth:SIMPLE) 
 cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
 org.apache.hadoop.fs.ChecksumException: Checksum error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348260#comment-14348260
 ] 

Li Bo commented on HADOOP-11643:


Patch v3 reviewed.

+1

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-11643:
---
Fix Version/s: HDFS-7285

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng resolved HADOOP-11643.

  Resolution: Fixed
Target Version/s: HDFS-7285
Hadoop Flags: Reviewed

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Fix For: HDFS-7285

 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348279#comment-14348279
 ] 

Kai Zheng commented on HADOOP-11643:


Thanks [~libo-intel]. I committed this to branch HDFS-7285.

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Tsuyoshi Ozawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

Target Version/s: 2.7.0
Hadoop Flags: Reviewed

 Set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348152#comment-14348152
 ] 

Hadoop QA commented on HADOOP-11638:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702696/HADOOP-11638-002.patch
  against trunk revision 8d88691.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//console

This message is automatically generated.

 Linux-specific gettid() used in OpensslSecureRandom.c
 -

 Key: HADOOP-11638
 URL: https://issues.apache.org/jira/browse/HADOOP-11638
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.6.0
Reporter: Dmitry Sivachenko
Assignee: Kiran Kumar M R
  Labels: freebsd
 Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch


 In OpensslSecureRandom.c you use Linux-specific syscall gettid():
 static unsigned long pthreads_thread_id(void)
 {
 return (unsigned long)syscall(SYS_gettid);
 }
 Man page says:
 gettid()  is Linux-specific and should not be used in programs that are
 intended to be portable.
 This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348229#comment-14348229
 ] 

Hadoop QA commented on HADOOP-11674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702711/HADOOP-11674.1.patch
  against trunk revision 8d88691.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//console

This message is automatically generated.

 data corruption for parallel CryptoInputStream and CryptoOutputStream
 -

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348250#comment-14348250
 ] 

Kai Zheng commented on HADOOP-11643:


Thanks [~libo-intel] for your review and the good catch. It's updated, would 
you review again ? Thanks.

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Kai Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-11643:
---
Attachment: HADOOP-11643-v3.patch

Change summary:
1. Fixed the issue found by Bo;
2. Also added a test.
3. Override toString() method to support dump.

 Define EC schema API for ErasureCodec
 -

 Key: HADOOP-11643
 URL: https://issues.apache.org/jira/browse/HADOOP-11643
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: io
Reporter: Kai Zheng
Assignee: Kai Zheng
 Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
 HADOOP-11643_v2.patch


 As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
 will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

Summary: oneByteBuf in CryptoInputStream and CryptoOutputStream should be 
non static  (was: data corruption for parallel CryptoInputStream and 
CryptoOutputStream)

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 3.0.0, 2.7.0, 2.6.1)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks [~busbey] for the contribution.

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348325#comment-14348325
 ] 

Hudson commented on HADOOP-11674:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7266 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7266/])
HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream should be 
non static. (Sean Busbey via yliu) (yliu: rev 
5e9b8144d54f586803212a0bdd8b1c25bdbb1e97)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoInputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoOutputStream.java
* hadoop-common-project/hadoop-common/CHANGES.txt


 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Fix For: 2.7.0

 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11602) Fix toUpperCase/toLowerCase to use Locale.ENGLISH

2015-03-04 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347927#comment-14347927
 ] 

Akira AJISAKA commented on HADOOP-11602:


Thanks [~ozawa]! Looks good to me but the patch needs rebasing.

 Fix toUpperCase/toLowerCase to use Locale.ENGLISH
 -

 Key: HADOOP-11602
 URL: https://issues.apache.org/jira/browse/HADOOP-11602
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Tsuyoshi Ozawa
Assignee: Tsuyoshi Ozawa
 Attachments: HADOOP-11602-001.patch, HADOOP-11602-002.patch, 
 HADOOP-11602-003.patch, HADOOP-11602-004.patch, 
 HADOOP-11602-branch-2.001.patch, HADOOP-11602-branch-2.002.patch, 
 HADOOP-11602-branch-2.003.patch


 String#toLowerCase()/toUpperCase() without a locale argument can occur 
 unexpected behavior based on the locale. It's written in 
 [Javadoc|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#toLowerCase()]:
 {quote}
 For instance, TITLE.toLowerCase() in a Turkish locale returns t\u0131tle, 
 where '\u0131' is the LATIN SMALL LETTER DOTLESS I character
 {quote}
 This issue is derived from HADOOP-10101.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11103) Clean up RemoteException

2015-03-04 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348115#comment-14348115
 ] 

Sean Busbey commented on HADOOP-11103:
--

TestFileTruncate passes locally.

 Clean up RemoteException
 

 Key: HADOOP-11103
 URL: https://issues.apache.org/jira/browse/HADOOP-11103
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Trivial
 Attachments: HADOOP-11103.1.patch


 RemoteException has a number of undocumented behaviors
 * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
 source, the String returned is the classname of the wrapped remote exception.
 * RemoteException(String, String) is equivalent to calling 
 RemoteException(String, String, null)
 * Constructors allow null for all arguments
 * Some of the test code doesn't check for correct error codes to correspond 
 with the wrapped exception type
 * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)
Sean Busbey created HADOOP-11674:


 Summary: data corruption for parallel CryptoInputStream and 
CryptoOutputStream
 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical


A common optimization in the io classes for Input/Output Streams is to save a 
single length-1 byte array to use in single byte read/write calls.

CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
but mistakenly mark the array as static. That means that only a single instance 
of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348315#comment-14348315
 ] 

Yi Liu commented on HADOOP-11674:
-

+1, {{oneByteBuf}} should be non-static, otherwise there may be issue for 
{{read()}} in multi threads.

 oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
 ---

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned HADOOP-11673:
-

Assignee: Brahma Reddy Battula

 Use org.junit.Assume to skip tests instead of return
 

 Key: HADOOP-11673
 URL: https://issues.apache.org/jira/browse/HADOOP-11673
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Akira AJISAKA
Assignee: Brahma Reddy Battula
Priority: Minor

 We see the following code many times:
 {code:title=TestCodec.java}
 if (!ZlibFactory.isNativeZlibLoaded(conf)) {
   LOG.warn(skipped: native libs not loaded);
   return;
 }
 {code}
 If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
 with a warn log. I'd like to *skip* this test case by using 
 {{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HADOOP-11674:
-
Status: Patch Available  (was: In Progress)

 data corruption for parallel CryptoInputStream and CryptoOutputStream
 -

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11672) test

2015-03-04 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HADOOP-11672.
---
Resolution: Not a Problem

 test
 

 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

2015-03-04 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HADOOP-11674:
-
Attachment: HADOOP-11674.1.patch

 data corruption for parallel CryptoInputStream and CryptoOutputStream
 -

 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical
 Attachments: HADOOP-11674.1.patch


 A common optimization in the io classes for Input/Output Streams is to save a 
 single length-1 byte array to use in single byte read/write calls.
 CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
 but mistakenly mark the array as static. That means that only a single 
 instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

2015-03-04 Thread Liang Xie (JIRA)
Liang Xie created HADOOP-11675:
--

 Summary: tiny exception log with checking storedBlock is null or 
not
 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor


Found this log at our product cluster:
{code}
2015-03-05,10:33:31,778 ERROR 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction 
failed 
regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
 storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 
24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
java.io.IOException: 
BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
exist or is not under Constructionnull
{code}

let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code

2015-03-04 Thread Ayappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
Affects Version/s: 2.4.1
   2.6.0

 Provide and unify cross platform byteorder support in native code
 -

 Key: HADOOP-11665
 URL: https://issues.apache.org/jira/browse/HADOOP-11665
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.4.1, 2.6.0
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HADOOP-11665.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code

2015-03-04 Thread Ayappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
Component/s: util

 Provide and unify cross platform byteorder support in native code
 -

 Key: HADOOP-11665
 URL: https://issues.apache.org/jira/browse/HADOOP-11665
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.4.1, 2.6.0
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: HADOOP-11665.001.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Kiran Kumar M R (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348097#comment-14348097
 ] 

Kiran Kumar M R commented on HADOOP-11638:
--

Thanks for review Colin. Added new patch as per comments.
[~trtrmitya], you can compile on FreeBSD and confirm if patch is working. 

 Linux-specific gettid() used in OpensslSecureRandom.c
 -

 Key: HADOOP-11638
 URL: https://issues.apache.org/jira/browse/HADOOP-11638
 Project: Hadoop Common
  Issue Type: Bug
  Components: native
Affects Versions: 2.6.0
Reporter: Dmitry Sivachenko
Assignee: Kiran Kumar M R
  Labels: freebsd
 Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch


 In OpensslSecureRandom.c you use Linux-specific syscall gettid():
 static unsigned long pthreads_thread_id(void)
 {
 return (unsigned long)syscall(SYS_gettid);
 }
 Man page says:
 gettid()  is Linux-specific and should not be used in programs that are
 intended to be portable.
 This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11648:
---
Attachment: HADOOP-11648-003.txt

trying to re-trigger the QA

 set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11510) Expose truncate API via FileContext

2015-03-04 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11510:

Component/s: fs

 Expose truncate API via FileContext
 ---

 Key: HADOOP-11510
 URL: https://issues.apache.org/jira/browse/HADOOP-11510
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: 2.7.0

 Attachments: HADOOP-11510.001.patch, HADOOP-11510.002.patch, 
 HADOOP-11510.003.patch


 We also need to expose truncate API via {{org.apache.hadoop.fs.FileContext}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11648:
---
Attachment: (was: HADOOP-11648-003.txt)

 set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

2015-03-04 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11675:
---
Attachment: HADOOP-11675-001.txt

a very simple fix, so no testing be added.

 tiny exception log with checking storedBlock is null or not
 ---

 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HADOOP-11675-001.txt


 Found this log at our product cluster:
 {code}
 2015-03-05,10:33:31,778 ERROR 
 org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: 
 Compaction failed 
 regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
  storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 
 M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
 java.io.IOException: 
 BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
 exist or is not under Constructionnull
 {code}
 let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348265#comment-14348265
 ] 

Tsuyoshi Ozawa commented on HADOOP-11648:
-

+1, committing this shortly.

 set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly

2015-03-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348301#comment-14348301
 ] 

Hudson commented on HADOOP-11648:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7265/])
HADOOP-11648. Set DomainSocketWatcher thread name explicitly. Contributed by 
Liang Xie. (ozawa: rev 74a4754d1c790b8740a4221f276aa571bc5dbfd5)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShmManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java


 Set DomainSocketWatcher thread name explicitly
 --

 Key: HADOOP-11648
 URL: https://issues.apache.org/jira/browse/HADOOP-11648
 Project: Hadoop Common
  Issue Type: Improvement
  Components: net
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 2.7.0

 Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
 HADOOP-11648-003.txt


 while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
 name is not set explicitly, e.g. in our cluster, the format is like: 
 Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
 Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
 found. I think it'd better to set the thread name, so we can debug issue 
 easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >