date:20150304


[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348325#comment-14348325
 ] 

Hudson commented on HADOOP-11674:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7266 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7266/])
HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream should be 
non static. (Sean Busbey via yliu) (yliu: rev 
5e9b8144d54f586803212a0bdd8b1c25bdbb1e97)
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoInputStream.java
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoOutputStream.java
* hadoop-common-project/hadoop-common/CHANGES.txt


> oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
> ---
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0  (was: 3.0.0, 2.7.0, 2.6.1)
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2. Thanks [~busbey] for the contribution.

> oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
> ---
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Fix For: 2.7.0
>
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348315#comment-14348315
 ] 

Yi Liu commented on HADOOP-11674:
-

+1, {{oneByteBuf}} should be non-static, otherwise there may be issue for 
{{read()}} in multi threads.

> oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
> ---
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static

2015-03-04 Thread Yi Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yi Liu updated HADOOP-11674:

Summary: oneByteBuf in CryptoInputStream and CryptoOutputStream should be 
non static  (was: data corruption for parallel CryptoInputStream and 
CryptoOutputStream)

> oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
> ---
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly


[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348301#comment-14348301
 ] 

Hudson commented on HADOOP-11648:
-

FAILURE: Integrated in Hadoop-trunk-Commit #7265 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/7265/])
HADOOP-11648. Set DomainSocketWatcher thread name explicitly. Contributed by 
Liang Xie. (ozawa: rev 74a4754d1c790b8740a4221f276aa571bc5dbfd5)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java
* 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShmManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java


> Set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Fix For: 2.7.0
>
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly


 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

   Resolution: Fixed
Fix Version/s: 2.7.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk and branch-2. Thanks Liang for your contribution and 
thanks Colin Patrick McCabe for your review.

> Set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Fix For: 2.7.0
>
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly


 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

Target Version/s: 2.7.0
Hadoop Flags: Reviewed

> Set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec


 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-11643:
---
Fix Version/s: HDFS-7285

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: HDFS-7285
>
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec


 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng resolved HADOOP-11643.

  Resolution: Fixed
Target Version/s: HDFS-7285
Hadoop Flags: Reviewed

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Fix For: HDFS-7285
>
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec


[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348279#comment-14348279
 ] 

Kai Zheng commented on HADOOP-11643:


Thanks [~libo-intel]. I committed this to branch HDFS-7285.

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly


 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi Ozawa updated HADOOP-11648:

Summary: Set DomainSocketWatcher thread name explicitly  (was: set 
DomainSocketWatcher thread name explicitly)

> Set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly


[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348265#comment-14348265
 ] 

Tsuyoshi Ozawa commented on HADOOP-11648:
-

+1, committing this shortly.

> set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348260#comment-14348260
 ] 

Li Bo commented on HADOOP-11643:


Patch v3 reviewed.

+1

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec


[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348250#comment-14348250
 ] 

Kai Zheng commented on HADOOP-11643:


Thanks [~libo-intel] for your review and the good catch. It's updated, would 
you review again ? Thanks.

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec


 [ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Zheng updated HADOOP-11643:
---
Attachment: HADOOP-11643-v3.patch

Change summary:
1. Fixed the issue found by Bo;
2. Also added a test.
3. Override toString() method to support dump.

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, 
> HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10846) DataChecksum#calculateChunkedSums not working for PPC when buffers not backed by array


[ 
https://issues.apache.org/jira/browse/HADOOP-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348236#comment-14348236
 ] 

Ayappan commented on HADOOP-10846:
--

A new jira ( HADOOP-11665 ) has been opened to fix this issue in a more 
standard way.

> DataChecksum#calculateChunkedSums not working for PPC when buffers not backed 
> by array
> --
>
> Key: HADOOP-10846
> URL: https://issues.apache.org/jira/browse/HADOOP-10846
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.4.1, 2.5.2
> Environment: PowerPC platform
>Reporter: Jinghui Wang
>Assignee: Ayappan
> Attachments: HADOOP-10846-v1.patch, HADOOP-10846-v2.patch, 
> HADOOP-10846-v3.patch, HADOOP-10846-v4.patch, HADOOP-10846.patch
>
>
> Got the following exception when running Hadoop on Power PC. The 
> implementation for computing checksum when the data buffer and checksum 
> buffer are not backed by arrays.
> 13/09/16 04:06:57 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:biadmin (auth:SIMPLE) 
> cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): 
> org.apache.hadoop.fs.ChecksumException: Checksum error



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code


 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
 Environment: PowerPC Big Endian & other Big Endian platforms
Target Version/s: 2.7.0

> Provide and unify cross platform byteorder support in native code
> -
>
> Key: HADOOP-11665
> URL: https://issues.apache.org/jira/browse/HADOOP-11665
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.4.1, 2.6.0
> Environment: PowerPC Big Endian & other Big Endian platforms
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HADOOP-11665.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code


 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
Component/s: util

> Provide and unify cross platform byteorder support in native code
> -
>
> Key: HADOOP-11665
> URL: https://issues.apache.org/jira/browse/HADOOP-11665
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.4.1, 2.6.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HADOOP-11665.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code


 [ 
https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayappan updated HADOOP-11665:
-
Affects Version/s: 2.4.1
   2.6.0

> Provide and unify cross platform byteorder support in native code
> -
>
> Key: HADOOP-11665
> URL: https://issues.apache.org/jira/browse/HADOOP-11665
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 2.4.1, 2.6.0
>Reporter: Binglin Chang
>Assignee: Binglin Chang
> Attachments: HADOOP-11665.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream


[ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348229#comment-14348229
 ] 

Hadoop QA commented on HADOOP-11674:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702711/HADOOP-11674.1.patch
  against trunk revision 8d88691.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//console

This message is automatically generated.

> data corruption for parallel CryptoInputStream and CryptoOutputStream
> -
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly


[ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348217#comment-14348217
 ] 

Hadoop QA commented on HADOOP-11648:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702060/HADOOP-11648-003.txt
  against trunk revision ded0200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//console

This message is automatically generated.

> set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not


 [ 
https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11675:
---
Attachment: HADOOP-11675-001.txt

a very simple fix, so no testing be added.

> tiny exception log with checking storedBlock is null or not
> ---
>
> Key: HADOOP-11675
> URL: https://issues.apache.org/jira/browse/HADOOP-11675
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HADOOP-11675-001.txt
>
>
> Found this log at our product cluster:
> {code}
> 2015-03-05,10:33:31,778 ERROR 
> org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: 
> Compaction failed 
> regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
>  storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 
> M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
> java.io.IOException: 
> BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
> exist or is not under Constructionnull
> {code}
> let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not


 [ 
https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11675:
---
Status: Patch Available  (was: Open)

> tiny exception log with checking storedBlock is null or not
> ---
>
> Key: HADOOP-11675
> URL: https://issues.apache.org/jira/browse/HADOOP-11675
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>Priority: Minor
> Attachments: HADOOP-11675-001.txt
>
>
> Found this log at our product cluster:
> {code}
> 2015-03-05,10:33:31,778 ERROR 
> org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: 
> Compaction failed 
> regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
>  storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 
> M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
> java.io.IOException: 
> BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
> exist or is not under Constructionnull
> {code}
> let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not

Liang Xie created HADOOP-11675:
--

 Summary: tiny exception log with checking storedBlock is null or 
not
 Key: HADOOP-11675
 URL: https://issues.apache.org/jira/browse/HADOOP-11675
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.6.0
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor


Found this log at our product cluster:
{code}
2015-03-05,10:33:31,778 ERROR 
org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction 
failed 
regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f.,
 storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 
24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479
java.io.IOException: 
BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not 
exist or is not under Constructionnull
{code}

let's check storedBlock is null or not to make log pretty



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec

2015-03-04 Thread Li Bo (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348197#comment-14348197
 ] 

Li Bo commented on HADOOP-11643:


hi, Kai
I think the code is ok in general.
One point:
When catching a {{NumberFormatException}}, {{IllegalArgumentException}} with 
message {{"No codec option is provided"}} is thrown. I think the codec option 
is provided but not in correct integer format, so how about changing the 
message like "Option XXX is an integer, please provide the correct format."

> Define EC schema API for ErasureCodec
> -
>
> Key: HADOOP-11643
> URL: https://issues.apache.org/jira/browse/HADOOP-11643
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: io
>Reporter: Kai Zheng
>Assignee: Kai Zheng
> Attachments: HADOOP-11643_v1.patch, HADOOP-11643_v2.patch
>
>
> As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API 
> will be first defined here for better sync among related issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly


 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11648:
---
Attachment: HADOOP-11648-003.txt

trying to re-trigger the QA

> set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, 
> HADOOP-11648-003.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly


 [ 
https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HADOOP-11648:
---
Attachment: (was: HADOOP-11648-003.txt)

> set DomainSocketWatcher thread name explicitly
> --
>
> Key: HADOOP-11648
> URL: https://issues.apache.org/jira/browse/HADOOP-11648
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt
>
>
> while working at HADOOP-11604, seems the current DomainSocketWatcher thread 
> name is not set explicitly, e.g. in our cluster, the format is like: 
> Thread-25,  Thread-303670 or sth else. Here Thread-25 seems came from 
> Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be 
> found. I think it'd better to set the thread name, so we can debug issue 
> easier in further.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11510) Expose truncate API via FileContext


 [ 
https://issues.apache.org/jira/browse/HADOOP-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11510:

Component/s: fs

> Expose truncate API via FileContext
> ---
>
> Key: HADOOP-11510
> URL: https://issues.apache.org/jira/browse/HADOOP-11510
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: Yi Liu
>Assignee: Yi Liu
> Fix For: 2.7.0
>
> Attachments: HADOOP-11510.001.patch, HADOOP-11510.002.patch, 
> HADOOP-11510.003.patch
>
>
> We also need to expose truncate API via {{org.apache.hadoop.fs.FileContext}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11589) NetUtils.createSocketAddr should trim the input URI


 [ 
https://issues.apache.org/jira/browse/HADOOP-11589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11589:

Component/s: net

> NetUtils.createSocketAddr should trim the input URI
> ---
>
> Key: HADOOP-11589
> URL: https://issues.apache.org/jira/browse/HADOOP-11589
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: net
>Affects Versions: 2.6.0
>Reporter: Akira AJISAKA
>Assignee: Rakesh R
>Priority: Minor
>  Labels: newbie
> Fix For: 2.7.0
>
> Attachments: HADOOP-11589-1.patch, HADOOP-11589-2.patch
>
>
> NetUtils.createSocketAddr does not trim the input URI, should be trimmed.
> HDFS-7684 and HADOOP-9869 are trying to trim some URIs to be passed to the 
> method, however, not all of the inputs have been trimmed already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11672) test


[ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348165#comment-14348165
 ] 

Brahma Reddy Battula commented on HADOOP-11672:
---

FYKI Please go through the following link to contributions ..

http://wiki.apache.org/hadoop/HowToContribute

> test
> 
>
> Key: HADOOP-11672
> URL: https://issues.apache.org/jira/browse/HADOOP-11672
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: xiangqian.xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HADOOP-11674:
-
Status: Patch Available  (was: In Progress)

> data corruption for parallel CryptoInputStream and CryptoOutputStream
> -
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HADOOP-11672) test


 [ 
https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula resolved HADOOP-11672.
---
Resolution: Not a Problem

> test
> 
>
> Key: HADOOP-11672
> URL: https://issues.apache.org/jira/browse/HADOOP-11672
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: xiangqian.xu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HADOOP-11674:
-
Attachment: HADOOP-11674.1.patch

> data corruption for parallel CryptoInputStream and CryptoOutputStream
> -
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
> Attachments: HADOOP-11674.1.patch
>
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream

Sean Busbey created HADOOP-11674:


 Summary: data corruption for parallel CryptoInputStream and 
CryptoOutputStream
 Key: HADOOP-11674
 URL: https://issues.apache.org/jira/browse/HADOOP-11674
 Project: Hadoop Common
  Issue Type: Bug
  Components: io
Affects Versions: 2.6.0
Reporter: Sean Busbey
Assignee: Sean Busbey
Priority: Critical


A common optimization in the io classes for Input/Output Streams is to save a 
single length-1 byte array to use in single byte read/write calls.

CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
but mistakenly mark the array as static. That means that only a single instance 
of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream


 [ 
https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-11674 started by Sean Busbey.

> data corruption for parallel CryptoInputStream and CryptoOutputStream
> -
>
> Key: HADOOP-11674
> URL: https://issues.apache.org/jira/browse/HADOOP-11674
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: io
>Affects Versions: 2.6.0
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Critical
>
> A common optimization in the io classes for Input/Output Streams is to save a 
> single length-1 byte array to use in single byte read/write calls.
> CryptoInputStream and CryptoOutputStream both attempt to follow this practice 
> but mistakenly mark the array as static. That means that only a single 
> instance of each can be present in a JVM safely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c


[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348152#comment-14348152
 ] 

Hadoop QA commented on HADOOP-11638:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702696/HADOOP-11638-002.patch
  against trunk revision 8d88691.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//console

This message is automatically generated.

> Linux-specific gettid() used in OpensslSecureRandom.c
> -
>
> Key: HADOOP-11638
> URL: https://issues.apache.org/jira/browse/HADOOP-11638
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Affects Versions: 2.6.0
>Reporter: Dmitry Sivachenko
>Assignee: Kiran Kumar M R
>  Labels: freebsd
> Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch
>
>
> In OpensslSecureRandom.c you use Linux-specific syscall gettid():
> static unsigned long pthreads_thread_id(void)
> {
> return (unsigned long)syscall(SYS_gettid);
> }
> Man page says:
> gettid()  is Linux-specific and should not be used in programs that are
> intended to be portable.
> This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11103) Clean up RemoteException


[ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348115#comment-14348115
 ] 

Sean Busbey commented on HADOOP-11103:
--

TestFileTruncate passes locally.

> Clean up RemoteException
> 
>
> Key: HADOOP-11103
> URL: https://issues.apache.org/jira/browse/HADOOP-11103
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Trivial
> Attachments: HADOOP-11103.1.patch
>
>
> RemoteException has a number of undocumented behaviors
> * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
> source, the String returned is the classname of the wrapped remote exception.
> * RemoteException(String, String) is equivalent to calling 
> RemoteException(String, String, null)
> * Constructors allow null for all arguments
> * Some of the test code doesn't check for correct error codes to correspond 
> with the wrapped exception type
> * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9902) Shell script rewrite

[
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer updated HADOOP-9902:
-
Release Note:
The Hadoop shell scripts have been rewritten to fix many long standing bugs and
include some new features. While an eye has been kept towards compatibility,
some changes may break existing installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the
appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations
in place, for multiple versions of the same secure daemon to run on a host.
Additionally, pid files are now created when daemons are run in interactive
mode. This will also prevent the accidental starting of two daemons with the
same configuration prior to launching java (i.e., "fast fail" without having to
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows
for all of the environment variables to be in one location. This was not the
case previously.
* The default content of *-env.sh has been significantly altered, with the
majority of defaults moved into more protected areas inside the code.
Additionally, these files do not auto-append anymore; setting a variable on the
command line prior to calling a shell command must contain the entire content,
not just any extra settings. This brings Hadoop more in-line with the vast
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred',
and related commands are executed. Previously, these were separated out which
meant a significant amount of duplication of common settings.
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec
and sbin. The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been
removed. These settings are now configurable in the *-env.sh files via *_OPT.
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take
advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or
${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be
ignored and stripped from their respective environment settings.

NEW FEATURES:

* Daemonization has been moved from *-daemon.sh to the bin commands via the
--daemon option. Simply use --daemon start to start a daemon, --daemon stop to
stop a daemon, and --daemon status to set $? to the daemon's status. The
return code for status is LSB-compatible. For example, 'hdfs --daemon start
namenode'.
* It is now possible to override some of the shell code capabilities to provide
site specific functionality without replacing the shipped versions.
Replacement functions should go into the new hadoop-user-functions.sh file.
* A new option called --buildpaths will attempt to add developer build
directories to the classpath to allow for in source tree testing.
* Operations which trigger ssh connections can now use pdsh if installed.
${HADOOP_SSH_OPTS} still gets applied.
* Added distch and jnipath subcommands to the hadoop command.
* Shell scripts now support a --debug option which will report basic
information on the construction of various environment variables, java options,
classpath, etc. to help in configuration debugging.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring
symlinking and other such tricks.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided
hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path,
without ${HADOOP_PREFIX} being defined, and as the target of bash -x for
debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined
based upon the current location of the shell library. Note that other parts of
the extended Hadoop ecosystem may still require this environment variable to be
configured.
* Operations which trigger ssh will now limit the number of connections to run
in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion.
By default, this is set to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Some subcommands were not listed in the usage.
* Various options on hadoop command lines were supported i

[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Kiran Kumar M R (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348097#comment-14348097
 ] 

Kiran Kumar M R commented on HADOOP-11638:
--

Thanks for review Colin. Added new patch as per comments.
[~trtrmitya], you can compile on FreeBSD and confirm if patch is working. 

> Linux-specific gettid() used in OpensslSecureRandom.c
> -
>
> Key: HADOOP-11638
> URL: https://issues.apache.org/jira/browse/HADOOP-11638
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Affects Versions: 2.6.0
>Reporter: Dmitry Sivachenko
>Assignee: Kiran Kumar M R
>  Labels: freebsd
> Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch
>
>
> In OpensslSecureRandom.c you use Linux-specific syscall gettid():
> static unsigned long pthreads_thread_id(void)
> {
> return (unsigned long)syscall(SYS_gettid);
> }
> Man page says:
> gettid()  is Linux-specific and should not be used in programs that are
> intended to be portable.
> This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Kiran Kumar M R (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Kumar M R updated HADOOP-11638:
-
Attachment: HADOOP-11638-002.patch

> Linux-specific gettid() used in OpensslSecureRandom.c
> -
>
> Key: HADOOP-11638
> URL: https://issues.apache.org/jira/browse/HADOOP-11638
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Affects Versions: 2.6.0
>Reporter: Dmitry Sivachenko
>Assignee: Kiran Kumar M R
>  Labels: freebsd
> Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch
>
>
> In OpensslSecureRandom.c you use Linux-specific syscall gettid():
> static unsigned long pthreads_thread_id(void)
> {
> return (unsigned long)syscall(SYS_gettid);
> }
> Man page says:
> gettid()  is Linux-specific and should not be used in programs that are
> intended to be portable.
> This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return


 [ 
https://issues.apache.org/jira/browse/HADOOP-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reassigned HADOOP-11673:
-

Assignee: Brahma Reddy Battula

> Use org.junit.Assume to skip tests instead of return
> 
>
> Key: HADOOP-11673
> URL: https://issues.apache.org/jira/browse/HADOOP-11673
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: test
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
>Priority: Minor
>
> We see the following code many times:
> {code:title=TestCodec.java}
> if (!ZlibFactory.isNativeZlibLoaded(conf)) {
>   LOG.warn("skipped: native libs not loaded");
>   return;
> }
> {code}
> If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
> with a warn log. I'd like to *skip* this test case by using 
> {{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return

2015-03-04 Thread Akira AJISAKA (JIRA)

Akira AJISAKA created HADOOP-11673:
--

 Summary: Use org.junit.Assume to skip tests instead of return
 Key: HADOOP-11673
 URL: https://issues.apache.org/jira/browse/HADOOP-11673
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Akira AJISAKA
Priority: Minor


We see the following code many times:
{code:title=TestCodec.java}
if (!ZlibFactory.isNativeZlibLoaded(conf)) {
  LOG.warn("skipped: native libs not loaded");
  return;
}
{code}
If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, 
with a warn log. I'd like to *skip* this test case by using 
{{org.junit.Assume}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField


[ 
https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348057#comment-14348057
 ] 

Hadoop QA commented on HADOOP-10027:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702397/HADOOP-10027.3.patch
  against trunk revision ded0200.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-common-project/hadoop-common:

org.apache.hadoop.io.compress.TestCodec

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//console

This message is automatically generated.

> *Compressor_deflateBytesDirect passes instance instead of jclass to 
> GetStaticObjectField
> 
>
> Key: HADOOP-10027
> URL: https://issues.apache.org/jira/browse/HADOOP-10027
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Reporter: Eric Abbott
>Assignee: Hui Zheng
>Priority: Minor
> Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, 
> HADOOP-10027.3.patch
>
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup
> This pattern appears in all the native compressors.
> // Get members of ZlibCompressor
> jobject clazz = (*env)->GetStaticObjectField(env, this,
>  ZlibCompressor_clazz);
> The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a 
> jobject. Adding the JVM param -Xcheck:jni will cause "FATAL ERROR in native 
> method: JNI received a class argument that is not a class" and a core dump 
> such as the following.
> (gdb) 
> #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6
> #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6
> #2 0x7f02e45bd727 in os::abort(bool) () from 
> /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, 
> bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from 
> /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #5 0x7f02d38eaf79 in 
> Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () 
> from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> In addition, that clazz object is only used for synchronization. In the case 
> of the native method _deflateBytesDirect, the result is a class wide lock 
> used to access the instance field uncompressed_direct_buf. Perhaps using the 
> instance as the sync point is more appropriate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11672) test

2015-03-04 Thread xiangqian.xu (JIRA)

xiangqian.xu created HADOOP-11672:
-

 Summary: test
 Key: HADOOP-11672
 URL: https://issues.apache.org/jira/browse/HADOOP-11672
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: xiangqian.xu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11671) Asynchronous native RPC v9 client

2015-03-04 Thread Haohui Mai (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348018#comment-14348018
 ] 

Haohui Mai commented on HADOOP-11671:
-

bq. Is this really a good, long term strategy given our use of protobuf now 
that gRPC exists?

The Hadoop RPC library allows more native applications to be integrated with 
Hadoop and to benefit the ecosystem. Once Hadoop have switched to GRPC, we can 
transform this library as a shim library to GRPC, or retire it. :-)

> Asynchronous native RPC v9 client
> -
>
> Key: HADOOP-11671
> URL: https://issues.apache.org/jira/browse/HADOOP-11671
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> There are more and more integration happening between Hadoop and applications 
> that are implemented using languages other than Java.
> To access Hadoop, applications either have to go through JNI (e.g. libhdfs), 
> or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). 
> Unfortunately, neither of them are satisfactory:
> * Integrating with JNI requires running a JVM inside the application. Some 
> applications (e.g., real-time processing, MPP database) does not want the 
> footprints and GC behavior of the JVM.
> * The Hadoop RPC protocol has a rich feature set including wire encryption, 
> SASL, Kerberos authentication. Many 3rd-party implementations can fully cover 
> the feature sets thus they might work in limited environment.
> This jira is to propose implementing an Hadoop RPC library in C++ that 
> provides a common ground to implement higher-level native client for HDFS, 
> YARN, and MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348013#comment-14348013
 ] 

Colin Patrick McCabe commented on HADOOP-11660:
---

OK.  Thanks, Edward.

> Add support for hardware crc on ARM aarch64 architecture
> 
>
> Key: HADOOP-11660
> URL: https://issues.apache.org/jira/browse/HADOOP-11660
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
> Environment: ARM aarch64 development platform
>Reporter: Edward Nevill
>Assignee: Edward Nevill
>Priority: Minor
>  Labels: performance
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the 
> pipelined version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
> are supported on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time 
> taken to CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c

2015-03-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348010#comment-14348010
 ] 

Colin Patrick McCabe commented on HADOOP-11638:
---

Can you add an {{#else}} clause that has an {{#error}}?  +1 after that is done.

thanks.

> Linux-specific gettid() used in OpensslSecureRandom.c
> -
>
> Key: HADOOP-11638
> URL: https://issues.apache.org/jira/browse/HADOOP-11638
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Affects Versions: 2.6.0
>Reporter: Dmitry Sivachenko
>Assignee: Kiran Kumar M R
>  Labels: freebsd
> Attachments: HADOOP-11638-001.patch
>
>
> In OpensslSecureRandom.c you use Linux-specific syscall gettid():
> static unsigned long pthreads_thread_id(void)
> {
> return (unsigned long)syscall(SYS_gettid);
> }
> Man page says:
> gettid()  is Linux-specific and should not be used in programs that are
> intended to be portable.
> This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11460) Deprecate shell vars


 [ 
https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11460:
--
Release Note: 
The following shell environment variables have been deprecated:

| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |

  was:
The following shell environment variables have been deprecated:
| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |


> Deprecate shell vars
> 
>
> Key: HADOOP-11460
> URL: https://issues.apache.org/jira/browse/HADOOP-11460
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: John Smith
>  Labels: scripts, shell
> Fix For: 3.0.0
>
> Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, 
> HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch
>
>
> It is a very common shell pattern in 3.x to effectively replace sub-project 
> specific vars with generics.  We should have a function that does this 
> replacement and provides a warning to the end user that the old shell var is 
> deprecated.  Additionally, we should use this shell function to deprecate the 
> shell vars that are holdovers already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Moved] (HADOOP-11671) Asynchronous native RPC v9 client

2015-03-04 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai moved HDFS-7887 to HADOOP-11671:
---

Key: HADOOP-11671  (was: HDFS-7887)
Project: Hadoop Common  (was: Hadoop HDFS)

> Asynchronous native RPC v9 client
> -
>
> Key: HADOOP-11671
> URL: https://issues.apache.org/jira/browse/HADOOP-11671
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>
> There are more and more integration happening between Hadoop and applications 
> that are implemented using languages other than Java.
> To access Hadoop, applications either have to go through JNI (e.g. libhdfs), 
> or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). 
> Unfortunately, neither of them are satisfactory:
> * Integrating with JNI requires running a JVM inside the application. Some 
> applications (e.g., real-time processing, MPP database) does not want the 
> footprints and GC behavior of the JVM.
> * The Hadoop RPC protocol has a rich feature set including wire encryption, 
> SASL, Kerberos authentication. Many 3rd-party implementations can fully cover 
> the feature sets thus they might work in limited environment.
> This jira is to propose implementing an Hadoop RPC library in C++ that 
> provides a common ground to implement higher-level native client for HDFS, 
> YARN, and MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField

2015-03-04 Thread Colin Patrick McCabe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348006#comment-14348006
 ] 

Colin Patrick McCabe commented on HADOOP-10027:
---

Not sure what the issue was here.  It looks kind of like a jenkins problem?  
Not sure.
{code}
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ 
hadoop-auth ---
FATAL: hudson.remoting.RequestAbortedException: 
hudson.remoting.Channel$OrderlyShutdown
hudson.remoting.RequestAbortedException: 
hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41)
at 
hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34)
at hudson.remoting.Request.call(Request.java:174)
at hudson.remoting.Channel.call(Channel.java:742)
at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168)
at com.sun.proxy.$Proxy57.join(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956)
{code}

I will retrigger.

> *Compressor_deflateBytesDirect passes instance instead of jclass to 
> GetStaticObjectField
> 
>
> Key: HADOOP-10027
> URL: https://issues.apache.org/jira/browse/HADOOP-10027
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: native
>Reporter: Eric Abbott
>Assignee: Hui Zheng
>Priority: Minor
> Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, 
> HADOOP-10027.3.patch
>
>
> http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup
> This pattern appears in all the native compressors.
> // Get members of ZlibCompressor
> jobject clazz = (*env)->GetStaticObjectField(env, this,
>  ZlibCompressor_clazz);
> The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a 
> jobject. Adding the JVM param -Xcheck:jni will cause "FATAL ERROR in native 
> method: JNI received a class argument that is not a class" and a core dump 
> such as the following.
> (gdb) 
> #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6
> #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6
> #2 0x7f02e45bd727 in os::abort(bool) () from 
> /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, 
> bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from 
> /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so
> #5 0x7f02d38eaf79 in 
> Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () 
> from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
> In addition, that clazz object is only used for synchronization. In the case 
> of the native method _deflateBytesDirect, the result is a class wide lock 
> used to access the instance field uncompressed_direct_buf. Perhaps using the 
> instance as the sync point is more appropriate?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9902) Shell script rewrite

[
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the
appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations
in place, for multiple versions of the same secure daemon to run on a host.
Additionally, pid files are now created when daemons are run in interactive
mode. This will also prevent the accidental starting of two daemons with the
same configuration prior to launching java (i.e., "fast fail" without having to
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows
for all of the environment variables to be in one location. This was not the
case previously.
* The default content of *-env.sh has been significantly altered, with the
majority of defaults moved into more protected areas inside the code.
Additionally, these files do not auto-append anymore; setting a variable on the
command line prior to calling a shell command must contain the entire content,
not just any extra settings. This brings Hadoop more in-line with the vast
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred',
and related commands are executed. Previously, these were separated out which
meant a significant amount of duplication of common settings.
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec
and sbin. The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been
removed. These settings are now configurable in the *-env.sh files via *_OPT.
* Some formerly 'documented' entries in yarn-env.sh have been undocumented as a
simple form of deprecration in order to greatly simplify configuration and
reduce unnecessary duplication. They still work, but those variables will
likely be removed in a future release.
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take
advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or
${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be
ignored and stripped from their respective environment settings.
* cygwin support has been removed.

NEW FEATURES:

BUG FIXES:

[jira] [Updated] (HADOOP-11460) Deprecate shell vars


 [ 
https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11460:
--
Release Note: 
The following shell environment variables have been deprecated:
| Old | New |
|: |: |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |

  was:
The following shell environment variables have been deprecated:
|| Old || New ||
|  |  |
| HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE|
| HADOOP_HDFS_NICENESS| HADOOP_NICENESS|
| HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT |
| HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR|
| HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING|
| HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR|
| HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE|
| HADOOP_MAPRED_NICENESS| HADOOP_NICENESS|
| HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR|
| HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_CONF_DIR| HADOOP_CONF_DIR|
| YARN_LOG_DIR| HADOOP_LOG_DIR|
| YARN_LOGFILE| HADOOP_LOGFILE|
| YARN_NICENESS| HADOOP_NICENESS|
| YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT|
| YARN_PID_DIR| HADOOP_PID_DIR|
| YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER|
| YARN_IDENT_STRING| HADOOP_IDENT_STRING|
| YARN_OPTS| HADOOP_OPTS|
| YARN_SLAVES| HADOOP_SLAVES|
| YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH|
| YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST|
| KMS_CONFIG |HADOOP_CONF_DIR|
| KMS_LOG |HADOOP_LOG_DIR |


> Deprecate shell vars
> 
>
> Key: HADOOP-11460
> URL: https://issues.apache.org/jira/browse/HADOOP-11460
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: scripts
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: John Smith
>  Labels: scripts, shell
> Fix For: 3.0.0
>
> Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, 
> HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch
>
>
> It is a very common shell pattern in 3.x to effectively replace sub-project 
> specific vars with generics.  We should have a function that does this 
> replacement and provides a warning to the end user that the old shell var is 
> deprecated.  Additionally, we should use this shell function to deprecate the 
> shell vars that are holdovers already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11602) Fix toUpperCase/toLowerCase to use Locale.ENGLISH

2015-03-04 Thread Akira AJISAKA (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347927#comment-14347927
 ] 

Akira AJISAKA commented on HADOOP-11602:


Thanks [~ozawa]! Looks good to me but the patch needs rebasing.

> Fix toUpperCase/toLowerCase to use Locale.ENGLISH
> -
>
> Key: HADOOP-11602
> URL: https://issues.apache.org/jira/browse/HADOOP-11602
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
> Attachments: HADOOP-11602-001.patch, HADOOP-11602-002.patch, 
> HADOOP-11602-003.patch, HADOOP-11602-004.patch, 
> HADOOP-11602-branch-2.001.patch, HADOOP-11602-branch-2.002.patch, 
> HADOOP-11602-branch-2.003.patch
>
>
> String#toLowerCase()/toUpperCase() without a locale argument can occur 
> unexpected behavior based on the locale. It's written in 
> [Javadoc|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#toLowerCase()]:
> {quote}
> For instance, "TITLE".toLowerCase() in a Turkish locale returns "t\u0131tle", 
> where '\u0131' is the LATIN SMALL LETTER DOTLESS I character
> {quote}
> This issue is derived from HADOOP-10101.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11103) Clean up RemoteException


[ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347901#comment-14347901
 ] 

Hadoop QA commented on HADOOP-11103:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12669533/HADOOP-11103.1.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestFileTruncate

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//console

This message is automatically generated.

> Clean up RemoteException
> 
>
> Key: HADOOP-11103
> URL: https://issues.apache.org/jira/browse/HADOOP-11103
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Trivial
> Attachments: HADOOP-11103.1.patch
>
>
> RemoteException has a number of undocumented behaviors
> * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
> source, the String returned is the classname of the wrapped remote exception.
> * RemoteException(String, String) is equivalent to calling 
> RemoteException(String, String, null)
> * Constructors allow null for all arguments
> * Some of the test code doesn't check for correct error codes to correspond 
> with the wrapped exception type
> * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a


[ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347806#comment-14347806
 ] 

Adam Budde commented on HADOOP-11670:
-

My mistake-- looks like you're correct. I've updated the description.

> Fix IAM instance profile auth for s3a
> -
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
> S3Credentials class to read the value of these two params. The change in 
> question is presented below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfileCredentialsProvider(),
> new AnonymousAWSCredentialsProvider()
> );
> {code}
> As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
> S3Credentials class are now used to provide constructor arguments to 
> BasicAWSCredentialsProvider. These methods will raise an exception if the 
> fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
> respectively. If a user is relying on an IAM instance profile to authenticate 
> to an S3 bucket and therefore doesn't supply values for these params, they 
> will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a


 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Budde updated HADOOP-11670:

Description: 
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
S3Credentials class to read the value of these two params. The change in 
question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.

  was:
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). The change in question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.


> Fix IAM instance profile auth for s3a
> -
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-10714 breaks this behavior by using the 
> S3Credentials class to read the value of these two params. The change in 
> question is presented below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfileCredentialsProvider(),
> new AnonymousAWSCredentialsProvider()
> );
> {code}
> As you can see, the getAcc

[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a


 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11670:

Affects Version/s: (was: 2.6.0)
   2.7.0

> Fix IAM instance profile auth for s3a
> -
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.7.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
> S3Credentials class to read the value of these two params (this change is 
> unrelated to resolving HADOOP-11446). The change in question is presented 
> below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfileCredentialsProvider(),
> new AnonymousAWSCredentialsProvider()
> );
> {code}
> As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
> S3Credentials class are now used to provide constructor arguments to 
> BasicAWSCredentialsProvider. These methods will raise an exception if the 
> fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
> respectively. If a user is relying on an IAM instance profile to authenticate 
> to an S3 bucket and therefore doesn't supply values for these params, they 
> will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)


[ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347801#comment-14347801
 ] 

Steve Loughran commented on HADOOP-11670:
-

looks more like HADOOP-10714 was the change that did this

> Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
> --
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
> S3Credentials class to read the value of these two params (this change is 
> unrelated to resolving HADOOP-11446). The change in question is presented 
> below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfileCredentialsProvider(),
> new AnonymousAWSCredentialsProvider()
> );
> {code}
> As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
> S3Credentials class are now used to provide constructor arguments to 
> BasicAWSCredentialsProvider. These methods will raise an exception if the 
> fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
> respectively. If a user is relying on an IAM instance profile to authenticate 
> to an S3 bucket and therefore doesn't supply values for these params, they 
> will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a


 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-11670:

Summary: Fix IAM instance profile auth for s3a  (was: Fix IAM instance 
profile auth for s3a (broken in HADOOP-11446))

> Fix IAM instance profile auth for s3a
> -
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
> S3Credentials class to read the value of these two params (this change is 
> unrelated to resolving HADOOP-11446). The change in question is presented 
> below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfileCredentialsProvider(),
> new AnonymousAWSCredentialsProvider()
> );
> {code}
> As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
> S3Credentials class are now used to provide constructor arguments to 
> BasicAWSCredentialsProvider. These methods will raise an exception if the 
> fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
> respectively. If a user is relying on an IAM instance profile to authenticate 
> to an S3 bucket and therefore doesn't supply values for these params, they 
> will receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


[ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347783#comment-14347783
 ] 

Hadoop QA commented on HADOOP-11668:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702627/HADOOP-11668-02.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//console

This message is automatically generated.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Budde updated HADOOP-11670:

Description: 
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). The change in question is presented below:

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.

  was:
One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.


> Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
> --
>
> Key: HADOOP-11670
> URL: https://issues.apache.org/jira/browse/HADOOP-11670
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Adam Budde
> Fix For: 2.7.0
>
>
> One big advantage provided by the s3a filesystem is the ability to use an IAM 
> instance profile in order to authenticate when attempting to access an S3 
> bucket from an EC2 instance. This eliminates the need to deploy AWS account 
> credentials to the instance or to provide them to Hadoop via the 
> fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.
> The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
> S3Credentials class to read the value of these two params (this change is 
> unrelated to resolving HADOOP-11446). The change in question is presented 
> below:
> S3AFileSystem.java, lines 161-170:
> {code}
> // Try to get our credentials or just connect anonymously
> S3Credentials s3Credentials = new S3Credentials();
> s3Credentials.initialize(name, conf);
> AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
> new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
> s3Credentials.getSecretAccessKey()),
> new InstanceProfile

[jira] [Updated] (HADOOP-11103) Clean up RemoteException


 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11103:
--
Status: Open  (was: Patch Available)

> Clean up RemoteException
> 
>
> Key: HADOOP-11103
> URL: https://issues.apache.org/jira/browse/HADOOP-11103
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Trivial
> Attachments: HADOOP-11103.1.patch
>
>
> RemoteException has a number of undocumented behaviors
> * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
> source, the String returned is the classname of the wrapped remote exception.
> * RemoteException(String, String) is equivalent to calling 
> RemoteException(String, String, null)
> * Constructors allow null for all arguments
> * Some of the test code doesn't check for correct error codes to correspond 
> with the wrapped exception type
> * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)

Adam Budde created HADOOP-11670:
---

 Summary: Fix IAM instance profile auth for s3a (broken in 
HADOOP-11446)
 Key: HADOOP-11670
 URL: https://issues.apache.org/jira/browse/HADOOP-11670
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Adam Budde
 Fix For: 2.7.0


One big advantage provided by the s3a filesystem is the ability to use an IAM 
instance profile in order to authenticate when attempting to access an S3 
bucket from an EC2 instance. This eliminates the need to deploy AWS account 
credentials to the instance or to provide them to Hadoop via the 
fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params.

The patch submitted to resolve HADOOP-11446 breaks this behavior by using the 
S3Credentials class to read the value of these two params (this change is 
unrelated to resolving HADOOP-11446). 

S3AFileSystem.java, lines 161-170:
{code}
// Try to get our credentials or just connect anonymously
S3Credentials s3Credentials = new S3Credentials();
s3Credentials.initialize(name, conf);

AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(),
s3Credentials.getSecretAccessKey()),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
{code}

As you can see, the getAccessKey() and getSecretAccessKey() methods from the 
S3Credentials class are now used to provide constructor arguments to 
BasicAWSCredentialsProvider. These methods will raise an exception if the 
fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, 
respectively. If a user is relying on an IAM instance profile to authenticate 
to an S3 bucket and therefore doesn't supply values for these params, they will 
receive an exception and won't be able to access the bucket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11103) Clean up RemoteException


 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11103:
--
Status: Patch Available  (was: Open)

> Clean up RemoteException
> 
>
> Key: HADOOP-11103
> URL: https://issues.apache.org/jira/browse/HADOOP-11103
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Trivial
> Attachments: HADOOP-11103.1.patch
>
>
> RemoteException has a number of undocumented behaviors
> * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
> source, the String returned is the classname of the wrapped remote exception.
> * RemoteException(String, String) is equivalent to calling 
> RemoteException(String, String, null)
> * Constructors allow null for all arguments
> * Some of the test code doesn't check for correct error codes to correspond 
> with the wrapped exception type
> * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Status: Patch Available  (was: Reopened)

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


[ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347709#comment-14347709
 ] 

Allen Wittenauer edited comment on HADOOP-11668 at 3/4/15 10:40 PM:


-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were not preserving quotes around parameters that contained $IFS due to 
lack of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.


was (Author: aw):
-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were preserving quotes around parameters that contained $IFS due to lack 
of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Attachment: HADOOP-11668-02.patch

-02:
* This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with 
multiple hosts.

The problem was two fold:
* We were preserving quotes around parameters that contained $IFS due to lack 
of quoting around the array deletion
* The then deleted array elements were retained and show up as an empty 
argument.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer reopened HADOOP-11668:
---
  Assignee: Allen Wittenauer  (was: Vinayakumar B)

Re-opening.  The problem here isn't start/stop, it's *-daemons.sh, which are 
now broken.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Allen Wittenauer
> Attachments: HADOOP-11668-01.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-10895) HTTP KerberosAuthenticator fallback should have a flag to disable it

2015-03-04 Thread Yongjun Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347611#comment-14347611
 ] 

Yongjun Zhang commented on HADOOP-10895:


Hi [~tucu00], [~atm], [~zjshen], [~daryn],

This jira originates from the discussion in HADOOP-10771 you guys participated. 
I'd like to bring to your attention, to see if we want to move this one 
forward. Please see my comment at 
https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823

Thanks for your time, and thanks [~vinodkv] for suggesting me in the email 
thread to collect feedback from you guys.


> HTTP KerberosAuthenticator fallback should have a flag to disable it
> 
>
> Key: HADOOP-10895
> URL: https://issues.apache.org/jira/browse/HADOOP-10895
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.4.1
>Reporter: Alejandro Abdelnur
>Assignee: Yongjun Zhang
>Priority: Blocker
> Attachments: HADOOP-10895.001.patch, HADOOP-10895.002.patch, 
> HADOOP-10895.003.patch, HADOOP-10895.003v1.patch, HADOOP-10895.003v2.patch, 
> HADOOP-10895.003v2improved.patch, HADOOP-10895.004.patch, 
> HADOOP-10895.005.patch, HADOOP-10895.006.patch, HADOOP-10895.007.patch, 
> HADOOP-10895.008.patch, HADOOP-10895.009.patch
>
>
> Per review feedback in HADOOP-10771, {{KerberosAuthenticator}} and the 
> delegation token version coming in with HADOOP-10771 should have a flag to 
> disable fallback to pseudo, similarly to the one that was introduced in 
> Hadoop RPC client with HADOOP-9698.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HADOOP-11103) Clean up RemoteException


 [ 
https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned HADOOP-11103:


Assignee: Sean Busbey

> Clean up RemoteException
> 
>
> Key: HADOOP-11103
> URL: https://issues.apache.org/jira/browse/HADOOP-11103
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Trivial
> Attachments: HADOOP-11103.1.patch
>
>
> RemoteException has a number of undocumented behaviors
> * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the 
> source, the String returned is the classname of the wrapped remote exception.
> * RemoteException(String, String) is equivalent to calling 
> RemoteException(String, String, null)
> * Constructors allow null for all arguments
> * Some of the test code doesn't check for correct error codes to correspond 
> with the wrapped exception type
> * methods don't document when they might return null



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients


 [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11656:
--
Labels: classloading classpath dependencies scripts shell  (was: 
classloading classpath dependencies shell)

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies, scripts, shell
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients


 [ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11656:
--
Labels: classloading classpath dependencies shell  (was: classloading 
classpath dependencies)

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies, shell
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347566#comment-14347566
 ] 

Allen Wittenauer commented on HADOOP-11656:
---

FYI, I'm adding the 'shell' label because regardless of the outcome, this will 
almost certainly have an impact on how the various classpath commands and 
shellprofile.d code works in the future.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies, shell
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11613) Remove httpclient dependency from hadoop-azure


[ 
https://issues.apache.org/jira/browse/HADOOP-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347560#comment-14347560
 ] 

Brahma Reddy Battula commented on HADOOP-11613:
---

 *Testcase failures*  are because of {{encodedKey = URLEncoder.encode(key, 
"UTF-8");}} which is having the limitations on special characters..( All other 
characters are unsafe and are first converted into one or more bytes using some 
encoding scheme..Check the following java doc for same)

https://docs.oracle.com/javase/6/docs/api/java/net/URLEncoder.html

Just I replaced with bitset ( like following ), all the testcases are 
passing..I am always happy work with bitset..Hence I had given intial patch 
with bitset..

{code}
byte[] rawdata = URLCodec.encodeUrl(allowed_abs_path,
   EncodingUtils.getBytes(key, "UTF-8"));
 String encodedKey = EncodingUtils.getAsciiString(rawdata);
{code}

[~ajisakaa] If you agree with you, please consider initial patch which is 
having the bitset..( with this all the testcases are passing)..  Please correct 
me If I am wrong.

> Remove httpclient dependency from hadoop-azure
> --
>
> Key: HADOOP-11613
> URL: https://issues.apache.org/jira/browse/HADOOP-11613
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11613-001.patch, HADOOP-11613-002.patch, 
> HADOOP-11613-003.patch, HADOOP-11613.patch
>
>
> Remove httpclient dependency from MockStorageInterface.java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


[ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347538#comment-14347538
 ] 

Hadoop QA commented on HADOOP-11659:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12702588/HADOOP-11659.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//console

This message is automatically generated.

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HADOOP-11659.patch
>
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk


[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347500#comment-14347500
 ] 

Hadoop QA commented on HADOOP-11627:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702595/HADOOP-11627-004.patch
  against trunk revision ed70fa1.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5847//console

This message is automatically generated.

> Remove io.native.lib.available from trunk
> -
>
> Key: HADOOP-11627
> URL: https://issues.apache.org/jira/browse/HADOOP-11627
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
> HADOOP-11627-004.patch, HADOOP-11627.patch
>
>
> According to the discussion in HADOOP-8642, we should remove 
> {{io.native.lib.available}} from trunk, and always use native libraries if 
> they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk


[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347479#comment-14347479
 ] 

Brahma Reddy Battula commented on HADOOP-11627:
---

Ran all the testcases for regression locally  , all are passing.

> Remove io.native.lib.available from trunk
> -
>
> Key: HADOOP-11627
> URL: https://issues.apache.org/jira/browse/HADOOP-11627
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
> HADOOP-11627-004.patch, HADOOP-11627.patch
>
>
> According to the discussion in HADOOP-8642, we should remove 
> {{io.native.lib.available}} from trunk, and always use native libraries if 
> they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk


[ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347469#comment-14347469
 ] 

Brahma Reddy Battula commented on HADOOP-11627:
---

Thanks a lot for review..Please check updated patch..

> Remove io.native.lib.available from trunk
> -
>
> Key: HADOOP-11627
> URL: https://issues.apache.org/jira/browse/HADOOP-11627
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
> HADOOP-11627-004.patch, HADOOP-11627.patch
>
>
> According to the discussion in HADOOP-8642, we should remove 
> {{io.native.lib.available}} from trunk, and always use native libraries if 
> they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk


 [ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11627:
--
Attachment: HADOOP-11627-004.patch

> Remove io.native.lib.available from trunk
> -
>
> Key: HADOOP-11627
> URL: https://issues.apache.org/jira/browse/HADOOP-11627
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
> HADOOP-11627-004.patch, HADOOP-11627.patch
>
>
> According to the discussion in HADOOP-8642, we should remove 
> {{io.native.lib.available}} from trunk, and always use native libraries if 
> they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk


 [ 
https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11627:
--
Status: Patch Available  (was: In Progress)

> Remove io.native.lib.available from trunk
> -
>
> Key: HADOOP-11627
> URL: https://issues.apache.org/jira/browse/HADOOP-11627
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 3.0.0
>Reporter: Akira AJISAKA
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, 
> HADOOP-11627-004.patch, HADOOP-11627.patch
>
>
> According to the discussion in HADOOP-8642, we should remove 
> {{io.native.lib.available}} from trunk, and always use native libraries if 
> they exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347446#comment-14347446
 ] 

Steve Loughran commented on HADOOP-11656:
-

[~saint@gmail.com], as someone downstream, I know you know the situation we 
have now; everyone who goes down experiences this, with HBase and OOzie being 
core pain points. Not exposing the transitive dependencies means that you can 
stop worrying about what version of Guava or protobuf is used by Hadoop, 
leaving only our consistent semantics to maintain.

The native lib problem will mean no more than one version of the hadoop JARs 
can be reliably loaded.

Now, unless I'm confused about how classloaders bootstrap, it has to be done in 
an order; classloader above classloader, with OSGi doing some magic at startup 
so the first CL can pick up stuff from external CLs and make them visible to 
others.

Does this mean that adoption of the new CL is a whole new startup process? as 
if so, it is going to be visible to everything downstream. Now, we could design 
YARN-679 to be ready for this, so if you adopt that as the launcher for your 
app then you can get the CL setup in there.

But what about every single client app that wants to talk HDFS? We may be able 
to go to HBase & Accumulo & say "new launcher", maybe go to spark and say "your 
AM needs to do this", but it's harder to say "your general purpose code to read 
off HDFS must now use our CL chain to work". Especially for the use case 
"webapp running in tomcat with the Classloader isolation of Java EE". 

Things like aren't going to work if we start imposing a new CL, it will need to 
flip the switch to say no dependency magic. 

So why is this being proposed as "on-by-default"? And, since there isn't a 
clear proposal yet, are we trying to define that  we should be incompatible 
from the outset?

Please: give us a proposal, let's work towards an implementation, actually test 
this downstream including in an Oozie version (hence tomcat tests), in-cluster 
apps, and remote client apps. Then we can consider whether or not it would be 
justifiable to say "you must do this to move to Hadoop 3"

Oh, and given the schedules, we should start planning for Java 9 & Jigsaw...



> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Status: Patch Available  (was: Open)

Attached the patch and did not return testcases but executed effected testcases 
for regression and all are passing..

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HADOOP-11659.patch
>
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port


[ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347425#comment-14347425
 ] 

Hadoop QA commented on HADOOP-11618:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702572/HADOOP-11618-002.patch
  against trunk revision 03cc229.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//console

This message is automatically generated.

> DelegateToFileSystem always uses default FS's default port 
> ---
>
> Key: HADOOP-11618
> URL: https://issues.apache.org/jira/browse/HADOOP-11618
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
> HADOOP-11618.patch
>
>
> DelegateToFileSystem constructor has the following code:
> {code}
> super(theUri, supportedScheme, authorityRequired,
> FileSystem.getDefaultUri(conf).getPort());
> {code}
> The default port should be taken from theFsImpl instead.
> {code}
> super(theUri, supportedScheme, authorityRequired,
> theFsImpl.getDefaultPort());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: HADOOP-11659.patch

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HADOOP-11659.patch
>
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: (was: HADOOP-11653.patch)

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Minor
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Priority: Minor  (was: Trivial)

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Minor
> Attachments: HADOOP-11653.patch
>
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup


 [ 
https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11659:
--
Attachment: HADOOP-11653.patch

> o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
> 
>
> Key: HADOOP-11659
> URL: https://issues.apache.org/jira/browse/HADOOP-11659
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
>Priority: Trivial
> Attachments: HADOOP-11653.patch
>
>
> The method looks up the same key in the same hash map potentially 3 times
> {code}
> if (map.containsKey(key) && fs == map.get(key)) {
>   map.remove(key)
> {code}
> Instead it could do a single lookup
> {code}
> FileSystem cachedFs = map.remove(key);
> {code}
> and then test cachedFs == fs or something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port


[ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347327#comment-14347327
 ] 

Brahma Reddy Battula commented on HADOOP-11618:
---

Thanks a lot for review..
{quote}
In both cases, we are going to assert that ftpFs.getUri() results in 
ftp://dummy-host:21
{quote}
will not return default port where URI having port..whenever port=-1(not 
configured)then  only default port will be return..Please check following code 
for same..
{code}
 private URI getUri(URI uri, String supportedScheme,
  boolean authorityNeeded, int defaultPort) throws URISyntaxException {
checkScheme(uri, supportedScheme);
// A file system implementation that requires authority must always
// specify default port
if (defaultPort < 0 && authorityNeeded) {
  throw new HadoopIllegalArgumentException(
  "FileSystem implementation error -  default port " + defaultPort
  + " is not valid");
}
String authority = uri.getAuthority();
if (authority == null) {
   if (authorityNeeded) {
 throw new HadoopIllegalArgumentException("Uri without authority: " + 
uri);
   } else {
 return new URI(supportedScheme + ":///");
   }   
}
// authority is non null  - AuthorityNeeded may be true or false.
int port = uri.getPort();
port = (port == -1 ? defaultPort : port);
if (port == -1) { // no port supplied and default port is not specified
  return new URI(supportedScheme, authority, "/", null);
}
return new URI(supportedScheme + "://" + uri.getHost() + ":" + port);
  }
{code}

and 001 patch also will call this method only...Anyway I updated patch,,kindly 
review the same..

> DelegateToFileSystem always uses default FS's default port 
> ---
>
> Key: HADOOP-11618
> URL: https://issues.apache.org/jira/browse/HADOOP-11618
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
> HADOOP-11618.patch
>
>
> DelegateToFileSystem constructor has the following code:
> {code}
> super(theUri, supportedScheme, authorityRequired,
> FileSystem.getDefaultUri(conf).getPort());
> {code}
> The default port should be taken from theFsImpl instead.
> {code}
> super(theUri, supportedScheme, authorityRequired,
> theFsImpl.getDefaultPort());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port


 [ 
https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HADOOP-11618:
--
Attachment: HADOOP-11618-002.patch

> DelegateToFileSystem always uses default FS's default port 
> ---
>
> Key: HADOOP-11618
> URL: https://issues.apache.org/jira/browse/HADOOP-11618
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 2.6.0
>Reporter: Gera Shegalov
>Assignee: Brahma Reddy Battula
> Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, 
> HADOOP-11618.patch
>
>
> DelegateToFileSystem constructor has the following code:
> {code}
> super(theUri, supportedScheme, authorityRequired,
> FileSystem.getDefaultUri(conf).getPort());
> {code}
> The default port should be taken from theFsImpl instead.
> {code}
> super(theUri, supportedScheme, authorityRequired,
> theFsImpl.getDefaultPort());
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347298#comment-14347298
 ] 

Arun C Murthy commented on HADOOP-11656:


Agree 1000% with [~jlowe].

Starting with the thesis that we should break compat is less than ideal - we 
should certainly strive to add features in a compatible manner, this allows all 
existing users to consume the feature without the need to make a *should I use 
this or not* choice.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347234#comment-14347234
 ] 

stack commented on HADOOP-11656:


bq. To add, I think we can and should strive for doing this in a compatible 
manner, whatever the approach.

Sure. Sounds good if possible at all as well as being a load of work proving 
changes are indeed compatible.

bq. Marking and calling it incompatible before we see proposal/patch seems 
premature to me.

I'd suggest you open a new issue to do classpath isolation in a 'compatible 
manner' rather than add this imposition here. In this issue, the reporter 
thinks it a breaking change ("At a minimum we'll break dependency compatibility 
and operational compatibility."). The two issues can move along independent of 
each other.

And to be clear when we talk 'compatible manner', the expectation is that a 
downstream apps, for example hbase, should be able to move from hadoop-2.X to 
hadoop-2.Y without breakage, right? That is, in spite of shading, new locations 
for dependencies, cleaned up exposure of libs likely transitively included, 
etc., there will be no need for downstreamers to add in new compensatory code, 
no need of our having to release special versions to work with hadoop-2.Z, and 
no need of callouts in code or for us to do educate our community's that "if on 
hadoop-2.X do this...but if on hadoop-2.Y" do that? Or are we talking 
something else (And "downstreamers, you are doing it wrong" is not allowed).

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients

2015-03-04 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347217#comment-14347217
 ] 

Jason Lowe commented on HADOOP-11656:
-

bq. There are plenty of ways we can make the transition easier for downstream 
folks. I've already mentioned giving upgrade docs that include maven pom 
changes needed to get the same set of dependencies. As you mention, we could 
also include some option toggle that says "I want to see the framework 
libraries." I happen to think this is a bad idea because it leads straight back 
to where we are now. In any case, either of these mitigations require 
downstream projects to change what they are doing, which sounds incompatible to 
me.

I think the idea here is to flip the defaults around.  The easiest transition 
for existing downstream folks is to opt in, rather than opt out, of classpath 
isolation.  We can debate whether that's custom classloaders, OSGi packaging, 
or what-not when it's turned on.  But if not turned on by default then it is 
backwards compatible, to the extent that we support backwards compatibility 
today.  Clients/jobs that ran before continue to run on the new version.  Those 
that want/need the isolation can ask for it, and we can iterate the isolation 
feature without necessarily breaking the existing users that aren't asking for 
it because it didn't exist back then and would break their old workflow if it 
suddenly does.  At some point in the future we can (and probably want to) 
switch the defaults so clients/apps get classpath isolation by default.  I 
totally agree that decision necessarily breaks backwards compatibility.

IMHO the smoothest transition for major features, this or otherwise, is to 
develop the feature if possible as opt in, rather than opt out, until it is 
mature, stable, and the community agrees it should be active by default.  Some 
features are such that they inherently cannot be turned off, but if possible 
it'd be great to develop and mature them as options that people can try out 
until they become stable to ease transitions and avoid unnecessary breakage at 
an early stage.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option


 [ 
https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HADOOP-11668:
--
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing this in favor of HADOOP-11590, which rewrites these scripts.

> start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell 
> option
> ---
>
> Key: HADOOP-11668
> URL: https://issues.apache.org/jira/browse/HADOOP-11668
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: scripts
>Reporter: Vinayakumar B
>Assignee: Vinayakumar B
> Attachments: HADOOP-11668-01.patch
>
>
> After introduction of "--slaves" option for the scripts, start-dfs.sh and 
> stop-dfs.sh will no longer work in HA mode.
> This is due to multiple hostnames passed for '--hostnames' delimited with 
> space.
> These hostnames are treated as commands and script fails.
> So, instead of delimiting with space, delimiting with comma(,) before passing 
> to hadoop-daemons.sh will solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347188#comment-14347188
 ] 

Sean Busbey commented on HADOOP-11656:
--

{quote}
One troublespot, even with that tactic, is shown by HADOOP-11064: 
"UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to NativeCRC32 
method changes". Changes in the internal JNI bindings meant that no hadoop-2.4 
app (like HBase) would run in a Hadoop 2.6-alpha cluster. We were lucky that I 
got to find that before 2.6 shipped, otherwise we'd have a lot of complaints. 
The problem here is that even with HBase isolated on classpath, it was picking 
up the hadoop-native binaries from somewhere on PATH/LIB or whatever, and so 
failing to link.

Classloader isolation & shading isn't going to be sufficient here. HADOOP-11127 
proposes some versioning, which will help —but I don't think it will let us 
load >1 hadoop.lib into a JVM. As a result, the only version of 
hadoop-common.jar which can be reliably loaded into a process is the one that 
is in sync with the version of the native library on the target machine.
{quote}

Yes, native library support is an entire additional can of worms. For this 
improvement I'd prefer to leave that to future work, if only because the JVM 
doesn't really offer options. Perhaps docs that cover the limitations of what 
isolation we offer would be a good start.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients


[ 
https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347174#comment-14347174
 ] 

Sean Busbey commented on HADOOP-11656:
--

I don't see how we can do this compatibly. Even defaulting to use the 
application classloader will break some downstream projects. Certainly going 
the step farther to make sure we also only expose our API to them, wether via 
an OSGi container or not, will break even more of them.

I can understand the desire to have a compatible version of this in the 2.x 
line. Probably the option to have it off would make the most sense for that. 
However, this kind of isolation is something we _should_ be doing. The reason 
to focus first on a breaking version is so we can have doing things correctly 
staked to some point in the future.

There are plenty of ways we can make the transition easier for downstream 
folks. I've already mentioned giving upgrade docs that include maven pom 
changes needed to get the same set of dependencies. As you mention, we could 
also include some option toggle that says "I want to see the framework 
libraries." I happen to think this is a bad idea because it leads straight back 
to where we are now. In any case, either of these mitigations require 
downstream projects to change what they are doing, which sounds incompatible to 
me.

> Classpath isolation for downstream clients
> --
>
> Key: HADOOP-11656
> URL: https://issues.apache.org/jira/browse/HADOOP-11656
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>  Labels: classloading, classpath, dependencies
>
> Currently, Hadoop exposes downstream clients to a variety of third party 
> libraries. As our code base grows and matures we increase the set of 
> libraries we rely on. At the same time, as our user base grows we increase 
> the likelihood that some downstream project will run into a conflict while 
> attempting to use a different version of some library we depend on. This has 
> already happened with i.e. Guava several times for HBase, Accumulo, and Spark 
> (and I'm sure others).
> While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to 
> off and they don't do anything to help dependency conflicts on the driver 
> side or for folks talking to HDFS directly. This should serve as an umbrella 
> for changes needed to do things thoroughly on the next major version.
> We should ensure that downstream clients
> 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that 
> doesn't pull in any third party dependencies
> 2) only see our public API classes (or as close to this as feasible) when 
> executing user provided code, whether client side in a launcher/driver or on 
> the cluster in a container or within MR.
> This provides us with a double benefit: users get less grief when they want 
> to run substantially ahead or behind the versions we need and the project is 
> freer to change our own dependency versions because they'll no longer be in 
> our compatibility promises.
> Project specific task jiras to follow after I get some justifying use cases 
> written in the comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11669) Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class


[ 
https://issues.apache.org/jira/browse/HADOOP-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347107#comment-14347107
 ] 

Hadoop QA commented on HADOOP-11669:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12702457/001-HADOOP-11669.patch
  against trunk revision 3560180.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//console

This message is automatically generated.

> Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class
> -
>
> Key: HADOOP-11669
> URL: https://issues.apache.org/jira/browse/HADOOP-11669
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: nijel
>Assignee: nijel
>Priority: Minor
> Attachments: 0001-HDFS-7883.patch, 001-HADOOP-11669.patch
>
>
> These 2 configurations in HttpServer2.java is hadoop configurations.
> {code}
>   static final String FILTER_INITIALIZER_PROPERTY
>   = "hadoop.http.filter.initializers";
>   public static final String HTTP_MAX_THREADS = "hadoop.http.max.threads";
> {code}
> It is better to keep it inside CommonConfigurationKeys



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Edward Nevill (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Nevill updated HADOOP-11660:
---
Attachment: (was: jira-11660.patch)

> Add support for hardware crc on ARM aarch64 architecture
> 
>
> Key: HADOOP-11660
> URL: https://issues.apache.org/jira/browse/HADOOP-11660
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
> Environment: ARM aarch64 development platform
>Reporter: Edward Nevill
>Assignee: Edward Nevill
>Priority: Minor
>  Labels: performance
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the 
> pipelined version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
> are supported on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time 
> taken to CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture

2015-03-04 Thread Edward Nevill (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Nevill updated HADOOP-11660:
---
Status: Open  (was: Patch Available)

Patch to be replaced with a version which does pipelining

> Add support for hardware crc on ARM aarch64 architecture
> 
>
> Key: HADOOP-11660
> URL: https://issues.apache.org/jira/browse/HADOOP-11660
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: native
>Affects Versions: 3.0.0
> Environment: ARM aarch64 development platform
>Reporter: Edward Nevill
>Assignee: Edward Nevill
>Priority: Minor
>  Labels: performance
> Attachments: jira-11660.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> This patch adds support for hardware crc for ARM's new 64 bit architecture
> The patch is completely conditionalized on __aarch64__
> I have only added support for the non pipelined version as I benchmarked the 
> pipelined version on aarch64 and it showed no performance improvement.
> The aarch64 version supports both Castagnoli and Zlib CRCs as both of these 
> are supported on ARM aarch64 hardwre.
> To benchmark this I modified the test_bulk_crc32 test to print out the time 
> taken to CRC a 1MB dataset 1000 times.
> Before:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55
> After:
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57
> So this represents a 5X performance improvement on raw CRC calculation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream


[ 
https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347034#comment-14347034
 ] 

Hudson commented on HADOOP-11183:
-

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/])
HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: 
rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87)
* hadoop-common-project/hadoop-common/src/main/resources/core-default.xml
* hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
* hadoop-common-project/hadoop-common/CHANGES.txt
* 
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java
* hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
* 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
* 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java


> Memory-based S3AOutputstream
> 
>
> Key: HADOOP-11183
> URL: https://issues.apache.org/jira/browse/HADOOP-11183
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.6.0
>Reporter: Thomas Demoor
>Assignee: Thomas Demoor
> Fix For: 2.7.0
>
> Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, 
> HADOOP-11183-006.patch, HADOOP-11183-007.patch, HADOOP-11183-008.patch, 
> HADOOP-11183-009.patch, HADOOP-11183-010.patch, HADOOP-11183.001.patch, 
> HADOOP-11183.002.patch, HADOOP-11183.003.patch, design-comments.pdf
>
>
> Currently s3a buffers files on disk(s) before uploading. This JIRA 
> investigates adding a memory-based upload implementation.
> The motivation is evidently performance: this would be beneficial for users 
> with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on 
> an S3-compatible object store (FYI: my contributions are made in name of 
> Amplidata). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor