[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348325#comment-14348325 ] Hudson commented on HADOOP-11674: - FAILURE: Integrated in Hadoop-trunk-Commit #7266 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7266/]) HADOOP-11674. oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static. (Sean Busbey via yliu) (yliu: rev 5e9b8144d54f586803212a0bdd8b1c25bdbb1e97) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoInputStream.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoOutputStream.java * hadoop-common-project/hadoop-common/CHANGES.txt > oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static > --- > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 2.7.0 > > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HADOOP-11674: Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 (was: 3.0.0, 2.7.0, 2.6.1) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk and branch-2. Thanks [~busbey] for the contribution. > oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static > --- > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Fix For: 2.7.0 > > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348315#comment-14348315 ] Yi Liu commented on HADOOP-11674: - +1, {{oneByteBuf}} should be non-static, otherwise there may be issue for {{read()}} in multi threads. > oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static > --- > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11674) oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HADOOP-11674: Summary: oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static (was: data corruption for parallel CryptoInputStream and CryptoOutputStream) > oneByteBuf in CryptoInputStream and CryptoOutputStream should be non static > --- > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348301#comment-14348301 ] Hudson commented on HADOOP-11648: - FAILURE: Integrated in Hadoop-trunk-Commit #7265 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7265/]) HADOOP-11648. Set DomainSocketWatcher thread name explicitly. Contributed by Liang Xie. (ozawa: rev 74a4754d1c790b8740a4221f276aa571bc5dbfd5) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/ShortCircuitRegistry.java * hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/net/unix/TestDomainSocketWatcher.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShmManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/net/unix/DomainSocketWatcher.java > Set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Fix For: 2.7.0 > > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated HADOOP-11648: Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Committed this to trunk and branch-2. Thanks Liang for your contribution and thanks Colin Patrick McCabe for your review. > Set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Fix For: 2.7.0 > > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated HADOOP-11648: Target Version/s: 2.7.0 Hadoop Flags: Reviewed > Set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HADOOP-11643: --- Fix Version/s: HDFS-7285 > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: HDFS-7285 > > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng resolved HADOOP-11643. Resolution: Fixed Target Version/s: HDFS-7285 Hadoop Flags: Reviewed > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: HDFS-7285 > > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348279#comment-14348279 ] Kai Zheng commented on HADOOP-11643: Thanks [~libo-intel]. I committed this to branch HDFS-7285. > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11648) Set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi Ozawa updated HADOOP-11648: Summary: Set DomainSocketWatcher thread name explicitly (was: set DomainSocketWatcher thread name explicitly) > Set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348265#comment-14348265 ] Tsuyoshi Ozawa commented on HADOOP-11648: - +1, committing this shortly. > set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348260#comment-14348260 ] Li Bo commented on HADOOP-11643: Patch v3 reviewed. +1 > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348250#comment-14348250 ] Kai Zheng commented on HADOOP-11643: Thanks [~libo-intel] for your review and the good catch. It's updated, would you review again ? Thanks. > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HADOOP-11643: --- Attachment: HADOOP-11643-v3.patch Change summary: 1. Fixed the issue found by Bo; 2. Also added a test. 3. Override toString() method to support dump. > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HADOOP-11643-v3.patch, HADOOP-11643_v1.patch, > HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10846) DataChecksum#calculateChunkedSums not working for PPC when buffers not backed by array
[ https://issues.apache.org/jira/browse/HADOOP-10846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348236#comment-14348236 ] Ayappan commented on HADOOP-10846: -- A new jira ( HADOOP-11665 ) has been opened to fix this issue in a more standard way. > DataChecksum#calculateChunkedSums not working for PPC when buffers not backed > by array > -- > > Key: HADOOP-10846 > URL: https://issues.apache.org/jira/browse/HADOOP-10846 > Project: Hadoop Common > Issue Type: Bug > Components: util >Affects Versions: 2.4.1, 2.5.2 > Environment: PowerPC platform >Reporter: Jinghui Wang >Assignee: Ayappan > Attachments: HADOOP-10846-v1.patch, HADOOP-10846-v2.patch, > HADOOP-10846-v3.patch, HADOOP-10846-v4.patch, HADOOP-10846.patch > > > Got the following exception when running Hadoop on Power PC. The > implementation for computing checksum when the data buffer and checksum > buffer are not backed by arrays. > 13/09/16 04:06:57 ERROR security.UserGroupInformation: > PriviledgedActionException as:biadmin (auth:SIMPLE) > cause:org.apache.hadoop.ipc.RemoteException(java.io.IOException): > org.apache.hadoop.fs.ChecksumException: Checksum error -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayappan updated HADOOP-11665: - Environment: PowerPC Big Endian & other Big Endian platforms Target Version/s: 2.7.0 > Provide and unify cross platform byteorder support in native code > - > > Key: HADOOP-11665 > URL: https://issues.apache.org/jira/browse/HADOOP-11665 > Project: Hadoop Common > Issue Type: Bug > Components: util >Affects Versions: 2.4.1, 2.6.0 > Environment: PowerPC Big Endian & other Big Endian platforms >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HADOOP-11665.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayappan updated HADOOP-11665: - Component/s: util > Provide and unify cross platform byteorder support in native code > - > > Key: HADOOP-11665 > URL: https://issues.apache.org/jira/browse/HADOOP-11665 > Project: Hadoop Common > Issue Type: Bug > Components: util >Affects Versions: 2.4.1, 2.6.0 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HADOOP-11665.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11665) Provide and unify cross platform byteorder support in native code
[ https://issues.apache.org/jira/browse/HADOOP-11665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayappan updated HADOOP-11665: - Affects Version/s: 2.4.1 2.6.0 > Provide and unify cross platform byteorder support in native code > - > > Key: HADOOP-11665 > URL: https://issues.apache.org/jira/browse/HADOOP-11665 > Project: Hadoop Common > Issue Type: Bug > Components: util >Affects Versions: 2.4.1, 2.6.0 >Reporter: Binglin Chang >Assignee: Binglin Chang > Attachments: HADOOP-11665.001.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348229#comment-14348229 ] Hadoop QA commented on HADOOP-11674: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702711/HADOOP-11674.1.patch against trunk revision 8d88691. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5853//console This message is automatically generated. > data corruption for parallel CryptoInputStream and CryptoOutputStream > - > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11648) set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348217#comment-14348217 ] Hadoop QA commented on HADOOP-11648: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702060/HADOOP-11648-003.txt against trunk revision ded0200. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5850//console This message is automatically generated. > set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not
[ https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-11675: --- Attachment: HADOOP-11675-001.txt a very simple fix, so no testing be added. > tiny exception log with checking storedBlock is null or not > --- > > Key: HADOOP-11675 > URL: https://issues.apache.org/jira/browse/HADOOP-11675 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie >Priority: Minor > Attachments: HADOOP-11675-001.txt > > > Found this log at our product cluster: > {code} > 2015-03-05,10:33:31,778 ERROR > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: > Compaction failed > regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f., > storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 > M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479 > java.io.IOException: > BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not > exist or is not under Constructionnull > {code} > let's check storedBlock is null or not to make log pretty -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11675) tiny exception log with checking storedBlock is null or not
[ https://issues.apache.org/jira/browse/HADOOP-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-11675: --- Status: Patch Available (was: Open) > tiny exception log with checking storedBlock is null or not > --- > > Key: HADOOP-11675 > URL: https://issues.apache.org/jira/browse/HADOOP-11675 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie >Priority: Minor > Attachments: HADOOP-11675-001.txt > > > Found this log at our product cluster: > {code} > 2015-03-05,10:33:31,778 ERROR > org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: > Compaction failed > regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f., > storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 > M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479 > java.io.IOException: > BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not > exist or is not under Constructionnull > {code} > let's check storedBlock is null or not to make log pretty -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11675) tiny exception log with checking storedBlock is null or not
Liang Xie created HADOOP-11675: -- Summary: tiny exception log with checking storedBlock is null or not Key: HADOOP-11675 URL: https://issues.apache.org/jira/browse/HADOOP-11675 Project: Hadoop Common Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Priority: Minor Found this log at our product cluster: {code} 2015-03-05,10:33:31,778 ERROR org.apache.hadoop.hbase.regionserver.compactions.CompactionRequest: Compaction failed regionName=xiaomi_device_info_test,ff,1425377429116.41437dc231fe370f1304104a75aad78f., storeName=A, fileCount=7, fileSize=899.7 M (470.7 M, 259.7 M, 75.9 M, 24.4 M, 24.8 M, 25.7 M, 18.6 M), priority=23, time=44765894600479 java.io.IOException: BP-1356983882-10.2.201.14-1359086191297:blk_1211511211_1100144235504 does not exist or is not under Constructionnull {code} let's check storedBlock is null or not to make log pretty -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11643) Define EC schema API for ErasureCodec
[ https://issues.apache.org/jira/browse/HADOOP-11643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348197#comment-14348197 ] Li Bo commented on HADOOP-11643: hi, Kai I think the code is ok in general. One point: When catching a {{NumberFormatException}}, {{IllegalArgumentException}} with message {{"No codec option is provided"}} is thrown. I think the codec option is provided but not in correct integer format, so how about changing the message like "Option XXX is an integer, please provide the correct format." > Define EC schema API for ErasureCodec > - > > Key: HADOOP-11643 > URL: https://issues.apache.org/jira/browse/HADOOP-11643 > Project: Hadoop Common > Issue Type: Sub-task > Components: io >Reporter: Kai Zheng >Assignee: Kai Zheng > Attachments: HADOOP-11643_v1.patch, HADOOP-11643_v2.patch > > > As part of {{ErasureCodec}} API to be defined in HDFS-7699, {{ECSchema}} API > will be first defined here for better sync among related issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-11648: --- Attachment: HADOOP-11648-003.txt trying to re-trigger the QA > set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt, > HADOOP-11648-003.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11648) set DomainSocketWatcher thread name explicitly
[ https://issues.apache.org/jira/browse/HADOOP-11648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang Xie updated HADOOP-11648: --- Attachment: (was: HADOOP-11648-003.txt) > set DomainSocketWatcher thread name explicitly > -- > > Key: HADOOP-11648 > URL: https://issues.apache.org/jira/browse/HADOOP-11648 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Liang Xie >Assignee: Liang Xie > Attachments: HADOOP-11648-001.txt, HADOOP-11648-002.txt > > > while working at HADOOP-11604, seems the current DomainSocketWatcher thread > name is not set explicitly, e.g. in our cluster, the format is like: > Thread-25, Thread-303670 or sth else. Here Thread-25 seems came from > Datanode.initDataXceiver, and once this thread die, the Xceiver leak will be > found. I think it'd better to set the thread name, so we can debug issue > easier in further. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11510) Expose truncate API via FileContext
[ https://issues.apache.org/jira/browse/HADOOP-11510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-11510: Component/s: fs > Expose truncate API via FileContext > --- > > Key: HADOOP-11510 > URL: https://issues.apache.org/jira/browse/HADOOP-11510 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Reporter: Yi Liu >Assignee: Yi Liu > Fix For: 2.7.0 > > Attachments: HADOOP-11510.001.patch, HADOOP-11510.002.patch, > HADOOP-11510.003.patch > > > We also need to expose truncate API via {{org.apache.hadoop.fs.FileContext}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11589) NetUtils.createSocketAddr should trim the input URI
[ https://issues.apache.org/jira/browse/HADOOP-11589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-11589: Component/s: net > NetUtils.createSocketAddr should trim the input URI > --- > > Key: HADOOP-11589 > URL: https://issues.apache.org/jira/browse/HADOOP-11589 > Project: Hadoop Common > Issue Type: Improvement > Components: net >Affects Versions: 2.6.0 >Reporter: Akira AJISAKA >Assignee: Rakesh R >Priority: Minor > Labels: newbie > Fix For: 2.7.0 > > Attachments: HADOOP-11589-1.patch, HADOOP-11589-2.patch > > > NetUtils.createSocketAddr does not trim the input URI, should be trimmed. > HDFS-7684 and HADOOP-9869 are trying to trim some URIs to be passed to the > method, however, not all of the inputs have been trimmed already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11672) test
[ https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348165#comment-14348165 ] Brahma Reddy Battula commented on HADOOP-11672: --- FYKI Please go through the following link to contributions .. http://wiki.apache.org/hadoop/HowToContribute > test > > > Key: HADOOP-11672 > URL: https://issues.apache.org/jira/browse/HADOOP-11672 > Project: Hadoop Common > Issue Type: New Feature >Reporter: xiangqian.xu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HADOOP-11674: - Status: Patch Available (was: In Progress) > data corruption for parallel CryptoInputStream and CryptoOutputStream > - > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HADOOP-11672) test
[ https://issues.apache.org/jira/browse/HADOOP-11672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula resolved HADOOP-11672. --- Resolution: Not a Problem > test > > > Key: HADOOP-11672 > URL: https://issues.apache.org/jira/browse/HADOOP-11672 > Project: Hadoop Common > Issue Type: New Feature >Reporter: xiangqian.xu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HADOOP-11674: - Attachment: HADOOP-11674.1.patch > data corruption for parallel CryptoInputStream and CryptoOutputStream > - > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > Attachments: HADOOP-11674.1.patch > > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
Sean Busbey created HADOOP-11674: Summary: data corruption for parallel CryptoInputStream and CryptoOutputStream Key: HADOOP-11674 URL: https://issues.apache.org/jira/browse/HADOOP-11674 Project: Hadoop Common Issue Type: Bug Components: io Affects Versions: 2.6.0 Reporter: Sean Busbey Assignee: Sean Busbey Priority: Critical A common optimization in the io classes for Input/Output Streams is to save a single length-1 byte array to use in single byte read/write calls. CryptoInputStream and CryptoOutputStream both attempt to follow this practice but mistakenly mark the array as static. That means that only a single instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HADOOP-11674) data corruption for parallel CryptoInputStream and CryptoOutputStream
[ https://issues.apache.org/jira/browse/HADOOP-11674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HADOOP-11674 started by Sean Busbey. > data corruption for parallel CryptoInputStream and CryptoOutputStream > - > > Key: HADOOP-11674 > URL: https://issues.apache.org/jira/browse/HADOOP-11674 > Project: Hadoop Common > Issue Type: Bug > Components: io >Affects Versions: 2.6.0 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Critical > > A common optimization in the io classes for Input/Output Streams is to save a > single length-1 byte array to use in single byte read/write calls. > CryptoInputStream and CryptoOutputStream both attempt to follow this practice > but mistakenly mark the array as static. That means that only a single > instance of each can be present in a JVM safely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c
[ https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348152#comment-14348152 ] Hadoop QA commented on HADOOP-11638: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702696/HADOOP-11638-002.patch against trunk revision 8d88691. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5852//console This message is automatically generated. > Linux-specific gettid() used in OpensslSecureRandom.c > - > > Key: HADOOP-11638 > URL: https://issues.apache.org/jira/browse/HADOOP-11638 > Project: Hadoop Common > Issue Type: Bug > Components: native >Affects Versions: 2.6.0 >Reporter: Dmitry Sivachenko >Assignee: Kiran Kumar M R > Labels: freebsd > Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch > > > In OpensslSecureRandom.c you use Linux-specific syscall gettid(): > static unsigned long pthreads_thread_id(void) > { > return (unsigned long)syscall(SYS_gettid); > } > Man page says: > gettid() is Linux-specific and should not be used in programs that are > intended to be portable. > This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11103) Clean up RemoteException
[ https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348115#comment-14348115 ] Sean Busbey commented on HADOOP-11103: -- TestFileTruncate passes locally. > Clean up RemoteException > > > Key: HADOOP-11103 > URL: https://issues.apache.org/jira/browse/HADOOP-11103 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Trivial > Attachments: HADOOP-11103.1.patch > > > RemoteException has a number of undocumented behaviors > * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the > source, the String returned is the classname of the wrapped remote exception. > * RemoteException(String, String) is equivalent to calling > RemoteException(String, String, null) > * Constructors allow null for all arguments > * Some of the test code doesn't check for correct error codes to correspond > with the wrapped exception type > * methods don't document when they might return null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid and out files for secure daemons have been renamed to include the appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. Additionally, pid files are now created when daemons are run in interactive mode. This will also prevent the accidental starting of two daemons with the same configuration prior to launching java (i.e., "fast fail" without having to wait for socket opening). * All Hadoop shell script subsystems now execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * The default content of *-env.sh has been significantly altered, with the majority of defaults moved into more protected areas inside the code. Additionally, these files do not auto-append anymore; setting a variable on the command line prior to calling a shell command must contain the entire content, not just any extra settings. This brings Hadoop more in-line with the vast majority of other software packages. * All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', and related commands are executed. Previously, these were separated out which meant a significant amount of duplication of common settings. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin. The sbin versions have been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. These settings are now configurable in the *-env.sh files via *_OPT. * Support for various undocumented YARN log4j.properties files has been removed. * Support for ${HADOOP_MASTER} and the related rsync code have been removed. * The undocumented and unused yarn.id.str Java property has been removed. * The unused yarn.policy.file Java property has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work. * Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding ${HADOOP_PREFIX} auto discovery.) * Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and stripped from their respective environment settings. NEW FEATURES: * Daemonization has been moved from *-daemon.sh to the bin commands via the --daemon option. Simply use --daemon start to start a daemon, --daemon stop to stop a daemon, and --daemon status to set $? to the daemon's status. The return code for status is LSB-compatible. For example, 'hdfs --daemon start namenode'. * It is now possible to override some of the shell code capabilities to provide site specific functionality without replacing the shipped versions. Replacement functions should go into the new hadoop-user-functions.sh file. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * Operations which trigger ssh connections can now use pdsh if installed. ${HADOOP_SSH_OPTS} still gets applied. * Added distch and jnipath subcommands to the hadoop command. * Shell scripts now support a --debug option which will report basic information on the construction of various environment variables, java options, classpath, etc. to help in configuration debugging. BUG FIXES: * ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring symlinking and other such tricks. * ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided hadoop-layout.sh.example file. * Shell commands should now work properly when called as a relative path, without ${HADOOP_PREFIX} being defined, and as the target of bash -x for debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined based upon the current location of the shell library. Note that other parts of the extended Hadoop ecosystem may still require this environment variable to be configured. * Operations which trigger ssh will now limit the number of connections to run in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. By default, this is set to 10. * ${HADOOP_CLIENT_OPTS} support has been added to a few more commands. * Some subcommands were not listed in the usage. * Various options on hadoop command lines were supported i
[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c
[ https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348097#comment-14348097 ] Kiran Kumar M R commented on HADOOP-11638: -- Thanks for review Colin. Added new patch as per comments. [~trtrmitya], you can compile on FreeBSD and confirm if patch is working. > Linux-specific gettid() used in OpensslSecureRandom.c > - > > Key: HADOOP-11638 > URL: https://issues.apache.org/jira/browse/HADOOP-11638 > Project: Hadoop Common > Issue Type: Bug > Components: native >Affects Versions: 2.6.0 >Reporter: Dmitry Sivachenko >Assignee: Kiran Kumar M R > Labels: freebsd > Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch > > > In OpensslSecureRandom.c you use Linux-specific syscall gettid(): > static unsigned long pthreads_thread_id(void) > { > return (unsigned long)syscall(SYS_gettid); > } > Man page says: > gettid() is Linux-specific and should not be used in programs that are > intended to be portable. > This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c
[ https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kiran Kumar M R updated HADOOP-11638: - Attachment: HADOOP-11638-002.patch > Linux-specific gettid() used in OpensslSecureRandom.c > - > > Key: HADOOP-11638 > URL: https://issues.apache.org/jira/browse/HADOOP-11638 > Project: Hadoop Common > Issue Type: Bug > Components: native >Affects Versions: 2.6.0 >Reporter: Dmitry Sivachenko >Assignee: Kiran Kumar M R > Labels: freebsd > Attachments: HADOOP-11638-001.patch, HADOOP-11638-002.patch > > > In OpensslSecureRandom.c you use Linux-specific syscall gettid(): > static unsigned long pthreads_thread_id(void) > { > return (unsigned long)syscall(SYS_gettid); > } > Man page says: > gettid() is Linux-specific and should not be used in programs that are > intended to be portable. > This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return
[ https://issues.apache.org/jira/browse/HADOOP-11673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula reassigned HADOOP-11673: - Assignee: Brahma Reddy Battula > Use org.junit.Assume to skip tests instead of return > > > Key: HADOOP-11673 > URL: https://issues.apache.org/jira/browse/HADOOP-11673 > Project: Hadoop Common > Issue Type: Improvement > Components: test >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula >Priority: Minor > > We see the following code many times: > {code:title=TestCodec.java} > if (!ZlibFactory.isNativeZlibLoaded(conf)) { > LOG.warn("skipped: native libs not loaded"); > return; > } > {code} > If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, > with a warn log. I'd like to *skip* this test case by using > {{org.junit.Assume}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11673) Use org.junit.Assume to skip tests instead of return
Akira AJISAKA created HADOOP-11673: -- Summary: Use org.junit.Assume to skip tests instead of return Key: HADOOP-11673 URL: https://issues.apache.org/jira/browse/HADOOP-11673 Project: Hadoop Common Issue Type: Improvement Components: test Reporter: Akira AJISAKA Priority: Minor We see the following code many times: {code:title=TestCodec.java} if (!ZlibFactory.isNativeZlibLoaded(conf)) { LOG.warn("skipped: native libs not loaded"); return; } {code} If {{ZlibFactory.isNativeZlibLoaded(conf)}} is false, the test will *pass*, with a warn log. I'd like to *skip* this test case by using {{org.junit.Assume}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField
[ https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348057#comment-14348057 ] Hadoop QA commented on HADOOP-10027: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702397/HADOOP-10027.3.patch against trunk revision ded0200. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-common-project/hadoop-common: org.apache.hadoop.io.compress.TestCodec Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5851//console This message is automatically generated. > *Compressor_deflateBytesDirect passes instance instead of jclass to > GetStaticObjectField > > > Key: HADOOP-10027 > URL: https://issues.apache.org/jira/browse/HADOOP-10027 > Project: Hadoop Common > Issue Type: Bug > Components: native >Reporter: Eric Abbott >Assignee: Hui Zheng >Priority: Minor > Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, > HADOOP-10027.3.patch > > > http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup > This pattern appears in all the native compressors. > // Get members of ZlibCompressor > jobject clazz = (*env)->GetStaticObjectField(env, this, > ZlibCompressor_clazz); > The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a > jobject. Adding the JVM param -Xcheck:jni will cause "FATAL ERROR in native > method: JNI received a class argument that is not a class" and a core dump > such as the following. > (gdb) > #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6 > #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6 > #2 0x7f02e45bd727 in os::abort(bool) () from > /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, > bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from > /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #5 0x7f02d38eaf79 in > Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () > from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > In addition, that clazz object is only used for synchronization. In the case > of the native method _deflateBytesDirect, the result is a class wide lock > used to access the instance field uncompressed_direct_buf. Perhaps using the > instance as the sync point is more appropriate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11672) test
xiangqian.xu created HADOOP-11672: - Summary: test Key: HADOOP-11672 URL: https://issues.apache.org/jira/browse/HADOOP-11672 Project: Hadoop Common Issue Type: New Feature Reporter: xiangqian.xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11671) Asynchronous native RPC v9 client
[ https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348018#comment-14348018 ] Haohui Mai commented on HADOOP-11671: - bq. Is this really a good, long term strategy given our use of protobuf now that gRPC exists? The Hadoop RPC library allows more native applications to be integrated with Hadoop and to benefit the ecosystem. Once Hadoop have switched to GRPC, we can transform this library as a shim library to GRPC, or retire it. :-) > Asynchronous native RPC v9 client > - > > Key: HADOOP-11671 > URL: https://issues.apache.org/jira/browse/HADOOP-11671 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Haohui Mai >Assignee: Haohui Mai > > There are more and more integration happening between Hadoop and applications > that are implemented using languages other than Java. > To access Hadoop, applications either have to go through JNI (e.g. libhdfs), > or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). > Unfortunately, neither of them are satisfactory: > * Integrating with JNI requires running a JVM inside the application. Some > applications (e.g., real-time processing, MPP database) does not want the > footprints and GC behavior of the JVM. > * The Hadoop RPC protocol has a rich feature set including wire encryption, > SASL, Kerberos authentication. Many 3rd-party implementations can fully cover > the feature sets thus they might work in limited environment. > This jira is to propose implementing an Hadoop RPC library in C++ that > provides a common ground to implement higher-level native client for HDFS, > YARN, and MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture
[ https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348013#comment-14348013 ] Colin Patrick McCabe commented on HADOOP-11660: --- OK. Thanks, Edward. > Add support for hardware crc on ARM aarch64 architecture > > > Key: HADOOP-11660 > URL: https://issues.apache.org/jira/browse/HADOOP-11660 > Project: Hadoop Common > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 > Environment: ARM aarch64 development platform >Reporter: Edward Nevill >Assignee: Edward Nevill >Priority: Minor > Labels: performance > Original Estimate: 48h > Remaining Estimate: 48h > > This patch adds support for hardware crc for ARM's new 64 bit architecture > The patch is completely conditionalized on __aarch64__ > I have only added support for the non pipelined version as I benchmarked the > pipelined version on aarch64 and it showed no performance improvement. > The aarch64 version supports both Castagnoli and Zlib CRCs as both of these > are supported on ARM aarch64 hardwre. > To benchmark this I modified the test_bulk_crc32 test to print out the time > taken to CRC a 1MB dataset 1000 times. > Before: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > After: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > So this represents a 5X performance improvement on raw CRC calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11638) Linux-specific gettid() used in OpensslSecureRandom.c
[ https://issues.apache.org/jira/browse/HADOOP-11638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348010#comment-14348010 ] Colin Patrick McCabe commented on HADOOP-11638: --- Can you add an {{#else}} clause that has an {{#error}}? +1 after that is done. thanks. > Linux-specific gettid() used in OpensslSecureRandom.c > - > > Key: HADOOP-11638 > URL: https://issues.apache.org/jira/browse/HADOOP-11638 > Project: Hadoop Common > Issue Type: Bug > Components: native >Affects Versions: 2.6.0 >Reporter: Dmitry Sivachenko >Assignee: Kiran Kumar M R > Labels: freebsd > Attachments: HADOOP-11638-001.patch > > > In OpensslSecureRandom.c you use Linux-specific syscall gettid(): > static unsigned long pthreads_thread_id(void) > { > return (unsigned long)syscall(SYS_gettid); > } > Man page says: > gettid() is Linux-specific and should not be used in programs that are > intended to be portable. > This breaks hadoop-2.6.0 compilation on FreeBSD (may be on other OSes too). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11460) Deprecate shell vars
[ https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11460: -- Release Note: The following shell environment variables have been deprecated: | Old | New | |: |: | | HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE| | HADOOP_HDFS_NICENESS| HADOOP_NICENESS| | HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT | | HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR| | HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING| | HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE| | HADOOP_MAPRED_NICENESS| HADOOP_NICENESS| | HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR| | HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_CONF_DIR| HADOOP_CONF_DIR| | YARN_LOG_DIR| HADOOP_LOG_DIR| | YARN_LOGFILE| HADOOP_LOGFILE| | YARN_NICENESS| HADOOP_NICENESS| | YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | YARN_PID_DIR| HADOOP_PID_DIR| | YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | YARN_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_OPTS| HADOOP_OPTS| | YARN_SLAVES| HADOOP_SLAVES| | YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH| | YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST| | KMS_CONFIG |HADOOP_CONF_DIR| | KMS_LOG |HADOOP_LOG_DIR | was: The following shell environment variables have been deprecated: | Old | New | |: |: | | HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE| | HADOOP_HDFS_NICENESS| HADOOP_NICENESS| | HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT | | HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR| | HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING| | HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE| | HADOOP_MAPRED_NICENESS| HADOOP_NICENESS| | HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR| | HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_CONF_DIR| HADOOP_CONF_DIR| | YARN_LOG_DIR| HADOOP_LOG_DIR| | YARN_LOGFILE| HADOOP_LOGFILE| | YARN_NICENESS| HADOOP_NICENESS| | YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | YARN_PID_DIR| HADOOP_PID_DIR| | YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | YARN_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_OPTS| HADOOP_OPTS| | YARN_SLAVES| HADOOP_SLAVES| | YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH| | YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST| | KMS_CONFIG |HADOOP_CONF_DIR| | KMS_LOG |HADOOP_LOG_DIR | > Deprecate shell vars > > > Key: HADOOP-11460 > URL: https://issues.apache.org/jira/browse/HADOOP-11460 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: John Smith > Labels: scripts, shell > Fix For: 3.0.0 > > Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, > HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch > > > It is a very common shell pattern in 3.x to effectively replace sub-project > specific vars with generics. We should have a function that does this > replacement and provides a warning to the end user that the old shell var is > deprecated. Additionally, we should use this shell function to deprecate the > shell vars that are holdovers already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HADOOP-11671) Asynchronous native RPC v9 client
[ https://issues.apache.org/jira/browse/HADOOP-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai moved HDFS-7887 to HADOOP-11671: --- Key: HADOOP-11671 (was: HDFS-7887) Project: Hadoop Common (was: Hadoop HDFS) > Asynchronous native RPC v9 client > - > > Key: HADOOP-11671 > URL: https://issues.apache.org/jira/browse/HADOOP-11671 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Haohui Mai >Assignee: Haohui Mai > > There are more and more integration happening between Hadoop and applications > that are implemented using languages other than Java. > To access Hadoop, applications either have to go through JNI (e.g. libhdfs), > or to reverse engineer the Hadoop RPC protocol. (e.g. snakebite). > Unfortunately, neither of them are satisfactory: > * Integrating with JNI requires running a JVM inside the application. Some > applications (e.g., real-time processing, MPP database) does not want the > footprints and GC behavior of the JVM. > * The Hadoop RPC protocol has a rich feature set including wire encryption, > SASL, Kerberos authentication. Many 3rd-party implementations can fully cover > the feature sets thus they might work in limited environment. > This jira is to propose implementing an Hadoop RPC library in C++ that > provides a common ground to implement higher-level native client for HDFS, > YARN, and MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10027) *Compressor_deflateBytesDirect passes instance instead of jclass to GetStaticObjectField
[ https://issues.apache.org/jira/browse/HADOOP-10027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348006#comment-14348006 ] Colin Patrick McCabe commented on HADOOP-10027: --- Not sure what the issue was here. It looks kind of like a jenkins problem? Not sure. {code} [INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ hadoop-auth --- FATAL: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown hudson.remoting.RequestAbortedException: hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:41) at hudson.remoting.RequestAbortedException.wrapForRethrow(RequestAbortedException.java:34) at hudson.remoting.Request.call(Request.java:174) at hudson.remoting.Channel.call(Channel.java:742) at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:168) at com.sun.proxy.$Proxy57.join(Unknown Source) at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:956) {code} I will retrigger. > *Compressor_deflateBytesDirect passes instance instead of jclass to > GetStaticObjectField > > > Key: HADOOP-10027 > URL: https://issues.apache.org/jira/browse/HADOOP-10027 > Project: Hadoop Common > Issue Type: Bug > Components: native >Reporter: Eric Abbott >Assignee: Hui Zheng >Priority: Minor > Attachments: HADOOP-10027.1.patch, HADOOP-10027.2.patch, > HADOOP-10027.3.patch > > > http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c?view=markup > This pattern appears in all the native compressors. > // Get members of ZlibCompressor > jobject clazz = (*env)->GetStaticObjectField(env, this, > ZlibCompressor_clazz); > The 2nd argument to GetStaticObjectField is supposed to be a jclass, not a > jobject. Adding the JVM param -Xcheck:jni will cause "FATAL ERROR in native > method: JNI received a class argument that is not a class" and a core dump > such as the following. > (gdb) > #0 0x7f02e4aef8a5 in raise () from /lib64/libc.so.6 > #1 0x7f02e4af1085 in abort () from /lib64/libc.so.6 > #2 0x7f02e45bd727 in os::abort(bool) () from > /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #3 0x7f02e43cec63 in jniCheck::validate_class(JavaThread*, _jclass*, > bool) () from /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #4 0x7f02e43ea669 in checked_jni_GetStaticObjectField () from > /opt/jdk1.6.0_31/jre/lib/amd64/server/libjvm.so > #5 0x7f02d38eaf79 in > Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_deflateBytesDirect () > from /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > In addition, that clazz object is only used for synchronization. In the case > of the native method _deflateBytesDirect, the result is a class wide lock > used to access the instance field uncompressed_direct_buf. Perhaps using the > instance as the sync point is more appropriate? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-9902) Shell script rewrite
[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-9902: - Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid and out files for secure daemons have been renamed to include the appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. Additionally, pid files are now created when daemons are run in interactive mode. This will also prevent the accidental starting of two daemons with the same configuration prior to launching java (i.e., "fast fail" without having to wait for socket opening). * All Hadoop shell script subsystems now execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * The default content of *-env.sh has been significantly altered, with the majority of defaults moved into more protected areas inside the code. Additionally, these files do not auto-append anymore; setting a variable on the command line prior to calling a shell command must contain the entire content, not just any extra settings. This brings Hadoop more in-line with the vast majority of other software packages. * All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', and related commands are executed. Previously, these were separated out which meant a significant amount of duplication of common settings. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin. The sbin versions have been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. These settings are now configurable in the *-env.sh files via *_OPT. * Some formerly 'documented' entries in yarn-env.sh have been undocumented as a simple form of deprecration in order to greatly simplify configuration and reduce unnecessary duplication. They still work, but those variables will likely be removed in a future release. * Support for various undocumented YARN log4j.properties files has been removed. * Support for ${HADOOP_MASTER} and the related rsync code have been removed. * The undocumented and unused yarn.id.str Java property has been removed. * The unused yarn.policy.file Java property has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work. * Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding ${HADOOP_PREFIX} auto discovery.) * Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and stripped from their respective environment settings. * cygwin support has been removed. NEW FEATURES: * Daemonization has been moved from *-daemon.sh to the bin commands via the --daemon option. Simply use --daemon start to start a daemon, --daemon stop to stop a daemon, and --daemon status to set $? to the daemon's status. The return code for status is LSB-compatible. For example, 'hdfs --daemon start namenode'. * It is now possible to override some of the shell code capabilities to provide site specific functionality without replacing the shipped versions. Replacement functions should go into the new hadoop-user-functions.sh file. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * Operations which trigger ssh connections can now use pdsh if installed. ${HADOOP_SSH_OPTS} still gets applied. * Added distch and jnipath subcommands to the hadoop command. * Shell scripts now support a --debug option which will report basic information on the construction of various environment variables, java options, classpath, etc. to help in configuration debugging. BUG FIXES: * ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring symlinking and other such tricks. * ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided hadoop-layout.sh.example file. * Shell commands should now work properly when called as a relative path, without ${HADOOP_PREFIX} being defined, and as the target of bash -x for debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined based upon the current location of the shell library. Note that other parts of the extended Hadoop ecosystem may still require this environment variable to be configured. * Operations which trigger ssh will now limit the number of conn
[jira] [Updated] (HADOOP-11460) Deprecate shell vars
[ https://issues.apache.org/jira/browse/HADOOP-11460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11460: -- Release Note: The following shell environment variables have been deprecated: | Old | New | |: |: | | HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE| | HADOOP_HDFS_NICENESS| HADOOP_NICENESS| | HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT | | HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR| | HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING| | HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE| | HADOOP_MAPRED_NICENESS| HADOOP_NICENESS| | HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR| | HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_CONF_DIR| HADOOP_CONF_DIR| | YARN_LOG_DIR| HADOOP_LOG_DIR| | YARN_LOGFILE| HADOOP_LOGFILE| | YARN_NICENESS| HADOOP_NICENESS| | YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | YARN_PID_DIR| HADOOP_PID_DIR| | YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | YARN_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_OPTS| HADOOP_OPTS| | YARN_SLAVES| HADOOP_SLAVES| | YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH| | YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST| | KMS_CONFIG |HADOOP_CONF_DIR| | KMS_LOG |HADOOP_LOG_DIR | was: The following shell environment variables have been deprecated: || Old || New || | | | | HADOOP_HDFS_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_HDFS_LOGFILE| HADOOP_LOGFILE| | HADOOP_HDFS_NICENESS| HADOOP_NICENESS| | HADOOP_HDFS_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT | | HADOOP_HDFS_PID_DIR| HADOOP_PID_DIR| | HADOOP_HDFS_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_HDFS_IDENT_STRING| HADOOP_IDENT_STRING| | HADOOP_MAPRED_LOG_DIR| HADOOP_LOG_DIR| | HADOOP_MAPRED_LOGFILE| HADOOP_LOGFILE| | HADOOP_MAPRED_NICENESS| HADOOP_NICENESS| | HADOOP_MAPRED_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | HADOOP_MAPRED_PID_DIR| HADOOP_PID_DIR| | HADOOP_MAPRED_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | HADOOP_MAPRED_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_CONF_DIR| HADOOP_CONF_DIR| | YARN_LOG_DIR| HADOOP_LOG_DIR| | YARN_LOGFILE| HADOOP_LOGFILE| | YARN_NICENESS| HADOOP_NICENESS| | YARN_STOP_TIMEOUT| HADOOP_STOP_TIMEOUT| | YARN_PID_DIR| HADOOP_PID_DIR| | YARN_ROOT_LOGGER| HADOOP_ROOT_LOGGER| | YARN_IDENT_STRING| HADOOP_IDENT_STRING| | YARN_OPTS| HADOOP_OPTS| | YARN_SLAVES| HADOOP_SLAVES| | YARN_USER_CLASSPATH| HADOOP_USER_CLASSPATH| | YARN_USER_CLASSPATH_FIRST| HADOOP_USER_CLASSPATH_FIRST| | KMS_CONFIG |HADOOP_CONF_DIR| | KMS_LOG |HADOOP_LOG_DIR | > Deprecate shell vars > > > Key: HADOOP-11460 > URL: https://issues.apache.org/jira/browse/HADOOP-11460 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts >Affects Versions: 3.0.0 >Reporter: Allen Wittenauer >Assignee: John Smith > Labels: scripts, shell > Fix For: 3.0.0 > > Attachments: HADOOP-11460-00.patch, HADOOP-11460-01.patch, > HADOOP-11460-02.patch, HADOOP-11460-03.patch, HADOOP-11460-04.patch > > > It is a very common shell pattern in 3.x to effectively replace sub-project > specific vars with generics. We should have a function that does this > replacement and provides a warning to the end user that the old shell var is > deprecated. Additionally, we should use this shell function to deprecate the > shell vars that are holdovers already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11602) Fix toUpperCase/toLowerCase to use Locale.ENGLISH
[ https://issues.apache.org/jira/browse/HADOOP-11602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347927#comment-14347927 ] Akira AJISAKA commented on HADOOP-11602: Thanks [~ozawa]! Looks good to me but the patch needs rebasing. > Fix toUpperCase/toLowerCase to use Locale.ENGLISH > - > > Key: HADOOP-11602 > URL: https://issues.apache.org/jira/browse/HADOOP-11602 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Attachments: HADOOP-11602-001.patch, HADOOP-11602-002.patch, > HADOOP-11602-003.patch, HADOOP-11602-004.patch, > HADOOP-11602-branch-2.001.patch, HADOOP-11602-branch-2.002.patch, > HADOOP-11602-branch-2.003.patch > > > String#toLowerCase()/toUpperCase() without a locale argument can occur > unexpected behavior based on the locale. It's written in > [Javadoc|http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#toLowerCase()]: > {quote} > For instance, "TITLE".toLowerCase() in a Turkish locale returns "t\u0131tle", > where '\u0131' is the LATIN SMALL LETTER DOTLESS I character > {quote} > This issue is derived from HADOOP-10101. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11103) Clean up RemoteException
[ https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347901#comment-14347901 ] Hadoop QA commented on HADOOP-11103: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12669533/HADOOP-11103.1.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5848//console This message is automatically generated. > Clean up RemoteException > > > Key: HADOOP-11103 > URL: https://issues.apache.org/jira/browse/HADOOP-11103 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Trivial > Attachments: HADOOP-11103.1.patch > > > RemoteException has a number of undocumented behaviors > * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the > source, the String returned is the classname of the wrapped remote exception. > * RemoteException(String, String) is equivalent to calling > RemoteException(String, String, null) > * Constructors allow null for all arguments > * Some of the test code doesn't check for correct error codes to correspond > with the wrapped exception type > * methods don't document when they might return null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347806#comment-14347806 ] Adam Budde commented on HADOOP-11670: - My mistake-- looks like you're correct. I've updated the description. > Fix IAM instance profile auth for s3a > - > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-10714 breaks this behavior by using the > S3Credentials class to read the value of these two params. The change in > question is presented below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfileCredentialsProvider(), > new AnonymousAWSCredentialsProvider() > ); > {code} > As you can see, the getAccessKey() and getSecretAccessKey() methods from the > S3Credentials class are now used to provide constructor arguments to > BasicAWSCredentialsProvider. These methods will raise an exception if the > fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, > respectively. If a user is relying on an IAM instance profile to authenticate > to an S3 bucket and therefore doesn't supply values for these params, they > will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Budde updated HADOOP-11670: Description: One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-10714 breaks this behavior by using the S3Credentials class to read the value of these two params. The change in question is presented below: S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. was: One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-11446 breaks this behavior by using the S3Credentials class to read the value of these two params (this change is unrelated to resolving HADOOP-11446). The change in question is presented below: S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. > Fix IAM instance profile auth for s3a > - > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-10714 breaks this behavior by using the > S3Credentials class to read the value of these two params. The change in > question is presented below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfileCredentialsProvider(), > new AnonymousAWSCredentialsProvider() > ); > {code} > As you can see, the getAcc
[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-11670: Affects Version/s: (was: 2.6.0) 2.7.0 > Fix IAM instance profile auth for s3a > - > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.7.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-11446 breaks this behavior by using the > S3Credentials class to read the value of these two params (this change is > unrelated to resolving HADOOP-11446). The change in question is presented > below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfileCredentialsProvider(), > new AnonymousAWSCredentialsProvider() > ); > {code} > As you can see, the getAccessKey() and getSecretAccessKey() methods from the > S3Credentials class are now used to provide constructor arguments to > BasicAWSCredentialsProvider. These methods will raise an exception if the > fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, > respectively. If a user is relying on an IAM instance profile to authenticate > to an S3 bucket and therefore doesn't supply values for these params, they > will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347801#comment-14347801 ] Steve Loughran commented on HADOOP-11670: - looks more like HADOOP-10714 was the change that did this > Fix IAM instance profile auth for s3a (broken in HADOOP-11446) > -- > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-11446 breaks this behavior by using the > S3Credentials class to read the value of these two params (this change is > unrelated to resolving HADOOP-11446). The change in question is presented > below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfileCredentialsProvider(), > new AnonymousAWSCredentialsProvider() > ); > {code} > As you can see, the getAccessKey() and getSecretAccessKey() methods from the > S3Credentials class are now used to provide constructor arguments to > BasicAWSCredentialsProvider. These methods will raise an exception if the > fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, > respectively. If a user is relying on an IAM instance profile to authenticate > to an S3 bucket and therefore doesn't supply values for these params, they > will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated HADOOP-11670: Summary: Fix IAM instance profile auth for s3a (was: Fix IAM instance profile auth for s3a (broken in HADOOP-11446)) > Fix IAM instance profile auth for s3a > - > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-11446 breaks this behavior by using the > S3Credentials class to read the value of these two params (this change is > unrelated to resolving HADOOP-11446). The change in question is presented > below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfileCredentialsProvider(), > new AnonymousAWSCredentialsProvider() > ); > {code} > As you can see, the getAccessKey() and getSecretAccessKey() methods from the > S3Credentials class are now used to provide constructor arguments to > BasicAWSCredentialsProvider. These methods will raise an exception if the > fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, > respectively. If a user is relying on an IAM instance profile to authenticate > to an S3 bucket and therefore doesn't supply values for these params, they > will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347783#comment-14347783 ] Hadoop QA commented on HADOOP-11668: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702627/HADOOP-11668-02.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5849//console This message is automatically generated. > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Allen Wittenauer > Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
[ https://issues.apache.org/jira/browse/HADOOP-11670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Budde updated HADOOP-11670: Description: One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-11446 breaks this behavior by using the S3Credentials class to read the value of these two params (this change is unrelated to resolving HADOOP-11446). The change in question is presented below: S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. was: One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-11446 breaks this behavior by using the S3Credentials class to read the value of these two params (this change is unrelated to resolving HADOOP-11446). S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. > Fix IAM instance profile auth for s3a (broken in HADOOP-11446) > -- > > Key: HADOOP-11670 > URL: https://issues.apache.org/jira/browse/HADOOP-11670 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Adam Budde > Fix For: 2.7.0 > > > One big advantage provided by the s3a filesystem is the ability to use an IAM > instance profile in order to authenticate when attempting to access an S3 > bucket from an EC2 instance. This eliminates the need to deploy AWS account > credentials to the instance or to provide them to Hadoop via the > fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. > The patch submitted to resolve HADOOP-11446 breaks this behavior by using the > S3Credentials class to read the value of these two params (this change is > unrelated to resolving HADOOP-11446). The change in question is presented > below: > S3AFileSystem.java, lines 161-170: > {code} > // Try to get our credentials or just connect anonymously > S3Credentials s3Credentials = new S3Credentials(); > s3Credentials.initialize(name, conf); > AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( > new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), > s3Credentials.getSecretAccessKey()), > new InstanceProfile
[jira] [Updated] (HADOOP-11103) Clean up RemoteException
[ https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11103: -- Status: Open (was: Patch Available) > Clean up RemoteException > > > Key: HADOOP-11103 > URL: https://issues.apache.org/jira/browse/HADOOP-11103 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Trivial > Attachments: HADOOP-11103.1.patch > > > RemoteException has a number of undocumented behaviors > * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the > source, the String returned is the classname of the wrapped remote exception. > * RemoteException(String, String) is equivalent to calling > RemoteException(String, String, null) > * Constructors allow null for all arguments > * Some of the test code doesn't check for correct error codes to correspond > with the wrapped exception type > * methods don't document when they might return null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-11670) Fix IAM instance profile auth for s3a (broken in HADOOP-11446)
Adam Budde created HADOOP-11670: --- Summary: Fix IAM instance profile auth for s3a (broken in HADOOP-11446) Key: HADOOP-11670 URL: https://issues.apache.org/jira/browse/HADOOP-11670 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 2.6.0 Reporter: Adam Budde Fix For: 2.7.0 One big advantage provided by the s3a filesystem is the ability to use an IAM instance profile in order to authenticate when attempting to access an S3 bucket from an EC2 instance. This eliminates the need to deploy AWS account credentials to the instance or to provide them to Hadoop via the fs.s3a.awsAccessKeyId and fs.s3a.awsSecretAccessKey params. The patch submitted to resolve HADOOP-11446 breaks this behavior by using the S3Credentials class to read the value of these two params (this change is unrelated to resolving HADOOP-11446). S3AFileSystem.java, lines 161-170: {code} // Try to get our credentials or just connect anonymously S3Credentials s3Credentials = new S3Credentials(); s3Credentials.initialize(name, conf); AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain( new BasicAWSCredentialsProvider(s3Credentials.getAccessKey(), s3Credentials.getSecretAccessKey()), new InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider() ); {code} As you can see, the getAccessKey() and getSecretAccessKey() methods from the S3Credentials class are now used to provide constructor arguments to BasicAWSCredentialsProvider. These methods will raise an exception if the fs.s3a.awsAccessKeyId or fs.s3a.awsSecretAccessKey params are missing, respectively. If a user is relying on an IAM instance profile to authenticate to an S3 bucket and therefore doesn't supply values for these params, they will receive an exception and won't be able to access the bucket. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11103) Clean up RemoteException
[ https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11103: -- Status: Patch Available (was: Open) > Clean up RemoteException > > > Key: HADOOP-11103 > URL: https://issues.apache.org/jira/browse/HADOOP-11103 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Trivial > Attachments: HADOOP-11103.1.patch > > > RemoteException has a number of undocumented behaviors > * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the > source, the String returned is the classname of the wrapped remote exception. > * RemoteException(String, String) is equivalent to calling > RemoteException(String, String, null) > * Constructors allow null for all arguments > * Some of the test code doesn't check for correct error codes to correspond > with the wrapped exception type > * methods don't document when they might return null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11668: -- Status: Patch Available (was: Reopened) > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Allen Wittenauer > Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347709#comment-14347709 ] Allen Wittenauer edited comment on HADOOP-11668 at 3/4/15 10:40 PM: -02: * This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with multiple hosts. The problem was two fold: * We were not preserving quotes around parameters that contained $IFS due to lack of quoting around the array deletion * The then deleted array elements were retained and show up as an empty argument. was (Author: aw): -02: * This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with multiple hosts. The problem was two fold: * We were preserving quotes around parameters that contained $IFS due to lack of quoting around the array deletion * The then deleted array elements were retained and show up as an empty argument. > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Allen Wittenauer > Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11668: -- Attachment: HADOOP-11668-02.patch -02: * This fixes hadoop-daemons.sh and yarn-daemons.sh so that they work with multiple hosts. The problem was two fold: * We were preserving quotes around parameters that contained $IFS due to lack of quoting around the array deletion * The then deleted array elements were retained and show up as an empty argument. > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Allen Wittenauer > Attachments: HADOOP-11668-01.patch, HADOOP-11668-02.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer reopened HADOOP-11668: --- Assignee: Allen Wittenauer (was: Vinayakumar B) Re-opening. The problem here isn't start/stop, it's *-daemons.sh, which are now broken. > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Allen Wittenauer > Attachments: HADOOP-11668-01.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10895) HTTP KerberosAuthenticator fallback should have a flag to disable it
[ https://issues.apache.org/jira/browse/HADOOP-10895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347611#comment-14347611 ] Yongjun Zhang commented on HADOOP-10895: Hi [~tucu00], [~atm], [~zjshen], [~daryn], This jira originates from the discussion in HADOOP-10771 you guys participated. I'd like to bring to your attention, to see if we want to move this one forward. Please see my comment at https://issues.apache.org/jira/browse/HADOOP-10895?focusedCommentId=14321823&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14321823 Thanks for your time, and thanks [~vinodkv] for suggesting me in the email thread to collect feedback from you guys. > HTTP KerberosAuthenticator fallback should have a flag to disable it > > > Key: HADOOP-10895 > URL: https://issues.apache.org/jira/browse/HADOOP-10895 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 2.4.1 >Reporter: Alejandro Abdelnur >Assignee: Yongjun Zhang >Priority: Blocker > Attachments: HADOOP-10895.001.patch, HADOOP-10895.002.patch, > HADOOP-10895.003.patch, HADOOP-10895.003v1.patch, HADOOP-10895.003v2.patch, > HADOOP-10895.003v2improved.patch, HADOOP-10895.004.patch, > HADOOP-10895.005.patch, HADOOP-10895.006.patch, HADOOP-10895.007.patch, > HADOOP-10895.008.patch, HADOOP-10895.009.patch > > > Per review feedback in HADOOP-10771, {{KerberosAuthenticator}} and the > delegation token version coming in with HADOOP-10771 should have a flag to > disable fallback to pseudo, similarly to the one that was introduced in > Hadoop RPC client with HADOOP-9698. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-11103) Clean up RemoteException
[ https://issues.apache.org/jira/browse/HADOOP-11103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey reassigned HADOOP-11103: Assignee: Sean Busbey > Clean up RemoteException > > > Key: HADOOP-11103 > URL: https://issues.apache.org/jira/browse/HADOOP-11103 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Trivial > Attachments: HADOOP-11103.1.patch > > > RemoteException has a number of undocumented behaviors > * o.a.h.ipc.RemoteException has no javadocs on getClassName. Reading the > source, the String returned is the classname of the wrapped remote exception. > * RemoteException(String, String) is equivalent to calling > RemoteException(String, String, null) > * Constructors allow null for all arguments > * Some of the test code doesn't check for correct error codes to correspond > with the wrapped exception type > * methods don't document when they might return null -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11656: -- Labels: classloading classpath dependencies scripts shell (was: classloading classpath dependencies shell) > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies, scripts, shell > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11656: -- Labels: classloading classpath dependencies shell (was: classloading classpath dependencies) > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies, shell > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347566#comment-14347566 ] Allen Wittenauer commented on HADOOP-11656: --- FYI, I'm adding the 'shell' label because regardless of the outcome, this will almost certainly have an impact on how the various classpath commands and shellprofile.d code works in the future. > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies, shell > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11613) Remove httpclient dependency from hadoop-azure
[ https://issues.apache.org/jira/browse/HADOOP-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347560#comment-14347560 ] Brahma Reddy Battula commented on HADOOP-11613: --- *Testcase failures* are because of {{encodedKey = URLEncoder.encode(key, "UTF-8");}} which is having the limitations on special characters..( All other characters are unsafe and are first converted into one or more bytes using some encoding scheme..Check the following java doc for same) https://docs.oracle.com/javase/6/docs/api/java/net/URLEncoder.html Just I replaced with bitset ( like following ), all the testcases are passing..I am always happy work with bitset..Hence I had given intial patch with bitset.. {code} byte[] rawdata = URLCodec.encodeUrl(allowed_abs_path, EncodingUtils.getBytes(key, "UTF-8")); String encodedKey = EncodingUtils.getAsciiString(rawdata); {code} [~ajisakaa] If you agree with you, please consider initial patch which is having the bitset..( with this all the testcases are passing).. Please correct me If I am wrong. > Remove httpclient dependency from hadoop-azure > -- > > Key: HADOOP-11613 > URL: https://issues.apache.org/jira/browse/HADOOP-11613 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11613-001.patch, HADOOP-11613-002.patch, > HADOOP-11613-003.patch, HADOOP-11613.patch > > > Remove httpclient dependency from MockStorageInterface.java. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347538#comment-14347538 ] Hadoop QA commented on HADOOP-11659: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702588/HADOOP-11659.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5846//console This message is automatically generated. > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HADOOP-11659.patch > > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk
[ https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347500#comment-14347500 ] Hadoop QA commented on HADOOP-11627: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702595/HADOOP-11627-004.patch against trunk revision ed70fa1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5847//console This message is automatically generated. > Remove io.native.lib.available from trunk > - > > Key: HADOOP-11627 > URL: https://issues.apache.org/jira/browse/HADOOP-11627 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, > HADOOP-11627-004.patch, HADOOP-11627.patch > > > According to the discussion in HADOOP-8642, we should remove > {{io.native.lib.available}} from trunk, and always use native libraries if > they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk
[ https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347479#comment-14347479 ] Brahma Reddy Battula commented on HADOOP-11627: --- Ran all the testcases for regression locally , all are passing. > Remove io.native.lib.available from trunk > - > > Key: HADOOP-11627 > URL: https://issues.apache.org/jira/browse/HADOOP-11627 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, > HADOOP-11627-004.patch, HADOOP-11627.patch > > > According to the discussion in HADOOP-8642, we should remove > {{io.native.lib.available}} from trunk, and always use native libraries if > they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11627) Remove io.native.lib.available from trunk
[ https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347469#comment-14347469 ] Brahma Reddy Battula commented on HADOOP-11627: --- Thanks a lot for review..Please check updated patch.. > Remove io.native.lib.available from trunk > - > > Key: HADOOP-11627 > URL: https://issues.apache.org/jira/browse/HADOOP-11627 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, > HADOOP-11627-004.patch, HADOOP-11627.patch > > > According to the discussion in HADOOP-8642, we should remove > {{io.native.lib.available}} from trunk, and always use native libraries if > they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk
[ https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11627: -- Attachment: HADOOP-11627-004.patch > Remove io.native.lib.available from trunk > - > > Key: HADOOP-11627 > URL: https://issues.apache.org/jira/browse/HADOOP-11627 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, > HADOOP-11627-004.patch, HADOOP-11627.patch > > > According to the discussion in HADOOP-8642, we should remove > {{io.native.lib.available}} from trunk, and always use native libraries if > they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11627) Remove io.native.lib.available from trunk
[ https://issues.apache.org/jira/browse/HADOOP-11627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11627: -- Status: Patch Available (was: In Progress) > Remove io.native.lib.available from trunk > - > > Key: HADOOP-11627 > URL: https://issues.apache.org/jira/browse/HADOOP-11627 > Project: Hadoop Common > Issue Type: Improvement >Affects Versions: 3.0.0 >Reporter: Akira AJISAKA >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11627-002.patch, HADOOP-11627-003.patch, > HADOOP-11627-004.patch, HADOOP-11627.patch > > > According to the discussion in HADOOP-8642, we should remove > {{io.native.lib.available}} from trunk, and always use native libraries if > they exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347446#comment-14347446 ] Steve Loughran commented on HADOOP-11656: - [~saint@gmail.com], as someone downstream, I know you know the situation we have now; everyone who goes down experiences this, with HBase and OOzie being core pain points. Not exposing the transitive dependencies means that you can stop worrying about what version of Guava or protobuf is used by Hadoop, leaving only our consistent semantics to maintain. The native lib problem will mean no more than one version of the hadoop JARs can be reliably loaded. Now, unless I'm confused about how classloaders bootstrap, it has to be done in an order; classloader above classloader, with OSGi doing some magic at startup so the first CL can pick up stuff from external CLs and make them visible to others. Does this mean that adoption of the new CL is a whole new startup process? as if so, it is going to be visible to everything downstream. Now, we could design YARN-679 to be ready for this, so if you adopt that as the launcher for your app then you can get the CL setup in there. But what about every single client app that wants to talk HDFS? We may be able to go to HBase & Accumulo & say "new launcher", maybe go to spark and say "your AM needs to do this", but it's harder to say "your general purpose code to read off HDFS must now use our CL chain to work". Especially for the use case "webapp running in tomcat with the Classloader isolation of Java EE". Things like aren't going to work if we start imposing a new CL, it will need to flip the switch to say no dependency magic. So why is this being proposed as "on-by-default"? And, since there isn't a clear proposal yet, are we trying to define that we should be incompatible from the outset? Please: give us a proposal, let's work towards an implementation, actually test this downstream including in an Oozie version (hence tomcat tests), in-cluster apps, and remote client apps. Then we can consider whether or not it would be justifiable to say "you must do this to move to Hadoop 3" Oh, and given the schedules, we should start planning for Java 9 & Jigsaw... > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11659: -- Status: Patch Available (was: Open) Attached the patch and did not return testcases but executed effected testcases for regression and all are passing.. > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HADOOP-11659.patch > > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port
[ https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347425#comment-14347425 ] Hadoop QA commented on HADOOP-11618: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702572/HADOOP-11618-002.patch against trunk revision 03cc229. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5845//console This message is automatically generated. > DelegateToFileSystem always uses default FS's default port > --- > > Key: HADOOP-11618 > URL: https://issues.apache.org/jira/browse/HADOOP-11618 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, > HADOOP-11618.patch > > > DelegateToFileSystem constructor has the following code: > {code} > super(theUri, supportedScheme, authorityRequired, > FileSystem.getDefaultUri(conf).getPort()); > {code} > The default port should be taken from theFsImpl instead. > {code} > super(theUri, supportedScheme, authorityRequired, > theFsImpl.getDefaultPort()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11659: -- Attachment: HADOOP-11659.patch > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HADOOP-11659.patch > > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11659: -- Attachment: (was: HADOOP-11653.patch) > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Minor > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11659: -- Priority: Minor (was: Trivial) > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Minor > Attachments: HADOOP-11653.patch > > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11659) o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup
[ https://issues.apache.org/jira/browse/HADOOP-11659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11659: -- Attachment: HADOOP-11653.patch > o.a.h.fs.FileSystem.Cache#remove should use a single hash map lookup > > > Key: HADOOP-11659 > URL: https://issues.apache.org/jira/browse/HADOOP-11659 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula >Priority: Trivial > Attachments: HADOOP-11653.patch > > > The method looks up the same key in the same hash map potentially 3 times > {code} > if (map.containsKey(key) && fs == map.get(key)) { > map.remove(key) > {code} > Instead it could do a single lookup > {code} > FileSystem cachedFs = map.remove(key); > {code} > and then test cachedFs == fs or something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port
[ https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347327#comment-14347327 ] Brahma Reddy Battula commented on HADOOP-11618: --- Thanks a lot for review.. {quote} In both cases, we are going to assert that ftpFs.getUri() results in ftp://dummy-host:21 {quote} will not return default port where URI having port..whenever port=-1(not configured)then only default port will be return..Please check following code for same.. {code} private URI getUri(URI uri, String supportedScheme, boolean authorityNeeded, int defaultPort) throws URISyntaxException { checkScheme(uri, supportedScheme); // A file system implementation that requires authority must always // specify default port if (defaultPort < 0 && authorityNeeded) { throw new HadoopIllegalArgumentException( "FileSystem implementation error - default port " + defaultPort + " is not valid"); } String authority = uri.getAuthority(); if (authority == null) { if (authorityNeeded) { throw new HadoopIllegalArgumentException("Uri without authority: " + uri); } else { return new URI(supportedScheme + ":///"); } } // authority is non null - AuthorityNeeded may be true or false. int port = uri.getPort(); port = (port == -1 ? defaultPort : port); if (port == -1) { // no port supplied and default port is not specified return new URI(supportedScheme, authority, "/", null); } return new URI(supportedScheme + "://" + uri.getHost() + ":" + port); } {code} and 001 patch also will call this method only...Anyway I updated patch,,kindly review the same.. > DelegateToFileSystem always uses default FS's default port > --- > > Key: HADOOP-11618 > URL: https://issues.apache.org/jira/browse/HADOOP-11618 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, > HADOOP-11618.patch > > > DelegateToFileSystem constructor has the following code: > {code} > super(theUri, supportedScheme, authorityRequired, > FileSystem.getDefaultUri(conf).getPort()); > {code} > The default port should be taken from theFsImpl instead. > {code} > super(theUri, supportedScheme, authorityRequired, > theFsImpl.getDefaultPort()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11618) DelegateToFileSystem always uses default FS's default port
[ https://issues.apache.org/jira/browse/HADOOP-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HADOOP-11618: -- Attachment: HADOOP-11618-002.patch > DelegateToFileSystem always uses default FS's default port > --- > > Key: HADOOP-11618 > URL: https://issues.apache.org/jira/browse/HADOOP-11618 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 2.6.0 >Reporter: Gera Shegalov >Assignee: Brahma Reddy Battula > Attachments: HADOOP-11618-001.patch, HADOOP-11618-002.patch, > HADOOP-11618.patch > > > DelegateToFileSystem constructor has the following code: > {code} > super(theUri, supportedScheme, authorityRequired, > FileSystem.getDefaultUri(conf).getPort()); > {code} > The default port should be taken from theFsImpl instead. > {code} > super(theUri, supportedScheme, authorityRequired, > theFsImpl.getDefaultPort()); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347298#comment-14347298 ] Arun C Murthy commented on HADOOP-11656: Agree 1000% with [~jlowe]. Starting with the thesis that we should break compat is less than ideal - we should certainly strive to add features in a compatible manner, this allows all existing users to consume the feature without the need to make a *should I use this or not* choice. > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347234#comment-14347234 ] stack commented on HADOOP-11656: bq. To add, I think we can and should strive for doing this in a compatible manner, whatever the approach. Sure. Sounds good if possible at all as well as being a load of work proving changes are indeed compatible. bq. Marking and calling it incompatible before we see proposal/patch seems premature to me. I'd suggest you open a new issue to do classpath isolation in a 'compatible manner' rather than add this imposition here. In this issue, the reporter thinks it a breaking change ("At a minimum we'll break dependency compatibility and operational compatibility."). The two issues can move along independent of each other. And to be clear when we talk 'compatible manner', the expectation is that a downstream apps, for example hbase, should be able to move from hadoop-2.X to hadoop-2.Y without breakage, right? That is, in spite of shading, new locations for dependencies, cleaned up exposure of libs likely transitively included, etc., there will be no need for downstreamers to add in new compensatory code, no need of our having to release special versions to work with hadoop-2.Z, and no need of callouts in code or for us to do educate our community's that "if on hadoop-2.X do this...but if on hadoop-2.Y" do that? Or are we talking something else (And "downstreamers, you are doing it wrong" is not allowed). > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347217#comment-14347217 ] Jason Lowe commented on HADOOP-11656: - bq. There are plenty of ways we can make the transition easier for downstream folks. I've already mentioned giving upgrade docs that include maven pom changes needed to get the same set of dependencies. As you mention, we could also include some option toggle that says "I want to see the framework libraries." I happen to think this is a bad idea because it leads straight back to where we are now. In any case, either of these mitigations require downstream projects to change what they are doing, which sounds incompatible to me. I think the idea here is to flip the defaults around. The easiest transition for existing downstream folks is to opt in, rather than opt out, of classpath isolation. We can debate whether that's custom classloaders, OSGi packaging, or what-not when it's turned on. But if not turned on by default then it is backwards compatible, to the extent that we support backwards compatibility today. Clients/jobs that ran before continue to run on the new version. Those that want/need the isolation can ask for it, and we can iterate the isolation feature without necessarily breaking the existing users that aren't asking for it because it didn't exist back then and would break their old workflow if it suddenly does. At some point in the future we can (and probably want to) switch the defaults so clients/apps get classpath isolation by default. I totally agree that decision necessarily breaks backwards compatibility. IMHO the smoothest transition for major features, this or otherwise, is to develop the feature if possible as opt in, rather than opt out, until it is mature, stable, and the community agrees it should be active by default. Some features are such that they inherently cannot be turned off, but if possible it'd be great to develop and mature them as options that people can try out until they become stable to ease transitions and avoid unnecessary breakage at an early stage. > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11668) start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell option
[ https://issues.apache.org/jira/browse/HADOOP-11668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HADOOP-11668: -- Resolution: Duplicate Status: Resolved (was: Patch Available) Closing this in favor of HADOOP-11590, which rewrites these scripts. > start-dfs.sh and stop-dfs.sh no longer works in HA mode after --slaves shell > option > --- > > Key: HADOOP-11668 > URL: https://issues.apache.org/jira/browse/HADOOP-11668 > Project: Hadoop Common > Issue Type: Bug > Components: scripts >Reporter: Vinayakumar B >Assignee: Vinayakumar B > Attachments: HADOOP-11668-01.patch > > > After introduction of "--slaves" option for the scripts, start-dfs.sh and > stop-dfs.sh will no longer work in HA mode. > This is due to multiple hostnames passed for '--hostnames' delimited with > space. > These hostnames are treated as commands and script fails. > So, instead of delimiting with space, delimiting with comma(,) before passing > to hadoop-daemons.sh will solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347188#comment-14347188 ] Sean Busbey commented on HADOOP-11656: -- {quote} One troublespot, even with that tactic, is shown by HADOOP-11064: "UnsatisifedLinkError with hadoop 2.4 JARs on hadoop-2.6 due to NativeCRC32 method changes". Changes in the internal JNI bindings meant that no hadoop-2.4 app (like HBase) would run in a Hadoop 2.6-alpha cluster. We were lucky that I got to find that before 2.6 shipped, otherwise we'd have a lot of complaints. The problem here is that even with HBase isolated on classpath, it was picking up the hadoop-native binaries from somewhere on PATH/LIB or whatever, and so failing to link. Classloader isolation & shading isn't going to be sufficient here. HADOOP-11127 proposes some versioning, which will help —but I don't think it will let us load >1 hadoop.lib into a JVM. As a result, the only version of hadoop-common.jar which can be reliably loaded into a process is the one that is in sync with the version of the native library on the target machine. {quote} Yes, native library support is an entire additional can of worms. For this improvement I'd prefer to leave that to future work, if only because the JVM doesn't really offer options. Perhaps docs that cover the limitations of what isolation we offer would be a good start. > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11656) Classpath isolation for downstream clients
[ https://issues.apache.org/jira/browse/HADOOP-11656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347174#comment-14347174 ] Sean Busbey commented on HADOOP-11656: -- I don't see how we can do this compatibly. Even defaulting to use the application classloader will break some downstream projects. Certainly going the step farther to make sure we also only expose our API to them, wether via an OSGi container or not, will break even more of them. I can understand the desire to have a compatible version of this in the 2.x line. Probably the option to have it off would make the most sense for that. However, this kind of isolation is something we _should_ be doing. The reason to focus first on a breaking version is so we can have doing things correctly staked to some point in the future. There are plenty of ways we can make the transition easier for downstream folks. I've already mentioned giving upgrade docs that include maven pom changes needed to get the same set of dependencies. As you mention, we could also include some option toggle that says "I want to see the framework libraries." I happen to think this is a bad idea because it leads straight back to where we are now. In any case, either of these mitigations require downstream projects to change what they are doing, which sounds incompatible to me. > Classpath isolation for downstream clients > -- > > Key: HADOOP-11656 > URL: https://issues.apache.org/jira/browse/HADOOP-11656 > Project: Hadoop Common > Issue Type: New Feature >Reporter: Sean Busbey >Assignee: Sean Busbey > Labels: classloading, classpath, dependencies > > Currently, Hadoop exposes downstream clients to a variety of third party > libraries. As our code base grows and matures we increase the set of > libraries we rely on. At the same time, as our user base grows we increase > the likelihood that some downstream project will run into a conflict while > attempting to use a different version of some library we depend on. This has > already happened with i.e. Guava several times for HBase, Accumulo, and Spark > (and I'm sure others). > While YARN-286 and MAPREDUCE-1700 provided an initial effort, they default to > off and they don't do anything to help dependency conflicts on the driver > side or for folks talking to HDFS directly. This should serve as an umbrella > for changes needed to do things thoroughly on the next major version. > We should ensure that downstream clients > 1) can depend on a client artifact for each of HDFS, YARN, and MapReduce that > doesn't pull in any third party dependencies > 2) only see our public API classes (or as close to this as feasible) when > executing user provided code, whether client side in a launcher/driver or on > the cluster in a container or within MR. > This provides us with a double benefit: users get less grief when they want > to run substantially ahead or behind the versions we need and the project is > freer to change our own dependency versions because they'll no longer be in > our compatibility promises. > Project specific task jiras to follow after I get some justifying use cases > written in the comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11669) Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class
[ https://issues.apache.org/jira/browse/HADOOP-11669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347107#comment-14347107 ] Hadoop QA commented on HADOOP-11669: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12702457/001-HADOOP-11669.patch against trunk revision 3560180. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/5844//console This message is automatically generated. > Move the Hadoop constants in HTTPServer.java to CommonConfigurationKeys class > - > > Key: HADOOP-11669 > URL: https://issues.apache.org/jira/browse/HADOOP-11669 > Project: Hadoop Common > Issue Type: Improvement >Reporter: nijel >Assignee: nijel >Priority: Minor > Attachments: 0001-HDFS-7883.patch, 001-HADOOP-11669.patch > > > These 2 configurations in HttpServer2.java is hadoop configurations. > {code} > static final String FILTER_INITIALIZER_PROPERTY > = "hadoop.http.filter.initializers"; > public static final String HTTP_MAX_THREADS = "hadoop.http.max.threads"; > {code} > It is better to keep it inside CommonConfigurationKeys -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture
[ https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Nevill updated HADOOP-11660: --- Attachment: (was: jira-11660.patch) > Add support for hardware crc on ARM aarch64 architecture > > > Key: HADOOP-11660 > URL: https://issues.apache.org/jira/browse/HADOOP-11660 > Project: Hadoop Common > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 > Environment: ARM aarch64 development platform >Reporter: Edward Nevill >Assignee: Edward Nevill >Priority: Minor > Labels: performance > Original Estimate: 48h > Remaining Estimate: 48h > > This patch adds support for hardware crc for ARM's new 64 bit architecture > The patch is completely conditionalized on __aarch64__ > I have only added support for the non pipelined version as I benchmarked the > pipelined version on aarch64 and it showed no performance improvement. > The aarch64 version supports both Castagnoli and Zlib CRCs as both of these > are supported on ARM aarch64 hardwre. > To benchmark this I modified the test_bulk_crc32 test to print out the time > taken to CRC a 1MB dataset 1000 times. > Before: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > After: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > So this represents a 5X performance improvement on raw CRC calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-11660) Add support for hardware crc on ARM aarch64 architecture
[ https://issues.apache.org/jira/browse/HADOOP-11660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Nevill updated HADOOP-11660: --- Status: Open (was: Patch Available) Patch to be replaced with a version which does pipelining > Add support for hardware crc on ARM aarch64 architecture > > > Key: HADOOP-11660 > URL: https://issues.apache.org/jira/browse/HADOOP-11660 > Project: Hadoop Common > Issue Type: Improvement > Components: native >Affects Versions: 3.0.0 > Environment: ARM aarch64 development platform >Reporter: Edward Nevill >Assignee: Edward Nevill >Priority: Minor > Labels: performance > Attachments: jira-11660.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > This patch adds support for hardware crc for ARM's new 64 bit architecture > The patch is completely conditionalized on __aarch64__ > I have only added support for the non pipelined version as I benchmarked the > pipelined version on aarch64 and it showed no performance improvement. > The aarch64 version supports both Castagnoli and Zlib CRCs as both of these > are supported on ARM aarch64 hardwre. > To benchmark this I modified the test_bulk_crc32 test to print out the time > taken to CRC a 1MB dataset 1000 times. > Before: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 2.55 > After: > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > CRC 1048576 bytes @ 512 bytes per checksum X 1000 iterations = 0.57 > So this represents a 5X performance improvement on raw CRC calculation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11183) Memory-based S3AOutputstream
[ https://issues.apache.org/jira/browse/HADOOP-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347034#comment-14347034 ] Hudson commented on HADOOP-11183: - SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) (stevel: rev 15b7076ad5f2ae92d231140b2f8cebc392a92c87) * hadoop-common-project/hadoop-common/src/main/resources/core-default.xml * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java * hadoop-common-project/hadoop-common/CHANGES.txt * hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/TestS3AFastOutputStream.java * hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java * hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFastOutputStream.java > Memory-based S3AOutputstream > > > Key: HADOOP-11183 > URL: https://issues.apache.org/jira/browse/HADOOP-11183 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 2.6.0 >Reporter: Thomas Demoor >Assignee: Thomas Demoor > Fix For: 2.7.0 > > Attachments: HADOOP-11183-004.patch, HADOOP-11183-005.patch, > HADOOP-11183-006.patch, HADOOP-11183-007.patch, HADOOP-11183-008.patch, > HADOOP-11183-009.patch, HADOOP-11183-010.patch, HADOOP-11183.001.patch, > HADOOP-11183.002.patch, HADOOP-11183.003.patch, design-comments.pdf > > > Currently s3a buffers files on disk(s) before uploading. This JIRA > investigates adding a memory-based upload implementation. > The motivation is evidently performance: this would be beneficial for users > with high network bandwidth to S3 (EC2?) or users that run Hadoop directly on > an S3-compatible object store (FYI: my contributions are made in name of > Amplidata). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-6857) FsShell should report raw disk usage including replication factor
[ https://issues.apache.org/jira/browse/HADOOP-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347032#comment-14347032 ] Hudson commented on HADOOP-6857: SUCCESS: Integrated in Hadoop-Mapreduce-trunk #2072 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2072/]) Move HADOOP-6857 to 3.0.0. (aajisaka: rev 29bb6898654199a809f1c3e8e536a63fb0d4f073) * hadoop-common-project/hadoop-common/CHANGES.txt > FsShell should report raw disk usage including replication factor > - > > Key: HADOOP-6857 > URL: https://issues.apache.org/jira/browse/HADOOP-6857 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Reporter: Alex Kozlov >Assignee: Byron Wong > Fix For: 3.0.0 > > Attachments: HADOOP-6857-revert.patch, HADOOP-6857.patch, > HADOOP-6857.patch, HADOOP-6857.patch, revert-HADOOP-6857-from-branch-2.patch, > show-space-consumed.txt > > > Currently FsShell report HDFS usage with "hadoop fs -dus " command. > Since replication level is per file level, it would be nice to add raw disk > usage including the replication factor (maybe "hadoop fs -dus -raw "?). > This will allow to assess resource usage more accurately. -- Alex K -- This message was sent by Atlassian JIRA (v6.3.4#6332)