[jira] [Updated] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-01 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HDFS-7866:
-
Attachment: HDFS-7866.11.patch

Thanks Zhe for the explanations!
Update patch accordingly.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, 
> HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, 
> HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175128#comment-15175128
 ] 

Zhe Zhang commented on HDFS-7866:
-

And yes, {{RS_6_3_POLICY_ID}} sounds good. If we later support multiple cell 
sizes we can reflect that in naming as well.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, 
> HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175124#comment-15175124
 ] 

Hudson commented on HDFS-9851:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9410 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9410/])
HDFS-9851. NameNode throws NPE when setPermission is called on a path 
(aajisaka: rev 27e0681f28ee896ada163bbbc08fd44d113e7d15)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/security/TestPermission.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> Name node throws NPE when setPermission is called on a path that does not 
> exist
> ---
>
> Key: HDFS-9851
> URL: https://issues.apache.org/jira/browse/HDFS-9851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: David Yan
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, 
> HDFS-9851.patch
>
>
> Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when 
> setPermission is called on a path that does not exist:
> {code}
> 16/02/23 16:37:03.888 DEBUG 
> security.UserGroupInformation:FSPermissionChecker.ja
> va:164 - ACCESS CHECK: 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, 
> doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, 
> subAccess=null, ignoreEmptyDir=false
> 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: 
> setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException
> 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 
> on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 
> from 127.0.0.1:36190 Call#21 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}
> I don't see this problem with Hadoop 2.6.x.
> The client that issues the setPermission call was compiled with Hadoop 2.2.0 
> libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175122#comment-15175122
 ] 

Zhe Zhang commented on HDFS-7866:
-

Sorry for the confusion Rui. I meant some text-base illustration like below. I 
think it will be useful to illustrate the header format for both contiguous and 
striped blocks. If it's still confusing, feel free to skip it and I'll be happy 
to do it as a follow-on.
{code}
// StripedBlockUtil
 *  | <  Block Group > |   <- Block Group: logical unit composing
 *  |  |striped HDFS files.
 *  blk_0  blk_1   blk_2   <- Internal Blocks: each internal block
 *|  |   |  represents a physically stored local
 *v  v   v  block file
 * +--+   +--+   +--+
 * |cell_0|   |cell_1|   |cell_2|  <- {@link StripingCell} represents the
 * +--+   +--+   +--+   logical order that a Block Group should
 * |cell_3|   |cell_4|   |cell_5|   be accessed: cell_0, cell_1, ...
 * +--+   +--+   +--+
 * |cell_6|   |cell_7|   |cell_8|
 * +--+   +--+   +--+
 * |cell_9|
 * +--+  <- A cell contains cellSize bytes of data
{code}

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, 
> HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently

2016-03-01 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175110#comment-15175110
 ] 

Xiao Chen commented on HDFS-9766:
-

Thanks [~ajisakaa] and [~liuml07].

> TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
> --
>
> Key: HDFS-9766
> URL: https://issues.apache.org/jira/browse/HDFS-9766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9766.01.patch
>
>
> *Stacktrace*
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289)
> {code}
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist

2016-03-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175108#comment-15175108
 ] 

Brahma Reddy Battula commented on HDFS-9851:


Thanks for review and commit.

> Name node throws NPE when setPermission is called on a path that does not 
> exist
> ---
>
> Key: HDFS-9851
> URL: https://issues.apache.org/jira/browse/HDFS-9851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: David Yan
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, 
> HDFS-9851.patch
>
>
> Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when 
> setPermission is called on a path that does not exist:
> {code}
> 16/02/23 16:37:03.888 DEBUG 
> security.UserGroupInformation:FSPermissionChecker.ja
> va:164 - ACCESS CHECK: 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, 
> doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, 
> subAccess=null, ignoreEmptyDir=false
> 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: 
> setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException
> 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 
> on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 
> from 127.0.0.1:36190 Call#21 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}
> I don't see this problem with Hadoop 2.6.x.
> The client that issues the setPermission call was compiled with Hadoop 2.2.0 
> libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9883) Replace the hard-code value to variable

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175092#comment-15175092
 ] 

Hadoop QA commented on HDFS-9883:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
40s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
57s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
45s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 25s 
{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 
509 unchanged - 0 fixed = 510 total (was 509) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 2s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 5s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
21s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 133m 22s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790849/HDFS-9883.001.patch |
| JIRA Issue | HDFS-9883 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  

[jira] [Updated] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist

2016-03-01 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9851:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to branch-2.7 and above. Thanks [~brahmareddy] for the 
contribution and thanks [~liuml07] for the review!

> Name node throws NPE when setPermission is called on a path that does not 
> exist
> ---
>
> Key: HDFS-9851
> URL: https://issues.apache.org/jira/browse/HDFS-9851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: David Yan
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, 
> HDFS-9851.patch
>
>
> Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when 
> setPermission is called on a path that does not exist:
> {code}
> 16/02/23 16:37:03.888 DEBUG 
> security.UserGroupInformation:FSPermissionChecker.ja
> va:164 - ACCESS CHECK: 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, 
> doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, 
> subAccess=null, ignoreEmptyDir=false
> 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: 
> setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException
> 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 
> on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 
> from 127.0.0.1:36190 Call#21 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}
> I don't see this problem with Hadoop 2.6.x.
> The client that issues the setPermission call was compiled with Hadoop 2.2.0 
> libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175087#comment-15175087
 ] 

Hudson commented on HDFS-9766:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9409 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9409/])
HDFS-9766. TestDataNodeMetrics#testDataNodeTimeSpend fails (aajisaka: rev 
e2ddf824694eb4605f3bb04a9c26e4b98529f5bc)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java


> TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
> --
>
> Key: HDFS-9766
> URL: https://issues.apache.org/jira/browse/HDFS-9766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9766.01.patch
>
>
> *Stacktrace*
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289)
> {code}
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist

2016-03-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175073#comment-15175073
 ] 

Akira AJISAKA commented on HDFS-9851:
-

+1, checking this in.

> Name node throws NPE when setPermission is called on a path that does not 
> exist
> ---
>
> Key: HDFS-9851
> URL: https://issues.apache.org/jira/browse/HDFS-9851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: David Yan
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, 
> HDFS-9851.patch
>
>
> Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when 
> setPermission is called on a path that does not exist:
> {code}
> 16/02/23 16:37:03.888 DEBUG 
> security.UserGroupInformation:FSPermissionChecker.ja
> va:164 - ACCESS CHECK: 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, 
> doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, 
> subAccess=null, ignoreEmptyDir=false
> 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: 
> setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException
> 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 
> on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 
> from 127.0.0.1:36190 Call#21 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}
> I don't see this problem with Hadoop 2.6.x.
> The client that issues the setPermission call was compiled with Hadoop 2.2.0 
> libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8475) Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available

2016-03-01 Thread Harshal Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175054#comment-15175054
 ] 

Harshal Joshi commented on HDFS-8475:
-

Hi Team,
Has anyone got a clue about this issue. Currently I am facing the same issue 
while inserting data into table hive table with ORC file formats.
ooks like this is a generic issue for below case :

1. create external table T1(col A , B , C) with partition on (col A) stored as 
ORC . Load table with substantial data. in my case around 85 GB data.

2. Create external table T2(col A,B,C) with partition on (Col B) stored as ORC. 
Load table T2 from T1 with dynamic partition.

Output :- Premature EOF exception

Thanks

> Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no 
> length prefix available
> 
>
> Key: HDFS-8475
> URL: https://issues.apache.org/jira/browse/HDFS-8475
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Vinod Valecha
>Priority: Blocker
>
> Scenraio:
> =
> write a file
> corrupt block manually
> Exception stack trace- 
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Exception in 
> createBlockOutputStream
> java.io.EOFException: Premature EOF: no length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream 
> Exception in createBlockOutputStream
>  java.io.EOFException: Premature EOF: no 
> length prefix available
> at 
> org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088)
> at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514)
> 2015-05-24 02:31:55.291 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Abandoning 
> BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Abandoning BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579
> 2015-05-24 02:31:55.299 INFO [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Excluding datanode 
> 10.108.106.59:50010
> [5/24/15 2:31:55:299 UTC] 02027a3b DFSClient I 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream 
> Excluding datanode 10.108.106.59:50010
> 2015-05-24 02:31:55.300 WARNING [T-33716795] 
> [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
> [5/24/15 2:31:55:300 UTC] 02027a3b DFSClient W 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run DataStreamer Exception
>  
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag
>  could only be replicated to 0 nodes instead of minReplication (=1).  There 
> are 1 datanode(s) running and 1 node(s) are excluded in this operation.
> at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpc

[jira] [Updated] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently

2016-03-01 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated HDFS-9766:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to branch-2.7 and above. Thanks [~xiaochen] for the contribution 
and thanks [~liuml07] for the review!

> TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
> --
>
> Key: HDFS-9766
> URL: https://issues.apache.org/jira/browse/HDFS-9766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Xiao Chen
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9766.01.patch
>
>
> *Stacktrace*
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289)
> {code}
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist

2016-03-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175050#comment-15175050
 ] 

Brahma Reddy Battula commented on HDFS-9851:


testfailures are unrelated...[~ajisakaa] can you please take a look...?

> Name node throws NPE when setPermission is called on a path that does not 
> exist
> ---
>
> Key: HDFS-9851
> URL: https://issues.apache.org/jira/browse/HDFS-9851
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.7.1, 2.7.2
>Reporter: David Yan
>Assignee: Brahma Reddy Battula
>Priority: Critical
> Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, 
> HDFS-9851.patch
>
>
> Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when 
> setPermission is called on a path that does not exist:
> {code}
> 16/02/23 16:37:03.888 DEBUG 
> security.UserGroupInformation:FSPermissionChecker.ja
> va:164 - ACCESS CHECK: 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, 
> doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, 
> subAccess=null, ignoreEmptyDir=false
> 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: 
> setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException
> 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 
> on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission 
> from 127.0.0.1:36190 Call#21 Retry#0
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
> {code}
> I don't see this problem with Hadoop 2.6.x.
> The client that issues the setPermission call was compiled with Hadoop 2.2.0 
> libraries.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9884) Use doxia macro to generate in-page TOC of HDFS site documentation

2016-03-01 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated HDFS-9884:
---
Description: Since maven-site-plugin 3.5 was released, we can use toc macro 
in Markdown.  (was: Since maven-site-plugin 3.5 was releaced, we can use toc 
macro in Markdown.)

> Use doxia macro to generate in-page TOC of HDFS site documentation
> --
>
> Key: HDFS-9884
> URL: https://issues.apache.org/jira/browse/HDFS-9884
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Masatake Iwasaki
>Assignee: Masatake Iwasaki
>
> Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently

2016-03-01 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175041#comment-15175041
 ] 

Akira AJISAKA commented on HDFS-9766:
-

LGTM, +1.

> TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
> --
>
> Key: HDFS-9766
> URL: https://issues.apache.org/jira/browse/HDFS-9766
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Mingliang Liu
>Assignee: Xiao Chen
> Attachments: HDFS-9766.01.patch
>
>
> *Stacktrace*
> {code}
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289)
> {code}
> See recent builds:
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/
> * 
> https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9884) Use doxia macro to generate in-page TOC of HDFS site documentation

2016-03-01 Thread Masatake Iwasaki (JIRA)
Masatake Iwasaki created HDFS-9884:
--

 Summary: Use doxia macro to generate in-page TOC of HDFS site 
documentation
 Key: HDFS-9884
 URL: https://issues.apache.org/jira/browse/HDFS-9884
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.7.0
Reporter: Masatake Iwasaki
Assignee: Masatake Iwasaki


Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9883) Replace the hard-code value to variable

2016-03-01 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9883:

Status: Patch Available  (was: Open)

Attach a simple patch.

> Replace the hard-code value to variable
> ---
>
> Key: HDFS-9883
> URL: https://issues.apache.org/jira/browse/HDFS-9883
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: HDFS-9883.001.patch
>
>
> In some class of HDFS, there are many hard-code value places. Like this: 
> {code}
>   /** Constructor 
>* @param bandwidthPerSec bandwidth allowed in bytes per second. 
>*/
>   public DataTransferThrottler(long bandwidthPerSec) {
> this(500, bandwidthPerSec);  // by default throttling period is 500ms
>   }
> {code}
> It will be better replace these value to variables so that it will not be 
> easily ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174951#comment-15174951
 ] 

Hudson commented on HDFS-9876:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9408 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9408/])
HDFS-9876. shouldProcessOverReplicated should not count number of (jing9: rev 
f2ba7da4f0df6cf0fc245093aeb4500158e6ee0b)
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Fix For: 3.0.0
>
> Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, 
> HDFS-9876.001.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9883) Replace the hard-code value to variable

2016-03-01 Thread Lin Yiqun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lin Yiqun updated HDFS-9883:

Attachment: HDFS-9883.001.patch

> Replace the hard-code value to variable
> ---
>
> Key: HDFS-9883
> URL: https://issues.apache.org/jira/browse/HDFS-9883
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Lin Yiqun
>Assignee: Lin Yiqun
>Priority: Minor
> Attachments: HDFS-9883.001.patch
>
>
> In some class of HDFS, there are many hard-code value places. Like this: 
> {code}
>   /** Constructor 
>* @param bandwidthPerSec bandwidth allowed in bytes per second. 
>*/
>   public DataTransferThrottler(long bandwidthPerSec) {
> this(500, bandwidthPerSec);  // by default throttling period is 500ms
>   }
> {code}
> It will be better replace these value to variables so that it will not be 
> easily ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9883) Replace the hard-code value to variable

2016-03-01 Thread Lin Yiqun (JIRA)
Lin Yiqun created HDFS-9883:
---

 Summary: Replace the hard-code value to variable
 Key: HDFS-9883
 URL: https://issues.apache.org/jira/browse/HDFS-9883
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.7.1
Reporter: Lin Yiqun
Assignee: Lin Yiqun
Priority: Minor


In some class of HDFS, there are many hard-code value places. Like this: 
{code}
  /** Constructor 
   * @param bandwidthPerSec bandwidth allowed in bytes per second. 
   */
  public DataTransferThrottler(long bandwidthPerSec) {
this(500, bandwidthPerSec);  // by default throttling period is 500ms
  }
{code}
It will be better replace these value to variables so that it will not be 
easily ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9876:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Nicholas! The failed tests all passed in my local run. 
I've committed the patch into trunk.

> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Fix For: 3.0.0
>
> Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, 
> HDFS-9876.001.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync

2016-03-01 Thread GAO Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174860#comment-15174860
 ] 

GAO Rui commented on HDFS-7661:
---

Hi [~liuml07].

For the data consistency issue, I figured out an scenario under R-S-6-3 EC 
policy:

0. Write client call flush, we have V1 parity in all of parity dns: DN0,DN1,DN2
1. Two IDB(internal data block) dns failed.
2. Read client read 4 IDBs, and V1 parity in DN0.
3. Write client call flush twice.
4. Read client read V3 parity in DN1.
5. Write client call flush twice again.
6. Read client read V5 parity in DN2.
7. Read client have only five internal blocks for all of V1,V3 and V5. 
8. Read fail.

This is quite a extreme scenario, but still might could happen some time. Based 
on current design, we have only the overwritten parity data for last one 
version, and we do not have a lock, that could cause problem in data 
consistency. 

I think if a lock in NN is too heavy, maybe we could consider to maintain the 
lock in write client. So the read client get file infos and write client info 
from NN, then use the lock in write client to control the data consistency. 
Ping [~szetszwo], [~jingzhao],[~zhz], [~drankye], [~walter.k.su] and [~ikki407] 
for discussion  :D

> Erasure coding: support hflush and hsync
> 
>
> Key: HDFS-7661
> URL: https://issues.apache.org/jira/browse/HDFS-7661
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: GAO Rui
> Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, 
> HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, 
> HDFS-EC-file-flush-sync-design-version1.1.pdf, 
> HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-01 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174861#comment-15174861
 ] 

Rui Li commented on HDFS-7866:
--

Thanks Zhe for the review and comments!

Forgive my ignorance but what do you mean by ASCII art?

As to the policy IDs, I think we can put them in {{HdfsConstants}} and name as 
{{RS_6_3_POLICY_ID}}. Sounds good?

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, 
> HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174843#comment-15174843
 ] 

Hadoop QA commented on HDFS-9876:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
3s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 7s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
31s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 181m 35s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.TestEncryptedTransfer |
|   | hadoop.hdfs.server.datanode.TestBlockReplacement |
|   | hadoop.tracing.TestTracing |
| JDK v1.7.0_95 Failed junit tests | 
hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot |
|   | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure |
|   | hadoop.hdfs.server.namenode.TestEditLog |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patc

[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-9876:
--
Component/s: namenode
 erasure-coding

+1 patch looks good.

> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: erasure-coding, namenode
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, 
> HDFS-9876.001.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies

2016-03-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174741#comment-15174741
 ] 

Zhe Zhang commented on HDFS-7866:
-

Thanks Rui, very nice work here!

I only finished reviewing the {{INodeFile}} header encoding logic. The main 
logic LGTM overall. Just some recommendations on code structure:
# The below is not very accurate. For striped blocks, the tail 11 bits do not 
only represent redundancy. E.g. there could be different coders like RS, HH. 
3-2 coding and 6-4 have same level of redundancy but they have different 
trade-offs. I see the intention here is to unify the terminology for contiguous 
and striped blocks. But I think it is pretty hard.
{code}
  /**
   * Number of bits used to encode block layout type.
   * Different types can be replica or EC
   */
  public static final int LAYOUT_BIT_WIDTH = 1;
  /**
   * Number of bits used to encode block redundancy.
   * For replicated block, the redundancy is the replication factor;
   * for erasure coded block, the redundancy is the EC policy's ID.
   */
  public static final int REDUNDANCY_BIT_WIDTH = 11;
{code}
# So instead of the above, I think we can keep the {{LAYOUT_BIT_WIDTH}}, and 
then explicitly parse the {{BLOCK_LAYOUT_AND_REDUNDANCY}} section for both 
striped and contiguous blocks. To avoid repeating code we can add a util method 
{{maskLayoutBit}}. Not sure if it's worth it, since the code is very simple.
# In the "Bit format:" Javadoc we should also explain the 
{{BLOCK_LAYOUT_AND_REDUNDANCY}} section more clearly. Some ASCII art here will 
be really helpful.

Also, {{SYS_POLICY1_ID}} and {{SYS_POLICY2_ID}} look a little hacky. Can we do 
something similar to {{BlockStoragePolicySuite#createDefaultSuite}} and create 
some constants? We can also make it a byte to save space.

> Erasure coding: NameNode manages multiple erasure coding policies
> -
>
> Key: HDFS-7866
> URL: https://issues.apache.org/jira/browse/HDFS-7866
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: Rui Li
> Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, 
> HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, 
> HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch
>
>
> This is to extend NameNode to load, list and sync predefine EC schemas in 
> authorized and controlled approach. The provided facilities will be used to 
> implement DFSAdmin commands so admin can list available EC schemas, then 
> could choose some of them for target EC zones.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-9882:
--
Affects Version/s: 2.7.2

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-9882:
--
Issue Type: New Feature  (was: Task)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode
>Affects Versions: 2.7.2
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-9882:
--
Status: In Progress  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174731#comment-15174731
 ] 

Hudson commented on HDFS-9881:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9407 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9407/])
HDFS-9881. DistributedFileSystem#getTrashRoot returns incorrect path for (wang: 
rev 4abb2fa687a80d2b76f2751dd31513822601b235)
* 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java


> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-9882:
--
Status: Patch Available  (was: Open)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Attachment: 0001-Add-heartbeatsTotal-metric.patch

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
> Attachments: 0001-Add-heartbeatsTotal-metric.patch
>
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Description: Heartbeat latency only reflects the time spent on generating 
reports and sending reports to NN. When heartbeats are delayed due to 
processing commands, this latency does not help investigation. I would like to 
propose to add another metric counter to show the total time.   (was: Heartbeat 
latency only reflects the time spent on generating reports and sending reports 
to NN. When heartbeats are delayed due to processing commands, this latency 
does not help investigation. I would like to propose either (1) changing the 
heartbeat latency to reflect the total time spent on sending reports and 
processing commands or (2) adding another metric counter to show the total 
time. )

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> to add another metric counter to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Open  (was: Patch Available)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Status: Patch Available  (was: In Progress)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9835) OIV: add ReverseXML processor which reconstructs an fsimage from an XML file

2016-03-01 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174717#comment-15174717
 ] 

Lei (Eddy) Xu commented on HDFS-9835:
-

Hi, [~cmccabe]

The patch looks good to me overall.

+1 after addressing final nitpicks:

* There are some checkstyle warnings  for {{switch...case}} indentions. 
* The findbugs warning is false, but can we mitigate it somehow?
* Could you also apply XATTR_..._MASKs to {{PBImageXmlWriter#dumpXAttrs}}. 


Thanks a lot for the work.

> OIV: add ReverseXML processor which reconstructs an fsimage from an XML file
> 
>
> Key: HDFS-9835
> URL: https://issues.apache.org/jira/browse/HDFS-9835
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: tools
>Affects Versions: 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-9835.001.patch, HDFS-9835.002.patch, 
> HDFS-9835.003.patch, HDFS-9835.004.patch, HDFS-9835.005.patch
>
>
> OIV: add ReverseXML processor which reconstructs an fsimage from an XML file. 
>  This will make it easy to create fsimages for testing, and manually edit 
> fsimages when there is corruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9881:
--
   Resolution: Fixed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Pushed to trunk, branch-2, branch-2.8. Thanks again Xiaoyu and Zhe for taking a 
look!

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174710#comment-15174710
 ] 

Andrew Wang commented on HDFS-9881:
---

Jenkins failed with a port in use exception, so I'm going to go ahead and 
commit based on xyao's +1. Thanks all!

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174704#comment-15174704
 ] 

Hadoop QA commented on HDFS-7597:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
29s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s 
{color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 3 unchanged 
- 0 fixed = 4 total (was 3) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
54s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 45s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 28s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
30s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 183m 3s {color} 
| {color:black} {color} |

[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes

2016-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6440:
--
Release Note: This feature adds support for running additional standby 
NameNodes, which provides additional fault-tolerance. It is designed for a 
total of 3-5 NameNodes.

> Support more than 2 NameNodes
> -
>
> Key: HDFS-6440
> URL: https://issues.apache.org/jira/browse/HDFS-6440
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: auto-failover, ha, namenode
>Affects Versions: 2.4.0
>Reporter: Jesse Yates
>Assignee: Jesse Yates
> Fix For: 3.0.0
>
> Attachments: Multiple-Standby-NameNodes_V1.pdf, 
> hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, 
> hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, 
> hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, 
> hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch
>
>
> Most of the work is already done to support more than 2 NameNodes (one 
> active, one standby). This would be the last bit to support running multiple 
> _standby_ NameNodes; one of the standbys should be available for fail-over.
> Mostly, this is a matter of updating how we parse configurations, some 
> complexity around managing the checkpointing, and updating a whole lot of 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174687#comment-15174687
 ] 

Chris Nauroth commented on HDFS-7597:
-

Oh right... in the meantime, there was HDFS-8855.  Darn, I knew that issue 
sounded familiar, but I had forgotten it was this one.

How shall we proceed?

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174676#comment-15174676
 ] 

Hadoop QA commented on HDFS-9881:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 
43s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
25s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
27s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
17s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
24s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 49s 
{color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.8.0_72. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 20s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
26s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 144m 27s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.7.0_95 Faile

[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174674#comment-15174674
 ] 

Hadoop QA commented on HDFS-7597:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 39s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
26s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
24s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 
33s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 
18s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s 
{color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 4 unchanged 
- 0 fixed = 5 total (was 4) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 7s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s 
{color} | {color:green} hadoop-hdfs-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 14s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 165m 47s {color} 
| {color:black} {color}

[jira] [Commented] (HDFS-9848) Ozone: Add Ozone Client lib for volume handling

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174647#comment-15174647
 ] 

Chris Nauroth commented on HDFS-9848:
-

Hi [~anu].  This looks good overall.  Just a few minor notes on the tests:

{code}
try {
  client = new OzoneClient(String.format("http://localhost:%d";, port));
} catch (OzoneException | URISyntaxException e) {
  Assert.assertTrue("Failed to create an Ozone Client." + e.getMessage()
  , false);
}
{code}

Usually, I'd just let these exceptions be thrown out of the method.  If the 
exception is thrown, JUnit will report it with a full stack trace to aid 
troubleshooting.  If you prefer to handle the exception so that you can emit 
the custom "Failed to create" message, then I'd recommend using {{Assert.fail}} 
instead of an {{Assert.assertTrue}} that fails on purpose.

{code}
  @Test
  public void testCreateDuplicateVolume() throws OzoneException {
try {
  client.setUserAuth(OzoneConsts.OZONE_SIMPLE_HDFS_USER);
  client.createVolume("testVol", "bilbo", "100TB");
  client.createVolume("testVol", "bilbo", "100TB");
} catch (Exception ex) {
  // OZone will throw saying volume already exists
  assertNotNull(ex);
}
  }
{code}

As currently written, this test always passes, even when it shouldn't.  If any 
of the client calls fails for any reason, then it will go to the exception 
handler.  {{ex}} is guaranteed to be non-null always, so the {{assertNotNull}} 
is effectively a no-op and the test passes.  If all 3 client calls succeed, 
then the overall test is a success, even though we wanted it to cover duplicate 
volume error handling.  From debugging, I can see that what is actually 
happening when the test runs is it throws "java.lang.IllegalArgumentException: 
Bucket or Volume name does not support uppercase characters", so the test also 
needs to be changed to use a different volume name.

> Ozone: Add Ozone Client lib for volume handling
> ---
>
> Key: HDFS-9848
> URL: https://issues.apache.org/jira/browse/HDFS-9848
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone
>Affects Versions: HDFS-7240
>Reporter: Anu Engineer
>Assignee: Anu Engineer
> Fix For: HDFS-7240
>
> Attachments: HDFS-9848-HDFS-7240.001.patch, 
> HDFS-9848-HDFS-7240.002.patch, HDFS-9848-HDFS-7240.003.patch, 
> HDFS-9848-HDFS-7240.004.patch
>
>
> Add a simple client lib for volume handling. This is used primarily to make 
> writing tests simpler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Chris Trezzo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Trezzo updated HDFS-8791:
---
Release Note: HDFS-8791 introduces a new datanode layout format. This 
layout is identical to the previous block id based layout except it has a 
smaller 32x32 sub-directory structure in each data storage. On startup, the 
datanode will automatically upgrade it's storages to this new layout.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hua Liu updated HDFS-9882:
--
Summary: Add heartbeatsTotal in Datanode metrics  (was: Change the meaning 
of heartbeat latency in Datanode metrics)

> Add heartbeatsTotal in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174605#comment-15174605
 ] 

Hudson commented on HDFS-9880:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9406 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9406/])
HDFS-9880. TestDatanodeRegistration fails occasionally. Contributed by (kihwal: 
rev e76b13c415459e4062c4c9660a16759a11ffb34a)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java


> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.7.3
>
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9882) Change the meaning of heartbeat latency in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174597#comment-15174597
 ] 

Inigo Goiri commented on HDFS-9882:
---

I think the second approach makes more sense as other people may have already 
built some dependencies on the semantic of the current {{heartbeats}}.
The current approach in {{DatanodeMetrics}} has:
{code}
@Metric MutableRate heartbeats;
{code}

I would add:
{code}
@Metric MutableRate heartbeatsTotal;
{code}

> Change the meaning of heartbeat latency in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9876:

Attachment: HDFS-9876.001.patch

Remove unused internalBlock.

> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, 
> HDFS-9876.001.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9870) Remove unused imports from DFSUtil

2016-03-01 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174595#comment-15174595
 ] 

Brahma Reddy Battula commented on HDFS-9870:


thanks a lot [~cnauroth] for commit.

> Remove unused imports from DFSUtil
> --
>
> Key: HDFS-9870
> URL: https://issues.apache.org/jira/browse/HDFS-9870
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch
>
>
> Remove the following unused imports {{DFSUtil.java}}
> {code}
> import static 
> org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY;
> import java.io.InterruptedIOException;
> import com.google.common.collect.Sets;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174590#comment-15174590
 ] 

Kihwal Lee commented on HDFS-8791:
--

Yes, we should do that. Thanks, Chris.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9880:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Thanks for the review, Daryn. I've committed this to trunk through branch-2.7.

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 2.7.3
>
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HDFS-9882) Change the meaning of heartbeat latency

2016-03-01 Thread Hua Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-9882 started by Hua Liu.
-
> Change the meaning of heartbeat latency
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9882) Change the meaning of heartbeat latency in Datanode metrics

2016-03-01 Thread Inigo Goiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Inigo Goiri updated HDFS-9882:
--
Summary: Change the meaning of heartbeat latency in Datanode metrics  (was: 
Change the meaning of heartbeat latency)

> Change the meaning of heartbeat latency in Datanode metrics
> ---
>
> Key: HDFS-9882
> URL: https://issues.apache.org/jira/browse/HDFS-9882
> Project: Hadoop HDFS
>  Issue Type: Task
>  Components: datanode
>Reporter: Hua Liu
>Assignee: Hua Liu
>Priority: Minor
>
> Heartbeat latency only reflects the time spent on generating reports and 
> sending reports to NN. When heartbeats are delayed due to processing 
> commands, this latency does not help investigation. I would like to propose 
> either (1) changing the heartbeat latency to reflect the total time spent on 
> sending reports and processing commands or (2) adding another metric counter 
> to show the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9882) Change the meaning of heartbeat latency

2016-03-01 Thread Hua Liu (JIRA)
Hua Liu created HDFS-9882:
-

 Summary: Change the meaning of heartbeat latency
 Key: HDFS-9882
 URL: https://issues.apache.org/jira/browse/HDFS-9882
 Project: Hadoop HDFS
  Issue Type: Task
  Components: datanode
Reporter: Hua Liu
Assignee: Hua Liu
Priority: Minor


Heartbeat latency only reflects the time spent on generating reports and 
sending reports to NN. When heartbeats are delayed due to processing commands, 
this latency does not help investigation. I would like to propose either (1) 
changing the heartbeat latency to reflect the total time spent on sending 
reports and processing commands or (2) adding another metric counter to show 
the total time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174510#comment-15174510
 ] 

Daryn Sharp commented on HDFS-9880:
---

+1

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174509#comment-15174509
 ] 

Chris Trezzo commented on HDFS-8791:


[~kihwal] Should I add a blurb in the jira release notes since this is a data 
node layout change?

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-9876:

Attachment: HDFS-9876.001.patch

Thanks for the review, Nicholas! Update the patch to address your comments.

> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Chris Trezzo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174499#comment-15174499
 ] 

Chris Trezzo commented on HDFS-8791:


Thanks [~kihwal] and everyone else!

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174493#comment-15174493
 ] 

Chris Nauroth commented on HDFS-7597:
-

I killed mine (14677).  Thanks, [~kihwal].

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas

2016-03-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174488#comment-15174488
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9876:
---

excessTypes should be computed if it is needed.  Let's move it inside 
{code}
if (candidates.size() > 1) {
  .. 
} {code}


> shouldProcessOverReplicated should not count number of pending replicas
> ---
>
> Key: HDFS-9876
> URL: https://issues.apache.org/jira/browse/HDFS-9876
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Takuya Fukudome
>Assignee: Jing Zhao
> Attachments: HDFS-9876.000.patch
>
>
> Currently when checking if we should process over-replicated block in 
> {{addStoredBlock}}, we count both the number of reported replicas and pending 
> replicas. However, {{processOverReplicatedBlock}} chooses excess replicas 
> only among all the reported storages of the block. So in a situation where we 
> have over-replicated replica/internal blocks which only reside in the pending 
> queue, we will not be able to choose any extra replica to delete.
> For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do 
> nothing. But for striped blocks, this may cause endless loop in 
> {{chooseExcessReplicasStriped}} in the following while loop:
> {code}
>   while (candidates.size() > 1) {
> List replicasToDelete = placementPolicy
> .chooseReplicasToDelete(nonExcess, candidates, (short) 1,
> excessTypes, null, null);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174484#comment-15174484
 ] 

Kihwal Lee commented on HDFS-7597:
--

I kicked the build already : 
https://builds.apache.org/job/PreCommit-HDFS-Build/14674
I think this is what you started: 
https://builds.apache.org/job/PreCommit-HDFS-Build/14677
Better kill one to avoid wasting resources.

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Xiaoyu Yao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174487#comment-15174487
 ] 

Xiaoyu Yao commented on HDFS-9881:
--

Thanks [~andrew.wang]. Patch LGTM, +1.  

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174477#comment-15174477
 ] 

Zhe Zhang commented on HDFS-9881:
-

Thanks Andrew. +1 on the patch pending Jenkins.

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174470#comment-15174470
 ] 

Hadoop QA commented on HDFS-9880:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
9s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
21s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s 
{color} | {color:green} trunk passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
49s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 
20s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s 
{color} | {color:green} the patch passed with JDK v1.8.0_72 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 24s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 18s {color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
23s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 148m 41s {color} 
| {color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog |
| JDK v1.8.0_72 Timed out junit tests | 
org.apache.hadoop.hdfs.server.namenode.TestNameNodeRpcServerMethods |
| JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestDFSClientExcludedNodes |
|   | hadoop.hdfs.server.datanode.TestFsDatasetCache |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790763/HDFS-9880.patch |
| JIRA Issue | HDFS-9880 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 1ea98065a99a 3.1

[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9881:
--
Attachment: HDFS-9881.002.patch

Rebased

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2016-03-01 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174467#comment-15174467
 ] 

Jing Zhao commented on HDFS-8786:
-

To avoid the changes inside of BlockInfoStriped, I think we can do the 
following: since we always use two arrays (one for datanodes and the other for 
their internal block indices) when returning a striped block to client, we can 
adjust the order inside of the LocatedStripedBlock instead of BlockInfoStriped. 
I.e., the order adjustment (based on decommissioning DN information) should be 
done against LocatedStripedBlock in the same step as {{sortLocatedBlocks}}.

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> ---
>
> Key: HDFS-8786
> URL: https://issues.apache.org/jira/browse/HDFS-8786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174468#comment-15174468
 ] 

Andrew Wang commented on HDFS-9881:
---

We still missed adding a / after TRASH_PREFIX, HDFS-9844 fixes the issue with 
"/" as EZ though.

We should avoid using string concatenation for paths in general, it leads to 
issues like this.

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174459#comment-15174459
 ] 

Chris Nauroth commented on HDFS-7597:
-

[~daryn], thank you for the reminder.  It was a surprise, because I honestly 
thought we had already closed down on this one.

I remain +1 for the patch, same as several months ago.  I manually triggered a 
Jenkins run since it has been a while.  I'll plan on committing this tomorrow 
barring any objections or surprises from pre-commit.

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174455#comment-15174455
 ] 

Zhe Zhang commented on HDFS-9881:
-

Hi Andrew, I thought the issue was already fixed by HDFS-9844?

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned

2016-03-01 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174447#comment-15174447
 ] 

Jing Zhao commented on HDFS-8786:
-

Thanks for updating the patch, [~rakeshr]. Comments on the current patch:
1. For ErasureCodingWork, we have to handle the following scenarios when 
{{hasAllInternalBlocks}} returns true:
#* we have decommissioning DN
#* we have enough DN but not enough racks
#* the above two situations happen at the same time
Things may get a little complicated when decommissioning situation and more 
racks situation get mixed. For example, it is possible that there are 9 live 
internal blocks on 5 racks, and 1 more internal block in a decommissioning 
datanode. In this situation, we will only choose 1 target and the 
decommissioning dn should be ignored. In another example, if we have 8 live 
replicas and 1 decommissioning replica, we should replicate the decommissioning 
replica. Looks to me the current patch cannot handle all the scenarios.

Currently I think we should explicitly let ErasureCodingWork know if the 
reconstruction work is triggered by not-enough-racks. We can have this check in 
{{validateReconstructionWork}}, and pass the result into the ErasureCodingWork 
instance. Later when adding task to DN, we should first check this result, and 
if it is true, run the current code added by HDFS-9818. If it is false, we 
check if the source nodes cover all the internal blocks but contain 
decommissioning datanode, and schedule replication work for it if necessary.

> Erasure coding: DataNode should transfer striped blocks before being 
> decommissioned
> ---
>
> Key: HDFS-8786
> URL: https://issues.apache.org/jira/browse/HDFS-8786
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Zhe Zhang
>Assignee: Rakesh R
> Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, 
> HDFS-8786-003.patch, HDFS-8786-draft.patch
>
>
> Per [discussion | 
> https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004]
>  under HDFS-8697, it's too expensive to reconstruct block groups for decomm 
> purpose.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174437#comment-15174437
 ] 

Hadoop QA commented on HDFS-9881:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} HDFS-9881 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790786/HDFS-9881.001.patch |
| JIRA Issue | HDFS-9881 |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14675/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9881:
--
Attachment: HDFS-9881.001.patch

Thanks to [~qwertymaniac] for the spot. [~zhz] / [~xyao] / [~cmccabe] mind 
reviewing?

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-9881:
--
Status: Patch Available  (was: Open)

> DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
> --
>
> Key: HDFS-9881
> URL: https://issues.apache.org/jira/browse/HDFS-9881
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Critical
> Attachments: HDFS-9881.001.patch
>
>
> getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
> files into a directory named "/ez/.Trashandrew" rather than 
> "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones

2016-03-01 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-9881:
-

 Summary: DistributedFileSystem#getTrashRoot returns incorrect path 
for encryption zones
 Key: HDFS-9881
 URL: https://issues.apache.org/jira/browse/HDFS-9881
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.8.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Critical


getTrashRoots is missing a "/" in the path concatenation, so ends up putting 
files into a directory named "/ez/.Trashandrew" rather than "/ez/.Trash/andrew"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek

2016-03-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174430#comment-15174430
 ] 

Daryn Sharp commented on HDFS-7597:
---

Patch still applies and is still relevant.  It actually provides a performance 
boost to the NN due to reduced ugi instances.  Instead of n-many ugis per task, 
it's a couple based on how quickly the lru recycles.

> DNs should not open new NN connections when webhdfs clients seek
> 
>
> Key: HDFS-7597
> URL: https://issues.apache.org/jira/browse/HDFS-7597
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: webhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch
>
>
> Webhdfs seeks involve closing the current connection, and reissuing a new 
> open request with the new offset.  The RPC layer caches connections so the DN 
> keeps a lingering connection open to the NN.  Connection caching is in part 
> based on UGI.  Although the client used the same token for the new offset 
> request, the UGI is different which forces the DN to open another unnecessary 
> connection to the NN.
> A job that performs many seeks will easily crash the NN due to fd exhaustion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9763) Add merge api

2016-03-01 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174414#comment-15174414
 ] 

Arpit Agarwal commented on HDFS-9763:
-

TOCTOU is a red herring. The real problem as mentioned by others is the number 
of RPCs. The proposal to cap the number of operations in one call is not 
unusual. e.g.[S3|https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html] 
and [Azure 
Storage|https://msdn.microsoft.com/en-us/library/azure/dd135734.aspx] do so for 
list calls, as does HDFS.

> Add merge api
> -
>
> Key: HDFS-9763
> URL: https://issues.apache.org/jira/browse/HDFS-9763
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Ashutosh Chauhan
>Assignee: Xiaobing Zhou
> Attachments: HDFS_Merge_API_Proposal.pdf
>
>
> It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. 
> Semantics will be to move all files under dir1 to dir2 and doing a rename of 
> files in case of collisions.
> In absence of this api, Hive[1] has to check for collision for each file and 
> then come up unique name and try again and so on. This is inefficient in 
> multiple ways:
> 1) It generates huge number of calls on NN (atleast 2*number of source files 
> in dir1)
> 2) It suffers from TOCTOU[2] bug for client picked up name in case of 
> collision.
> 3) Whole operation is not atomic.
> A merge api outlined as above will be immensely useful for Hive and 
> potentially to other HDFS users.
> [1] 
> https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576
> [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174413#comment-15174413
 ] 

Hudson commented on HDFS-8791:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9403 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9403/])
HDFS-8791. block ID-based DN storage layout can be very slow for (kihwal: rev 
2c8496ebf3b7b31c2e18fdf8d4cb2a0115f43112)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-to-57-dn-layout-dir.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-56-layout-datanode-dir.tgz
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java


> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-8791:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.7.3
   Status: Resolved  (was: Patch Available)

Thanks for working on the fix, [~ctrezzo] and thank you all for reviews and 
discussions. I've committed it from trunk through branch-2.7.  I didn't put in 
branch-2.6 because it does not have the parallel upgrade fix.  I will leave it 
up to the 2.6 release manager and the interested party.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174373#comment-15174373
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8791:
---

It seems that the 32x32 layout is already well tested.  Let use it unless 
someone wants to test the 64x64 or other layouts.

+1 for the patch.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174354#comment-15174354
 ] 

Kuhu Shukla commented on HDFS-9880:
---

+1 (non-binding). Lgtm.

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans

2016-03-01 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174323#comment-15174323
 ] 

Daryn Sharp commented on HDFS-8674:
---

I'll try to rebase again.

> Improve performance of postponed block scans
> 
>
> Key: HDFS-8674
> URL: https://issues.apache.org/jira/browse/HDFS-8674
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.6.0
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
>Priority: Critical
> Attachments: HDFS-8674.patch, HDFS-8674.patch
>
>
> When a standby goes active, it marks all nodes as "stale" which will cause 
> block invalidations for over-replicated blocks to be queued until full block 
> reports are received from the nodes with the block.  The replication monitor 
> scans the queue with O(N) runtime.  It picks a random offset and iterates 
> through the set to randomize blocks scanned.
> The result is devastating when a cluster loses multiple nodes during a 
> rolling upgrade. Re-replication occurs, the nodes come back, the excess block 
> invalidations are postponed. Rescanning just 2k blocks out of millions of 
> postponed blocks may take multiple seconds. During the scan, the write lock 
> is held which stalls all other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9534) Add CLI command to clear storage policy from a path.

2016-03-01 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174306#comment-15174306
 ] 

Chris Nauroth commented on HDFS-9534:
-

+1 from me for patch v003.  Thank you, [~xiaobingo]!

> Add CLI command to clear storage policy from a path.
> 
>
> Key: HDFS-9534
> URL: https://issues.apache.org/jira/browse/HDFS-9534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Chris Nauroth
>Assignee: Xiaobing Zhou
> Attachments: HDFS-9534.001.patch, HDFS-9534.002.patch, 
> HDFS-9534.003.patch
>
>
> The {{hdfs storagepolicies}} command has sub-commands for 
> {{-setStoragePolicy}} and {{-getStoragePolicy}} on a path.  However, there is 
> no {{-removeStoragePolicy}} to remove a previously set storage policy on a 
> path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9870) Remove unused imports from DFSUtil

2016-03-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174274#comment-15174274
 ] 

Hudson commented on HDFS-9870:
--

FAILURE: Integrated in Hadoop-trunk-Commit #9402 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/9402/])
HDFS-9870. Remove unused imports from DFSUtil. Contributed by Brahma (cnauroth: 
rev 2137e8feeb5c5c88d3a80db3a334fd472f299ee4)
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java


> Remove unused imports from DFSUtil
> --
>
> Key: HDFS-9870
> URL: https://issues.apache.org/jira/browse/HDFS-9870
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch
>
>
> Remove the following unused imports {{DFSUtil.java}}
> {code}
> import static 
> org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY;
> import java.io.InterruptedIOException;
> import com.google.common.collect.Sets;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9870) Remove unused imports from DFSUtil

2016-03-01 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-9870:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

+1.  I have committed this to trunk, branch-2 and branch-2.8.  [~brahmareddy], 
thank you for cleaning up this code.

> Remove unused imports from DFSUtil
> --
>
> Key: HDFS-9870
> URL: https://issues.apache.org/jira/browse/HDFS-9870
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
> Fix For: 2.8.0
>
> Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch
>
>
> Remove the following unused imports {{DFSUtil.java}}
> {code}
> import static 
> org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY;
> import java.io.InterruptedIOException;
> import com.google.common.collect.Sets;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9880:
-
Status: Patch Available  (was: Open)

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee reassigned HDFS-9880:


Assignee: Kihwal Lee

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9763) Add merge api

2016-03-01 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174219#comment-15174219
 ] 

Jitendra Nath Pandey commented on HDFS-9763:


 I don't think this API will address TOCTOU because hive will need multiple 
merge calls anyway for a given query, to reach final stage. In the absence of 
transactional support in the file system, as suggested by Haohui, it is 
difficult to support such semantics at HDFS level.

  However, I do see the need of hive to avoid hundreds or thousands of 
individual rename operations. This also seems to be a pretty general case for 
many map-reduce based data ingest jobs that write data all over the place and 
finally need to consolidate in a  final directory, after all the related jobs 
are successful.
  IMO, the requirement here is of a 'rename' operation that merges the 
directories instead of overwriting, or throwing already-exists-exception. 
However, it can be argued that its better to add a new 'merge' API, instead of 
overloading the 'rename'. 
 
  I see that there are two main design points here that need to be carefully 
thought through and agreed upon
1) How do we resolve conflicts in file names
   I think the proposal handles it elegantly by specifying a policy. In fact I 
would love to change our 'rename' to support different policies that also 
provides merge capability, but I can accept a separate API for compatibility 
and simplicity.
2) The O(N) problem.
   It is not so bad because its not recursive. There is no scope of recursion 
here.
   Still if a directory has a lot of files, an iterative approach is feasible 
here, because the source directory will get smaller after every iteration. We 
do have precedence of iteration for example 'listStatus'. This will avoid the 
complexity in the NN to release the lock in between.

> Add merge api
> -
>
> Key: HDFS-9763
> URL: https://issues.apache.org/jira/browse/HDFS-9763
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Ashutosh Chauhan
>Assignee: Xiaobing Zhou
> Attachments: HDFS_Merge_API_Proposal.pdf
>
>
> It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. 
> Semantics will be to move all files under dir1 to dir2 and doing a rename of 
> files in case of collisions.
> In absence of this api, Hive[1] has to check for collision for each file and 
> then come up unique name and try again and so on. This is inefficient in 
> multiple ways:
> 1) It generates huge number of calls on NN (atleast 2*number of source files 
> in dir1)
> 2) It suffers from TOCTOU[2] bug for client picked up name in case of 
> collision.
> 3) Whole operation is not atomic.
> A merge api outlined as above will be immensely useful for Hive and 
> potentially to other HDFS users.
> [1] 
> https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576
> [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9880:
-
Attachment: HDFS-9880.patch

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
> Attachments: HDFS-9880.patch
>
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9880:
-
Target Version/s: 2.7.3

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174214#comment-15174214
 ] 

Kihwal Lee commented on HDFS-9198:
--

Relevant tests are all passing with this patch.
{noformat}
---
 T E S T S
---
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.server.datanode.TestIncrementalBrVariations
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.301 sec - in 
org.apache.hadoop.hdfs.server.datanode.TestIncrementalBrVariations
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.83 sec - in 
org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.238 sec - 
in org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.094 sec - in 
org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was 
removed in 8.0
Running org.apache.hadoop.hdfs.TestDatanodeRegistration
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.187 sec - in 
org.apache.hadoop.hdfs.TestDatanodeRegistration

Results :

Tests run: 32, Failures: 0, Errors: 0, Skipped: 0
{noformat}

Just committed to branch-2.7. Thanks, Vinay!

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9198-Branch-2-withamend.diff, 
> HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, 
> HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9198) Coalesce IBR processing in the NN

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9198:
-
Fix Version/s: 2.7.3

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0, 2.7.3
>
> Attachments: HDFS-9198-Branch-2-withamend.diff, 
> HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, 
> HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174190#comment-15174190
 ] 

Kihwal Lee commented on HDFS-9198:
--

+1 for the branch-2.7 patch. It is code-wise identical to what we are running 
internally.

> Coalesce IBR processing in the NN
> -
>
> Key: HDFS-9198
> URL: https://issues.apache.org/jira/browse/HDFS-9198
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Daryn Sharp
>Assignee: Daryn Sharp
> Fix For: 2.8.0
>
> Attachments: HDFS-9198-Branch-2-withamend.diff, 
> HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, 
> HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, 
> HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch
>
>
> IBRs from thousands of DNs under load will degrade NN performance due to 
> excessive write-lock contention from multiple IPC handler threads.  The IBR 
> processing is quick, so the lock contention may be reduced by coalescing 
> multiple IBRs into a single write-lock transaction.  The handlers will also 
> be freed up faster for other operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174163#comment-15174163
 ] 

Kihwal Lee commented on HDFS-9880:
--

If I run this test case in a loop, the failure is reproduced in less than 10 
runs. With the timeout changed to 2000 (2 seconds), it passes all the time. The 
test-level timeout of 10 seconds should be removed. The test run-time is very 
close to 10 seconds.

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-9880:
-
Description: When {{testForcedRegistration}} calls 
{{waitForBlockReport()}}, it sometimes returns false because the timeout is too 
short (100ms).

> TestDatanodeRegistration fails occasionally
> ---
>
> Key: HDFS-9880
> URL: https://issues.apache.org/jira/browse/HDFS-9880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Kihwal Lee
>
> When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes 
> returns false because the timeout is too short (100ms).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9880) TestDatanodeRegistration fails occasionally

2016-03-01 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-9880:


 Summary: TestDatanodeRegistration fails occasionally
 Key: HDFS-9880
 URL: https://issues.apache.org/jira/browse/HDFS-9880
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Reporter: Kihwal Lee






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw

2016-03-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174003#comment-15174003
 ] 

Hadoop QA commented on HDFS-9699:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
29s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s 
{color} | {color:green} HDFS-8707 passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s 
{color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s 
{color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
12s {color} | {color:green} HDFS-8707 passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
10s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed with JDK v1.8.0_74 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 33s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed with JDK v1.7.0_95 {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
9s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 7s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.8.0_74. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 8s 
{color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK 
v1.7.0_95. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
19s {color} | {color:green} Patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 57m 4s {color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0cf5e66 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12790733/HDFS-9699.HDFS-8707.002.patch
 |
| JIRA Issue | HDFS-9699 |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux db7841d3ae98 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | HDFS-8707 / eca19c1 |
| Default Java | 1.7.0_95 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_74 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 |
| JDK v1.7.0_95  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14672/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: 
hadoop-hdfs-project/hadoop-hdfs-native-client |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/14672/console |
| Powered by | Apache Yetus 0.3.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> libhdfs++: Add appropriate catch blocks for ASIO operations that throw
> --
>
> Key: HDFS-9699
> URL: https://issues.apache.org/jira/browse/HDFS-969

[jira] [Updated] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw

2016-03-01 Thread Bob Hansen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Hansen updated HDFS-9699:
-
Attachment: HDFS-9699.HDFS-8707.002.patch

New patch: moved IOService exception handling into the IOServiceImpl 
implementation so we don't lose the work object

Adopted [~James Clampffer]'s suggested wording changes.

> libhdfs++: Add appropriate catch blocks for ASIO operations that throw
> --
>
> Key: HDFS-9699
> URL: https://issues.apache.org/jira/browse/HDFS-9699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-6966.HDFS-8707.000.patch, 
> HDFS-9699.HDFS-8707.001.patch, HDFS-9699.HDFS-8707.002.patch, 
> cancel_backtrace.txt
>
>
> libhdfs++ doesn't create exceptions of its own but it should be able to 
> gracefully handle exceptions thrown by libraries it uses, particularly asio.
> libhdfs++ should be able to catch most exceptions within reason either at the 
> call site or in the code that spins up asio worker threads.  Certain system 
> exceptions like std::bad_alloc don't need to be caught because by that point 
> the process is likely in a unrecoverable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw

2016-03-01 Thread James Clampffer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173914#comment-15173914
 ] 

James Clampffer commented on HDFS-9699:
---

bq. Perhaps we should put it into a loop and respond to exceptions by 
re-entering the stack so we won't lose the worker thread. I'll put that 
together in another patch.

Are you taking about doing something like:
{code}
while(!fs_shutdown) {
  try {
my_io_service.run()
  } catch (stuff) {
//handle stuff
  }
}
{code}

If so I think that's a solid approach as long as it's logging a lot.

> libhdfs++: Add appropriate catch blocks for ASIO operations that throw
> --
>
> Key: HDFS-9699
> URL: https://issues.apache.org/jira/browse/HDFS-9699
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Reporter: James Clampffer
>Assignee: James Clampffer
> Attachments: HDFS-6966.HDFS-8707.000.patch, 
> HDFS-9699.HDFS-8707.001.patch, cancel_backtrace.txt
>
>
> libhdfs++ doesn't create exceptions of its own but it should be able to 
> gracefully handle exceptions thrown by libraries it uses, particularly asio.
> libhdfs++ should be able to catch most exceptions within reason either at the 
> call site or in the code that spins up asio worker threads.  Certain system 
> exceptions like std::bad_alloc don't need to be caught because by that point 
> the process is likely in a unrecoverable state.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4

2016-03-01 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173902#comment-15173902
 ] 

Kihwal Lee commented on HDFS-8791:
--

[~szetszwo], do you have any further concerns?  If not, +1 for the patch.  

We have been running with this patch in production with good results. Almost 
all clusters are now on the new layout. The parallel layout upgrade typically 
took 3-5 minutes per node for us. # of blocks on each storage was roughly 100k 
to 200k. Once you are over the layout upgrade hurdle, it is all green pasture. 
du runs faster and scanning a block pool slice finished in a couple of seconds. 
 As mentioned before, our parallel upgrade was done offline using a custom 
tool, which took advantage of replica cache files (HDFS-7928) created during 
shutdown. This avoids the discovery phase of the upgrade.

> block ID-based DN storage layout can be very slow for datanode on ext4
> --
>
> Key: HDFS-8791
> URL: https://issues.apache.org/jira/browse/HDFS-8791
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.6.0, 2.8.0, 2.7.1
>Reporter: Nathan Roberts
>Assignee: Chris Trezzo
>Priority: Blocker
> Attachments: 32x32DatanodeLayoutTesting-v1.pdf, 
> 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, 
> HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, 
> HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, 
> hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt
>
>
> We are seeing cases where the new directory layout causes the datanode to 
> basically cause the disks to seek for 10s of minutes. This can be when the 
> datanode is running du, and it can also be when it is performing a 
> checkDirs(). Both of these operations currently scan all directories in the 
> block pool and that's very expensive in the new layout.
> The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K 
> leaf directories where block files are placed.
> So, what we have on disk is:
> - 256 inodes for the first level directories
> - 256 directory blocks for the first level directories
> - 256*256 inodes for the second level directories
> - 256*256 directory blocks for the second level directories
> - Then the inodes and blocks to store the the HDFS blocks themselves.
> The main problem is the 256*256 directory blocks. 
> inodes and dentries will be cached by linux and one can configure how likely 
> the system is to prune those entries (vfs_cache_pressure). However, ext4 
> relies on the buffer cache to cache the directory blocks and I'm not aware of 
> any way to tell linux to favor buffer cache pages (even if it did I'm not 
> sure I would want it to in general).
> Also, ext4 tries hard to spread directories evenly across the entire volume, 
> this basically means the 64K directory blocks are probably randomly spread 
> across the entire disk. A du type scan will look at directories one at a 
> time, so the ioscheduler can't optimize the corresponding seeks, meaning the 
> seeks will be random and far. 
> In a system I was using to diagnose this, I had 60K blocks. A DU when things 
> are hot is less than 1 second. When things are cold, about 20 minutes.
> How do things get cold?
> - A large set of tasks run on the node. This pushes almost all of the buffer 
> cache out, causing the next DU to hit this situation. We are seeing cases 
> where a large job can cause a seek storm across the entire cluster.
> Why didn't the previous layout see this?
> - It might have but it wasn't nearly as pronounced. The previous layout would 
> be a few hundred directory blocks. Even when completely cold, these would 
> only take a few a hundred seeks which would mean single digit seconds.  
> - With only a few hundred directories, the odds of the directory blocks 
> getting modified is quite high, this keeps those blocks hot and much less 
> likely to be evicted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >