[jira] [Updated] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HDFS-7866: - Attachment: HDFS-7866.11.patch Thanks Zhe for the explanations! Update patch accordingly. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.11.patch, > HDFS-7866.4.patch, HDFS-7866.5.patch, HDFS-7866.6.patch, HDFS-7866.7.patch, > HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175128#comment-15175128 ] Zhe Zhang commented on HDFS-7866: - And yes, {{RS_6_3_POLICY_ID}} sounds good. If we later support multiple cell sizes we can reflect that in naming as well. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175124#comment-15175124 ] Hudson commented on HDFS-9851: -- FAILURE: Integrated in Hadoop-trunk-Commit #9410 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9410/]) HDFS-9851. NameNode throws NPE when setPermission is called on a path (aajisaka: rev 27e0681f28ee896ada163bbbc08fd44d113e7d15) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirXAttrOp.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/security/TestPermission.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSShell.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175122#comment-15175122 ] Zhe Zhang commented on HDFS-7866: - Sorry for the confusion Rui. I meant some text-base illustration like below. I think it will be useful to illustrate the header format for both contiguous and striped blocks. If it's still confusing, feel free to skip it and I'll be happy to do it as a follow-on. {code} // StripedBlockUtil * | < Block Group > | <- Block Group: logical unit composing * | |striped HDFS files. * blk_0 blk_1 blk_2 <- Internal Blocks: each internal block *| | | represents a physically stored local *v v v block file * +--+ +--+ +--+ * |cell_0| |cell_1| |cell_2| <- {@link StripingCell} represents the * +--+ +--+ +--+ logical order that a Block Group should * |cell_3| |cell_4| |cell_5| be accessed: cell_0, cell_1, ... * +--+ +--+ +--+ * |cell_6| |cell_7| |cell_8| * +--+ +--+ +--+ * |cell_9| * +--+ <- A cell contains cellSize bytes of data {code} > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175110#comment-15175110 ] Xiao Chen commented on HDFS-9766: - Thanks [~ajisakaa] and [~liuml07]. > TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently > -- > > Key: HDFS-9766 > URL: https://issues.apache.org/jira/browse/HDFS-9766 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Xiao Chen > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9766.01.patch > > > *Stacktrace* > {code} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289) > {code} > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175108#comment-15175108 ] Brahma Reddy Battula commented on HDFS-9851: Thanks for review and commit. > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9883) Replace the hard-code value to variable
[ https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175092#comment-15175092 ] Hadoop QA commented on HDFS-9883: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 40s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 36s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 25s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs: patch generated 1 new + 509 unchanged - 0 fixed = 510 total (was 509) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 2s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m 5s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 133m 22s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.web.TestWebHdfsWithRestCsrfPreventionFilter | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790849/HDFS-9883.001.patch | | JIRA Issue | HDFS-9883 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite
[jira] [Updated] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-9851: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 2.8.0 Status: Resolved (was: Patch Available) Committed this to branch-2.7 and above. Thanks [~brahmareddy] for the contribution and thanks [~liuml07] for the review! > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175087#comment-15175087 ] Hudson commented on HDFS-9766: -- FAILURE: Integrated in Hadoop-trunk-Commit #9409 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9409/]) HDFS-9766. TestDataNodeMetrics#testDataNodeTimeSpend fails (aajisaka: rev e2ddf824694eb4605f3bb04a9c26e4b98529f5bc) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeMetrics.java > TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently > -- > > Key: HDFS-9766 > URL: https://issues.apache.org/jira/browse/HDFS-9766 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Xiao Chen > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9766.01.patch > > > *Stacktrace* > {code} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289) > {code} > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175073#comment-15175073 ] Akira AJISAKA commented on HDFS-9851: - +1, checking this in. > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8475) Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no length prefix available
[ https://issues.apache.org/jira/browse/HDFS-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175054#comment-15175054 ] Harshal Joshi commented on HDFS-8475: - Hi Team, Has anyone got a clue about this issue. Currently I am facing the same issue while inserting data into table hive table with ORC file formats. ooks like this is a generic issue for below case : 1. create external table T1(col A , B , C) with partition on (col A) stored as ORC . Load table with substantial data. in my case around 85 GB data. 2. Create external table T2(col A,B,C) with partition on (Col B) stored as ORC. Load table T2 from T1 with dynamic partition. Output :- Premature EOF exception Thanks > Exception in createBlockOutputStream java.io.EOFException: Premature EOF: no > length prefix available > > > Key: HDFS-8475 > URL: https://issues.apache.org/jira/browse/HDFS-8475 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Vinod Valecha >Priority: Blocker > > Scenraio: > = > write a file > corrupt block manually > Exception stack trace- > 2015-05-24 02:31:55.291 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Exception in > createBlockOutputStream > java.io.EOFException: Premature EOF: no length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) > [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer createBlockOutputStream > Exception in createBlockOutputStream > java.io.EOFException: Premature EOF: no > length prefix available > at > org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1492) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1155) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1088) > at > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:514) > 2015-05-24 02:31:55.291 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Abandoning > BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579 > [5/24/15 2:31:55:291 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream > Abandoning BP-176676314-10.108.106.59-1402620296713:blk_1404621403_330880579 > 2015-05-24 02:31:55.299 INFO [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] Excluding datanode > 10.108.106.59:50010 > [5/24/15 2:31:55:299 UTC] 02027a3b DFSClient I > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer nextBlockOutputStream > Excluding datanode 10.108.106.59:50010 > 2015-05-24 02:31:55.300 WARNING [T-33716795] > [org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer] DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and 1 node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555) > [5/24/15 2:31:55:300 UTC] 02027a3b DFSClient W > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer run DataStreamer Exception > > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File > /var/db/opera/files/B4889CCDA75F9751DDBB488E5AAB433E/BE4DAEF290B7136ED6EF3D4B157441A2/BE4DAEF290B7136ED6EF3D4B157441A2-4.pag > could only be replicated to 0 nodes instead of minReplication (=1). There > are 1 datanode(s) running and 1 node(s) are excluded in this operation. > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpc
[jira] [Updated] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-9766: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 2.8.0 Status: Resolved (was: Patch Available) Committed this to branch-2.7 and above. Thanks [~xiaochen] for the contribution and thanks [~liuml07] for the review! > TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently > -- > > Key: HDFS-9766 > URL: https://issues.apache.org/jira/browse/HDFS-9766 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Xiao Chen > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9766.01.patch > > > *Stacktrace* > {code} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289) > {code} > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9851) Name node throws NPE when setPermission is called on a path that does not exist
[ https://issues.apache.org/jira/browse/HDFS-9851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175050#comment-15175050 ] Brahma Reddy Battula commented on HDFS-9851: testfailures are unrelated...[~ajisakaa] can you please take a look...? > Name node throws NPE when setPermission is called on a path that does not > exist > --- > > Key: HDFS-9851 > URL: https://issues.apache.org/jira/browse/HDFS-9851 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1, 2.7.2 >Reporter: David Yan >Assignee: Brahma Reddy Battula >Priority: Critical > Attachments: HDFS-9851-002.patch, HDFS-9851-branch-2.7.patch, > HDFS-9851.patch > > > Tried it on both Hadoop 2.7.1 and 2.7.2, and I'm getting the same error when > setPermission is called on a path that does not exist: > {code} > 16/02/23 16:37:03.888 DEBUG > security.UserGroupInformation:FSPermissionChecker.ja > va:164 - ACCESS CHECK: > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker@299b19af, > doCheckOwner=true, ancestorAccess=null, parentAccess=null, access=null, > subAccess=null, ignoreEmptyDir=false > 16/02/23 16:37:03.889 DEBUG ipc.Server:ProtobufRpcEngine.java:631 - Served: > setPermission queueTime= 3 procesingTime= 3 exception= NullPointerException > 16/02/23 16:37:03.890 WARN ipc.Server:Server.java:2068 - IPC Server handler 2 > on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.setPermission > from 127.0.0.1:36190 Call#21 Retry#0 > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkOwner(FSPermissionChecker.java:247) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:227) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1720) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1704) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkOwner(FSDirectory.java:1673) > at > org.apache.hadoop.hdfs.server.namenode.FSDirAttrOp.setPermission(FSDirAttrOp.java:61) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setPermission(FSNamesystem.java:1653) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setPermission(NameNodeRpcServer.java:695) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setPermission(ClientNamenodeProtocolServerSideTranslatorPB.java:453) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043) > {code} > I don't see this problem with Hadoop 2.6.x. > The client that issues the setPermission call was compiled with Hadoop 2.2.0 > libraries. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9884) Use doxia macro to generate in-page TOC of HDFS site documentation
[ https://issues.apache.org/jira/browse/HDFS-9884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Masatake Iwasaki updated HDFS-9884: --- Description: Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown. (was: Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown.) > Use doxia macro to generate in-page TOC of HDFS site documentation > -- > > Key: HDFS-9884 > URL: https://issues.apache.org/jira/browse/HDFS-9884 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Affects Versions: 2.7.0 >Reporter: Masatake Iwasaki >Assignee: Masatake Iwasaki > > Since maven-site-plugin 3.5 was released, we can use toc macro in Markdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9766) TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently
[ https://issues.apache.org/jira/browse/HDFS-9766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175041#comment-15175041 ] Akira AJISAKA commented on HDFS-9766: - LGTM, +1. > TestDataNodeMetrics#testDataNodeTimeSpend fails intermittently > -- > > Key: HDFS-9766 > URL: https://issues.apache.org/jira/browse/HDFS-9766 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0 >Reporter: Mingliang Liu >Assignee: Xiao Chen > Attachments: HDFS-9766.01.patch > > > *Stacktrace* > {code} > java.lang.AssertionError: null > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertTrue(Assert.java:52) > at > org.apache.hadoop.hdfs.server.datanode.TestDataNodeMetrics.testDataNodeTimeSpend(TestDataNodeMetrics.java:289) > {code} > See recent builds: > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14393/testReport/org.apache.hadoop.hdfs.server.datanode/TestDataNodeMetrics/testDataNodeTimeSpend/ > * > https://builds.apache.org/job/PreCommit-HDFS-Build/14317/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.8.0_66.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9884) Use doxia macro to generate in-page TOC of HDFS site documentation
Masatake Iwasaki created HDFS-9884: -- Summary: Use doxia macro to generate in-page TOC of HDFS site documentation Key: HDFS-9884 URL: https://issues.apache.org/jira/browse/HDFS-9884 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Masatake Iwasaki Assignee: Masatake Iwasaki Since maven-site-plugin 3.5 was releaced, we can use toc macro in Markdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9883) Replace the hard-code value to variable
[ https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-9883: Status: Patch Available (was: Open) Attach a simple patch. > Replace the hard-code value to variable > --- > > Key: HDFS-9883 > URL: https://issues.apache.org/jira/browse/HDFS-9883 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: HDFS-9883.001.patch > > > In some class of HDFS, there are many hard-code value places. Like this: > {code} > /** Constructor >* @param bandwidthPerSec bandwidth allowed in bytes per second. >*/ > public DataTransferThrottler(long bandwidthPerSec) { > this(500, bandwidthPerSec); // by default throttling period is 500ms > } > {code} > It will be better replace these value to variables so that it will not be > easily ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174951#comment-15174951 ] Hudson commented on HDFS-9876: -- FAILURE: Integrated in Hadoop-trunk-Commit #9408 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9408/]) HDFS-9876. shouldProcessOverReplicated should not count number of (jing9: rev f2ba7da4f0df6cf0fc245093aeb4500158e6ee0b) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, > HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9883) Replace the hard-code value to variable
[ https://issues.apache.org/jira/browse/HDFS-9883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lin Yiqun updated HDFS-9883: Attachment: HDFS-9883.001.patch > Replace the hard-code value to variable > --- > > Key: HDFS-9883 > URL: https://issues.apache.org/jira/browse/HDFS-9883 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.7.1 >Reporter: Lin Yiqun >Assignee: Lin Yiqun >Priority: Minor > Attachments: HDFS-9883.001.patch > > > In some class of HDFS, there are many hard-code value places. Like this: > {code} > /** Constructor >* @param bandwidthPerSec bandwidth allowed in bytes per second. >*/ > public DataTransferThrottler(long bandwidthPerSec) { > this(500, bandwidthPerSec); // by default throttling period is 500ms > } > {code} > It will be better replace these value to variables so that it will not be > easily ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9883) Replace the hard-code value to variable
Lin Yiqun created HDFS-9883: --- Summary: Replace the hard-code value to variable Key: HDFS-9883 URL: https://issues.apache.org/jira/browse/HDFS-9883 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.7.1 Reporter: Lin Yiqun Assignee: Lin Yiqun Priority: Minor In some class of HDFS, there are many hard-code value places. Like this: {code} /** Constructor * @param bandwidthPerSec bandwidth allowed in bytes per second. */ public DataTransferThrottler(long bandwidthPerSec) { this(500, bandwidthPerSec); // by default throttling period is 500ms } {code} It will be better replace these value to variables so that it will not be easily ignored. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9876: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks for the review, Nicholas! The failed tests all passed in my local run. I've committed the patch into trunk. > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Fix For: 3.0.0 > > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, > HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
[ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174860#comment-15174860 ] GAO Rui commented on HDFS-7661: --- Hi [~liuml07]. For the data consistency issue, I figured out an scenario under R-S-6-3 EC policy: 0. Write client call flush, we have V1 parity in all of parity dns: DN0,DN1,DN2 1. Two IDB(internal data block) dns failed. 2. Read client read 4 IDBs, and V1 parity in DN0. 3. Write client call flush twice. 4. Read client read V3 parity in DN1. 5. Write client call flush twice again. 6. Read client read V5 parity in DN2. 7. Read client have only five internal blocks for all of V1,V3 and V5. 8. Read fail. This is quite a extreme scenario, but still might could happen some time. Based on current design, we have only the overwritten parity data for last one version, and we do not have a lock, that could cause problem in data consistency. I think if a lock in NN is too heavy, maybe we could consider to maintain the lock in write client. So the read client get file infos and write client info from NN, then use the lock in write client to control the data consistency. Ping [~szetszwo], [~jingzhao],[~zhz], [~drankye], [~walter.k.su] and [~ikki407] for discussion :D > Erasure coding: support hflush and hsync > > > Key: HDFS-7661 > URL: https://issues.apache.org/jira/browse/HDFS-7661 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Tsz Wo Nicholas Sze >Assignee: GAO Rui > Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, > HDFS-7661-unitTest-wip-trunk.patch, HDFS-7661-wip.01.patch, > HDFS-EC-file-flush-sync-design-version1.1.pdf, > HDFS-EC-file-flush-sync-design-version2.0.pdf > > > We also need to support hflush/hsync and visible length. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174861#comment-15174861 ] Rui Li commented on HDFS-7866: -- Thanks Zhe for the review and comments! Forgive my ignorance but what do you mean by ASCII art? As to the policy IDs, I think we can put them in {{HdfsConstants}} and name as {{RS_6_3_POLICY_ID}}. Sounds good? > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174843#comment-15174843 ] Hadoop QA commented on HDFS-9876: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 3s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 8s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 6s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 49s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 7s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 75m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 181m 35s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.datanode.TestBlockReplacement | | | hadoop.tracing.TestTracing | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.TestEditLog | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patc
[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-9876: -- Component/s: namenode erasure-coding +1 patch looks good. > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: erasure-coding, namenode >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, > HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7866) Erasure coding: NameNode manages multiple erasure coding policies
[ https://issues.apache.org/jira/browse/HDFS-7866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174741#comment-15174741 ] Zhe Zhang commented on HDFS-7866: - Thanks Rui, very nice work here! I only finished reviewing the {{INodeFile}} header encoding logic. The main logic LGTM overall. Just some recommendations on code structure: # The below is not very accurate. For striped blocks, the tail 11 bits do not only represent redundancy. E.g. there could be different coders like RS, HH. 3-2 coding and 6-4 have same level of redundancy but they have different trade-offs. I see the intention here is to unify the terminology for contiguous and striped blocks. But I think it is pretty hard. {code} /** * Number of bits used to encode block layout type. * Different types can be replica or EC */ public static final int LAYOUT_BIT_WIDTH = 1; /** * Number of bits used to encode block redundancy. * For replicated block, the redundancy is the replication factor; * for erasure coded block, the redundancy is the EC policy's ID. */ public static final int REDUNDANCY_BIT_WIDTH = 11; {code} # So instead of the above, I think we can keep the {{LAYOUT_BIT_WIDTH}}, and then explicitly parse the {{BLOCK_LAYOUT_AND_REDUNDANCY}} section for both striped and contiguous blocks. To avoid repeating code we can add a util method {{maskLayoutBit}}. Not sure if it's worth it, since the code is very simple. # In the "Bit format:" Javadoc we should also explain the {{BLOCK_LAYOUT_AND_REDUNDANCY}} section more clearly. Some ASCII art here will be really helpful. Also, {{SYS_POLICY1_ID}} and {{SYS_POLICY2_ID}} look a little hacky. Can we do something similar to {{BlockStoragePolicySuite#createDefaultSuite}} and create some constants? We can also make it a byte to save space. > Erasure coding: NameNode manages multiple erasure coding policies > - > > Key: HDFS-7866 > URL: https://issues.apache.org/jira/browse/HDFS-7866 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Rui Li > Attachments: HDFS-7866-v1.patch, HDFS-7866-v2.patch, > HDFS-7866-v3.patch, HDFS-7866.10.patch, HDFS-7866.4.patch, HDFS-7866.5.patch, > HDFS-7866.6.patch, HDFS-7866.7.patch, HDFS-7866.8.patch, HDFS-7866.9.patch > > > This is to extend NameNode to load, list and sync predefine EC schemas in > authorized and controlled approach. The provided facilities will be used to > implement DFSAdmin commands so admin can list available EC schemas, then > could choose some of them for target EC zones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated HDFS-9882: -- Affects Version/s: 2.7.2 > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated HDFS-9882: -- Issue Type: New Feature (was: Task) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode >Affects Versions: 2.7.2 >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated HDFS-9882: -- Status: In Progress (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174731#comment-15174731 ] Hudson commented on HDFS-9881: -- FAILURE: Integrated in Hadoop-trunk-Commit #9407 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9407/]) HDFS-9881. DistributedFileSystem#getTrashRoot returns incorrect path for (wang: rev 4abb2fa687a80d2b76f2751dd31513822601b235) * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Fix For: 2.8.0 > > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated HDFS-9882: -- Status: Patch Available (was: Open) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Attachment: 0001-Add-heartbeatsTotal-metric.patch > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > Attachments: 0001-Add-heartbeatsTotal-metric.patch > > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Description: Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose to add another metric counter to show the total time. (was: Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose either (1) changing the heartbeat latency to reflect the total time spent on sending reports and processing commands or (2) adding another metric counter to show the total time. ) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > to add another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Open (was: Patch Available) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Status: Patch Available (was: In Progress) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9835) OIV: add ReverseXML processor which reconstructs an fsimage from an XML file
[ https://issues.apache.org/jira/browse/HDFS-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174717#comment-15174717 ] Lei (Eddy) Xu commented on HDFS-9835: - Hi, [~cmccabe] The patch looks good to me overall. +1 after addressing final nitpicks: * There are some checkstyle warnings for {{switch...case}} indentions. * The findbugs warning is false, but can we mitigate it somehow? * Could you also apply XATTR_..._MASKs to {{PBImageXmlWriter#dumpXAttrs}}. Thanks a lot for the work. > OIV: add ReverseXML processor which reconstructs an fsimage from an XML file > > > Key: HDFS-9835 > URL: https://issues.apache.org/jira/browse/HDFS-9835 > Project: Hadoop HDFS > Issue Type: New Feature > Components: tools >Affects Versions: 2.0.0-alpha >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > Attachments: HDFS-9835.001.patch, HDFS-9835.002.patch, > HDFS-9835.003.patch, HDFS-9835.004.patch, HDFS-9835.005.patch > > > OIV: add ReverseXML processor which reconstructs an fsimage from an XML file. > This will make it easy to create fsimages for testing, and manually edit > fsimages when there is corruption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-9881: -- Resolution: Fixed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Pushed to trunk, branch-2, branch-2.8. Thanks again Xiaoyu and Zhe for taking a look! > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Fix For: 2.8.0 > > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174710#comment-15174710 ] Andrew Wang commented on HDFS-9881: --- Jenkins failed with a port in use exception, so I'm going to go ahead and commit based on xyao's +1. Thanks all! > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174704#comment-15174704 ] Hadoop QA commented on HDFS-7597: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 12s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 24s {color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 3 unchanged - 0 fixed = 4 total (was 3) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 22s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 8s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 49s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 45s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 28s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 30s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 183m 3s {color} | {color:black} {color} |
[jira] [Updated] (HDFS-6440) Support more than 2 NameNodes
[ https://issues.apache.org/jira/browse/HDFS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6440: -- Release Note: This feature adds support for running additional standby NameNodes, which provides additional fault-tolerance. It is designed for a total of 3-5 NameNodes. > Support more than 2 NameNodes > - > > Key: HDFS-6440 > URL: https://issues.apache.org/jira/browse/HDFS-6440 > Project: Hadoop HDFS > Issue Type: New Feature > Components: auto-failover, ha, namenode >Affects Versions: 2.4.0 >Reporter: Jesse Yates >Assignee: Jesse Yates > Fix For: 3.0.0 > > Attachments: Multiple-Standby-NameNodes_V1.pdf, > hdfs-6440-cdh-4.5-full.patch, hdfs-6440-trunk-v1.patch, > hdfs-6440-trunk-v1.patch, hdfs-6440-trunk-v3.patch, hdfs-6440-trunk-v4.patch, > hdfs-6440-trunk-v5.patch, hdfs-6440-trunk-v6.patch, hdfs-6440-trunk-v7.patch, > hdfs-6440-trunk-v8.patch, hdfs-multiple-snn-trunk-v0.patch > > > Most of the work is already done to support more than 2 NameNodes (one > active, one standby). This would be the last bit to support running multiple > _standby_ NameNodes; one of the standbys should be available for fail-over. > Mostly, this is a matter of updating how we parse configurations, some > complexity around managing the checkpointing, and updating a whole lot of > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174687#comment-15174687 ] Chris Nauroth commented on HDFS-7597: - Oh right... in the meantime, there was HDFS-8855. Darn, I knew that issue sounded familiar, but I had forgotten it was this one. How shall we proceed? > DNs should not open new NN connections when webhdfs clients seek > > > Key: HDFS-7597 > URL: https://issues.apache.org/jira/browse/HDFS-7597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Labels: BB2015-05-TBR > Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch > > > Webhdfs seeks involve closing the current connection, and reissuing a new > open request with the new offset. The RPC layer caches connections so the DN > keeps a lingering connection open to the NN. Connection caching is in part > based on UGI. Although the client used the same token for the new offset > request, the UGI is different which forces the DN to open another unnecessary > connection to the NN. > A job that performs many seeks will easily crash the NN due to fd exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174676#comment-15174676 ] Hadoop QA commented on HDFS-9881: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 43s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 12s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 11s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 20s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 7s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 54m 49s {color} | {color:green} hadoop-hdfs in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 52m 20s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 144m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.7.0_95 Faile
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174674#comment-15174674 ] Hadoop QA commented on HDFS-7597: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 13s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 12s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 39s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 33s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 35s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 32s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 22s {color} | {color:red} hadoop-hdfs-project: patch generated 1 new + 4 unchanged - 0 fixed = 5 total (was 4) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 29s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 7s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 58s {color} | {color:green} hadoop-hdfs-client in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 59m 14s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 165m 47s {color} | {color:black} {color}
[jira] [Commented] (HDFS-9848) Ozone: Add Ozone Client lib for volume handling
[ https://issues.apache.org/jira/browse/HDFS-9848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174647#comment-15174647 ] Chris Nauroth commented on HDFS-9848: - Hi [~anu]. This looks good overall. Just a few minor notes on the tests: {code} try { client = new OzoneClient(String.format("http://localhost:%d";, port)); } catch (OzoneException | URISyntaxException e) { Assert.assertTrue("Failed to create an Ozone Client." + e.getMessage() , false); } {code} Usually, I'd just let these exceptions be thrown out of the method. If the exception is thrown, JUnit will report it with a full stack trace to aid troubleshooting. If you prefer to handle the exception so that you can emit the custom "Failed to create" message, then I'd recommend using {{Assert.fail}} instead of an {{Assert.assertTrue}} that fails on purpose. {code} @Test public void testCreateDuplicateVolume() throws OzoneException { try { client.setUserAuth(OzoneConsts.OZONE_SIMPLE_HDFS_USER); client.createVolume("testVol", "bilbo", "100TB"); client.createVolume("testVol", "bilbo", "100TB"); } catch (Exception ex) { // OZone will throw saying volume already exists assertNotNull(ex); } } {code} As currently written, this test always passes, even when it shouldn't. If any of the client calls fails for any reason, then it will go to the exception handler. {{ex}} is guaranteed to be non-null always, so the {{assertNotNull}} is effectively a no-op and the test passes. If all 3 client calls succeed, then the overall test is a success, even though we wanted it to cover duplicate volume error handling. From debugging, I can see that what is actually happening when the test runs is it throws "java.lang.IllegalArgumentException: Bucket or Volume name does not support uppercase characters", so the test also needs to be changed to use a different volume name. > Ozone: Add Ozone Client lib for volume handling > --- > > Key: HDFS-9848 > URL: https://issues.apache.org/jira/browse/HDFS-9848 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-7240 > > Attachments: HDFS-9848-HDFS-7240.001.patch, > HDFS-9848-HDFS-7240.002.patch, HDFS-9848-HDFS-7240.003.patch, > HDFS-9848-HDFS-7240.004.patch > > > Add a simple client lib for volume handling. This is used primarily to make > writing tests simpler. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Trezzo updated HDFS-8791: --- Release Note: HDFS-8791 introduces a new datanode layout format. This layout is identical to the previous block id based layout except it has a smaller 32x32 sub-directory structure in each data storage. On startup, the datanode will automatically upgrade it's storages to this new layout. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Add heartbeatsTotal in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hua Liu updated HDFS-9882: -- Summary: Add heartbeatsTotal in Datanode metrics (was: Change the meaning of heartbeat latency in Datanode metrics) > Add heartbeatsTotal in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174605#comment-15174605 ] Hudson commented on HDFS-9880: -- FAILURE: Integrated in Hadoop-trunk-Commit #9406 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9406/]) HDFS-9880. TestDatanodeRegistration fails occasionally. Contributed by (kihwal: rev e76b13c415459e4062c4c9660a16759a11ffb34a) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeRegistration.java > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Fix For: 2.7.3 > > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9882) Change the meaning of heartbeat latency in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174597#comment-15174597 ] Inigo Goiri commented on HDFS-9882: --- I think the second approach makes more sense as other people may have already built some dependencies on the semantic of the current {{heartbeats}}. The current approach in {{DatanodeMetrics}} has: {code} @Metric MutableRate heartbeats; {code} I would add: {code} @Metric MutableRate heartbeatsTotal; {code} > Change the meaning of heartbeat latency in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9876: Attachment: HDFS-9876.001.patch Remove unused internalBlock. > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch, > HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9870) Remove unused imports from DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174595#comment-15174595 ] Brahma Reddy Battula commented on HDFS-9870: thanks a lot [~cnauroth] for commit. > Remove unused imports from DFSUtil > -- > > Key: HDFS-9870 > URL: https://issues.apache.org/jira/browse/HDFS-9870 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch > > > Remove the following unused imports {{DFSUtil.java}} > {code} > import static > org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY; > import java.io.InterruptedIOException; > import com.google.common.collect.Sets; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174590#comment-15174590 ] Kihwal Lee commented on HDFS-8791: -- Yes, we should do that. Thanks, Chris. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9880: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Thanks for the review, Daryn. I've committed this to trunk through branch-2.7. > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Fix For: 2.7.3 > > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HDFS-9882) Change the meaning of heartbeat latency
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-9882 started by Hua Liu. - > Change the meaning of heartbeat latency > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9882) Change the meaning of heartbeat latency in Datanode metrics
[ https://issues.apache.org/jira/browse/HDFS-9882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Inigo Goiri updated HDFS-9882: -- Summary: Change the meaning of heartbeat latency in Datanode metrics (was: Change the meaning of heartbeat latency) > Change the meaning of heartbeat latency in Datanode metrics > --- > > Key: HDFS-9882 > URL: https://issues.apache.org/jira/browse/HDFS-9882 > Project: Hadoop HDFS > Issue Type: Task > Components: datanode >Reporter: Hua Liu >Assignee: Hua Liu >Priority: Minor > > Heartbeat latency only reflects the time spent on generating reports and > sending reports to NN. When heartbeats are delayed due to processing > commands, this latency does not help investigation. I would like to propose > either (1) changing the heartbeat latency to reflect the total time spent on > sending reports and processing commands or (2) adding another metric counter > to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9882) Change the meaning of heartbeat latency
Hua Liu created HDFS-9882: - Summary: Change the meaning of heartbeat latency Key: HDFS-9882 URL: https://issues.apache.org/jira/browse/HDFS-9882 Project: Hadoop HDFS Issue Type: Task Components: datanode Reporter: Hua Liu Assignee: Hua Liu Priority: Minor Heartbeat latency only reflects the time spent on generating reports and sending reports to NN. When heartbeats are delayed due to processing commands, this latency does not help investigation. I would like to propose either (1) changing the heartbeat latency to reflect the total time spent on sending reports and processing commands or (2) adding another metric counter to show the total time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174510#comment-15174510 ] Daryn Sharp commented on HDFS-9880: --- +1 > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174509#comment-15174509 ] Chris Trezzo commented on HDFS-8791: [~kihwal] Should I add a blurb in the jira release notes since this is a data node layout change? > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-9876: Attachment: HDFS-9876.001.patch Thanks for the review, Nicholas! Update the patch to address your comments. > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Attachments: HDFS-9876.000.patch, HDFS-9876.001.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174499#comment-15174499 ] Chris Trezzo commented on HDFS-8791: Thanks [~kihwal] and everyone else! > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174493#comment-15174493 ] Chris Nauroth commented on HDFS-7597: - I killed mine (14677). Thanks, [~kihwal]. > DNs should not open new NN connections when webhdfs clients seek > > > Key: HDFS-7597 > URL: https://issues.apache.org/jira/browse/HDFS-7597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Labels: BB2015-05-TBR > Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch > > > Webhdfs seeks involve closing the current connection, and reissuing a new > open request with the new offset. The RPC layer caches connections so the DN > keeps a lingering connection open to the NN. Connection caching is in part > based on UGI. Although the client used the same token for the new offset > request, the UGI is different which forces the DN to open another unnecessary > connection to the NN. > A job that performs many seeks will easily crash the NN due to fd exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9876) shouldProcessOverReplicated should not count number of pending replicas
[ https://issues.apache.org/jira/browse/HDFS-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174488#comment-15174488 ] Tsz Wo Nicholas Sze commented on HDFS-9876: --- excessTypes should be computed if it is needed. Let's move it inside {code} if (candidates.size() > 1) { .. } {code} > shouldProcessOverReplicated should not count number of pending replicas > --- > > Key: HDFS-9876 > URL: https://issues.apache.org/jira/browse/HDFS-9876 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Takuya Fukudome >Assignee: Jing Zhao > Attachments: HDFS-9876.000.patch > > > Currently when checking if we should process over-replicated block in > {{addStoredBlock}}, we count both the number of reported replicas and pending > replicas. However, {{processOverReplicatedBlock}} chooses excess replicas > only among all the reported storages of the block. So in a situation where we > have over-replicated replica/internal blocks which only reside in the pending > queue, we will not be able to choose any extra replica to delete. > For contiguous blocks, this causes {{chooseExcessReplicasContiguous}} to do > nothing. But for striped blocks, this may cause endless loop in > {{chooseExcessReplicasStriped}} in the following while loop: > {code} > while (candidates.size() > 1) { > List replicasToDelete = placementPolicy > .chooseReplicasToDelete(nonExcess, candidates, (short) 1, > excessTypes, null, null); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174484#comment-15174484 ] Kihwal Lee commented on HDFS-7597: -- I kicked the build already : https://builds.apache.org/job/PreCommit-HDFS-Build/14674 I think this is what you started: https://builds.apache.org/job/PreCommit-HDFS-Build/14677 Better kill one to avoid wasting resources. > DNs should not open new NN connections when webhdfs clients seek > > > Key: HDFS-7597 > URL: https://issues.apache.org/jira/browse/HDFS-7597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Labels: BB2015-05-TBR > Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch > > > Webhdfs seeks involve closing the current connection, and reissuing a new > open request with the new offset. The RPC layer caches connections so the DN > keeps a lingering connection open to the NN. Connection caching is in part > based on UGI. Although the client used the same token for the new offset > request, the UGI is different which forces the DN to open another unnecessary > connection to the NN. > A job that performs many seeks will easily crash the NN due to fd exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174487#comment-15174487 ] Xiaoyu Yao commented on HDFS-9881: -- Thanks [~andrew.wang]. Patch LGTM, +1. > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174477#comment-15174477 ] Zhe Zhang commented on HDFS-9881: - Thanks Andrew. +1 on the patch pending Jenkins. > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174470#comment-15174470 ] Hadoop QA commented on HDFS-9880: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} trunk passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} trunk passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 51s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 24s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 57m 18s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 148m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_72 Failed junit tests | hadoop.hdfs.server.namenode.TestEditLog | | JDK v1.8.0_72 Timed out junit tests | org.apache.hadoop.hdfs.server.namenode.TestNameNodeRpcServerMethods | | JDK v1.7.0_95 Failed junit tests | hadoop.hdfs.TestDFSClientExcludedNodes | | | hadoop.hdfs.server.datanode.TestFsDatasetCache | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790763/HDFS-9880.patch | | JIRA Issue | HDFS-9880 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 1ea98065a99a 3.1
[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-9881: -- Attachment: HDFS-9881.002.patch Rebased > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch, HDFS-9881.002.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned
[ https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174467#comment-15174467 ] Jing Zhao commented on HDFS-8786: - To avoid the changes inside of BlockInfoStriped, I think we can do the following: since we always use two arrays (one for datanodes and the other for their internal block indices) when returning a striped block to client, we can adjust the order inside of the LocatedStripedBlock instead of BlockInfoStriped. I.e., the order adjustment (based on decommissioning DN information) should be done against LocatedStripedBlock in the same step as {{sortLocatedBlocks}}. > Erasure coding: DataNode should transfer striped blocks before being > decommissioned > --- > > Key: HDFS-8786 > URL: https://issues.apache.org/jira/browse/HDFS-8786 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Rakesh R > Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, > HDFS-8786-003.patch, HDFS-8786-draft.patch > > > Per [discussion | > https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] > under HDFS-8697, it's too expensive to reconstruct block groups for decomm > purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174468#comment-15174468 ] Andrew Wang commented on HDFS-9881: --- We still missed adding a / after TRASH_PREFIX, HDFS-9844 fixes the issue with "/" as EZ though. We should avoid using string concatenation for paths in general, it leads to issues like this. > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174459#comment-15174459 ] Chris Nauroth commented on HDFS-7597: - [~daryn], thank you for the reminder. It was a surprise, because I honestly thought we had already closed down on this one. I remain +1 for the patch, same as several months ago. I manually triggered a Jenkins run since it has been a while. I'll plan on committing this tomorrow barring any objections or surprises from pre-commit. > DNs should not open new NN connections when webhdfs clients seek > > > Key: HDFS-7597 > URL: https://issues.apache.org/jira/browse/HDFS-7597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Labels: BB2015-05-TBR > Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch > > > Webhdfs seeks involve closing the current connection, and reissuing a new > open request with the new offset. The RPC layer caches connections so the DN > keeps a lingering connection open to the NN. Connection caching is in part > based on UGI. Although the client used the same token for the new offset > request, the UGI is different which forces the DN to open another unnecessary > connection to the NN. > A job that performs many seeks will easily crash the NN due to fd exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174455#comment-15174455 ] Zhe Zhang commented on HDFS-9881: - Hi Andrew, I thought the issue was already fixed by HDFS-9844? > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8786) Erasure coding: DataNode should transfer striped blocks before being decommissioned
[ https://issues.apache.org/jira/browse/HDFS-8786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174447#comment-15174447 ] Jing Zhao commented on HDFS-8786: - Thanks for updating the patch, [~rakeshr]. Comments on the current patch: 1. For ErasureCodingWork, we have to handle the following scenarios when {{hasAllInternalBlocks}} returns true: #* we have decommissioning DN #* we have enough DN but not enough racks #* the above two situations happen at the same time Things may get a little complicated when decommissioning situation and more racks situation get mixed. For example, it is possible that there are 9 live internal blocks on 5 racks, and 1 more internal block in a decommissioning datanode. In this situation, we will only choose 1 target and the decommissioning dn should be ignored. In another example, if we have 8 live replicas and 1 decommissioning replica, we should replicate the decommissioning replica. Looks to me the current patch cannot handle all the scenarios. Currently I think we should explicitly let ErasureCodingWork know if the reconstruction work is triggered by not-enough-racks. We can have this check in {{validateReconstructionWork}}, and pass the result into the ErasureCodingWork instance. Later when adding task to DN, we should first check this result, and if it is true, run the current code added by HDFS-9818. If it is false, we check if the source nodes cover all the internal blocks but contain decommissioning datanode, and schedule replication work for it if necessary. > Erasure coding: DataNode should transfer striped blocks before being > decommissioned > --- > > Key: HDFS-8786 > URL: https://issues.apache.org/jira/browse/HDFS-8786 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Rakesh R > Attachments: HDFS-8786-001.patch, HDFS-8786-002.patch, > HDFS-8786-003.patch, HDFS-8786-draft.patch > > > Per [discussion | > https://issues.apache.org/jira/browse/HDFS-8697?focusedCommentId=14609004&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14609004] > under HDFS-8697, it's too expensive to reconstruct block groups for decomm > purpose. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174437#comment-15174437 ] Hadoop QA commented on HDFS-9881: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-9881 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790786/HDFS-9881.001.patch | | JIRA Issue | HDFS-9881 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14675/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-9881: -- Attachment: HDFS-9881.001.patch Thanks to [~qwertymaniac] for the spot. [~zhz] / [~xyao] / [~cmccabe] mind reviewing? > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
[ https://issues.apache.org/jira/browse/HDFS-9881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-9881: -- Status: Patch Available (was: Open) > DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones > -- > > Key: HDFS-9881 > URL: https://issues.apache.org/jira/browse/HDFS-9881 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Critical > Attachments: HDFS-9881.001.patch > > > getTrashRoots is missing a "/" in the path concatenation, so ends up putting > files into a directory named "/ez/.Trashandrew" rather than > "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9881) DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones
Andrew Wang created HDFS-9881: - Summary: DistributedFileSystem#getTrashRoot returns incorrect path for encryption zones Key: HDFS-9881 URL: https://issues.apache.org/jira/browse/HDFS-9881 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.8.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Critical getTrashRoots is missing a "/" in the path concatenation, so ends up putting files into a directory named "/ez/.Trashandrew" rather than "/ez/.Trash/andrew" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7597) DNs should not open new NN connections when webhdfs clients seek
[ https://issues.apache.org/jira/browse/HDFS-7597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174430#comment-15174430 ] Daryn Sharp commented on HDFS-7597: --- Patch still applies and is still relevant. It actually provides a performance boost to the NN due to reduced ugi instances. Instead of n-many ugis per task, it's a couple based on how quickly the lru recycles. > DNs should not open new NN connections when webhdfs clients seek > > > Key: HDFS-7597 > URL: https://issues.apache.org/jira/browse/HDFS-7597 > Project: Hadoop HDFS > Issue Type: Improvement > Components: webhdfs >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Labels: BB2015-05-TBR > Attachments: HDFS-7597.patch, HDFS-7597.patch, HDFS-7597.patch > > > Webhdfs seeks involve closing the current connection, and reissuing a new > open request with the new offset. The RPC layer caches connections so the DN > keeps a lingering connection open to the NN. Connection caching is in part > based on UGI. Although the client used the same token for the new offset > request, the UGI is different which forces the DN to open another unnecessary > connection to the NN. > A job that performs many seeks will easily crash the NN due to fd exhaustion. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9763) Add merge api
[ https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174414#comment-15174414 ] Arpit Agarwal commented on HDFS-9763: - TOCTOU is a red herring. The real problem as mentioned by others is the number of RPCs. The proposal to cap the number of operations in one call is not unusual. e.g.[S3|https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html] and [Azure Storage|https://msdn.microsoft.com/en-us/library/azure/dd135734.aspx] do so for list calls, as does HDFS. > Add merge api > - > > Key: HDFS-9763 > URL: https://issues.apache.org/jira/browse/HDFS-9763 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Ashutosh Chauhan >Assignee: Xiaobing Zhou > Attachments: HDFS_Merge_API_Proposal.pdf > > > It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. > Semantics will be to move all files under dir1 to dir2 and doing a rename of > files in case of collisions. > In absence of this api, Hive[1] has to check for collision for each file and > then come up unique name and try again and so on. This is inefficient in > multiple ways: > 1) It generates huge number of calls on NN (atleast 2*number of source files > in dir1) > 2) It suffers from TOCTOU[2] bug for client picked up name in case of > collision. > 3) Whole operation is not atomic. > A merge api outlined as above will be immensely useful for Hive and > potentially to other HDFS users. > [1] > https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576 > [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174413#comment-15174413 ] Hudson commented on HDFS-8791: -- FAILURE: Integrated in Hadoop-trunk-Commit #9403 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9403/]) HDFS-8791. block ID-based DN storage layout can be very slow for (kihwal: rev 2c8496ebf3b7b31c2e18fdf8d4cb2a0115f43112) * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-to-57-dn-layout-dir.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DatanodeUtil.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeLayoutVersion.java * hadoop-hdfs-project/hadoop-hdfs/src/test/resources/hadoop-56-layout-datanode-dir.tgz * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDatanodeLayoutUpgrade.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataStorage.java > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-8791: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.7.3 Status: Resolved (was: Patch Available) Thanks for working on the fix, [~ctrezzo] and thank you all for reviews and discussions. I've committed it from trunk through branch-2.7. I didn't put in branch-2.6 because it does not have the parallel upgrade fix. I will leave it up to the 2.6 release manager and the interested party. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Fix For: 2.7.3 > > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174373#comment-15174373 ] Tsz Wo Nicholas Sze commented on HDFS-8791: --- It seems that the 32x32 layout is already well tested. Let use it unless someone wants to test the 64x64 or other layouts. +1 for the patch. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174354#comment-15174354 ] Kuhu Shukla commented on HDFS-9880: --- +1 (non-binding). Lgtm. > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8674) Improve performance of postponed block scans
[ https://issues.apache.org/jira/browse/HDFS-8674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174323#comment-15174323 ] Daryn Sharp commented on HDFS-8674: --- I'll try to rebase again. > Improve performance of postponed block scans > > > Key: HDFS-8674 > URL: https://issues.apache.org/jira/browse/HDFS-8674 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.6.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp >Priority: Critical > Attachments: HDFS-8674.patch, HDFS-8674.patch > > > When a standby goes active, it marks all nodes as "stale" which will cause > block invalidations for over-replicated blocks to be queued until full block > reports are received from the nodes with the block. The replication monitor > scans the queue with O(N) runtime. It picks a random offset and iterates > through the set to randomize blocks scanned. > The result is devastating when a cluster loses multiple nodes during a > rolling upgrade. Re-replication occurs, the nodes come back, the excess block > invalidations are postponed. Rescanning just 2k blocks out of millions of > postponed blocks may take multiple seconds. During the scan, the write lock > is held which stalls all other processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9534) Add CLI command to clear storage policy from a path.
[ https://issues.apache.org/jira/browse/HDFS-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174306#comment-15174306 ] Chris Nauroth commented on HDFS-9534: - +1 from me for patch v003. Thank you, [~xiaobingo]! > Add CLI command to clear storage policy from a path. > > > Key: HDFS-9534 > URL: https://issues.apache.org/jira/browse/HDFS-9534 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Reporter: Chris Nauroth >Assignee: Xiaobing Zhou > Attachments: HDFS-9534.001.patch, HDFS-9534.002.patch, > HDFS-9534.003.patch > > > The {{hdfs storagepolicies}} command has sub-commands for > {{-setStoragePolicy}} and {{-getStoragePolicy}} on a path. However, there is > no {{-removeStoragePolicy}} to remove a previously set storage policy on a > path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9870) Remove unused imports from DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174274#comment-15174274 ] Hudson commented on HDFS-9870: -- FAILURE: Integrated in Hadoop-trunk-Commit #9402 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9402/]) HDFS-9870. Remove unused imports from DFSUtil. Contributed by Brahma (cnauroth: rev 2137e8feeb5c5c88d3a80db3a334fd472f299ee4) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSUtil.java > Remove unused imports from DFSUtil > -- > > Key: HDFS-9870 > URL: https://issues.apache.org/jira/browse/HDFS-9870 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch > > > Remove the following unused imports {{DFSUtil.java}} > {code} > import static > org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY; > import java.io.InterruptedIOException; > import com.google.common.collect.Sets; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9870) Remove unused imports from DFSUtil
[ https://issues.apache.org/jira/browse/HDFS-9870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-9870: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1. I have committed this to trunk, branch-2 and branch-2.8. [~brahmareddy], thank you for cleaning up this code. > Remove unused imports from DFSUtil > -- > > Key: HDFS-9870 > URL: https://issues.apache.org/jira/browse/HDFS-9870 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Brahma Reddy Battula >Assignee: Brahma Reddy Battula > Fix For: 2.8.0 > > Attachments: HDFS-9870-branch-2.patch, HDFS-9870.patch > > > Remove the following unused imports {{DFSUtil.java}} > {code} > import static > org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_LIFELINE_RPC_ADDRESS_KEY; > import java.io.InterruptedIOException; > import com.google.common.collect.Sets; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9880: - Status: Patch Available (was: Open) > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee reassigned HDFS-9880: Assignee: Kihwal Lee > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee >Assignee: Kihwal Lee > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9763) Add merge api
[ https://issues.apache.org/jira/browse/HDFS-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174219#comment-15174219 ] Jitendra Nath Pandey commented on HDFS-9763: I don't think this API will address TOCTOU because hive will need multiple merge calls anyway for a given query, to reach final stage. In the absence of transactional support in the file system, as suggested by Haohui, it is difficult to support such semantics at HDFS level. However, I do see the need of hive to avoid hundreds or thousands of individual rename operations. This also seems to be a pretty general case for many map-reduce based data ingest jobs that write data all over the place and finally need to consolidate in a final directory, after all the related jobs are successful. IMO, the requirement here is of a 'rename' operation that merges the directories instead of overwriting, or throwing already-exists-exception. However, it can be argued that its better to add a new 'merge' API, instead of overloading the 'rename'. I see that there are two main design points here that need to be carefully thought through and agreed upon 1) How do we resolve conflicts in file names I think the proposal handles it elegantly by specifying a policy. In fact I would love to change our 'rename' to support different policies that also provides merge capability, but I can accept a separate API for compatibility and simplicity. 2) The O(N) problem. It is not so bad because its not recursive. There is no scope of recursion here. Still if a directory has a lot of files, an iterative approach is feasible here, because the source directory will get smaller after every iteration. We do have precedence of iteration for example 'listStatus'. This will avoid the complexity in the NN to release the lock in between. > Add merge api > - > > Key: HDFS-9763 > URL: https://issues.apache.org/jira/browse/HDFS-9763 > Project: Hadoop HDFS > Issue Type: New Feature > Components: fs >Reporter: Ashutosh Chauhan >Assignee: Xiaobing Zhou > Attachments: HDFS_Merge_API_Proposal.pdf > > > It will be good to add merge(Path dir1, Path dir2, ... ) api to HDFS. > Semantics will be to move all files under dir1 to dir2 and doing a rename of > files in case of collisions. > In absence of this api, Hive[1] has to check for collision for each file and > then come up unique name and try again and so on. This is inefficient in > multiple ways: > 1) It generates huge number of calls on NN (atleast 2*number of source files > in dir1) > 2) It suffers from TOCTOU[2] bug for client picked up name in case of > collision. > 3) Whole operation is not atomic. > A merge api outlined as above will be immensely useful for Hive and > potentially to other HDFS users. > [1] > https://github.com/apache/hive/blob/release-2.0.0-rc1/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L2576 > [2] https://en.wikipedia.org/wiki/Time_of_check_to_time_of_use -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9880: - Attachment: HDFS-9880.patch > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee > Attachments: HDFS-9880.patch > > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9880: - Target Version/s: 2.7.3 > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN
[ https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174214#comment-15174214 ] Kihwal Lee commented on HDFS-9198: -- Relevant tests are all passing with this patch. {noformat} --- T E S T S --- OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.datanode.TestIncrementalBrVariations Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.301 sec - in org.apache.hadoop.hdfs.server.datanode.TestIncrementalBrVariations OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.83 sec - in org.apache.hadoop.hdfs.server.namenode.TestDeadDatanode OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.238 sec - in org.apache.hadoop.hdfs.server.blockmanagement.TestBlockManager OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 33.094 sec - in org.apache.hadoop.hdfs.server.blockmanagement.TestPendingReplication OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=768m; support was removed in 8.0 Running org.apache.hadoop.hdfs.TestDatanodeRegistration Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.187 sec - in org.apache.hadoop.hdfs.TestDatanodeRegistration Results : Tests run: 32, Failures: 0, Errors: 0, Skipped: 0 {noformat} Just committed to branch-2.7. Thanks, Vinay! > Coalesce IBR processing in the NN > - > > Key: HDFS-9198 > URL: https://issues.apache.org/jira/browse/HDFS-9198 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9198-Branch-2-withamend.diff, > HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, > HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, > HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch > > > IBRs from thousands of DNs under load will degrade NN performance due to > excessive write-lock contention from multiple IPC handler threads. The IBR > processing is quick, so the lock contention may be reduced by coalescing > multiple IBRs into a single write-lock transaction. The handlers will also > be freed up faster for other operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9198) Coalesce IBR processing in the NN
[ https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9198: - Fix Version/s: 2.7.3 > Coalesce IBR processing in the NN > - > > Key: HDFS-9198 > URL: https://issues.apache.org/jira/browse/HDFS-9198 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0, 2.7.3 > > Attachments: HDFS-9198-Branch-2-withamend.diff, > HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, > HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, > HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch > > > IBRs from thousands of DNs under load will degrade NN performance due to > excessive write-lock contention from multiple IPC handler threads. The IBR > processing is quick, so the lock contention may be reduced by coalescing > multiple IBRs into a single write-lock transaction. The handlers will also > be freed up faster for other operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9198) Coalesce IBR processing in the NN
[ https://issues.apache.org/jira/browse/HDFS-9198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174190#comment-15174190 ] Kihwal Lee commented on HDFS-9198: -- +1 for the branch-2.7 patch. It is code-wise identical to what we are running internally. > Coalesce IBR processing in the NN > - > > Key: HDFS-9198 > URL: https://issues.apache.org/jira/browse/HDFS-9198 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.0.0-alpha >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.8.0 > > Attachments: HDFS-9198-Branch-2-withamend.diff, > HDFS-9198-Branch-2.8-withamend.diff, HDFS-9198-branch-2.7.patch, > HDFS-9198-branch2.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, > HDFS-9198-trunk.patch, HDFS-9198-trunk.patch, HDFS-9198-trunk.patch > > > IBRs from thousands of DNs under load will degrade NN performance due to > excessive write-lock contention from multiple IPC handler threads. The IBR > processing is quick, so the lock contention may be reduced by coalescing > multiple IBRs into a single write-lock transaction. The handlers will also > be freed up faster for other operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174163#comment-15174163 ] Kihwal Lee commented on HDFS-9880: -- If I run this test case in a loop, the failure is reproduced in less than 10 runs. With the timeout changed to 2000 (2 seconds), it passes all the time. The test-level timeout of 10 seconds should be removed. The test run-time is very close to 10 seconds. > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9880) TestDatanodeRegistration fails occasionally
[ https://issues.apache.org/jira/browse/HDFS-9880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-9880: - Description: When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes returns false because the timeout is too short (100ms). > TestDatanodeRegistration fails occasionally > --- > > Key: HDFS-9880 > URL: https://issues.apache.org/jira/browse/HDFS-9880 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Reporter: Kihwal Lee > > When {{testForcedRegistration}} calls {{waitForBlockReport()}}, it sometimes > returns false because the timeout is too short (100ms). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-9880) TestDatanodeRegistration fails occasionally
Kihwal Lee created HDFS-9880: Summary: TestDatanodeRegistration fails occasionally Key: HDFS-9880 URL: https://issues.apache.org/jira/browse/HDFS-9880 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Kihwal Lee -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw
[ https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174003#comment-15174003 ] Hadoop QA commented on HDFS-9699: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 29s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s {color} | {color:green} HDFS-8707 passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s {color} | {color:green} HDFS-8707 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 15s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} HDFS-8707 passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 10s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 33s {color} | {color:green} the patch passed with JDK v1.8.0_74 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 33s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 4m 25s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 4m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 4m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 7s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.8.0_74. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 8s {color} | {color:green} hadoop-hdfs-native-client in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 57m 4s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0cf5e66 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12790733/HDFS-9699.HDFS-8707.002.patch | | JIRA Issue | HDFS-9699 | | Optional Tests | asflicense compile cc mvnsite javac unit | | uname | Linux db7841d3ae98 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-8707 / eca19c1 | | Default Java | 1.7.0_95 | | Multi-JDK versions | /usr/lib/jvm/java-8-oracle:1.8.0_74 /usr/lib/jvm/java-7-openjdk-amd64:1.7.0_95 | | JDK v1.7.0_95 Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/14672/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs-native-client U: hadoop-hdfs-project/hadoop-hdfs-native-client | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/14672/console | | Powered by | Apache Yetus 0.3.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > libhdfs++: Add appropriate catch blocks for ASIO operations that throw > -- > > Key: HDFS-9699 > URL: https://issues.apache.org/jira/browse/HDFS-969
[jira] [Updated] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw
[ https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bob Hansen updated HDFS-9699: - Attachment: HDFS-9699.HDFS-8707.002.patch New patch: moved IOService exception handling into the IOServiceImpl implementation so we don't lose the work object Adopted [~James Clampffer]'s suggested wording changes. > libhdfs++: Add appropriate catch blocks for ASIO operations that throw > -- > > Key: HDFS-9699 > URL: https://issues.apache.org/jira/browse/HDFS-9699 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-6966.HDFS-8707.000.patch, > HDFS-9699.HDFS-8707.001.patch, HDFS-9699.HDFS-8707.002.patch, > cancel_backtrace.txt > > > libhdfs++ doesn't create exceptions of its own but it should be able to > gracefully handle exceptions thrown by libraries it uses, particularly asio. > libhdfs++ should be able to catch most exceptions within reason either at the > call site or in the code that spins up asio worker threads. Certain system > exceptions like std::bad_alloc don't need to be caught because by that point > the process is likely in a unrecoverable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9699) libhdfs++: Add appropriate catch blocks for ASIO operations that throw
[ https://issues.apache.org/jira/browse/HDFS-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173914#comment-15173914 ] James Clampffer commented on HDFS-9699: --- bq. Perhaps we should put it into a loop and respond to exceptions by re-entering the stack so we won't lose the worker thread. I'll put that together in another patch. Are you taking about doing something like: {code} while(!fs_shutdown) { try { my_io_service.run() } catch (stuff) { //handle stuff } } {code} If so I think that's a solid approach as long as it's logging a lot. > libhdfs++: Add appropriate catch blocks for ASIO operations that throw > -- > > Key: HDFS-9699 > URL: https://issues.apache.org/jira/browse/HDFS-9699 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: James Clampffer >Assignee: James Clampffer > Attachments: HDFS-6966.HDFS-8707.000.patch, > HDFS-9699.HDFS-8707.001.patch, cancel_backtrace.txt > > > libhdfs++ doesn't create exceptions of its own but it should be able to > gracefully handle exceptions thrown by libraries it uses, particularly asio. > libhdfs++ should be able to catch most exceptions within reason either at the > call site or in the code that spins up asio worker threads. Certain system > exceptions like std::bad_alloc don't need to be caught because by that point > the process is likely in a unrecoverable state. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8791) block ID-based DN storage layout can be very slow for datanode on ext4
[ https://issues.apache.org/jira/browse/HDFS-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15173902#comment-15173902 ] Kihwal Lee commented on HDFS-8791: -- [~szetszwo], do you have any further concerns? If not, +1 for the patch. We have been running with this patch in production with good results. Almost all clusters are now on the new layout. The parallel layout upgrade typically took 3-5 minutes per node for us. # of blocks on each storage was roughly 100k to 200k. Once you are over the layout upgrade hurdle, it is all green pasture. du runs faster and scanning a block pool slice finished in a couple of seconds. As mentioned before, our parallel upgrade was done offline using a custom tool, which took advantage of replica cache files (HDFS-7928) created during shutdown. This avoids the discovery phase of the upgrade. > block ID-based DN storage layout can be very slow for datanode on ext4 > -- > > Key: HDFS-8791 > URL: https://issues.apache.org/jira/browse/HDFS-8791 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0, 2.8.0, 2.7.1 >Reporter: Nathan Roberts >Assignee: Chris Trezzo >Priority: Blocker > Attachments: 32x32DatanodeLayoutTesting-v1.pdf, > 32x32DatanodeLayoutTesting-v2.pdf, HDFS-8791-trunk-v1.patch, > HDFS-8791-trunk-v2-bin.patch, HDFS-8791-trunk-v2.patch, > HDFS-8791-trunk-v2.patch, HDFS-8791-trunk-v3-bin.patch, > hadoop-56-layout-datanode-dir.tgz, test-node-upgrade.txt > > > We are seeing cases where the new directory layout causes the datanode to > basically cause the disks to seek for 10s of minutes. This can be when the > datanode is running du, and it can also be when it is performing a > checkDirs(). Both of these operations currently scan all directories in the > block pool and that's very expensive in the new layout. > The new layout creates 256 subdirs, each with 256 subdirs. Essentially 64K > leaf directories where block files are placed. > So, what we have on disk is: > - 256 inodes for the first level directories > - 256 directory blocks for the first level directories > - 256*256 inodes for the second level directories > - 256*256 directory blocks for the second level directories > - Then the inodes and blocks to store the the HDFS blocks themselves. > The main problem is the 256*256 directory blocks. > inodes and dentries will be cached by linux and one can configure how likely > the system is to prune those entries (vfs_cache_pressure). However, ext4 > relies on the buffer cache to cache the directory blocks and I'm not aware of > any way to tell linux to favor buffer cache pages (even if it did I'm not > sure I would want it to in general). > Also, ext4 tries hard to spread directories evenly across the entire volume, > this basically means the 64K directory blocks are probably randomly spread > across the entire disk. A du type scan will look at directories one at a > time, so the ioscheduler can't optimize the corresponding seeks, meaning the > seeks will be random and far. > In a system I was using to diagnose this, I had 60K blocks. A DU when things > are hot is less than 1 second. When things are cold, about 20 minutes. > How do things get cold? > - A large set of tasks run on the node. This pushes almost all of the buffer > cache out, causing the next DU to hit this situation. We are seeing cases > where a large job can cause a seek storm across the entire cluster. > Why didn't the previous layout see this? > - It might have but it wasn't nearly as pronounced. The previous layout would > be a few hundred directory blocks. Even when completely cold, these would > only take a few a hundred seeks which would mean single digit seconds. > - With only a few hundred directories, the odds of the directory blocks > getting modified is quite high, this keeps those blocks hot and much less > likely to be evicted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)