[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-12840: - Labels: hdfs-ec-3.0-must-do (was: ) > Creating a replicated file in a EC zone does not correctly serialized in > EditLogs > - > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Labels: hdfs-ec-3.0-must-do > Attachments: HDFS-12840.00.patch, HDFS-12840.reprod.patch > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-12840: - Status: Patch Available (was: Open) > Creating a replicated file in a EC zone does not correctly serialized in > EditLogs > - > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Attachments: HDFS-12840.00.patch, HDFS-12840.reprod.patch > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12681) Make HdfsLocatedFileStatus a subtype of LocatedFileStatus
[ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-12681: - Summary: Make HdfsLocatedFileStatus a subtype of LocatedFileStatus (was: Fold HdfsLocatedFileStatus into HdfsFileStatus) > Make HdfsLocatedFileStatus a subtype of LocatedFileStatus > - > > Key: HDFS-12681 > URL: https://issues.apache.org/jira/browse/HDFS-12681 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chris Douglas >Assignee: Chris Douglas >Priority: Minor > Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, > HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, > HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, > HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, > HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch > > > {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of > {{LocatedFileStatus}}. Conversion requires copying common fields and shedding > unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to > extend {{LocatedFileStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12681) Fold HdfsLocatedFileStatus into HdfsFileStatus
[ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated HDFS-12681: - Attachment: HDFS-12681.13.patch Failing tests are due to resource exhaustion. Updated patch to fix some checkstyle, put the findbugs suppression in the correct file. > Fold HdfsLocatedFileStatus into HdfsFileStatus > -- > > Key: HDFS-12681 > URL: https://issues.apache.org/jira/browse/HDFS-12681 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Chris Douglas >Assignee: Chris Douglas >Priority: Minor > Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, > HDFS-12681.02.patch, HDFS-12681.03.patch, HDFS-12681.04.patch, > HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, > HDFS-12681.08.patch, HDFS-12681.09.patch, HDFS-12681.10.patch, > HDFS-12681.11.patch, HDFS-12681.12.patch, HDFS-12681.13.patch > > > {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of > {{LocatedFileStatus}}. Conversion requires copying common fields and shedding > unknown data. It would be cleaner and sufficient for {{HdfsFileStatus}} to > extend {{LocatedFileStatus}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12740: --- Attachment: HDFS-12740-HDFS-7240.005.patch > SCM should support a RPC to share the cluster Id with KSM and DataNodes > --- > > Key: HDFS-12740 > URL: https://issues.apache.org/jira/browse/HDFS-12740 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12740-HDFS-7240.001.patch, > HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, > HDFS-12740-HDFS-7240.004.patch, HDFS-12740-HDFS-7240.005.patch > > > When the ozone cluster is first Created, SCM --init command will generate > cluster Id as well as SCM Id and persist it locally. The same cluster Id and > the SCM id will be shared with KSM during the KSM initialization and > Datanodes during datanode registration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12740: --- Attachment: (was: HDFS-12740-HDFS-7240.005.patch) > SCM should support a RPC to share the cluster Id with KSM and DataNodes > --- > > Key: HDFS-12740 > URL: https://issues.apache.org/jira/browse/HDFS-12740 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12740-HDFS-7240.001.patch, > HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, > HDFS-12740-HDFS-7240.004.patch > > > When the ozone cluster is first Created, SCM --init command will generate > cluster Id as well as SCM Id and persist it locally. The same cluster Id and > the SCM id will be shared with KSM during the KSM initialization and > Datanodes during datanode registration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12740: --- Attachment: HDFS-12740-HDFS-7240.005.patch Thanks [~nandakumar131] for the review comments.Patch v5 addresses the review comments.Please have a look. > SCM should support a RPC to share the cluster Id with KSM and DataNodes > --- > > Key: HDFS-12740 > URL: https://issues.apache.org/jira/browse/HDFS-12740 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12740-HDFS-7240.001.patch, > HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, > HDFS-12740-HDFS-7240.004.patch, HDFS-12740-HDFS-7240.005.patch > > > When the ozone cluster is first Created, SCM --init command will generate > cluster Id as well as SCM Id and persist it locally. The same cluster Id and > the SCM id will be shared with KSM during the KSM initialization and > Datanodes during datanode registration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12838) Ozone: Optimize number of allocated block rpc by aggregating multiple block allocation requests
[ https://issues.apache.org/jira/browse/HDFS-12838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukul Kumar Singh updated HDFS-12838: - Status: Patch Available (was: Open) > Ozone: Optimize number of allocated block rpc by aggregating multiple block > allocation requests > --- > > Key: HDFS-12838 > URL: https://issues.apache.org/jira/browse/HDFS-12838 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: HDFS-7240 > > Attachments: HDFS-12838-HDFS-7240.001.patch > > > Currently KeySpaceManager allocates multiple blocks by sending multiple block > allocation requests over the RPC. This can be optimized to aggregate multiple > block allocation request over one rpc. > {code} > while (requestedSize > 0) { > long allocateSize = Math.min(scmBlockSize, requestedSize); > AllocatedBlock allocatedBlock = > scmBlockClient.allocateBlock(allocateSize, type, factor); > KsmKeyLocationInfo subKeyInfo = new KsmKeyLocationInfo.Builder() > .setContainerName(allocatedBlock.getPipeline().getContainerName()) > .setBlockID(allocatedBlock.getKey()) > .setShouldCreateContainer(allocatedBlock.getCreateContainer()) > .setIndex(idx++) > .setLength(allocateSize) > .setOffset(0) > .build(); > locations.add(subKeyInfo); > requestedSize -= allocateSize; > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit
[ https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260198#comment-16260198 ] DENG FEI commented on HDFS-12832: - [~xkrogen] Has been uploaded on the stack, it happened at {{ReplicationWork#chooseTargets()}} indeed. And you are right. {{INode#getPathComponents()}} has same problem when concurrently do {{move}} but not only {{rename}}. > INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to > NameNode exit > > > Key: HDFS-12832 > URL: https://issues.apache.org/jira/browse/HDFS-12832 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.4, 3.0.0-beta1 >Reporter: DENG FEI >Priority: Critical > Attachments: HDFS-12832-trunk-001.patch, exception.log > > > {code:title=INode.java|borderStyle=solid} > public String getFullPathName() { > // Get the full path name of this inode. > if (isRoot()) { > return Path.SEPARATOR; > } > // compute size of needed bytes for the path > int idx = 0; > for (INode inode = this; inode != null; inode = inode.getParent()) { > // add component + delimiter (if not tail component) > idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0); > } > byte[] path = new byte[idx]; > for (INode inode = this; inode != null; inode = inode.getParent()) { > if (inode != this) { > path[--idx] = Path.SEPARATOR_CHAR; > } > byte[] name = inode.getLocalNameBytes(); > idx -= name.length; > System.arraycopy(name, 0, path, idx, name.length); > } > return DFSUtil.bytes2String(path); > } > {code} > We found ArrayIndexOutOfBoundsException at > _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ > when ReplicaMonitor work ,and the NameNode will quit. > It seems the two loop is not synchronized, the path's length is changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit
[ https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DENG FEI updated HDFS-12832: Attachment: exception.log > INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to > NameNode exit > > > Key: HDFS-12832 > URL: https://issues.apache.org/jira/browse/HDFS-12832 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.4, 3.0.0-beta1 >Reporter: DENG FEI >Priority: Critical > Attachments: HDFS-12832-trunk-001.patch, exception.log > > > {code:title=INode.java|borderStyle=solid} > public String getFullPathName() { > // Get the full path name of this inode. > if (isRoot()) { > return Path.SEPARATOR; > } > // compute size of needed bytes for the path > int idx = 0; > for (INode inode = this; inode != null; inode = inode.getParent()) { > // add component + delimiter (if not tail component) > idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0); > } > byte[] path = new byte[idx]; > for (INode inode = this; inode != null; inode = inode.getParent()) { > if (inode != this) { > path[--idx] = Path.SEPARATOR_CHAR; > } > byte[] name = inode.getLocalNameBytes(); > idx -= name.length; > System.arraycopy(name, 0, path, idx, name.length); > } > return DFSUtil.bytes2String(path); > } > {code} > We found ArrayIndexOutOfBoundsException at > _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ > when ReplicaMonitor work ,and the NameNode will quit. > It seems the two loop is not synchronized, the path's length is changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12638) NameNode exits due to ReplicationMonitor thread received Runtime exception in ReplicationWork#chooseTargets
[ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256293#comment-16256293 ] Konstantin Shvachko edited comment on HDFS-12638 at 11/21/17 2:09 AM: -- I think it's a blocker for all branches 2.8 and up. Even just removing that line {{toDelete.delete();}} would prevent crashing NameNode. Or reverting HDFS-9754 should also help. was (Author: shv): I think it's a blocker for all branches 2.8 and up. Even just removing that line {{toDelete.delete();}} would prevent crashing NameNode. > NameNode exits due to ReplicationMonitor thread received Runtime exception in > ReplicationWork#chooseTargets > --- > > Key: HDFS-12638 > URL: https://issues.apache.org/jira/browse/HDFS-12638 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 2.8.2 >Reporter: Jiandan Yang >Priority: Blocker > Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, > OphanBlocksAfterTruncateDelete.jpg > > > Active NamNode exit due to NPE, I can confirm that the BlockCollection passed > in when creating ReplicationWork is null, but I do not know why > BlockCollection is null, By view history I found > [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging > whether BlockCollection is null. > NN logs are as following: > {code:java} > 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12813) RequestHedgingProxyProvider can hide Exception thrown from the Namenode for proxy size of 1
[ https://issues.apache.org/jira/browse/HDFS-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260134#comment-16260134 ] Hudson commented on HDFS-12813: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13262 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13262/]) HDFS-12813. RequestHedgingProxyProvider can hide Exception thrown from (szetszwo: rev 659e85e304d070f9908a96cf6a0e1cbafde6a434) * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestRequestHedgingProxyProvider.java * (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/RequestHedgingProxyProvider.java > RequestHedgingProxyProvider can hide Exception thrown from the Namenode for > proxy size of 1 > --- > > Key: HDFS-12813 > URL: https://issues.apache.org/jira/browse/HDFS-12813 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: 3.0.0, 2.10.0 > > Attachments: HDFS-12813.001.patch, HDFS-12813.002.patch, > HDFS-12813.003.patch, HDFS-12813.004.patch > > > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12813) RequestHedgingProxyProvider can hide Exception thrown from the Namenode for proxy size of 1
[ https://issues.apache.org/jira/browse/HDFS-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-12813: --- Resolution: Fixed Fix Version/s: 2.10.0 3.0.0 Status: Resolved (was: Patch Available) I have committed this. Thanks, Mukul! > RequestHedgingProxyProvider can hide Exception thrown from the Namenode for > proxy size of 1 > --- > > Key: HDFS-12813 > URL: https://issues.apache.org/jira/browse/HDFS-12813 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Fix For: 3.0.0, 2.10.0 > > Attachments: HDFS-12813.001.patch, HDFS-12813.002.patch, > HDFS-12813.003.patch, HDFS-12813.004.patch > > > HDFS-11395 fixed the problem where the MultiException thrown by > RequestHedgingProxyProvider was hidden. However when the target proxy size is > 1, then unwrapping is not done for the InvocationTargetException. for target > proxy size of 1, the unwrapping should be done till first level where as for > multiple proxy size, it should be done at 2 levels. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10183) Prevent race condition during class initialization
[ https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260111#comment-16260111 ] Hadoop QA commented on HDFS-10183: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 9m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 19s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 49s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 28s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}142m 44s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.fs.TestUnbuffer | | | hadoop.hdfs.server.namenode.ha.TestHASafeMode | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-10183 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12794395/HDFS-10183.2.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux ace64cb573b6 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 60fc2a1 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/22146/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/22146/testReport/ | | Max. process+thread count | 4159 (vs. ulimit of 5000) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console
[jira] [Updated] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
[ https://issues.apache.org/jira/browse/HDFS-12840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-12840: - Attachment: HDFS-12840.reprod.patch Attach the patch to reproduce this bug. Will post the fix soon. > Creating a replicated file in a EC zone does not correctly serialized in > EditLogs > - > > Key: HDFS-12840 > URL: https://issues.apache.org/jira/browse/HDFS-12840 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu >Priority: Blocker > Attachments: HDFS-12840.reprod.patch > > > When create a replicated file in an existing EC zone, the edit logs does not > differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, > this file is treated as EC file, as a results, it crashes the NN because the > blocks of this file are replicated, which does not match with {{INode}}. > {noformat} > ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered > exception on operation AddBlockOp [path=/system/balancer.id, > penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, > RpcCallId=-2] > java.lang.IllegalArgumentException: reportedBlock is not striped > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) > at > org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12840) Creating a replicated file in a EC zone does not correctly serialized in EditLogs
Lei (Eddy) Xu created HDFS-12840: Summary: Creating a replicated file in a EC zone does not correctly serialized in EditLogs Key: HDFS-12840 URL: https://issues.apache.org/jira/browse/HDFS-12840 Project: Hadoop HDFS Issue Type: Bug Components: erasure-coding Affects Versions: 3.0.0-beta1 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Priority: Blocker When create a replicated file in an existing EC zone, the edit logs does not differentiate it from an EC file. When {{FSEditLogLoader}} to replay edits, this file is treated as EC file, as a results, it crashes the NN because the blocks of this file are replicated, which does not match with {{INode}}. {noformat} ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation AddBlockOp [path=/system/balancer.id, penultimateBlock=NULL, lastBlock=blk_1073743259_2455, RpcClientId=, RpcCallId=-2] java.lang.IllegalArgumentException: reportedBlock is not striped at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.hadoop.hdfs.server.blockmanagement.BlockInfoStriped.addStorage(BlockInfoStriped.java:118) at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.addBlock(DatanodeStorageInfo.java:256) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:3141) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlockUnderConstruction(BlockManager.java:3068) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:3864) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessages(BlockManager.java:2916) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processQueuedMessagesForBlock(BlockManager.java:2903) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.addNewBlock(FSEditLogLoader.java:1069) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:532) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249) at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12347) TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently
[ https://issues.apache.org/jira/browse/HDFS-12347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Chen updated HDFS-12347: - Summary: TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently (was: TestBalancerRPCDelay#testBalancerRPCDelay fails consistently) > TestBalancerRPCDelay#testBalancerRPCDelay fails very frequently > --- > > Key: HDFS-12347 > URL: https://issues.apache.org/jira/browse/HDFS-12347 > Project: Hadoop HDFS > Issue Type: Bug > Components: test >Affects Versions: 3.0.0-beta1, 2.7.5, 3.0.1 >Reporter: Xiao Chen >Assignee: Bharat Viswanadham >Priority: Critical > Attachments: trunk.failed.xml > > > Seems to be failing consistently on trunk from yesterday-ish. > A sample failure is > https://builds.apache.org/job/PreCommit-HDFS-Build/20824/testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerRPCDelay/testBalancerRPCDelay/ > Running locally failed with: > {noformat} > type="java.lang.AssertionError"> > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260017#comment-16260017 ] Anu Engineer commented on HDFS-7240: h1. Ozone - First community meeting {{Time: Thursday, November 16, 2017, at 1:00:00 am PST}} _Participants: Anu Engineer, Mukul Kumar Singh, Nandakumar Vadivelu, Weiwei Yang, Steve Loughran, Thomas Demoor, Shashikant Banerjee, Lokesh Jain_ We discussed quite a large number of technical issues at this meeting. We went over how Ozone's works, the Namespace architecture of KSM and how it interacts with SCM. We traced both a write I/O path and read I/O path. There was some discussion over the REST protocol and making sure that Rest protocol is good enough to support Hadoop based workloads. We look at various REST APIs of Ozone and also discussed O3 FS working over RPC instead of REST protocol. This is a work in progress. Steve Loughran suggested that we add Storm to the applications that are tested against Ozone. Currently, we use Hive, Spark, YARN, as the applications to test against Ozone. We will add Storm to this testing mix. We discussed performance and scale of testing; ozone has been tested with millions of keys. We have also tested with cluster sizes up to 300 nodes. Steve suggested that we upgrade the Ratis version and lock that down before the merge. Thomas Demoor pointed out the difference between the commit ordering of S3 and Ozone. Ozone uses the actual commit time to decide the key ordering, S3 uses the key creation time to decide the ordering of the keys. He also mentioned that this should not matter in the real world as he is not aware hard-coded dependency on commit ordering. > Object store in HDFS > > > Key: HDFS-7240 > URL: https://issues.apache.org/jira/browse/HDFS-7240 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Jitendra Nath Pandey >Assignee: Jitendra Nath Pandey > Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, > HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, > HDFS-7240.004.patch, HDFS-7240.005.patch, HDFS-7240.006.patch, > MeetingMinutes.pdf, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, > ozone_user_v0.pdf > > > This jira proposes to add object store capabilities into HDFS. > As part of the federation work (HDFS-1052) we separated block storage as a > generic storage layer. Using the Block Pool abstraction, new kinds of > namespaces can be built on top of the storage layer i.e. datanodes. > In this jira I will explore building an object store using the datanode > storage, but independent of namespace metadata. > I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-12839) Refactor ratis-server tests to reduce the use DEFAULT_CALLID
Tsz Wo Nicholas Sze created HDFS-12839: -- Summary: Refactor ratis-server tests to reduce the use DEFAULT_CALLID Key: HDFS-12839 URL: https://issues.apache.org/jira/browse/HDFS-12839 Project: Hadoop HDFS Issue Type: Improvement Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor This JIRA is to help reducing the patch size in RATIS-141. We refactor the tests so that DEFAULT_CALLID is only used in MiniRaftCluster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445
[ https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260010#comment-16260010 ] Wei-Chiu Chuang commented on HDFS-12641: Failed tests are due to jenkins OOM, doesn't appear related. Whitespace warnings are due to a file unrelated to the patch. The checkstyle warning needs to be address though. > Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445 > - > > Key: HDFS-12641 > URL: https://issues.apache.org/jira/browse/HDFS-12641 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 2.7.4 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-12641.branch-2.7.001.patch > > > Our internal testing caught a regression in HDFS-11445 when we cherry picked > the commit into CDH. Basically, it produces bogus missing file warnings. > Further analysis revealed that the regression is actually fixed by HDFS-11755. > Because of the order commits are merged in branch-2.8 ~ trunk (HDFS-11755 was > committed before HDFS-11445), the regression was never actually surfaced for > Hadoop 2.8/3.0.0-(alpha/beta) users. Since branch-2.7 has HDFS-11445 but no > HDFS-11755, I suspect the regression is more visible for Hadoop 2.7.4. > I am filing this jira to raise more awareness, than simply backporting > HDFS-11755 into branch-2.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445
[ https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259984#comment-16259984 ] Hadoop QA commented on HDFS-12641: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 17m 5s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 11s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 653 unchanged - 1 fixed = 655 total (was 654) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 1s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 47s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 10s{color} | {color:red} The patch generated 284 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}122m 37s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:17 | | Failed junit tests | hadoop.hdfs.TestBlocksScheduledCounter | | | hadoop.hdfs.TestWriteConfigurationToDFS | | | hadoop.hdfs.TestSetTimes | | | hadoop.hdfs.TestDFSRollback | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureToleration | | | hadoop.hdfs.TestMiniDFSCluster | | | hadoop.hdfs.TestBalancerBandwidth | | | hadoop.hdfs.TestDFSClientRetries | | Timed out junit tests | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.TestSeekBug | | | org.apache.hadoop.hdfs.TestDFSMkdirs | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:67e87c9 | | JIRA Issue | HDFS-12641 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12892157/HDFS-12641.branch-2.7.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit
[jira] [Commented] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259970#comment-16259970 ] Hadoop QA commented on HDFS-12794: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 0s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 46s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 45s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 49s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 59s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 0s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 43s{color} | {color:orange} hadoop-hdfs-project: The patch generated 3 new + 1 unchanged - 0 fixed = 4 total (was 1) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 39s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}104m 2s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 53s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}172m 50s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:6 | | Failed junit tests | hadoop.ozone.web.client.TestKeysRatis | | | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.hdfs.server.namenode.TestFSEditLogLoader | | | hadoop.ozone.TestOzoneConfigurationFields | | | hadoop.ozone.container.ozoneimpl.TestOzoneContainer | | | hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion | | | hadoop.hdfs.server.namenode.TestFavoredNodesEndToEnd | | | hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy | | | hadoop.ozone.container.common.impl.TestContainerPersistence | |
[jira] [Updated] (HDFS-10183) Prevent race condition during class initialization
[ https://issues.apache.org/jira/browse/HDFS-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Subru Krishnan updated HDFS-10183: -- Fix Version/s: (was: 2.9.0) 2.9.1 > Prevent race condition during class initialization > -- > > Key: HDFS-10183 > URL: https://issues.apache.org/jira/browse/HDFS-10183 > Project: Hadoop HDFS > Issue Type: Bug > Components: fs >Affects Versions: 2.9.0 >Reporter: Pavel Avgustinov >Assignee: Pavel Avgustinov >Priority: Minor > Fix For: 2.9.1 > > Attachments: HADOOP-12944.1.patch, HDFS-10183.2.patch > > > In HADOOP-11969, [~busbey] tracked down a non-deterministic > {{NullPointerException}} to an oddity in the Java memory model: When multiple > threads trigger the loading of a class at the same time, one of them wins and > creates the {{java.lang.Class}} instance; the others block during this > initialization, but once it is complete they may obtain a reference to the > {{Class}} which has non-{{final}} fields still containing their default (i.e. > {{null}}) values. This leads to runtime failures that are hard to debug or > diagnose. > HADOOP-11969 observed that {{ThreadLocal}} fields, by their very nature, are > very likely to be accessed from multiple threads, and thus the problem is > particularly severe there. Consequently, the patch removed all occurrences of > the issue in the code base. > Unfortunately, since then HDFS-7964 has [reverted one of the fixes during a > refactoring|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-0c2e9f7f9e685f38d1a11373b627cfa6R151], > and introduced a [new instance of the > problem|https://github.com/apache/hadoop/commit/2151716832ad14932dd65b1a4e47e64d8d6cd767#diff-6334d0df7d9aefbccd12b21bb7603169R43]. > The attached patch addresses the issue by adding the missing {{final}} > modifier in these two cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log
[ https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259924#comment-16259924 ] Wei-Chiu Chuang commented on HDFS-12836: Thanks. From a supportability point of view that definitely worths an improvement because the error message is obscure. > startTxId could be greater than endTxId when tailing in-progress edit log > - > > Key: HDFS-12836 > URL: https://issues.apache.org/jira/browse/HDFS-12836 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Chao Sun >Assignee: Chao Sun > > When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also > tail those in progress edit log segments. However, in the following code: > {code} > if (onlyDurableTxns && inProgressOk) { > endTxId = Math.min(endTxId, committedTxnId); > } > EditLogInputStream elis = EditLogFileInputStream.fromUrl( > connectionFactory, url, remoteLog.getStartTxId(), > endTxId, remoteLog.isInProgress()); > {code} > it is possible that {{remoteLog.getStartTxId()}} could be greater than > {{endTxId}}, and therefore will cause the following error: > {code} > 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Error replaying edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > 2017-11-17 19:55:41,165 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading > edits from disk. Will try again. > org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying > edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > Caused by: > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at >
[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log
[ https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259891#comment-16259891 ] Chao Sun commented on HDFS-12836: - This issue happens when not all NNs in the cluster enable the in-progress tailing. In this case, the NNs without in-progress tailing enabled will not update the commit ID and cause it to be one less than the start/end txn ID. The commit update is handled by [this code | https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumOutputStream.java#L118]. A simple fix should just update {{endTxId}} to be the maximum between {{endTxId}} and {{remoteLog.getStartTxId()}}. However, I'm not sure if this is a valid issue since I assume people should always use the same configuration for all NNs. > startTxId could be greater than endTxId when tailing in-progress edit log > - > > Key: HDFS-12836 > URL: https://issues.apache.org/jira/browse/HDFS-12836 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Chao Sun >Assignee: Chao Sun > > When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also > tail those in progress edit log segments. However, in the following code: > {code} > if (onlyDurableTxns && inProgressOk) { > endTxId = Math.min(endTxId, committedTxnId); > } > EditLogInputStream elis = EditLogFileInputStream.fromUrl( > connectionFactory, url, remoteLog.getStartTxId(), > endTxId, remoteLog.isInProgress()); > {code} > it is possible that {{remoteLog.getStartTxId()}} could be greater than > {{endTxId}}, and therefore will cause the following error: > {code} > 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Error replaying edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > 2017-11-17 19:55:41,165 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading > edits from disk. Will try again. > org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying > edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) >
[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit
[ https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259877#comment-16259877 ] Konstantin Shvachko commented on HDFS-12832: [~Deng FEI] could you please post the actual exception here if you have it. It would be good to see a stack trace. > INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to > NameNode exit > > > Key: HDFS-12832 > URL: https://issues.apache.org/jira/browse/HDFS-12832 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.4, 3.0.0-beta1 >Reporter: DENG FEI >Priority: Critical > Attachments: HDFS-12832-trunk-001.patch > > > {code:title=INode.java|borderStyle=solid} > public String getFullPathName() { > // Get the full path name of this inode. > if (isRoot()) { > return Path.SEPARATOR; > } > // compute size of needed bytes for the path > int idx = 0; > for (INode inode = this; inode != null; inode = inode.getParent()) { > // add component + delimiter (if not tail component) > idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0); > } > byte[] path = new byte[idx]; > for (INode inode = this; inode != null; inode = inode.getParent()) { > if (inode != this) { > path[--idx] = Path.SEPARATOR_CHAR; > } > byte[] name = inode.getLocalNameBytes(); > idx -= name.length; > System.arraycopy(name, 0, path, idx, name.length); > } > return DFSUtil.bytes2String(path); > } > {code} > We found ArrayIndexOutOfBoundsException at > _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ > when ReplicaMonitor work ,and the NameNode will quit. > It seems the two loop is not synchronized, the path's length is changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12786) Ozone: add port/service names to the ksm/scm web ui
[ https://issues.apache.org/jira/browse/HDFS-12786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257815#comment-16257815 ] Xiaoyu Yao edited comment on HDFS-12786 at 11/20/17 9:20 PM: - [~elek], thanks for reporting the issue and posting the fix. The change looks good to me. Can we remove the ":" before the port number? Otherwise, +1. was (Author: xyao): [~elek], thanks for reporting the issue and posting the fix. It looks good to me and I only have one question: do we need to export serverName from the ksm/scm UI(e.g., ksm.js or scm.js) so that the tag.serverName referred in ozone.js is not null or empty? > Ozone: add port/service names to the ksm/scm web ui > --- > > Key: HDFS-12786 > URL: https://issues.apache.org/jira/browse/HDFS-12786 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton >Priority: Minor > Attachments: HDFS-12786-HDFS-7240.001.patch > > > Since HDFS-12655 an additional serviceNames field is available for all rpc > service via the metrics interface. > This super small patch modifies to scm/ksm web ui to display this name. > Instead of > :9863 > We will display: > ScmBlockLocationProtocolService (:9863) > TESTING: > Start dozone cluster and check the header of the rpc metrics section on the > web ui: http://localhost:9876/ -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog
[ https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259856#comment-16259856 ] Hudson commented on HDFS-12804: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13261 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13261/]) HDFS-12804. Use slf4j instead of log4j in FSEditLog. Contributed by (cliang: rev 60fc2a138827c2c29fa7e9d6844e3b8d43809726) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLog.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogRace.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestEditLogAutoroll.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestEditLogTailer.java > Use slf4j instead of log4j in FSEditLog > --- > > Key: HDFS-12804 > URL: https://issues.apache.org/jira/browse/HDFS-12804 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, > HDFS-12804.003.patch > > > FSEditLog uses log4j, this jira will update the logging to use sl4j. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12804) Use slf4j instead of log4j in FSEditLog
[ https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Liang updated HDFS-12804: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Use slf4j instead of log4j in FSEditLog > --- > > Key: HDFS-12804 > URL: https://issues.apache.org/jira/browse/HDFS-12804 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, > HDFS-12804.003.patch > > > FSEditLog uses log4j, this jira will update the logging to use sl4j. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12804) Use slf4j instead of log4j in FSEditLog
[ https://issues.apache.org/jira/browse/HDFS-12804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259818#comment-16259818 ] Chen Liang commented on HDFS-12804: --- Thanks [~msingh] for the update, I tested locally also, {{TestUnbuffer}} and {{TestBalancerRPCDelay}} did fail even without the patch. I've committed v003 patch to trunk. Thanks Mukul for the contribution! > Use slf4j instead of log4j in FSEditLog > --- > > Key: HDFS-12804 > URL: https://issues.apache.org/jira/browse/HDFS-12804 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Reporter: Mukul Kumar Singh >Assignee: Mukul Kumar Singh > Attachments: HDFS-12804.001.patch, HDFS-12804.002.patch, > HDFS-12804.003.patch > > > FSEditLog uses log4j, this jira will update the logging to use sl4j. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12641) Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445
[ https://issues.apache.org/jira/browse/HDFS-12641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259820#comment-16259820 ] Wei-Chiu Chuang commented on HDFS-12641: The tests failures and warnings don't seem related. Triggering precommit job again. > Backport HDFS-11755 into branch-2.7 to fix a regression in HDFS-11445 > - > > Key: HDFS-12641 > URL: https://issues.apache.org/jira/browse/HDFS-12641 > Project: Hadoop HDFS > Issue Type: Task >Affects Versions: 2.7.4 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-12641.branch-2.7.001.patch > > > Our internal testing caught a regression in HDFS-11445 when we cherry picked > the commit into CDH. Basically, it produces bogus missing file warnings. > Further analysis revealed that the regression is actually fixed by HDFS-11755. > Because of the order commits are merged in branch-2.8 ~ trunk (HDFS-11755 was > committed before HDFS-11445), the regression was never actually surfaced for > Hadoop 2.8/3.0.0-(alpha/beta) users. Since branch-2.7 has HDFS-11445 but no > HDFS-11755, I suspect the regression is more visible for Hadoop 2.7.4. > I am filing this jira to raise more awareness, than simply backporting > HDFS-11755 into branch-2.7. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12171) Reduce IIP object allocations for inode lookup
[ https://issues.apache.org/jira/browse/HDFS-12171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259815#comment-16259815 ] Wei-Chiu Chuang commented on HDFS-12171: For future reference. The object allocation came from the code refactor in HDFS-7498. > Reduce IIP object allocations for inode lookup > -- > > Key: HDFS-12171 > URL: https://issues.apache.org/jira/browse/HDFS-12171 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Fix For: 2.9.0, 3.0.0-beta1, 2.8.3 > > Attachments: HDFS-12171.branch-2.patch, HDFS-12171.patch > > > {{IIP#getReadOnlyINodes}} is invoked frequently for EZ and EC lookups. It > allocates unnecessary objects to make the primitive array an immutable array > list. IIP already has a method for indexed inode retrieval that can be > tweaked to further improve performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259781#comment-16259781 ] Nanda kumar commented on HDFS-12740: Thanks [~shashikant] for updating the patch, it looks good to me. Some minor comments, {{org.apache.hadoop.scm.container.common.helpers}} is not the right place for {{ScmInfo}}. We can move {{ScmInfo}} under {{org.apache.hadoop.scm}} MiniOzoneClassicCluster.java Line:529 - 533 can be replaced with {{scmStore.setClusterId(clusterId.orElse(runID.toString()))}} Line:534 - 538 can be replaced with {{scmStore.setScmId(scmId.orElse(UUID.randomUUID().toString()))}} NITs ScmBlockLocationProtocol.proto Line: 135, 136, 141 & 142 incorrect indentation StorageContainerManager.java Line: 1057 & 1058 {{this}} keyword can be removed > SCM should support a RPC to share the cluster Id with KSM and DataNodes > --- > > Key: HDFS-12740 > URL: https://issues.apache.org/jira/browse/HDFS-12740 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12740-HDFS-7240.001.patch, > HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, > HDFS-12740-HDFS-7240.004.patch > > > When the ozone cluster is first Created, SCM --init command will generate > cluster Id as well as SCM Id and persist it locally. The same cluster Id and > the SCM id will be shared with KSM during the KSM initialization and > Datanodes during datanode registration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259768#comment-16259768 ] Hudson commented on HDFS-12730: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13259 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13259/]) HDFS-12730. Verify open files captured in the snapshots across config (manojpec: rev 9fb4effd2c4de2d83b667a43e8798315e85ff79b) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOpenFilesWithSnapshot.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0 > > Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12826) Document Saying the RPC port, But it's required IPC port in Balancer Document.
[ https://issues.apache.org/jira/browse/HDFS-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259763#comment-16259763 ] Chen Liang commented on HDFS-12826: --- Thanks [~peruguusha] for the patch! Would it be a bit more precise to take the other way: changing {{ipc}} to {{rpc}} instead? > Document Saying the RPC port, But it's required IPC port in Balancer Document. > -- > > Key: HDFS-12826 > URL: https://issues.apache.org/jira/browse/HDFS-12826 > Project: Hadoop HDFS > Issue Type: Bug > Components: balancer & mover, documentation >Affects Versions: 3.0.0-beta1 >Reporter: Harshakiran Reddy >Assignee: usharani >Priority: Minor > Attachments: HDFS-12826.patch > > > In {{Adding a new Namenode to an existing HDFS cluster}} , refreshNamenodes > command required IPC port but in Documentation it's saying the RPC port. > http://hadoop.apache.org/docs/r3.0.0-beta1/hadoop-project-dist/hadoop-hdfs/Federation.html#Balancer > {noformat} > bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin > -refreshNamenodes host-name:65110 > refreshNamenodes: Unknown protocol: > org.apache.hadoop.hdfs.protocol.ClientDatanodeProtocol > bin.:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin > -refreshNamenodes > Usage: hdfs dfsadmin [-refreshNamenodes datanode-host:ipc_port] > bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> ./hdfs dfsadmin > -refreshNamenodes host-name:50077 > bin>:~/hdfsdata/HA/install/hadoop/datanode/bin> > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:58 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code}   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) {code} {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code}   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:57 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) {code} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } {code} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. {code} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } {code} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. {code}   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) {code} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) {code} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code {} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } code {} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code {} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } code {} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. code{}   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) code{} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) code{} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:50 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code {} //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } code {} While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code {} throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } code {} Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. code{}   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) code{} java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) code{} This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container >
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:45 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {   response = response.thenApply(reply -> { try{   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; }catch (IOException e){   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e) java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {   response = response.thenApply(reply -> { try {   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) {   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e); } java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container >
[jira] [Comment Edited] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259728#comment-16259728 ] Shashikant Banerjee edited comment on HDFS-12794 at 11/20/17 7:43 PM: -- Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given point of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the xceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {   response = response.thenApply(reply -> { try {   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) {   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e); } java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. was (Author: shashikant): Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } # While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given pint of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the sceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {  try {   String requestID =   traceID + chunkIndex + ContainerProtos.Type.WriteChunk.name();   //add the chunk write traceId to the queue   semaphore.acquire();   LOG.warn("calling async");   response =   writeChunkAsync(xceiverClient, chunk, key, data, requestID);   response = response.thenApply(reply -> { try {   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) {   LOG.info("coming here to throw exception");   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e); } } code { java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne
[jira] [Commented] (HDFS-12799) Ozone: SCM: Close containers: extend SCMCommandResponseProto with SCMCloseContainerCmdResponseProto
[ https://issues.apache.org/jira/browse/HDFS-12799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259729#comment-16259729 ] Chen Liang commented on HDFS-12799: --- Thanks for working on this [~elek]. Returning current state does seem to be a bug and was causing test failing. So I fixed it in HDFS-12793. As for creating container, have you tried to call {{cluster.getStorageContainerManager().allocateContainer}}, followed with two {{mapping.updateContainerState}}, just like what {{TestContainerMapping#createContainer}} is doing? > Ozone: SCM: Close containers: extend SCMCommandResponseProto with > SCMCloseContainerCmdResponseProto > --- > > Key: HDFS-12799 > URL: https://issues.apache.org/jira/browse/HDFS-12799 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Elek, Marton >Assignee: Elek, Marton > Attachments: HDFS-12799-HDFS-7240.001.patch > > > This issue is about extending the HB response protocol between SCM and DN > with a command to ask the datanode to close a container. (This is just about > extending the protocol not about fixing the implementation of SCM tto handle > the state transitions). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12794) Ozone: Parallelize ChunkOutputSream Writes to container
[ https://issues.apache.org/jira/browse/HDFS-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12794: --- Attachment: HDFS-12794-HDFS-7240.003.patch Thanks [~anu] , for the review comments. As per discussion with [~anu], here are few conclusions: 1) code { //make sure all the data in the ChunkoutputStreams is written to the // container Preconditions.checkArgument( semaphore.availablePermits() == getMaxOutstandingChunks()); } # While doing close on the groupOutputStream, we do chunkOutputstream.close, where we do future.get() on response obtained after the write completes from the xceiver server which makes sure the response is received from the xceiver server. While closing the groupStream, semaphorePermiCount should be equal to no of available permits which is equal to max no of outstanding chunks at any given pint of time. 2. code { throw new CompletionException( "Unexpected Storage Container Exception: " + e.toString(), e); } Hardcoding the exception when the writeChunkToConatiner calls completes in the sceiverServer shows that , the exception is caught in the chunkoutputGroupStream.close path which is expected. Code {  try {   String requestID =   traceID + chunkIndex + ContainerProtos.Type.WriteChunk.name();   //add the chunk write traceId to the queue   semaphore.acquire();   LOG.warn("calling async");   response =   writeChunkAsync(xceiverClient, chunk, key, data, requestID);   response = response.thenApply(reply -> { try {   throw new IOException("Exception while validating response"); // ContainerProtocolCalls.validateContainerResponse(reply); // return reply; } catch (IOException e) {   LOG.info("coming here to throw exception");   throw new CompletionException(   "Unexpected Storage Container Exception: " + e.toString(), e); } } code { java.io.IOException: Unexpected Storage Container Exception: java.util.concurrent.ExecutionException: java.io.IOException: Exception while validating response   at org.apache.hadoop.scm.storage.ChunkOutputStream.close(ChunkOutputStream.java:174)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream$ChunkOutputStreamEntry.close(ChunkGroupOutputStream.java:468)   at org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.close(ChunkGroupOutputStream.java:291) } This is as expected. Idea was to write a mocktest while validatingContainerResposne calls which is static method of a final class, and this requires powerMockrunner which leads to issues while bringing up the miniOzoneCluster.Will address the unit test to vertify the same later in a different jira. Patch v3 addresses the remaining review comments. [~anu]/others, please have a look. > Ozone: Parallelize ChunkOutputSream Writes to container > --- > > Key: HDFS-12794 > URL: https://issues.apache.org/jira/browse/HDFS-12794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ozone >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12794-HDFS-7240.001.patch, > HDFS-12794-HDFS-7240.002.patch, HDFS-12794-HDFS-7240.003.patch > > > The chunkOutPutStream Write are sync in nature .Once one chunk of data gets > written, the next chunk write is blocked until the previous chunk is written > to the container. > The ChunkOutputWrite Stream writes should be made async and Close on the > OutputStream should ensure flushing of all dirty buffers to the container. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores
[ https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259709#comment-16259709 ] Hadoop QA commented on HDFS-12807: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 18m 36s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 4s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 47s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 53s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 15m 54s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 5m 43s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 16s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 42s{color} | {color:orange} hadoop-hdfs-project: The patch generated 11 new + 0 unchanged - 0 fixed = 11 total (was 0) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} xml {color} | {color:green} 0m 3s{color} | {color:green} The patch has no ill-formed XML file. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 14s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 56s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 31s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}131m 2s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}220m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:1 | | Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.hdfs.TestLeaseRecovery2 | | | hadoop.ozone.TestOzoneConfigurationFields | | | hadoop.hdfs.server.namenode.ha.TestPipelinesFailover | | | hadoop.ozone.container.common.impl.TestContainerPersistence | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.cblock.TestCBlockReadWrite | | | hadoop.fs.TestUnbuffer | | | hadoop.ozone.web.client.TestKeys | | Timed out junit tests | org.apache.hadoop.cblock.TestLocalBlockCache | \\ \\ || Subsystem ||
[jira] [Updated] (HDFS-12787) Ozone: SCM: Aggregate the metrics from all the container reports
[ https://issues.apache.org/jira/browse/HDFS-12787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-12787: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: HDFS-7240 Status: Resolved (was: Patch Available) Thanks [~linyiqun] for the contribution. +1 for the latest patch. I've committed to the feature branch. > Ozone: SCM: Aggregate the metrics from all the container reports > > > Key: HDFS-12787 > URL: https://issues.apache.org/jira/browse/HDFS-12787 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: metrics, ozone >Affects Versions: HDFS-7240 >Reporter: Yiqun Lin >Assignee: Yiqun Lin > Fix For: HDFS-7240 > > Attachments: HDFS-12787-HDFS-7240.001.patch, > HDFS-12787-HDFS-7240.002.patch, HDFS-12787-HDFS-7240.003.patch, > HDFS-12787-HDFS-7240.004.patch, HDFS-12787-HDFS-7240.005.patch, > HDFS-12787-HDFS-7240.006.patch > > > We should aggregate the metrics from all the reports of different datanodes > in addition to the last report. This way, we can get a global view of the > container I/Os over the ozone cluster. This is a follow up work of HDFS-11468. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12730) Verify open files captured in the snapshots across config disable and enable
[ https://issues.apache.org/jira/browse/HDFS-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Govindassamy updated HDFS-12730: -- Resolution: Fixed Fix Version/s: 3.1.0 Status: Resolved (was: Patch Available) Thanks for the review [~yzhangal] and [~hanishakoneru]. Committed it to trunk. > Verify open files captured in the snapshots across config disable and enable > > > Key: HDFS-12730 > URL: https://issues.apache.org/jira/browse/HDFS-12730 > Project: Hadoop HDFS > Issue Type: Test > Components: hdfs >Affects Versions: 3.0.0 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Fix For: 3.1.0 > > Attachments: HDFS-12730.01.patch, HDFS-12730.02.patch > > > Open files captured in the snapshots have their meta data preserved based on > the config > _dfs.namenode.snapshot.capture.openfiles_ (refer HDFS-11402). During the > upgrade scenario or when the NameNode gets restarted with config turned on or > off, the attributes of the open files captured in the snapshots are > influenced accordingly. Better to have a test case to verify open file > attributes across config turn on and off, and the current expected behavior > with HDFS-11402 so as to catch any regressions in the future. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log
[ https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259559#comment-16259559 ] Wei-Chiu Chuang commented on HDFS-12836: Updated Affects Version/s based on the fix version of HDFS-10519 > startTxId could be greater than endTxId when tailing in-progress edit log > - > > Key: HDFS-12836 > URL: https://issues.apache.org/jira/browse/HDFS-12836 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Chao Sun >Assignee: Chao Sun > > When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also > tail those in progress edit log segments. However, in the following code: > {code} > if (onlyDurableTxns && inProgressOk) { > endTxId = Math.min(endTxId, committedTxnId); > } > EditLogInputStream elis = EditLogFileInputStream.fromUrl( > connectionFactory, url, remoteLog.getStartTxId(), > endTxId, remoteLog.isInProgress()); > {code} > it is possible that {{remoteLog.getStartTxId()}} could be greater than > {{endTxId}}, and therefore will cause the following error: > {code} > 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Error replaying edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > 2017-11-17 19:55:41,165 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading > edits from disk. Will try again. > org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying > edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > Caused by: > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at >
[jira] [Updated] (HDFS-12836) startTxId could be greater than endTxId when tailing in-progress edit log
[ https://issues.apache.org/jira/browse/HDFS-12836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-12836: --- Affects Version/s: 3.0.0-alpha1 > startTxId could be greater than endTxId when tailing in-progress edit log > - > > Key: HDFS-12836 > URL: https://issues.apache.org/jira/browse/HDFS-12836 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Chao Sun >Assignee: Chao Sun > > When {{dfs.ha.tail-edits.in-progress}} is true, edit log tailer will also > tail those in progress edit log segments. However, in the following code: > {code} > if (onlyDurableTxns && inProgressOk) { > endTxId = Math.min(endTxId, committedTxnId); > } > EditLogInputStream elis = EditLogFileInputStream.fromUrl( > connectionFactory, url, remoteLog.getStartTxId(), > endTxId, remoteLog.isInProgress()); > {code} > it is possible that {{remoteLog.getStartTxId()}} could be greater than > {{endTxId}}, and therefore will cause the following error: > {code} > 2017-11-17 19:55:41,165 ERROR org.apache.hadoop.hdfs.server.namenode.FSImage: > Error replaying edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:189) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:205) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > 2017-11-17 19:55:41,165 WARN > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Error while reading > edits from disk. Will try again. > org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying > edit log at offset 1048576. Expected transaction ID was 87 > Recent opcode offsets: 1048576 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:218) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:882) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:863) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:481) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393) > Caused by: > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException: > got premature end-of-file at txid 86; expected file to go up to 85 > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:197) > at >
[jira] [Commented] (HDFS-12711) deadly hdfs test
[ https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259536#comment-16259536 ] Allen Wittenauer commented on HDFS-12711: - FYI HADOOP-13514. > deadly hdfs test > > > Key: HDFS-12711 > URL: https://issues.apache.org/jira/browse/HDFS-12711 > Project: Hadoop HDFS > Issue Type: Test >Affects Versions: 2.9.0, 2.8.2 >Reporter: Allen Wittenauer >Priority: Critical > Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit
[ https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259458#comment-16259458 ] Erik Krogen edited comment on HDFS-12832 at 11/20/17 4:42 PM: -- Thanks for reporting this and for working on a patch, [~Deng FEI]! Actually, the new method you are using, {{INode#getPathComponents()}} is subject to the same race condition. Generally {{INode}} is not meant to be a concurrent data structure as far as I can tell. I believe the issue is actually that {{ReplicationWork#chooseTargets()}} is being called without a lock: {code:title=BlockManager.ReplicationWork} // choose replication targets: NOT HOLDING THE GLOBAL LOCK // It is costly to extract the filename for which chooseTargets is called, // so for now we pass in the block collection itself. rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes); {code} Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, {{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which it seems should not be allowed outside of the lock. [~Deng FEI], do you have a stack trace available to confirm that this is the same code path which caused your exception? This is the code path that was taken to trigger the issue for us. was (Author: xkrogen): Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually, the new method you are using, {{INode#getPathComponents()}} is subject to the same race condition. Generally {{INode}} is not meant to be a concurrent data structure as far as I can tell. I believe the issue is actually that {{ReplicationWork#chooseTargets()}} is being called without a lock: {code:title=BlockManager.ReplicationWork} // choose replication targets: NOT HOLDING THE GLOBAL LOCK // It is costly to extract the filename for which chooseTargets is called, // so for now we pass in the block collection itself. rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes); {code} Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, {{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which it seems should not be allowed outside of the lock. [~Deng FEI], do you have a stack trace available to confirm that this is the same code path which caused your exception? This is the code path that was taken to trigger the issue for us. > INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to > NameNode exit > > > Key: HDFS-12832 > URL: https://issues.apache.org/jira/browse/HDFS-12832 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.4, 3.0.0-beta1 >Reporter: DENG FEI >Priority: Critical > Attachments: HDFS-12832-trunk-001.patch > > > {code:title=INode.java|borderStyle=solid} > public String getFullPathName() { > // Get the full path name of this inode. > if (isRoot()) { > return Path.SEPARATOR; > } > // compute size of needed bytes for the path > int idx = 0; > for (INode inode = this; inode != null; inode = inode.getParent()) { > // add component + delimiter (if not tail component) > idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0); > } > byte[] path = new byte[idx]; > for (INode inode = this; inode != null; inode = inode.getParent()) { > if (inode != this) { > path[--idx] = Path.SEPARATOR_CHAR; > } > byte[] name = inode.getLocalNameBytes(); > idx -= name.length; > System.arraycopy(name, 0, path, idx, name.length); > } > return DFSUtil.bytes2String(path); > } > {code} > We found ArrayIndexOutOfBoundsException at > _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ > when ReplicaMonitor work ,and the NameNode will quit. > It seems the two loop is not synchronized, the path's length is changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12832) INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to NameNode exit
[ https://issues.apache.org/jira/browse/HDFS-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259458#comment-16259458 ] Erik Krogen commented on HDFS-12832: Thanks for reporting this and for working on a patch, [~Deng FEI]]! Actually, the new method you are using, {{INode#getPathComponents()}} is subject to the same race condition. Generally {{INode}} is not meant to be a concurrent data structure as far as I can tell. I believe the issue is actually that {{ReplicationWork#chooseTargets()}} is being called without a lock: {code:title=BlockManager.ReplicationWork} // choose replication targets: NOT HOLDING THE GLOBAL LOCK // It is costly to extract the filename for which chooseTargets is called, // so for now we pass in the block collection itself. rw.chooseTargets(blockplacement, storagePolicySuite, excludedNodes); {code} Within {{chooseTargets()}} various methods on {{INode}}/{{BlockCollection}}, {{DatanodeDescriptor}}, {{DatanodeStorageInfo}}, and {{Block}} are called which it seems should not be allowed outside of the lock. [~Deng FEI], do you have a stack trace available to confirm that this is the same code path which caused your exception? This is the code path that was taken to trigger the issue for us. > INode.getFullPathName may throw ArrayIndexOutOfBoundsException lead to > NameNode exit > > > Key: HDFS-12832 > URL: https://issues.apache.org/jira/browse/HDFS-12832 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.4, 3.0.0-beta1 >Reporter: DENG FEI >Priority: Critical > Attachments: HDFS-12832-trunk-001.patch > > > {code:title=INode.java|borderStyle=solid} > public String getFullPathName() { > // Get the full path name of this inode. > if (isRoot()) { > return Path.SEPARATOR; > } > // compute size of needed bytes for the path > int idx = 0; > for (INode inode = this; inode != null; inode = inode.getParent()) { > // add component + delimiter (if not tail component) > idx += inode.getLocalNameBytes().length + (inode != this ? 1 : 0); > } > byte[] path = new byte[idx]; > for (INode inode = this; inode != null; inode = inode.getParent()) { > if (inode != this) { > path[--idx] = Path.SEPARATOR_CHAR; > } > byte[] name = inode.getLocalNameBytes(); > idx -= name.length; > System.arraycopy(name, 0, path, idx, name.length); > } > return DFSUtil.bytes2String(path); > } > {code} > We found ArrayIndexOutOfBoundsException at > _{color:#707070}System.arraycopy(name, 0, path, idx, name.length){color}_ > when ReplicaMonitor work ,and the NameNode will quit. > It seems the two loop is not synchronized, the path's length is changed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12823) Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to branch-2.7
[ https://issues.apache.org/jira/browse/HDFS-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259393#comment-16259393 ] Erik Krogen commented on HDFS-12823: Thank you [~manojg] and [~zhz]! > Backport HDFS-9259 "Make SO_SNDBUF size configurable at DFSClient" to > branch-2.7 > > > Key: HDFS-12823 > URL: https://issues.apache.org/jira/browse/HDFS-12823 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, hdfs-client >Reporter: Erik Krogen >Assignee: Erik Krogen > Fix For: 2.7.5 > > Attachments: HDFS-12823-branch-2.7.000.patch, > HDFS-12823-branch-2.7.001.patch, HDFS-12823-branch-2.7.002.patch > > > Given the pretty significant performance implications of HDFS-9259 (see > discussion in HDFS-10326) when doing transfers across high latency links, it > would be helpful to have this configurability exist in the 2.7 series. > Opening a new JIRA since the original HDFS-9259 has been closed for a while > and there are conflicts due to a few classes moving. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores
[ https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDFS-12807: Status: Patch Available (was: In Progress) > Ozone: Expose RockDB stats via JMX for Ozone metadata stores > > > Key: HDFS-12807 > URL: https://issues.apache.org/jira/browse/HDFS-12807 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Elek, Marton > Attachments: HDFS-12807-HDFS-7240.001.patch, > HDFS-12807-HDFS-7240.002.patch > > > RocksDB JNI has an option to expose stats, this can be further exposed to > graphs and monitoring applications. We should expose them to our Rocks > metadata store implementation for troubleshooting metadata related > performance issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12807) Ozone: Expose RockDB stats via JMX for Ozone metadata stores
[ https://issues.apache.org/jira/browse/HDFS-12807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elek, Marton updated HDFS-12807: Attachment: HDFS-12807-HDFS-7240.002.patch Second version. Fixed the case when the conf is null (unit tests). > Ozone: Expose RockDB stats via JMX for Ozone metadata stores > > > Key: HDFS-12807 > URL: https://issues.apache.org/jira/browse/HDFS-12807 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Xiaoyu Yao >Assignee: Elek, Marton > Attachments: HDFS-12807-HDFS-7240.001.patch, > HDFS-12807-HDFS-7240.002.patch > > > RocksDB JNI has an option to expose stats, this can be further exposed to > graphs and monitoring applications. We should expose them to our Rocks > metadata store implementation for troubleshooting metadata related > performance issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259251#comment-16259251 ] Hadoop QA commented on HDFS-12754: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 15m 57s{color} | {color:red} Docker failed to build yetus/hadoop:5b98639. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12754 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898486/HDFS-12754.008.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/22142/console | | Powered by | Apache Yetus 0.7.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch, > HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch, > HDFS-12754.006.patch, HDFS-12754.007.patch, HDFS-12754.008.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12754) Lease renewal can hit a deadlock
[ https://issues.apache.org/jira/browse/HDFS-12754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated HDFS-12754: --- Attachment: HDFS-12754.008.patch Attaching patch that improves the test for better coordination between leaseRenewer and the dfsClient using a latch. Fixed the visibility to the grace default to private. > Lease renewal can hit a deadlock > - > > Key: HDFS-12754 > URL: https://issues.apache.org/jira/browse/HDFS-12754 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.8.1 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: HDFS-12754.001.patch, HDFS-12754.002.patch, > HDFS-12754.003.patch, HDFS-12754.004.patch, HDFS-12754.005.patch, > HDFS-12754.006.patch, HDFS-12754.007.patch, HDFS-12754.008.patch > > > The Client and the renewer can hit a deadlock during close operation since > closeFile() reaches back to the DFSClient#removeFileBeingWritten. This is > possible if the client class close when the renewer is renewing a lease. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259142#comment-16259142 ] Hadoop QA commented on HDFS-12740: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 17s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 3 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 30s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 49s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 42s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 44s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 46s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 13s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 34s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 59s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 9s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 36s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 34s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 23s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red}156m 29s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}219m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.fs.TestUnbuffer | | | hadoop.ozone.container.common.impl.TestContainerPersistence | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.ozone.web.client.TestKeys | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.cblock.TestBufferManager | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.hdfs.server.balancer.TestBalancerRPCDelay | | | hadoop.cblock.TestCBlockReadWrite | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.ozone.TestOzoneConfigurationFields | | Timed out junit tests |
[jira] [Commented] (HDFS-12787) Ozone: SCM: Aggregate the metrics from all the container reports
[ https://issues.apache.org/jira/browse/HDFS-12787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259058#comment-16259058 ] Hadoop QA commented on HDFS-12787: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 40s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 4 new or modified test files. {color} | || || || || {color:brown} HDFS-7240 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 19m 0s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 19s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 32s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 11s{color} | {color:green} HDFS-7240 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s{color} | {color:green} HDFS-7240 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 31s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}141m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 24s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}214m 36s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:2 | | Failed junit tests | hadoop.ozone.ozShell.TestOzoneShell | | | hadoop.ozone.TestOzoneConfigurationFields | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.ozone.container.common.impl.TestContainerPersistence | | | hadoop.ozone.client.rpc.TestOzoneRpcClient | | | hadoop.cblock.TestCBlockReadWrite | | | hadoop.fs.TestUnbuffer | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | | | hadoop.ozone.web.client.TestKeys | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.cblock.TestLocalBlockCache | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:d11161b | | JIRA Issue | HDFS-12787 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898422/HDFS-12787-HDFS-7240.006.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 5877da42727e 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | HDFS-7240 / 5dc4dfa | | maven | version:
[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure
[ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258981#comment-16258981 ] Gang Xie commented on HDFS-12820: - Why we don't substract nodesInService when we complete the decommission of the datanode and it becomes dead? And consideration here? > Decommissioned datanode is counted in service cause datanode allcating failure > -- > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.4.0 >Reporter: Gang Xie > > When allocate a datanode when dfsclient write with considering the load, it > checks if the datanode is overloaded by calculating the average xceivers of > all the in service datanode. But if the datanode is decommissioned and become > dead, it's still treated as in service, which make the average load much more > than the real one especially when the number of the decommissioned datanode > is great. In our cluster, 180 datanode, and 100 of them decommissioned, and > the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || > node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure
[ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258974#comment-16258974 ] Gang Xie edited comment on HDFS-12820 at 11/20/17 9:11 AM: --- After carefully check the issue reported in HDFS-9279, and found this issue is not its dup. this case is mainly about the param {color:#d04437}nodesInService{color}. When a datanode is decommissioned and then dead. nodesInService will not be subtracted. Then when allocation, the dead node will be counted in the maxload, which makes the maxload very low, in turns, causes any allocation failing. if (considerLoad) { {color:#d04437} final double maxLoad = maxLoadRatio * stats.getInServiceXceiverAverage();{color} final int nodeLoad = node.getXceiverCount(); if (nodeLoad > maxLoad) { logNodeIsNotChosen(storage, "the node is too busy (load:"+nodeLoad+" > "+maxLoad+") "); stats.incrOverLoaded(); return false; } } was (Author: xiegang112): After carefully check the issue reported in HDFS-9279, and found this issue is not a its dup. this case is mainly about the param {color:#d04437}nodesInService{color}. When a datanode is decommissioned and then dead. nodesInService will not be subtracted. Then when allocation, the dead node will be counted in the maxload, which makes the maxload very low, in turns, causes any allocation failing. if (considerLoad) { {color:#d04437} final double maxLoad = maxLoadRatio * stats.getInServiceXceiverAverage();{color} final int nodeLoad = node.getXceiverCount(); if (nodeLoad > maxLoad) { logNodeIsNotChosen(storage, "the node is too busy (load:"+nodeLoad+" > "+maxLoad+") "); stats.incrOverLoaded(); return false; } } > Decommissioned datanode is counted in service cause datanode allcating failure > -- > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.4.0 >Reporter: Gang Xie > > When allocate a datanode when dfsclient write with considering the load, it > checks if the datanode is overloaded by calculating the average xceivers of > all the in service datanode. But if the datanode is decommissioned and become > dead, it's still treated as in service, which make the average load much more > than the real one especially when the number of the decommissioned datanode > is great. In our cluster, 180 datanode, and 100 of them decommissioned, and > the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || > node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure
[ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gang Xie reopened HDFS-12820: - After carefully check the issue reported in HDFS-9279, and found this issue is not a its dup. this case is mainly about the param {color:#d04437}nodesInService{color}. When a datanode is decommissioned and then dead. nodesInService will not be subtracted. Then when allocation, the dead node will be counted in the maxload, which makes the maxload very low, in turns, causes any allocation failing. if (considerLoad) { {color:#d04437} final double maxLoad = maxLoadRatio * stats.getInServiceXceiverAverage();{color} final int nodeLoad = node.getXceiverCount(); if (nodeLoad > maxLoad) { logNodeIsNotChosen(storage, "the node is too busy (load:"+nodeLoad+" > "+maxLoad+") "); stats.incrOverLoaded(); return false; } } > Decommissioned datanode is counted in service cause datanode allcating failure > -- > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.4.0 >Reporter: Gang Xie > > When allocate a datanode when dfsclient write with considering the load, it > checks if the datanode is overloaded by calculating the average xceivers of > all the in service datanode. But if the datanode is decommissioned and become > dead, it's still treated as in service, which make the average load much more > than the real one especially when the number of the decommissioned datanode > is great. In our cluster, 180 datanode, and 100 of them decommissioned, and > the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || > node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12820) Decommissioned datanode is counted in service cause datanode allcating failure
[ https://issues.apache.org/jira/browse/HDFS-12820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258975#comment-16258975 ] Gang Xie commented on HDFS-12820: - And I believe this issue still in the latest version > Decommissioned datanode is counted in service cause datanode allcating failure > -- > > Key: HDFS-12820 > URL: https://issues.apache.org/jira/browse/HDFS-12820 > Project: Hadoop HDFS > Issue Type: Bug > Components: block placement >Affects Versions: 2.4.0 >Reporter: Gang Xie > > When allocate a datanode when dfsclient write with considering the load, it > checks if the datanode is overloaded by calculating the average xceivers of > all the in service datanode. But if the datanode is decommissioned and become > dead, it's still treated as in service, which make the average load much more > than the real one especially when the number of the decommissioned datanode > is great. In our cluster, 180 datanode, and 100 of them decommissioned, and > the average load is 17. This failed all the datanode allocation. > private void subtract(final DatanodeDescriptor node) { > capacityUsed -= node.getDfsUsed(); > blockPoolUsed -= node.getBlockPoolUsed(); > xceiverCount -= node.getXceiverCount(); > {color:red} if (!(node.isDecommissionInProgress() || > node.isDecommissioned())) {{color} > nodesInService--; > nodesInServiceXceiverCount -= node.getXceiverCount(); > capacityTotal -= node.getCapacity(); > capacityRemaining -= node.getRemaining(); > } else { > capacityTotal -= node.getDfsUsed(); > } > cacheCapacity -= node.getCacheCapacity(); > cacheUsed -= node.getCacheUsed(); > } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-12740) SCM should support a RPC to share the cluster Id with KSM and DataNodes
[ https://issues.apache.org/jira/browse/HDFS-12740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-12740: --- Attachment: HDFS-12740-HDFS-7240.004.patch Rebased patch v3. > SCM should support a RPC to share the cluster Id with KSM and DataNodes > --- > > Key: HDFS-12740 > URL: https://issues.apache.org/jira/browse/HDFS-12740 > Project: Hadoop HDFS > Issue Type: Sub-task >Affects Versions: HDFS-7240 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee > Fix For: HDFS-7240 > > Attachments: HDFS-12740-HDFS-7240.001.patch, > HDFS-12740-HDFS-7240.002.patch, HDFS-12740-HDFS-7240.003.patch, > HDFS-12740-HDFS-7240.004.patch > > > When the ozone cluster is first Created, SCM --init command will generate > cluster Id as well as SCM Id and persist it locally. The same cluster Id and > the SCM id will be shared with KSM during the KSM initialization and > Datanodes during datanode registration. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org