[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845193#comment-13845193 ] Hudson commented on HDFS-5580: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4863 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4863/]) HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845204#comment-13845204 ] Junping Du commented on HDFS-5580: -- +1. I have commit this to trunk and branch-2. Thanks Binglin! Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.4.0 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-5580: - Resolution: Fixed Fix Version/s: 2.4.0 Target Version/s: 2.4.0 Status: Resolved (was: Patch Available) Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.4.0 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845218#comment-13845218 ] Liang Xie commented on HDFS-4273: - {code} - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false {code} i checked codebase, it shows : {code} private synchronized boolean seekToBlockSource(long targetPos) throws IOException { currentNode = blockSeekTo(targetPos); return true; } {code} It could not return false, seems the original description is stale ? Problem in DFSInputStream read retry logic may cause early failure -- Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} some issues of the logic: 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle. 2. the variable int retries=2 in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-4874) create with OVERWRITE deletes existing file without checking the lease: feature or a bug.
[ https://issues.apache.org/jira/browse/HDFS-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845244#comment-13845244 ] amol khatri commented on HDFS-4874: --- What will happen in case, 2 clients trying to create file on the same path? create with OVERWRITE deletes existing file without checking the lease: feature or a bug. - Key: HDFS-4874 URL: https://issues.apache.org/jira/browse/HDFS-4874 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.0.4-alpha Reporter: Konstantin Shvachko create with OVERWRITE flag will remove a file under construction even if the issuing client does not hold a lease on the file. It could be a bug or the feature that applications rely upon. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5646) Exceptions during HDFS failover
[ https://issues.apache.org/jira/browse/HDFS-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HDFS-5646. -- Resolution: Fixed HI, I'm afraid you are going to have to take this up with cloudera -if there is a problem in the hadoop codebase then they can escalate it over here Closing as invalid per policy [http://wiki.apache.org/hadoop/InvalidJiraIssues] Exceptions during HDFS failover --- Key: HDFS-5646 URL: https://issues.apache.org/jira/browse/HDFS-5646 Project: Hadoop HDFS Issue Type: Bug Components: ha Reporter: Nikhil Mulley Hi, In our HDFS HA, I see the following excpetions when I try to failback. I have an auto failover mechanism enabled. Although the failback operation succeeds, the exceptions and the return status of 255 tend to worry me (because I cannot script this if I needed to) Please let me know if this is anything that is known and easily resolvable. I am using Cloudera Hadoop 4.4.0, if that helps.Please let me know if I need to open this ticket with CDH Jira, instead. Thanks. sudo -u hdfs hdfs haadmin -failover nn2 nn1 Operation failed: Unable to become active. Service became unhealthy while trying to failover. at org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:652) at org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:58) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:591) at org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:588) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:588) at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) at org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61) at org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1351) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment
[ https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845300#comment-13845300 ] Hudson commented on HDFS-5074: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move entry for HDFS-5074 to correct section. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a segment. Contributed by Todd Lipcon. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java Allow starting up from an fsimage checkpoint in the middle of a segment --- Key: HDFS-5074 URL:
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845289#comment-13845289 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845293#comment-13845293 ] Hudson commented on HDFS-5504: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart
[ https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845296#comment-13845296 ] Hudson commented on HDFS-5425: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Renaming underconstruction file with snapshots can make NN failure on restart - Key: HDFS-5425 URL: https://issues.apache.org/jira/browse/HDFS-5425 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.2.0 Reporter: sathish Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, HDFS-5425.patch I faced this When i am doing some snapshot operations like createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with exception, 2013-10-24 21:07:03,040 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:133) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311) 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845295#comment-13845295 ] Hudson commented on HDFS-5476: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845291#comment-13845291 ] Hudson commented on HDFS-5580: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.4.0 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845292#comment-13845292 ] Hudson commented on HDFS-5428: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.004.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE
[ https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845298#comment-13845298 ] Hudson commented on HDFS-5257: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt addBlock() retry should return LocatedBlock with locations else client will get AIOBE - Key: HDFS-5257 URL: https://issues.apache.org/jira/browse/HDFS-5257 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.1.1-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch {{addBlock()}} call retry should return the LocatedBlock with locations if the block was created in previous call and failover/restart of namenode happened. otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating the block and write will fail. {noformat}java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845297#comment-13845297 ] Hudson commented on HDFS-5443: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Fix For: 2.3.0 Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845290#comment-13845290 ] Hudson commented on HDFS-5427: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.
[ https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845294#comment-13845294 ] Hudson commented on HDFS-5474: -- FAILURE: Integrated in Hadoop-Yarn-trunk #418 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/418/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Deletesnapshot can make Namenode in safemode on NN restarts. Key: HDFS-5474 URL: https://issues.apache.org/jira/browse/HDFS-5474 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Uma Maheswara Rao G Assignee: sathish Fix For: 2.3.0 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch When we deletesnapshot, we are deleting the blocks associated to that snapshot and after that we do logsync to editlog about deleteSnapshot. There can be a chance that blocks removed from blocks map but before log sync if there is BR , NN may finds that block does not exist in blocks map and may invalidate that block. As part HB, invalidation info also can go. After this steps if Namenode shutdown before actually do logsync, On restart it will still consider that snapshot Inodes and expect blocks to report from DN. Simple solution is, we should simply move down that blocks removal after logsync only. Similar to delete op. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845308#comment-13845308 ] Binglin Chang commented on HDFS-4273: - seekToNewSource, not seekToBlockSource Problem in DFSInputStream read retry logic may cause early failure -- Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} some issues of the logic: 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle. 2. the variable int retries=2 in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed? -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE
[ https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845394#comment-13845394 ] Hudson commented on HDFS-5257: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt addBlock() retry should return LocatedBlock with locations else client will get AIOBE - Key: HDFS-5257 URL: https://issues.apache.org/jira/browse/HDFS-5257 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.1.1-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch {{addBlock()}} call retry should return the LocatedBlock with locations if the block was created in previous call and failover/restart of namenode happened. otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating the block and write will fail. {noformat}java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart
[ https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845392#comment-13845392 ] Hudson commented on HDFS-5425: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Renaming underconstruction file with snapshots can make NN failure on restart - Key: HDFS-5425 URL: https://issues.apache.org/jira/browse/HDFS-5425 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.2.0 Reporter: sathish Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, HDFS-5425.patch I faced this When i am doing some snapshot operations like createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with exception, 2013-10-24 21:07:03,040 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:133) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311) 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845386#comment-13845386 ] Hudson commented on HDFS-5427: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845387#comment-13845387 ] Hudson commented on HDFS-5580: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.4.0 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845391#comment-13845391 ] Hudson commented on HDFS-5476: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845393#comment-13845393 ] Hudson commented on HDFS-5443: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Fix For: 2.3.0 Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment
[ https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845396#comment-13845396 ] Hudson commented on HDFS-5074: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move entry for HDFS-5074 to correct section. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a segment. Contributed by Todd Lipcon. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java Allow starting up from an fsimage checkpoint in the middle of a segment --- Key: HDFS-5074 URL:
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845389#comment-13845389 ] Hudson commented on HDFS-5504: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845385#comment-13845385 ] Hudson commented on HDFS-5283: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845388#comment-13845388 ] Hudson commented on HDFS-5428: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.004.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.
[ https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845390#comment-13845390 ] Hudson commented on HDFS-5474: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1609 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1609/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Deletesnapshot can make Namenode in safemode on NN restarts. Key: HDFS-5474 URL: https://issues.apache.org/jira/browse/HDFS-5474 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Uma Maheswara Rao G Assignee: sathish Fix For: 2.3.0 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch When we deletesnapshot, we are deleting the blocks associated to that snapshot and after that we do logsync to editlog about deleteSnapshot. There can be a chance that blocks removed from blocks map but before log sync if there is BR , NN may finds that block does not exist in blocks map and may invalidate that block. As part HB, invalidation info also can go. After this steps if Namenode shutdown before actually do logsync, On restart it will still consider that snapshot Inodes and expect blocks to report from DN. Simple solution is, we should simply move down that blocks removal after logsync only. Similar to delete op. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5504) In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode.
[ https://issues.apache.org/jira/browse/HDFS-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845444#comment-13845444 ] Hudson commented on HDFS-5504: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt In HA mode, OP_DELETE_SNAPSHOT is not decrementing the safemode threshold, leads to NN safemode. Key: HDFS-5504 URL: https://issues.apache.org/jira/browse/HDFS-5504 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5504.patch, HDFS-5504.patch 1. HA installation, standby NN is down. 2. delete snapshot is called and it has deleted the blocks from blocksmap and all datanodes. log sync also happened. 3. before next log roll NN crashed 4. When the namenode restartes then it will fsimage and finalized edits from shared storage and set the safemode threshold. which includes blocks from deleted snapshot also. (because this edits is not yet read as namenode is restarted before the last edits segment is not finalized) 5. When it becomes active, it finalizes the edits and read the delete snapshot edits_op. but at this time, it was not reducing the safemode count. and it will continuing in safemode. 6. On next restart, as the edits is already finalized, on startup only it will read and set the safemode threshold correctly. But one more restart will bring NN out of safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5427) not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart
[ https://issues.apache.org/jira/browse/HDFS-5427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845441#comment-13845441 ] Hudson commented on HDFS-5427: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt not able to read deleted files from snapshot directly under snapshottable dir after checkpoint and NN restart - Key: HDFS-5427 URL: https://issues.apache.org/jira/browse/HDFS-5427 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5427-v2.patch, HDFS-5427.patch, HDFS-5427.patch 1. allow snapshots under dir /foo 2. create a file /foo/bar 3. create a snapshot s1 under /foo 4. delete the file /foo/bar 5. wait till checkpoint or do saveNameSpace 6. restart NN. 7. Now try to read the file from snapshot /foo/.snapshot/s1/bar client will get BlockMissingException Reason is While loading the deleted file list for a snashottable dir from fsimage, blocks were not updated in blocksmap -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5283) NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold
[ https://issues.apache.org/jira/browse/HDFS-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845440#comment-13845440 ] Hudson commented on HDFS-5283: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5283 to section branch-2.3.0 (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550032) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NN not coming out of startup safemode due to under construction blocks only inside snapshots also counted in safemode threshhold Key: HDFS-5283 URL: https://issues.apache.org/jira/browse/HDFS-5283 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.1.1-beta Reporter: Vinay Assignee: Vinay Priority: Critical Fix For: 2.3.0 Attachments: HDFS-5283.000.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch, HDFS-5283.patch This is observed in one of our env: 1. A MR Job was running which has created some temporary files and was writing to them. 2. Snapshot was taken 3. And Job was killed and temporary files were deleted. 4. Namenode restarted. 5. After restart Namenode was in safemode waiting for blocks Analysis - 1. Since the snapshot taken also includes the temporary files which were open, and later original files are deleted. 2. UnderConstruction blocks count was taken from leases. not considered the UC blocks only inside snapshots 3. So safemode threshold count was more and NN did not come out of safemode -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5257) addBlock() retry should return LocatedBlock with locations else client will get AIOBE
[ https://issues.apache.org/jira/browse/HDFS-5257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845449#comment-13845449 ] Hudson commented on HDFS-5257: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt addBlock() retry should return LocatedBlock with locations else client will get AIOBE - Key: HDFS-5257 URL: https://issues.apache.org/jira/browse/HDFS-5257 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client, namenode Affects Versions: 2.1.1-beta Reporter: Vinay Assignee: Vinay Fix For: 2.3.0 Attachments: HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch, HDFS-5257.patch {{addBlock()}} call retry should return the LocatedBlock with locations if the block was created in previous call and failover/restart of namenode happened. otherwise client will get {{ArrayIndexOutOfBoundsException}} while creating the block and write will fail. {noformat}java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1118) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1078) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:511){noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5476) Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion
[ https://issues.apache.org/jira/browse/HDFS-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845446#comment-13845446 ] Hudson commented on HDFS-5476: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Snapshot: clean the blocks/files/directories under a renamed file/directory while deletion -- Key: HDFS-5476 URL: https://issues.apache.org/jira/browse/HDFS-5476 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.2.0 Reporter: Jing Zhao Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5476.001.patch Currently DstReference#destroyAndCollectBlocks may fail to clean the subtree under the DstReference node for file/directory/snapshot deletion. Use case 1: # rename under-construction file with 0-sized blocks after snapshot. # delete the renamed directory. We need to make sure we delete the 0-sized block. Use case 2: # create snapshot s0 for / # create a new file under /foo/bar/ # rename foo -- foo2 # create snapshot s1 # delete bar and foo2 # delete snapshot s1 We need to make sure we delete the file under /foo/bar since it is not included in snapshot s0. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5474) Deletesnapshot can make Namenode in safemode on NN restarts.
[ https://issues.apache.org/jira/browse/HDFS-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845445#comment-13845445 ] Hudson commented on HDFS-5474: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Deletesnapshot can make Namenode in safemode on NN restarts. Key: HDFS-5474 URL: https://issues.apache.org/jira/browse/HDFS-5474 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Uma Maheswara Rao G Assignee: sathish Fix For: 2.3.0 Attachments: HDFS-5474-001.patch, HDFS-5474-002.patch When we deletesnapshot, we are deleting the blocks associated to that snapshot and after that we do logsync to editlog about deleteSnapshot. There can be a chance that blocks removed from blocks map but before log sync if there is BR , NN may finds that block does not exist in blocks map and may invalidate that block. As part HB, invalidation info also can go. After this steps if Namenode shutdown before actually do logsync, On restart it will still consider that snapshot Inodes and expect blocks to report from DN. Simple solution is, we should simply move down that blocks removal after logsync only. Similar to delete op. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5074) Allow starting up from an fsimage checkpoint in the middle of a segment
[ https://issues.apache.org/jira/browse/HDFS-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845451#comment-13845451 ] Hudson commented on HDFS-5074: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move entry for HDFS-5074 to correct section. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550027) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt HDFS-5074. Allow starting up from an fsimage checkpoint in the middle of a segment. Contributed by Todd Lipcon. (atm: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550016) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal/src/main/java/org/apache/hadoop/contrib/bkjournal/BookKeeperJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLogger.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/AsyncLoggerSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/IPCLoggerChannel.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/client/QuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocol/QJournalProtocol.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolServerSideTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/protocolPB/QJournalProtocolTranslatorPB.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/Journal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeRpcServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/BackupJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/JournalSet.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LogsPurgeable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SecondaryNameNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/BootstrapStandby.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ha/EditLogTailer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/QJournalProtocol.proto * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/MiniQJMHACluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/TestNNWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/qjournal/client/TestQuorumJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NameNodeAdapter.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFileJournalManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestGenericJournalConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNNStorageRetentionManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestBootstrapStandbyWithQJM.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestFailureToReadEdits.java Allow starting up from an fsimage checkpoint in the middle of a segment --- Key: HDFS-5074 URL:
[jira] [Commented] (HDFS-5425) Renaming underconstruction file with snapshots can make NN failure on restart
[ https://issues.apache.org/jira/browse/HDFS-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845447#comment-13845447 ] Hudson commented on HDFS-5425: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Renaming underconstruction file with snapshots can make NN failure on restart - Key: HDFS-5425 URL: https://issues.apache.org/jira/browse/HDFS-5425 Project: Hadoop HDFS Issue Type: Bug Components: namenode, snapshots Affects Versions: 2.2.0 Reporter: sathish Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5425.001.patch, HDFS-5425.patch, HDFS-5425.patch, HDFS-5425.patch I faced this When i am doing some snapshot operations like createSnapshot,renameSnapshot,i restarted my NN,it is shutting down with exception, 2013-10-24 21:07:03,040 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:133) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.replace(INodeDirectoryWithSnapshot.java:82) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$ChildrenDiff.access$700(INodeDirectoryWithSnapshot.java:62) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.replaceChild(INodeDirectoryWithSnapshot.java:397) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot$DirectoryDiffList.access$900(INodeDirectoryWithSnapshot.java:376) at org.apache.hadoop.hdfs.server.namenode.snapshot.INodeDirectoryWithSnapshot.replaceChild(INodeDirectoryWithSnapshot.java:598) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedReplaceINodeFile(FSDirectory.java:1548) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.replaceINodeFile(FSDirectory.java:1537) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.loadFilesUnderConstruction(FSImageFormat.java:855) at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:350) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:910) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:899) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:751) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:720) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:266) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:784) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:563) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:422) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:472) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:670) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:655) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1245) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1311) 2013-10-24 21:07:03,050 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2013-10-24 21:07:03,052 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5428) under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode
[ https://issues.apache.org/jira/browse/HDFS-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845443#comment-13845443 ] Hudson commented on HDFS-5428: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt under construction files deletion after snapshot+checkpoint+nn restart leads nn safemode Key: HDFS-5428 URL: https://issues.apache.org/jira/browse/HDFS-5428 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 2.2.0 Reporter: Vinay Assignee: Jing Zhao Fix For: 2.3.0 Attachments: HDFS-5428-v2.patch, HDFS-5428.000.patch, HDFS-5428.001.patch, HDFS-5428.002.patch, HDFS-5428.003.patch, HDFS-5428.004.patch, HDFS-5428.patch 1. allow snapshots under dir /foo 2. create a file /foo/test/bar and start writing to it 3. create a snapshot s1 under /foo after block is allocated and some data has been written to it 4. Delete the directory /foo/test 5. wait till checkpoint or do saveNameSpace 6. restart NN. NN enters to safemode. Analysis: Snapshot nodes loaded from fsimage are always complete and all blocks will be in COMPLETE state. So when the Datanode reports RBW blocks those will not be updated in blocksmap. Some of the FINALIZED blocks will be marked as corrupt due to length mismatch. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5443) Delete 0-sized block when deleting an under-construction file that is included in snapshot
[ https://issues.apache.org/jira/browse/HDFS-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845448#comment-13845448 ] Hudson commented on HDFS-5443: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) Move HDFS-5257,HDFS-5427,HDFS-5443,HDFS-5476,HDFS-5425,HDFS-5474,HDFS-5504,HDFS-5428 into branch-2.3 section. (jing9: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550011) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Delete 0-sized block when deleting an under-construction file that is included in snapshot -- Key: HDFS-5443 URL: https://issues.apache.org/jira/browse/HDFS-5443 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Affects Versions: 3.0.0, 2.2.0 Reporter: Uma Maheswara Rao G Assignee: Jing Zhao Fix For: 2.3.0 Attachments: 5443-test.patch, HDFS-5443.000.patch Namenode can stuck in safemode on restart if it crashes just after addblock logsync and after taking snapshot for such file. This issue is reported by Prakash and Sathish. On looking into the issue following things are happening. . 1) Client added block at NN and just did logsync So, NN has block ID persisted. 2)Before returning addblock response to client take a snapshot for root or parent directories for that file 3) Delete parent directory for that file 4) Now crash the NN with out responding success to client for that addBlock call Now on restart of the Namenode, it will stuck in safemode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5580) Infinite loop in Balancer.waitForMoveCompletion
[ https://issues.apache.org/jira/browse/HDFS-5580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845442#comment-13845442 ] Hudson commented on HDFS-5580: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1635 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1635/]) HDFS-5580. Fix infinite loop in Balancer.waitForMoveCompletion. (Binglin Chang via junping_du) (junping_du: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550074) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java Infinite loop in Balancer.waitForMoveCompletion --- Key: HDFS-5580 URL: https://issues.apache.org/jira/browse/HDFS-5580 Project: Hadoop HDFS Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Fix For: 2.4.0 Attachments: HDFS-5580.v1.patch, HDFS-5580.v2.patch, HDFS-5580.v3.patch, TestBalancerWithNodeGroupTimeout.log In recent [build|https://builds.apache.org/job/PreCommit-HDFS-Build/5592//testReport/org.apache.hadoop.hdfs.server.balancer/TestBalancerWithNodeGroup/testBalancerWithNodeGroup/] in HDFS-5574, TestBalancerWithNodeGroup timeout, this is also mentioned in HDFS-4376 [here|https://issues.apache.org/jira/browse/HDFS-4376?focusedCommentId=13799402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13799402]. Looks like the bug is introduced by HDFS-3495. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-4331) checkpoint between NN and SNN (secure cluster) does not happen once NN TGT expires
[ https://issues.apache.org/jira/browse/HDFS-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benoy Antony resolved HDFS-4331. Resolution: Duplicate Release Note: Code already in place checkpoint between NN and SNN (secure cluster) does not happen once NN TGT expires --- Key: HDFS-4331 URL: https://issues.apache.org/jira/browse/HDFS-4331 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 1.1.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-4331.patch, nn-checkpoint-failed.log NameNode fails to download the new FSIMage from SNN. The error indicates that the NameNode TGT has expired. It seems that NN doesn't renew the ticket after the ticket expires (10 hours validity). NN - Checkpoint error is attached -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845529#comment-13845529 ] Kihwal Lee commented on HDFS-5496: -- It will be nice if the web UI says something if the replication queues are being initialized. Showing its progress will be a plus. Make replication queue initialization asynchronous -- Key: HDFS-5496 URL: https://issues.apache.org/jira/browse/HDFS-5496 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Kihwal Lee Attachments: HDFS-5496.patch, HDFS-5496.patch Today, initialization of replication queues blocks safe mode exit and certain HA state transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem write lock held. During this time, important requests (e.g. initial block reports, heartbeat, etc) are blocked. The effect of delaying the initialization would be not starting replication right away, but I think the benefit outweighs. If we make it asynchronous, the work per iteration should be limited, so that the lock duration is capped. If full/incremental block reports and any other requests that modifies block state properly performs replication checks while the blocks are scanned and the queues populated in background, every block will be processed. (Some may be done twice) The replication monitor should run even before all blocks are processed. This will allow namenode to exit safe mode and start serving immediately even with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-5496: - Assignee: Vinay Make replication queue initialization asynchronous -- Key: HDFS-5496 URL: https://issues.apache.org/jira/browse/HDFS-5496 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Kihwal Lee Assignee: Vinay Attachments: HDFS-5496.patch, HDFS-5496.patch Today, initialization of replication queues blocks safe mode exit and certain HA state transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem write lock held. During this time, important requests (e.g. initial block reports, heartbeat, etc) are blocked. The effect of delaying the initialization would be not starting replication right away, but I think the benefit outweighs. If we make it asynchronous, the work per iteration should be limited, so that the lock duration is capped. If full/incremental block reports and any other requests that modifies block state properly performs replication checks while the blocks are scanned and the queues populated in background, every block will be processed. (Some may be done twice) The replication monitor should run even before all blocks are processed. This will allow namenode to exit safe mode and start serving immediately even with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HDFS-5654) Add lock context support to FSNamesystemLock
Daryn Sharp created HDFS-5654: - Summary: Add lock context support to FSNamesystemLock Key: HDFS-5654 URL: https://issues.apache.org/jira/browse/HDFS-5654 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Supporting new methods of locking the namesystem, ie. coarse or fine-grain, needs an api to manage the locks (or any object conforming to Lock interface) held during access to the namespace. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HDFS-5655) Update FSNamesystem path operations to use a lock context
Daryn Sharp created HDFS-5655: - Summary: Update FSNamesystem path operations to use a lock context Key: HDFS-5655 URL: https://issues.apache.org/jira/browse/HDFS-5655 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Most path based methods should use the {{FSNamesystem.LockContext}} introduced by HDFS-5654. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5654) Add lock context support to FSNamesystemLock
[ https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-5654: -- Attachment: HDFS-5654.patch Adds an interface for a lock context to {{FSNameSystemLock}}, and provides a trivial implementation of a coarse locking context which just uses the {{FSNamesystemLock}} itself. I'll update some path-based {{FSNamesystem}} methods on a followup jira. Add lock context support to FSNamesystemLock Key: HDFS-5654 URL: https://issues.apache.org/jira/browse/HDFS-5654 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-5654.patch Supporting new methods of locking the namesystem, ie. coarse or fine-grain, needs an api to manage the locks (or any object conforming to Lock interface) held during access to the namespace. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5654) Add lock context support to FSNamesystemLock
[ https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp updated HDFS-5654: -- Status: Patch Available (was: Open) Add lock context support to FSNamesystemLock Key: HDFS-5654 URL: https://issues.apache.org/jira/browse/HDFS-5654 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-5654.patch Supporting new methods of locking the namesystem, ie. coarse or fine-grain, needs an api to manage the locks (or any object conforming to Lock interface) held during access to the namespace. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845567#comment-13845567 ] Jonathan Eagles commented on HDFS-5023: --- +1. Jing. If I don't hear anything on this issue today, I'll check this in tomorrow. TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2 --- Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: test Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
[ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845580#comment-13845580 ] Kihwal Lee commented on HDFS-5496: -- The following change would have been fine if leaving safe mode and initializing replication queues were synchronized. It appears {{checkMode()}} can start a background initialization before leaving the safe mode. Since the queues are unconditionally cleared right before the following, an on-going initialization should be stopped and redone. {code} -if (!isInSafeMode() || -(isInSafeMode() safeMode.isPopulatingReplQueues())) { +// We only need to reprocess the queue in HA mode and not in safemode +if (!isInSafeMode() haEnabled) { {code} There have been discussions regarding removing safe mode extension and perhaps safe mode monitor. That will make the check/logic simpler. Make replication queue initialization asynchronous -- Key: HDFS-5496 URL: https://issues.apache.org/jira/browse/HDFS-5496 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Kihwal Lee Assignee: Vinay Attachments: HDFS-5496.patch, HDFS-5496.patch Today, initialization of replication queues blocks safe mode exit and certain HA state transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem write lock held. During this time, important requests (e.g. initial block reports, heartbeat, etc) are blocked. The effect of delaying the initialization would be not starting replication right away, but I think the benefit outweighs. If we make it asynchronous, the work per iteration should be limited, so that the lock duration is capped. If full/incremental block reports and any other requests that modifies block state properly performs replication checks while the blocks are scanned and the queues populated in background, every block will be processed. (Some may be done twice) The replication monitor should run even before all blocks are processed. This will allow namenode to exit safe mode and start serving immediately even with a big name space. It will also reduce the HA failover latency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845591#comment-13845591 ] Jing Zhao commented on HDFS-5023: - +1. Thanks Mit! TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2 --- Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: test Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric
[ https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-5350: -- Priority: Minor (was: Major) Name Node should report fsimage transfer time as a metric - Key: HDFS-5350 URL: https://issues.apache.org/jira/browse/HDFS-5350 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rob Weltman Assignee: Jimmy Xiang Priority: Minor If the (Secondary) Name Node reported fsimage transfer times (perhaps the last ten of them), monitoring tools could detect slowdowns that might jeopardize cluster stability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5647) Merge INodeDirectory.Feature and INodeFile.Feature
[ https://issues.apache.org/jira/browse/HDFS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5647: - Attachment: HDFS-5647.003.patch Merge INodeDirectory.Feature and INodeFile.Feature -- Key: HDFS-5647 URL: https://issues.apache.org/jira/browse/HDFS-5647 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5647.000.patch, HDFS-5647.001.patch, HDFS-5647.002.patch, HDFS-5647.003.patch HDFS-4685 implements ACLs for HDFS, which can benefit from the INode features introduced in HDFS-5284. The current code separates the INode feature of INodeFile and INodeDirectory into two different class hierarchies. This hinders the implementation of ACL since ACL is a concept that applies to both INodeFile and INodeDirectory. This jira proposes to merge the two class hierarchies (i.e., INodeDirectory.Feature and INodeFile.Feature) to simplify the implementation of ACLs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API
[ https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5650: - Attachment: HDFS-5650.005.patch Thanks for the comments! Uploading the v5 patch to address the comments from Vinay and Chris. Remove AclReadFlag and AclWriteFlag in FileSystem API - Key: HDFS-5650 URL: https://issues.apache.org/jira/browse/HDFS-5650 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, HDFS-5650.005.patch AclReadFlag and AclWriteFlag intended to capture various options used in getfacl and setfacl. These options determine whether the tool should traverse the filesystem recursively, follow the symlink, etc., but they are not part of the core ACLs abstractions. The client program has more information and more flexibility to implement these options. This jira proposes to remove these flags to simplify the APIs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5654) Add lock context support to FSNamesystemLock
[ https://issues.apache.org/jira/browse/HDFS-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845671#comment-13845671 ] Hadoop QA commented on HDFS-5654: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618264/HDFS-5654.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5694//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5694//console This message is automatically generated. Add lock context support to FSNamesystemLock Key: HDFS-5654 URL: https://issues.apache.org/jira/browse/HDFS-5654 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-5654.patch Supporting new methods of locking the namesystem, ie. coarse or fine-grain, needs an api to manage the locks (or any object conforming to Lock interface) held during access to the namespace. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5242) Reduce contention on DatanodeInfo instances
[ https://issues.apache.org/jira/browse/HDFS-5242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845675#comment-13845675 ] Kihwal Lee commented on HDFS-5242: -- +1 for the patch. However, the contention might have seen unusually high if only a small number of data nodes were involved. Reduce contention on DatanodeInfo instances --- Key: HDFS-5242 URL: https://issues.apache.org/jira/browse/HDFS-5242 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: HDFS-5242.patch Synchronization in {{DatanodeInfo}} instances causes unnecessary contention between call handlers. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845687#comment-13845687 ] Colin Patrick McCabe commented on HDFS-5634: bq. Do we mean to setCachingStrategy in DFSInputStream#getBlockReader? Also, I get that there are a zillion parameters for the BRL constructor, but builders are for when there are optional arguments. Here, it looks like we want to set all of them. Actually, in the tests, we often don't set a lot of the arguments. For example, the unit tests don't use the FISCache, may not set readahead, etc. Also, I think there's value in naming the arguments, since otherwise updating the callsites gets very, very difficult. bq. We have both verifyChecksum and skipChecksum right now. Let's get rid of one, seems error-prone to be flipping booleans. OK. I updated to {{BlockReaderFactory#newShortCircuitBlockReader}} to use {{skipChecksums}} as well. A little note on the history here: prior to the introduction of mlock, it was more straightforward to have a simple positive boolean verifyChecksum than to have the skip boolean. But now that we have mlock, verifyChecksum = true might be a lie, since mlock might mean we don't verify. bq. The skipChecksum || mlocked.get() idiom is used in a few places, maybe should be a shouldSkipChecksum() method? OK. bq. IIUC, fillDataBuf fills the bounce buffer, and drainBounceBuffer empties it. Rename fillDataBuf to fillBounceBuffer for parity? I renamed {{drainBounceBuffer}} to {{drainDataBuf}} for symmetry. bq. I'm wondering what happens in the bounce buffer read paths when readahead is turned off. It looks like they use readahead to determine how much to read, regardless of the bytes needed, so what if it's zero? We always buffer at least a single chunk, even if readahead is turned off. The mechanics of checksumming require this. bq. For the slow lane, fillDataBuf doesn't actually fill the returned buf, so when we hit the EOF and break, it looks like we make the user read again to flush out the bounce buffer. Can we save this? Yeah, the current code could result in us doing an extra {{pread}} even after we know we're at EOF. Let me see if I can avoid that. bq. fillDataBuf doesn't fill just the data buf, it also fills the checksum buf and verifies checksums via fillBuffer. Would be nice to javadoc this. OK bq. I noticed there are two readahead config options too, dfs.client.cache.readahead and dfs.datanode.readahead.bytes. Seems like we should try to emulate the same behavior as remote reads which (according to hdfs-default.xml) use the DN setting, and override with the client setting. Right now it's just using the DN readahead in BRL, so the test that sets client readahead to 0 isn't doing much. Right now, the readahead is coming out of {{DFSClient#cachingStrategy}}, so it will be coming from {{dfs.client.cache.readahead}}, unless someone has overridden it for that {{DFSInputStream}} object. The problem with defaulting to the DN setting, is that we don't know what that is (we're on the client, not the DN). bq. I don't quite understand why we check needed maxReadahead... for the fast lane. Once we're checksum aligned via draining the bounce buffer, can't we just stay in the fast lane? Seems like the slow vs. fast lane determination should be based on read alignment, not bytes left. The issue is that we want to honor the readahead setting. We would not be doing this if we did a shorter read directly into the provided buffer. bq. It's a little weird to me that the readahead chunks is min'd with the buffer size (default 1MB). I get why (memory consumption), but this linkage should be documented somewhere. I added a comment. bq. DirectBufferPool, would it be better to use a Deque's stack operations rather than a Queue? Might give better cache locality to do LIFO rather than FIFO. Interesting point. I will try that and see what numbers I get. bq.TestEnhancedByteBufferAccess has an import only change OK. I will avoid doing that to make merging easier. bq. Kinda unrelated, but should the dfs.client.read.shortcircuit.* keys be in hdfs-default.xml? I also noticed that dfs.client.cache.readahead says this setting causes the datanode to... so the readahead documentation might need to be updated too. I'll update it with the information about short-circuit allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch
[jira] [Created] (HDFS-5656) add some configuration keys to hdfs-default.xml
Colin Patrick McCabe created HDFS-5656: -- Summary: add some configuration keys to hdfs-default.xml Key: HDFS-5656 URL: https://issues.apache.org/jira/browse/HDFS-5656 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Priority: Minor Some configuration keys like {{dfs.client.read.shortcircuit}} are not present in {{hdfs-default.xml}} as they should be. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845689#comment-13845689 ] Colin Patrick McCabe commented on HDFS-5634: update: there are a bunch of things in DFSConfigKeys not in hdfs-default.xml. I created HDFS-5656 for this, since it's a change we'd want to do and quickly merge to branch-2.3, etc, and also because it can be decoupled from this JIRA. allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API
[ https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5650. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) Hadoop Flags: Reviewed +1 for the patch. I committed this to the HDFS-4685 branch. Thank you to Haohui for incorporating this valuable feedback on the API. Thank you to Vinay for code reviews. Remove AclReadFlag and AclWriteFlag in FileSystem API - Key: HDFS-5650 URL: https://issues.apache.org/jira/browse/HDFS-5650 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, HDFS-5650.005.patch AclReadFlag and AclWriteFlag intended to capture various options used in getfacl and setfacl. These options determine whether the tool should traverse the filesystem recursively, follow the symlink, etc., but they are not part of the core ACLs abstractions. The client program has more information and more flexibility to implement these options. This jira proposes to remove these flags to simplify the APIs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API
[ https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5650: Target Version/s: HDFS ACLs (HDFS-4685) Affects Version/s: HDFS ACLs (HDFS-4685) Remove AclReadFlag and AclWriteFlag in FileSystem API - Key: HDFS-5650 URL: https://issues.apache.org/jira/browse/HDFS-5650 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, HDFS-5650.005.patch AclReadFlag and AclWriteFlag intended to capture various options used in getfacl and setfacl. These options determine whether the tool should traverse the filesystem recursively, follow the symlink, etc., but they are not part of the core ACLs abstractions. The client program has more information and more flexibility to implement these options. This jira proposes to remove these flags to simplify the APIs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5650) Remove AclReadFlag and AclWriteFlag in FileSystem API
[ https://issues.apache.org/jira/browse/HDFS-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845718#comment-13845718 ] Chris Nauroth commented on HDFS-5650: - I have one more note on part of the change here. We remove the path member from {{AclStatus}}. The only reason for the path member was to support recursive getfacl. If the recursion had been done server-side, then the result set would have needed to specify the file for each returned ACL. Now that recursion will be driven from the client side, we don't need this member anymore. FsShell will always know which path it was working on, so it can still print each file during a recursive getfacl. Remove AclReadFlag and AclWriteFlag in FileSystem API - Key: HDFS-5650 URL: https://issues.apache.org/jira/browse/HDFS-5650 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5650.000.patch, HDFS-5650.001.patch, HDFS-5650.002.patch, HDFS-5650.003.patch, HDFS-5650.004.patch, HDFS-5650.005.patch AclReadFlag and AclWriteFlag intended to capture various options used in getfacl and setfacl. These options determine whether the tool should traverse the filesystem recursively, follow the symlink, etc., but they are not part of the core ACLs abstractions. The client program has more information and more flexibility to implement these options. This jira proposes to remove these flags to simplify the APIs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5477) Block manager as a service
[ https://issues.apache.org/jira/browse/HDFS-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nathan Roberts updated HDFS-5477: - Attachment: Proposal.pdf Fix formatting problems in PDF. Block manager as a service -- Key: HDFS-5477 URL: https://issues.apache.org/jira/browse/HDFS-5477 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Attachments: Proposal.pdf, Proposal.pdf, Standalone BM.pdf The block manager needs to evolve towards having the ability to run as a standalone service to improve NN vertical and horizontal scalability. The goal is reducing the memory footprint of the NN proper to support larger namespaces, and improve overall performance by decoupling the block manager from the namespace and its lock. Ideally, a distinct BM will be transparent to clients and DNs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-5023: -- Affects Version/s: 3.0.0 TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2 --- Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 3.0.0, 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: java7, test Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-5023: -- Summary: TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 (was: TestSnapshotPathINodes.testAllowSnapshot is failing in branch-2) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 - Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 3.0.0, 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: java7, test Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-5023: -- Labels: java7 test (was: test) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 - Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 3.0.0, 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: java7, test Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated HDFS-5023: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 - Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 3.0.0, 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: java7, test Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5023) TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7
[ https://issues.apache.org/jira/browse/HDFS-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845738#comment-13845738 ] Hudson commented on HDFS-5023: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4868 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4868/]) HDFS-5023. TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 (Mit Desai via jeagles) (jeagles: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550261) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSnapshotPathINodes.java TestSnapshotPathINodes.testAllowSnapshot is failing with jdk7 - Key: HDFS-5023 URL: https://issues.apache.org/jira/browse/HDFS-5023 Project: Hadoop HDFS Issue Type: Bug Components: snapshots, test Affects Versions: 3.0.0, 2.4.0 Reporter: Ravi Prakash Assignee: Mit Desai Labels: java7, test Fix For: 3.0.0, 2.4.0 Attachments: HDFS-5023.patch, HDFS-5023.patch, HDFS-5023.patch, TEST-org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes.xml, org.apache.hadoop.hdfs.server.namenode.TestSnapshotPathINodes-output.txt The assertion on line 91 is failing. I am using Fedora 19 + JDK7. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.
[ https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5607. - Resolution: Fixed I'm resolving this as won't fix. This is no longer relevant after the API design changes in HDFS-5650. libHDFS: add support for recursive flag in ACL functions. - Key: HDFS-5607 URL: https://issues.apache.org/jira/browse/HDFS-5607 Project: Hadoop HDFS Issue Type: Sub-task Components: libhdfs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Implement and test handling of recursive flag for all ACL functions in libHDFS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5599) DistributedFileSystem: add support for recursive flag in ACL methods.
[ https://issues.apache.org/jira/browse/HDFS-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5599. - Resolution: Won't Fix I'm resolving this as won't fix. This is no longer relevant after the API design changes in HDFS-5650. DistributedFileSystem: add support for recursive flag in ACL methods. - Key: HDFS-5599 URL: https://issues.apache.org/jira/browse/HDFS-5599 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Implement and test handling of recursive flag for all ACL methods in {{DistributedFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Reopened] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.
[ https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reopened HDFS-5607: - libHDFS: add support for recursive flag in ACL functions. - Key: HDFS-5607 URL: https://issues.apache.org/jira/browse/HDFS-5607 Project: Hadoop HDFS Issue Type: Sub-task Components: libhdfs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Implement and test handling of recursive flag for all ACL functions in libHDFS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5611) WebHDFS: add support for recursive flag in ACL operations.
[ https://issues.apache.org/jira/browse/HDFS-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5611. - Resolution: Won't Fix I'm resolving this as won't fix. This is no longer relevant after the API design changes in HDFS-5650. WebHDFS: add support for recursive flag in ACL operations. -- Key: HDFS-5611 URL: https://issues.apache.org/jira/browse/HDFS-5611 Project: Hadoop HDFS Issue Type: Sub-task Components: webhdfs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Renil J Implement and test handling of recursive flag for all ACL operations in WebHDFS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5607) libHDFS: add support for recursive flag in ACL functions.
[ https://issues.apache.org/jira/browse/HDFS-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5607. - Resolution: Won't Fix libHDFS: add support for recursive flag in ACL functions. - Key: HDFS-5607 URL: https://issues.apache.org/jira/browse/HDFS-5607 Project: Hadoop HDFS Issue Type: Sub-task Components: libhdfs Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Implement and test handling of recursive flag for all ACL functions in libHDFS. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-4201) NPE in BPServiceActor#sendHeartBeat
[ https://issues.apache.org/jira/browse/HDFS-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-4201: --- Resolution: Fixed Fix Version/s: (was: 3.0.0) 2.3.0 Target Version/s: 2.3.0 Status: Resolved (was: Patch Available) NPE in BPServiceActor#sendHeartBeat --- Key: HDFS-4201 URL: https://issues.apache.org/jira/browse/HDFS-4201 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Eli Collins Assignee: Jimmy Xiang Priority: Critical Fix For: 2.3.0 Attachments: trunk-4201.patch, trunk-4201_v2.patch, trunk-4201_v3.patch Saw the following NPE in a log. Think this is likely due to {{dn}} or {{dn.getFSDataset()}} being null, (not {{bpRegistration}}) due to a configuration or local directory failure. {code} 2012-09-25 04:33:20,782 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode svsrs00127/11.164.162.226:8020 using DELETEREPORT_INTERVAL of 30 msec BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; heartBeatInterval=3000 2012-09-25 04:33:20,782 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-1678908700-11.164.162.226-1342785481826 (storage id DS-1031100678-11.164.162.251-5010-1341933415989) service to svsrs00127/11.164.162.226:8020 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:434) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:520) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673) at java.lang.Thread.run(Thread.java:722) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5647) Merge INodeDirectory.Feature and INodeFile.Feature
[ https://issues.apache.org/jira/browse/HDFS-5647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845746#comment-13845746 ] Hadoop QA commented on HDFS-5647: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618277/HDFS-5647.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5695//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5695//console This message is automatically generated. Merge INodeDirectory.Feature and INodeFile.Feature -- Key: HDFS-5647 URL: https://issues.apache.org/jira/browse/HDFS-5647 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5647.000.patch, HDFS-5647.001.patch, HDFS-5647.002.patch, HDFS-5647.003.patch HDFS-4685 implements ACLs for HDFS, which can benefit from the INode features introduced in HDFS-5284. The current code separates the INode feature of INodeFile and INodeDirectory into two different class hierarchies. This hinders the implementation of ACL since ACL is a concept that applies to both INodeFile and INodeDirectory. This jira proposes to merge the two class hierarchies (i.e., INodeDirectory.Feature and INodeFile.Feature) to simplify the implementation of ACLs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5596) Implement RPC stubs
[ https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5596: - Summary: Implement RPC stubs (was: DistributedFileSystem: implement getAcls and setAcl.) Implement RPC stubs --- Key: HDFS-5596 URL: https://issues.apache.org/jira/browse/HDFS-5596 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Implement and test {{getAcls}} and {{setAcl}} in {{DistributedFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-4201) NPE in BPServiceActor#sendHeartBeat
[ https://issues.apache.org/jira/browse/HDFS-4201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845747#comment-13845747 ] Hudson commented on HDFS-4201: -- SUCCESS: Integrated in Hadoop-trunk-Commit #4869 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4869/]) HDFS-4201. NPE in BPServiceActor#sendHeartBeat (jxiang via cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1550269) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPOfferService.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBPOfferService.java NPE in BPServiceActor#sendHeartBeat --- Key: HDFS-4201 URL: https://issues.apache.org/jira/browse/HDFS-4201 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Eli Collins Assignee: Jimmy Xiang Priority: Critical Fix For: 2.3.0 Attachments: trunk-4201.patch, trunk-4201_v2.patch, trunk-4201_v3.patch Saw the following NPE in a log. Think this is likely due to {{dn}} or {{dn.getFSDataset()}} being null, (not {{bpRegistration}}) due to a configuration or local directory failure. {code} 2012-09-25 04:33:20,782 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenode svsrs00127/11.164.162.226:8020 using DELETEREPORT_INTERVAL of 30 msec BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; heartBeatInterval=3000 2012-09-25 04:33:20,782 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-1678908700-11.164.162.226-1342785481826 (storage id DS-1031100678-11.164.162.251-5010-1341933415989) service to svsrs00127/11.164.162.226:8020 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:434) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:520) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:673) at java.lang.Thread.run(Thread.java:722) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5596) Implement RPC stubs
[ https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5596: - Description: Implement RPC stubs for both {{DistributedFileSystem}} and {{NameNodeRpcServer}}. (was: Implement and test {{getAcls}} and {{setAcl}} in {{DistributedFileSystem}}.) Implement RPC stubs --- Key: HDFS-5596 URL: https://issues.apache.org/jira/browse/HDFS-5596 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Implement RPC stubs for both {{DistributedFileSystem}} and {{NameNodeRpcServer}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845751#comment-13845751 ] Colin Patrick McCabe commented on HDFS-5634: bq. DirectBufferPool, would it be better to use a Deque's stack operations rather than a Queue? Might give better cache locality to do LIFO rather than FIFO. I examined this code more carefully, and I found that it was actually using LIFO at the moment. The reason is because it uses {{ConcurrentLinkedQueue#add}} to add the elements, which add them to the end. It then uses {{ConcurrentLinkedQueue#poll}} to get the elements, which gets them from the beginning. allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5634: --- Attachment: HDFS-5634.003.patch allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, HDFS-5634.003.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5596) Implement RPC stubs
[ https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5596: - Attachment: HDFS-5596.000.patch Implement RPC stubs --- Key: HDFS-5596 URL: https://issues.apache.org/jira/browse/HDFS-5596 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Attachments: HDFS-5596.000.patch Implement RPC stubs for both {{DistributedFileSystem}} and {{NameNodeRpcServer}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5597) DistributedFileSystem: implement modifyAclEntries, removeAclEntries and removeAcl.
[ https://issues.apache.org/jira/browse/HDFS-5597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-5597. -- Resolution: Duplicate Assignee: Haohui Mai This jira is implemented within the scope of HDFS-5596. Marking it as a duplicate. DistributedFileSystem: implement modifyAclEntries, removeAclEntries and removeAcl. -- Key: HDFS-5597 URL: https://issues.apache.org/jira/browse/HDFS-5597 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Implement and test {{modifyAclEntries}}, {{removeAclEntries}} and {{removeAcl}} in {{DistributedFileSystem}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5431) support cachepool-based limit management in path-based caching
[ https://issues.apache.org/jira/browse/HDFS-5431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845756#comment-13845756 ] Andrew Wang commented on HDFS-5431: --- The checkLimit flag makes sense to me, except I'd prefer force, or if you'd like it flipped, enforce or strict. This is pretty easy. I agree on synchronously waiting on the CRM in the listed scenarios, and a CV would be a good way of doing this. It's a bit complicated though, since I don't think we can get a FSN CV, especially with the new lock context in HDFS-5453 coming down the pipe I think kicking the CRM, releasing the FSN lock, waiting on the CRM CV, then regetting the FSN lock should be okay, but it might be simpler to just call into CRM directly to do the rescan. I'll try the CV version, but if it looks too messy, we can go with a direct call. support cachepool-based limit management in path-based caching -- Key: HDFS-5431 URL: https://issues.apache.org/jira/browse/HDFS-5431 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Andrew Wang Attachments: hdfs-5431-1.patch, hdfs-5431-2.patch We should support cachepool-based quota management in path-based caching. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HDFS-5598) DistributedFileSystem: implement removeDefaultAcl.
[ https://issues.apache.org/jira/browse/HDFS-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-5598. -- Resolution: Duplicate Assignee: Haohui Mai This jira is implemented within the scope of HDFS-5596. Marking it as a duplicate. DistributedFileSystem: implement removeDefaultAcl. -- Key: HDFS-5598 URL: https://issues.apache.org/jira/browse/HDFS-5598 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Implement and test {{removeDefaultAcl}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HDFS-5657) race condition causes writeback state error in NFS gateway
Brandon Li created HDFS-5657: Summary: race condition causes writeback state error in NFS gateway Key: HDFS-5657 URL: https://issues.apache.org/jira/browse/HDFS-5657 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Brandon Li A race condition between NFS gateway writeback executor thread and new write handler thread can cause writeback state check failure, e.g., {noformat} 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending writes, fileId: 30938 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService (AsyncDataService.java:run(136)) - Asyn data service got error:java.lang.IllegalStateException: The openFileCtx has false async status at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890) at org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current filesize=917504 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 917504 length:65536 stableHow:0 {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HDFS-5658) Implement ACL as a INode feature
Haohui Mai created HDFS-5658: Summary: Implement ACL as a INode feature Key: HDFS-5658 URL: https://issues.apache.org/jira/browse/HDFS-5658 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai HDFS-5284 introduces features as generic abstractions to extend the functionality of the inodes. The implementation of ACL should leverage the new abstractions. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5657) race condition causes writeback state error in NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845769#comment-13845769 ] Brandon Li commented on HDFS-5657: -- Here is how the race happens: {noformat} /** Invoked by AsynDataService to write back to HDFS */ void executeWriteBack() { Preconditions.checkState(asyncStatus, The openFileCtx has false async status); == check failed here try { while (activeState) { WriteCtx toWrite = offerNextToWrite(); if (toWrite != null) { // Do the write doSingleWrite(toWrite); === a synchronized method, which sets asyncStatus to false updateLastAccessTime(); } else { break; } } if (!activeState LOG.isDebugEnabled()) { LOG.debug(The openFileCtx is not active anymore, fileId: + latestAttr.getFileId()); } } finally { // make sure we reset asyncStatus to false asyncStatus = false; == before this line is executed, OpenFileCtx.checkAndStartWrite sets asyncStatus to true and invokes a task. When that task calls executeWriteBack() again the condition check failed. } } {noformat} race condition causes writeback state error in NFS gateway -- Key: HDFS-5657 URL: https://issues.apache.org/jira/browse/HDFS-5657 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Brandon Li A race condition between NFS gateway writeback executor thread and new write handler thread can cause writeback state check failure, e.g., {noformat} 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending writes, fileId: 30938 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService (AsyncDataService.java:run(136)) - Asyn data service got error:java.lang.IllegalStateException: The openFileCtx has false async status at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890) at org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current filesize=917504 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 917504 length:65536 stableHow:0 {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric
[ https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-5350: -- Attachment: trunk-5350.patch Attached a patch that added metrics for fsimage downloaded/uploaded. Name Node should report fsimage transfer time as a metric - Key: HDFS-5350 URL: https://issues.apache.org/jira/browse/HDFS-5350 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rob Weltman Assignee: Jimmy Xiang Priority: Minor Fix For: 3.0.0 Attachments: trunk-5350.patch If the (Secondary) Name Node reported fsimage transfer times (perhaps the last ten of them), monitoring tools could detect slowdowns that might jeopardize cluster stability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5350) Name Node should report fsimage transfer time as a metric
[ https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HDFS-5350: -- Fix Version/s: 3.0.0 Status: Patch Available (was: Open) Name Node should report fsimage transfer time as a metric - Key: HDFS-5350 URL: https://issues.apache.org/jira/browse/HDFS-5350 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rob Weltman Assignee: Jimmy Xiang Priority: Minor Fix For: 3.0.0 Attachments: trunk-5350.patch If the (Secondary) Name Node reported fsimage transfer times (perhaps the last ten of them), monitoring tools could detect slowdowns that might jeopardize cluster stability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5350) Name Node should report fsimage transfer time as a metric
[ https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845805#comment-13845805 ] Jimmy Xiang commented on HDFS-5350: --- I tested the patch on my cluster. Here is the new metrics from the jmx page: {noformat} GetImageNumOps : 56, GetImageAvgTime : 3.75, PutImageNumOps : 51, PutImageAvgTime : 80.0 {noformat} Name Node should report fsimage transfer time as a metric - Key: HDFS-5350 URL: https://issues.apache.org/jira/browse/HDFS-5350 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rob Weltman Assignee: Jimmy Xiang Priority: Minor Fix For: 3.0.0 Attachments: trunk-5350.patch If the (Secondary) Name Node reported fsimage transfer times (perhaps the last ten of them), monitoring tools could detect slowdowns that might jeopardize cluster stability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5634: --- Attachment: HDFS-5634.004.patch I optimized the CPU consumption a bit by caching the checksum size and bytes-per in final ints, and avoiding the need to re-do some multiplications a few times on every read. perf stat now gives me 305,384,306,460 cycles for TestParallelShortCircuitRead, as opposed to 321,040,227,686 cycles before. allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, HDFS-5634.003.patch, HDFS-5634.004.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5648) Get rid of perVolumeReplicaMap
[ https://issues.apache.org/jira/browse/HDFS-5648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5648: Attachment: h5648.08.patch Updated patch fixes an unrelated bug exposed by the earlier patch. DatanodeStorage was not overriding {{Object.equals()}} and {{Object.hashCode()}}. Get rid of perVolumeReplicaMap -- Key: HDFS-5648 URL: https://issues.apache.org/jira/browse/HDFS-5648 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: Heterogeneous Storage (HDFS-2832) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: h5648.02.patch, h5648.08.patch The perVolumeReplicaMap in FsDatasetImpl.java is not necessary and can be removed. We continue to use the existing volumeMap. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-2832: Attachment: h2832_20131211.patch Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, 20131125-HeterogeneousStorage-TestPlan.pdf, 20131125-HeterogeneousStorage.pdf, 20131202-HeterogeneousStorage-TestPlan.pdf, 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, editsStored, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845879#comment-13845879 ] Hadoop QA commented on HDFS-5634: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618297/HDFS-5634.003.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5696//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5696//console This message is automatically generated. allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, HDFS-5634.003.patch, HDFS-5634.004.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5596) Implement RPC stubs
[ https://issues.apache.org/jira/browse/HDFS-5596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845915#comment-13845915 ] Chris Nauroth commented on HDFS-5596: - Nice work, Haohui. A few comments: # {{ClientProtocol}}: All RPCs can be annotated idempotent. The implementation will be such that repeated application of the same request will yield the same result. For example, a retried {{removeDefaultAcl}} call yields the same result whether the first call reaches the server, the second call reaches the server, or both. The end result is always the prior ACL entries with all default entries removed. # {{ReadonlyIterableAdaptor}}: (Optional) Do you think this is worth promoting to a top-level class in {{org.apache.hadoop.hdfs.util}}? It's not directly coupled to the rest of the serialization code, and perhaps it will be useful elsewhere. # {{DFSClient}}: There are a few more exception types that would be helpful to unwrap on the modification operations. I think the full list of interesting exceptions for all modification operations would be: {{AccessControlException}}, {{FileNotFoundException}}, {{SafeModeException}}, {{UnresolvedPathException}}, {{SnapshotAccessControlException}}, and {{NSQuotaExceededException}}. However, I'm also wondering if we ought to simplify the whole thing and call {{unwrapRemoteException}} with no args for all of these new methods. What do you think? # {{TestPBHelper}}: Let's add a test for conversion of {{AclStatus}} too. Implement RPC stubs --- Key: HDFS-5596 URL: https://issues.apache.org/jira/browse/HDFS-5596 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Attachments: HDFS-5596.000.patch Implement RPC stubs for both {{DistributedFileSystem}} and {{NameNodeRpcServer}}. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5350) Name Node should report fsimage transfer time as a metric
[ https://issues.apache.org/jira/browse/HDFS-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845961#comment-13845961 ] Hadoop QA commented on HDFS-5350: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618308/trunk-5350.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5697//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5697//console This message is automatically generated. Name Node should report fsimage transfer time as a metric - Key: HDFS-5350 URL: https://issues.apache.org/jira/browse/HDFS-5350 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Rob Weltman Assignee: Jimmy Xiang Priority: Minor Fix For: 3.0.0 Attachments: trunk-5350.patch If the (Secondary) Name Node reported fsimage transfer times (perhaps the last ten of them), monitoring tools could detect slowdowns that might jeopardize cluster stability. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-5634) allow BlockReaderLocal to switch between checksumming and not
[ https://issues.apache.org/jira/browse/HDFS-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845977#comment-13845977 ] Hadoop QA commented on HDFS-5634: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618311/HDFS-5634.004.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestBPOfferService {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5698//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5698//console This message is automatically generated. allow BlockReaderLocal to switch between checksumming and not - Key: HDFS-5634 URL: https://issues.apache.org/jira/browse/HDFS-5634 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 3.0.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5634.001.patch, HDFS-5634.002.patch, HDFS-5634.003.patch, HDFS-5634.004.patch BlockReaderLocal should be able to switch between checksumming and non-checksumming, so that when we get notifications that something is mlocked (see HDFS-5182), we can avoid checksumming when reading from that block. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS
[ https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845992#comment-13845992 ] Hadoop QA commented on HDFS-2832: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12618314/h2832_20131211.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 48 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated -12 warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup org.apache.hadoop.hdfs.server.balancer.TestBalancerWithEncryptedTransfer {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/5699//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5699//console This message is automatically generated. Enable support for heterogeneous storages in HDFS - Key: HDFS-2832 URL: https://issues.apache.org/jira/browse/HDFS-2832 Project: Hadoop HDFS Issue Type: New Feature Affects Versions: 0.24.0 Reporter: Suresh Srinivas Assignee: Suresh Srinivas Attachments: 20130813-HeterogeneousStorage.pdf, 20131125-HeterogeneousStorage-TestPlan.pdf, 20131125-HeterogeneousStorage.pdf, 20131202-HeterogeneousStorage-TestPlan.pdf, 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, editsStored, h2832_20131023.patch, h2832_20131023b.patch, h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch HDFS currently supports configuration where storages are a list of directories. Typically each of these directories correspond to a volume with its own file system. All these directories are homogeneous and therefore identified as a single storage at the namenode. I propose, change to the current model where Datanode * is a * storage, to Datanode * is a collection * of strorages. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5657: - Attachment: HDFS-5657.001.patch race condition causes writeback state error in NFS gateway -- Key: HDFS-5657 URL: https://issues.apache.org/jira/browse/HDFS-5657 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5657.001.patch A race condition between NFS gateway writeback executor thread and new write handler thread can cause writeback state check failure, e.g., {noformat} 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending writes, fileId: 30938 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService (AsyncDataService.java:run(136)) - Asyn data service got error:java.lang.IllegalStateException: The openFileCtx has false async status at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890) at org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current filesize=917504 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 917504 length:65536 stableHow:0 {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HDFS-5657) race condition causes writeback state error in NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5657: - Status: Patch Available (was: Open) race condition causes writeback state error in NFS gateway -- Key: HDFS-5657 URL: https://issues.apache.org/jira/browse/HDFS-5657 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-5657.001.patch A race condition between NFS gateway writeback executor thread and new write handler thread can cause writeback state check failure, e.g., {noformat} 2013-11-26 10:34:07,859 DEBUG nfs3.RpcProgramNfs3 (Nfs3Utils.java:writeChannel(113)) - WRITE_RPC_CALL_END__957880843 2013-11-26 10:34:07,863 DEBUG nfs3.OpenFileCtx (OpenFileCtx.java:offerNextToWrite(832)) - The asyn write task has no pending writes, fileId: 30938 2013-11-26 10:34:07,871 ERROR nfs3.AsyncDataService (AsyncDataService.java:run(136)) - Asyn data service got error:java.lang.IllegalStateException: The openFileCtx has false async status at com.google.common.base.Preconditions.checkState(Preconditions.java:145) at org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx.executeWriteBack(OpenFileCtx.java:890) at org.apache.hadoop.hdfs.nfs.nfs3.AsyncDataService$WriteBackTask.run(AsyncDataService.java:134) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 2013-11-26 10:34:07,901 DEBUG nfs3.RpcProgramNfs3 (RpcProgramNfs3.java:write(707)) - requesed offset=917504 and current filesize=917504 2013-11-26 10:34:07,902 DEBUG nfs3.WriteManager (WriteManager.java:handleWrite(131)) - handleWrite fileId: 30938 offset: 917504 length:65536 stableHow:0 {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HDFS-4273) Problem in DFSInputStream read retry logic may cause early failure
[ https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846002#comment-13846002 ] Liang Xie commented on HDFS-4273: - Oh, my stupid:) Problem in DFSInputStream read retry logic may cause early failure -- Key: HDFS-4273 URL: https://issues.apache.org/jira/browse/HDFS-4273 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.0.2-alpha Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, HDFS-4273.v4.patch, HDFS-4273.v5.patch, TestDFSInputStream.java Assume the following call logic {noformat} readWithStrategy() - blockSeekTo() - readBuffer() - reader.doRead() - seekToNewSource() add currentNode to deadnode, wish to get a different datanode - blockSeekTo() - chooseDataNode() - block missing, clear deadNodes and pick the currentNode again seekToNewSource() return false readBuffer() re-throw the exception quit loop readWithStrategy() got the exception, and may fail the read call before tried MaxBlockAcquireFailures. {noformat} some issues of the logic: 1. seekToNewSource() logic is broken because it may clear deadNodes in the middle. 2. the variable int retries=2 in readWithStrategy seems have conflict with MaxBlockAcquireFailures, should it be removed? -- This message was sent by Atlassian JIRA (v6.1.4#6159)