[jira] [Commented] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
[ https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299675#comment-14299675 ] Hadoop QA commented on HDFS-7720: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695700/HDFS-7720.0.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.balancer.TestBalancer org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.web.TestWebHDFSXAttr Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9389//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9389//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9389//console This message is automatically generated. > Quota by Storage Type API, tools and ClientNameNode Protocol changes > > > Key: HDFS-7720 > URL: https://issues.apache.org/jira/browse/HDFS-7720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch > > > Split the patch into small ones based on the feedback. This one covers the > HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.
[ https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299664#comment-14299664 ] Hadoop QA commented on HDFS-7719: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695692/HDFS-7719.000.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1191 javac compiler warnings (more than the trunk's current 152 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 48 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9388//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/patchReleaseAuditProblems.txt Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9388//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9388//console This message is automatically generated. > BlockPoolSliceStorage could not remove storageDir. > -- > > Key: HDFS-7719 > URL: https://issues.apache.org/jira/browse/HDFS-7719 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7719.000.patch > > > The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume > level directories, thus {{BlockPoolSliceStorage}} could not directly compare > its own {{StorageDirs}} with this volume-level directory. The result of that > is {{BlockPoolSliceStorage}} did not actually remove the targeted > {{StorageDirectory}}. > It will cause failure when remove a volume and then immediately add a volume > back with the same mount point.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
[ https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299665#comment-14299665 ] Hadoop QA commented on HDFS-7647: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695696/HDFS-7647-3.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.shortcircuit.TestShortCircuitCache org.apache.hadoop.hdfs.TestDataTransferKeepalive org.apache.hadoop.hdfs.TestDecommission org.apache.hadoop.hdfs.TestBlockReaderFactory Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9387//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9387//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9387//console This message is automatically generated. > DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs > -- > > Key: HDFS-7647 > URL: https://issues.apache.org/jira/browse/HDFS-7647 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Milan Desai >Assignee: Milan Desai > Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch > > > DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside > each LocatedBlock, but does not touch the array of StorageIDs and > StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are > mismatched. The method is called by FSNamesystem.getBlockLocations(), so the > client will not know which StorageID/Type corresponds to which DatanodeInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
[ https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299649#comment-14299649 ] Hadoop QA commented on HDFS-7720: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695711/HDFS-7720.1.patch against trunk revision 054a947. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestSnapshotRename The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9390//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9390//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9390//console This message is automatically generated. > Quota by Storage Type API, tools and ClientNameNode Protocol changes > > > Key: HDFS-7720 > URL: https://issues.apache.org/jira/browse/HDFS-7720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch > > > Split the patch into small ones based on the feedback. This one covers the > HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7712) Switch blockStateChangeLog to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299581#comment-14299581 ] Hadoop QA commented on HDFS-7712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695654/hdfs-7712.002.patch against trunk revision 09ad9a8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistFiles org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes org.apache.hadoop.hdfs.TestFileCreationDelete org.apache.hadoop.hdfs.TestFileAppend4 org.apache.hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality org.apache.hadoop.hdfs.TestFileAppend2 org.apache.hadoop.hdfs.TestFileAppend3 org.apache.hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes org.apache.hadoop.hdfs.TestRenameWhileOpen org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestScrLazyPersistFiles org.apache.hadoop.hdfs.server.datanode.TestNNHandlesBlockReportPerStorage org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode org.apache.hadoop.hdfs.TestDatanodeDeath org.apache.hadoop.hdfs.server.datanode.TestNNHandlesCombinedBlockReport org.apache.hadoop.hdfs.TestFileCorruption The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestFileCreation Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9386//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9386//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9386//console This message is automatically generated. > Switch blockStateChangeLog to use slf4j > --- > > Key: HDFS-7712 > URL: https://issues.apache.org/jira/browse/HDFS-7712 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7712.001.patch, hdfs-7712.002.patch > > > As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will > save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7723) Quota By Storage Type namenode implemenation
[ https://issues.apache.org/jira/browse/HDFS-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7723: - Attachment: HDFS-7723.0.patch This patch assumes ClientNameNodeRPC protocol changes (HDFS-7720) is in. I will defer submit the patch until the review for HDFS-7720 is finished. > Quota By Storage Type namenode implemenation > > > Key: HDFS-7723 > URL: https://issues.apache.org/jira/browse/HDFS-7723 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao > Attachments: HDFS-7723.0.patch > > > This includes: 1) new editlog to persist quota by storage type op 2) > corresponding fsimage load/save the new op. 3) QuotaCount refactor to update > usage of the storage types for quota enforcement 4) Snapshot support 5) Unit > test update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299554#comment-14299554 ] Tsz Wo Nicholas Sze commented on HDFS-5631: --- extdataset is missing in the branch-2 patch. Forgot to add the new files? > Expose interfaces required by FsDatasetSpi implementations > -- > > Key: HDFS-5631 > URL: https://issues.apache.org/jira/browse/HDFS-5631 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: Joe Pallas >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5631-LazyPersist.patch, > HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, > HDFS-5631.patch > > > This sub-task addresses section 4.1 of the document attached to HDFS-5194, > the exposure of interfaces needed by a FsDatasetSpi implementation. > Specifically it makes ChunkChecksum public and BlockMetadataHeader's > readHeader() and writeHeader() methods public. > The changes to BlockReaderUtil (and related classes) discussed by section > 4.1 are only needed if supporting short-circuit, and should be addressed > as part of an effort to provide such support rather than this JIRA. > To help ensure these changes are complete and are not regressed in the > future, tests that gauge the accessibility (though *not* behavior) > of interfaces needed by a FsDatasetSpi subclass are also included. > These take the form of a dummy FsDatasetSpi subclass -- a successful > compilation is effectively a pass. Trivial unit tests are included so > that there is something tangible to track. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7723) Quota By Storage Type namenode implemenation
Xiaoyu Yao created HDFS-7723: Summary: Quota By Storage Type namenode implemenation Key: HDFS-7723 URL: https://issues.apache.org/jira/browse/HDFS-7723 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao This includes: 1) new editlog to persist quota by storage type op 2) corresponding fsimage load/save the new op. 3) QuotaCount refactor to update usage of the storage types for quota enforcement 4) Snapshot support 5) Unit test update -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
[ https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7720: - Attachment: HDFS-7720.1.patch > Quota by Storage Type API, tools and ClientNameNode Protocol changes > > > Key: HDFS-7720 > URL: https://issues.apache.org/jira/browse/HDFS-7720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-7720.0.patch, HDFS-7720.1.patch > > > Split the patch into small ones based on the feedback. This one covers the > HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron T. Myers updated HDFS-7707: - Target Version/s: 2.7.0 > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7701) Support quota by storage type output with "hadoop fs -count -q"
[ https://issues.apache.org/jira/browse/HDFS-7701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-7701: Assignee: Xiaoyu Yao > Support quota by storage type output with "hadoop fs -count -q" > --- > > Key: HDFS-7701 > URL: https://issues.apache.org/jira/browse/HDFS-7701 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > > "hadoop fs -count -q" currently shows name space/disk space quota and > remaining quota information. With HDFS-7584, we want to display per storage > type quota and its remaining information as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
[ https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7720: - Attachment: HDFS-7720.0.patch > Quota by Storage Type API, tools and ClientNameNode Protocol changes > > > Key: HDFS-7720 > URL: https://issues.apache.org/jira/browse/HDFS-7720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-7720.0.patch > > > Split the patch into small ones based on the feedback. This one covers the > HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
Lei (Eddy) Xu created HDFS-7722: --- Summary: DataNode#checkDiskError should also remove Storage when error is found. Key: HDFS-7722 URL: https://issues.apache.org/jira/browse/HDFS-7722 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu When {{DataNode#checkDiskError}} found disk errors, it removes all block metadatas from {{FsDatasetImpl}}. However, it does not removed the corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. The result is that, we could not directly run {{reconfig}} to hot swap the failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
[ https://issues.apache.org/jira/browse/HDFS-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7720: - Status: Patch Available (was: Open) > Quota by Storage Type API, tools and ClientNameNode Protocol changes > > > Key: HDFS-7720 > URL: https://issues.apache.org/jira/browse/HDFS-7720 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao > Attachments: HDFS-7720.0.patch > > > Split the patch into small ones based on the feedback. This one covers the > HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7721) TestBlockScanner.testScanRateLimit may fail
Tsz Wo Nicholas Sze created HDFS-7721: - Summary: TestBlockScanner.testScanRateLimit may fail Key: HDFS-7721 URL: https://issues.apache.org/jira/browse/HDFS-7721 Project: Hadoop HDFS Issue Type: Bug Components: datanode Reporter: Tsz Wo Nicholas Sze - https://builds.apache.org/job/PreCommit-HDFS-Build/9375//testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockScanner/testScanRateLimit/ - https://builds.apache.org/job/PreCommit-HDFS-Build/9365//testReport/org.apache.hadoop.hdfs.server.datanode/TestBlockScanner/testScanRateLimit/ {code} java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.hadoop.hdfs.server.datanode.TestBlockScanner.testScanRateLimit(TestBlockScanner.java:439) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7696) FsDatasetImpl.getTmpInputStreams(..) may leak file descriptors
[ https://issues.apache.org/jira/browse/HDFS-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299518#comment-14299518 ] Brandon Li commented on HDFS-7696: -- +1. The patch looks good to me. > FsDatasetImpl.getTmpInputStreams(..) may leak file descriptors > -- > > Key: HDFS-7696 > URL: https://issues.apache.org/jira/browse/HDFS-7696 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo Nicholas Sze >Assignee: Tsz Wo Nicholas Sze > Attachments: h7696_20150128.patch > > > getTmpInputStreams(..) opens a block file and a meta file, and then return > them as ReplicaInputStreams. The caller responses to closes those streams. > In case of errors, an exception is thrown without closing the files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
[ https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7647: -- Attachment: HDFS-7647-3.patch Thanks [~arpitagarwal] for the review, and sorry about the delay. 1. I returned the fields for {{storageIDs}} and {{storageTypes}} to store their cached versions. 2. Introduced method {{invalidateCachedStorageInfos}} to invalidate the arrays for {{storageIDs}} and {{storageTypes}}. It is called by {{sortLocatedBlocks}} after the sorting. 3. Added unit test {{TestDatanodeManager.testSortLocatedBlocks}}. 4. I added a comment to {{getLocations()}} saying the returned array is not expected to be modified, and if it is, caller must immediately invoke {{invalidateCachedStorageInfos}} from (2) Will open a separate Jira for making {{locs}} an immutable list. > DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs > -- > > Key: HDFS-7647 > URL: https://issues.apache.org/jira/browse/HDFS-7647 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Milan Desai >Assignee: Milan Desai > Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch > > > DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside > each LocatedBlock, but does not touch the array of StorageIDs and > StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are > mismatched. The method is called by FSNamesystem.getBlockLocations(), so the > client will not know which StorageID/Type corresponds to which DatanodeInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7647) DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs
[ https://issues.apache.org/jira/browse/HDFS-7647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Milan Desai updated HDFS-7647: -- Status: Patch Available (was: In Progress) > DatanodeManager.sortLocatedBlocks() sorts DatanodeInfos but not StorageIDs > -- > > Key: HDFS-7647 > URL: https://issues.apache.org/jira/browse/HDFS-7647 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Milan Desai >Assignee: Milan Desai > Attachments: HDFS-7647-2.patch, HDFS-7647-3.patch, HDFS-7647.patch > > > DatanodeManager.sortLocatedBlocks() sorts the array of DatanodeInfos inside > each LocatedBlock, but does not touch the array of StorageIDs and > StorageTypes. As a result, the DatanodeInfos and StorageIDs/StorageTypes are > mismatched. The method is called by FSNamesystem.getBlockLocations(), so the > client will not know which StorageID/Type corresponds to which DatanodeInfo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-5631: - Target Version/s: 3.0.0, 2.7.0 (was: 3.0.0) > Expose interfaces required by FsDatasetSpi implementations > -- > > Key: HDFS-5631 > URL: https://issues.apache.org/jira/browse/HDFS-5631 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: Joe Pallas >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5631-LazyPersist.patch, > HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, > HDFS-5631.patch > > > This sub-task addresses section 4.1 of the document attached to HDFS-5194, > the exposure of interfaces needed by a FsDatasetSpi implementation. > Specifically it makes ChunkChecksum public and BlockMetadataHeader's > readHeader() and writeHeader() methods public. > The changes to BlockReaderUtil (and related classes) discussed by section > 4.1 are only needed if supporting short-circuit, and should be addressed > as part of an effort to provide such support rather than this JIRA. > To help ensure these changes are complete and are not regressed in the > future, tests that gauge the accessibility (though *not* behavior) > of interfaces needed by a FsDatasetSpi subclass are also included. > These take the form of a dummy FsDatasetSpi subclass -- a successful > compilation is effectively a pass. Trivial unit tests are included so > that there is something tangible to track. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7720) Quota by Storage Type API, tools and ClientNameNode Protocol changes
Xiaoyu Yao created HDFS-7720: Summary: Quota by Storage Type API, tools and ClientNameNode Protocol changes Key: HDFS-7720 URL: https://issues.apache.org/jira/browse/HDFS-7720 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Split the patch into small ones based on the feedback. This one covers the HDFS API changes, tool changes and ClientNameNode protocol changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5631) Expose interfaces required by FsDatasetSpi implementations
[ https://issues.apache.org/jira/browse/HDFS-5631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-5631: - Attachment: HDFS-5631-branch-2.patch Added patch for branch-2. > Expose interfaces required by FsDatasetSpi implementations > -- > > Key: HDFS-5631 > URL: https://issues.apache.org/jira/browse/HDFS-5631 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: Joe Pallas >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5631-LazyPersist.patch, > HDFS-5631-LazyPersist.patch, HDFS-5631-branch-2.patch, HDFS-5631.patch, > HDFS-5631.patch > > > This sub-task addresses section 4.1 of the document attached to HDFS-5194, > the exposure of interfaces needed by a FsDatasetSpi implementation. > Specifically it makes ChunkChecksum public and BlockMetadataHeader's > readHeader() and writeHeader() methods public. > The changes to BlockReaderUtil (and related classes) discussed by section > 4.1 are only needed if supporting short-circuit, and should be addressed > as part of an effort to provide such support rather than this JIRA. > To help ensure these changes are complete and are not regressed in the > future, tests that gauge the accessibility (though *not* behavior) > of interfaces needed by a FsDatasetSpi subclass are also included. > These take the form of a dummy FsDatasetSpi subclass -- a successful > compilation is effectively a pass. Trivial unit tests are included so > that there is something tangible to track. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Moved] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.
[ https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu moved HADOOP-11530 to HDFS-7719: -- Target Version/s: 3.0.0, 2.7.0 (was: 3.0.0, 2.7.0) Affects Version/s: (was: 2.6.0) 2.6.0 Key: HDFS-7719 (was: HADOOP-11530) Project: Hadoop HDFS (was: Hadoop Common) > BlockPoolSliceStorage could not remove storageDir. > -- > > Key: HDFS-7719 > URL: https://issues.apache.org/jira/browse/HDFS-7719 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > > The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume > level directories, thus {{BlockPoolSliceStorage}} could not directly compare > its own {{StorageDirs}} with this volume-level directory. The result of that > is {{BlockPoolSliceStorage}} did not actually remove the targeted > {{StorageDirectory}}. > It will cause failure when remove a volume and then immediately add a volume > back with the same mount point.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.
[ https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7719: Status: Patch Available (was: Open) > BlockPoolSliceStorage could not remove storageDir. > -- > > Key: HDFS-7719 > URL: https://issues.apache.org/jira/browse/HDFS-7719 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7719.000.patch > > > The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume > level directories, thus {{BlockPoolSliceStorage}} could not directly compare > its own {{StorageDirs}} with this volume-level directory. The result of that > is {{BlockPoolSliceStorage}} did not actually remove the targeted > {{StorageDirectory}}. > It will cause failure when remove a volume and then immediately add a volume > back with the same mount point.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7719) BlockPoolSliceStorage could not remove storageDir.
[ https://issues.apache.org/jira/browse/HDFS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-7719: Attachment: HDFS-7719.000.patch This patch checks the targeted directories are parent directories in {{BlockPoolSliceStorage#removeVolumes}}. A test is added to enforce the behavior. > BlockPoolSliceStorage could not remove storageDir. > -- > > Key: HDFS-7719 > URL: https://issues.apache.org/jira/browse/HDFS-7719 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Lei (Eddy) Xu >Assignee: Lei (Eddy) Xu > Attachments: HDFS-7719.000.patch > > > The parameter of {{BlockPoolSliceStorage#removeVolumes()}} is a set of volume > level directories, thus {{BlockPoolSliceStorage}} could not directly compare > its own {{StorageDirs}} with this volume-level directory. The result of that > is {{BlockPoolSliceStorage}} did not actually remove the targeted > {{StorageDirectory}}. > It will cause failure when remove a volume and then immediately add a volume > back with the same mount point.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299493#comment-14299493 ] Zhe Zhang commented on HDFS-7339: - bq. I guess the concern is that with EC we will be going through the block ID space much faster since you'll allocate 9 IDs per physical block. Is that correct? We have used Jing's proposal and allocated negative block IDs to EC blocks. Within that range ({{LONG.MIN ~ 0}}), 16 IDs will be allocated to each group. A physical block _could_ use 16 IDs, if the containing file is smaller than a block. In large files, each block group will have multiple blocks. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299475#comment-14299475 ] Arpit Agarwal commented on HDFS-7339: - Yes I think not reserving will be fine. I guess the concern is that with EC we will be going through the block ID space much faster since you'll allocate 9 IDs per physical block. Is that correct? > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7718) DFSClient objects created by AbstractFileSystem objects created by FileContext are not closed and results in thread leakage
[ https://issues.apache.org/jira/browse/HDFS-7718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun Suresh updated HDFS-7718: -- Description: Currently, the {{FileContext}} class used by clients such as (for eg. {{YARNRunner}}) creates a new {{AbstractFilesystem}} object on initialization.. which creates a new {{DFSClient}} object.. which in turn creates a KeyProvider object.. If Encryption is turned on, and https is turned on, the keyprovider implementation (the {{KMSClientProvider}}) will create a {{ReloadingX509TrustManager}} thread per instance... which are never killed and can lead to a thread leak (was: Currently, the {{FileContext}} class used by clients such as (for eg. {{YARNRunner}}) creates new {{AbstractFilesystem}} object on initialization.. which creates new {{DFSClient}} objects.. which in turn creates KeyProvider objects.. If Encryption is turned on, and https is turned on, the keyprovider implementation (the {{KMSClientProvider}}) will create a {{ReloadingX509TrustManager}} per instance... which are never killed and can leak) > DFSClient objects created by AbstractFileSystem objects created by > FileContext are not closed and results in thread leakage > --- > > Key: HDFS-7718 > URL: https://issues.apache.org/jira/browse/HDFS-7718 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Arun Suresh >Assignee: Arun Suresh > > Currently, the {{FileContext}} class used by clients such as (for eg. > {{YARNRunner}}) creates a new {{AbstractFilesystem}} object on > initialization.. which creates a new {{DFSClient}} object.. which in turn > creates a KeyProvider object.. If Encryption is turned on, and https is > turned on, the keyprovider implementation (the {{KMSClientProvider}}) will > create a {{ReloadingX509TrustManager}} thread per instance... which are never > killed and can lead to a thread leak -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in
[ https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299467#comment-14299467 ] Chris Nauroth commented on HDFS-7520: - My best guess is that this happens when the build finds an OpenSSL, but it's too old for us to use. According to the CMake logic, we'd skip compilation of OpensslCipher.c: {code} if (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR) GET_FILENAME_COMPONENT(HADOOP_OPENSSL_LIBRARY ${OPENSSL_LIBRARY} NAME) INCLUDE(CheckCSourceCompiles) SET(OLD_CMAKE_REQUIRED_INCLUDES ${CMAKE_REQUIRED_INCLUDES}) SET(CMAKE_REQUIRED_INCLUDES ${OPENSSL_INCLUDE_DIR}) CHECK_C_SOURCE_COMPILES("#include \"${OPENSSL_INCLUDE_DIR}/openssl/evp.h\"\nint main(int argc, char **argv) { return !EVP_aes_256_ctr; }" HAS_NEW_ENOUGH_OPENSSL) SET(CMAKE_REQUIRED_INCLUDES ${OLD_CMAKE_REQUIRED_INCLUDES}) if(NOT HAS_NEW_ENOUGH_OPENSSL) MESSAGE("The OpenSSL library installed at ${OPENSSL_LIBRARY} is too old. You need a version at least new enough to have EVP_aes_256_ctr.") else(NOT HAS_NEW_ENOUGH_OPENSSL) SET(USABLE_OPENSSL 1) endif(NOT HAS_NEW_ENOUGH_OPENSSL) endif (OPENSSL_LIBRARY AND OPENSSL_INCLUDE_DIR) if (USABLE_OPENSSL) SET(OPENSSL_SOURCE_FILES "${D}/crypto/OpensslCipher.c" "${D}/crypto/random/OpensslSecureRandom.c") {code} However, the check for {{buildSupportsOpenssl}} is driven by {{HADOOP_OPENSSL_LIBRARY}}, and I believe the CMake logic still left that defined: {code} JNIEXPORT jboolean JNICALL Java_org_apache_hadoop_util_NativeCodeLoader_buildSupportsOpenssl (JNIEnv *env, jclass clazz) { #ifdef HADOOP_OPENSSL_LIBRARY return JNI_TRUE; #else return JNI_FALSE; #endif } {code} At the Java layer, this would cause it to think the build supports OpenSSL, therefore it calls {{initIDs}}, but they symbol isn't really in libhadoop.so. Therefore, it's an {{UnsatisfiedLinkError}} with message set to the signature of the Java native method. Colin, if you know you saw this happening with a particular version of OpenSSL, would you please comment? That would help Anu with a repro. Thanks! > checknative should display a nicer error message when openssl support is not > compiled in > > > Key: HDFS-7520 > URL: https://issues.apache.org/jira/browse/HDFS-7520 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Anu Engineer > > checknative should display a nicer error message when openssl support is not > compiled in. Currently, it displays this: > {code} > [cmccabe@keter hadoop]$ hadoop checknative > 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized > native-bzip2 library system-native > 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib64/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V > {code} > Instead, we should display something like this, if openssl is not supported > by the current build: > {code} > openssl: false Hadoop was built without openssl support. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7718) DFSClient objects created by AbstractFileSystem objects created by FileContext are not closed and results in thread leakage
Arun Suresh created HDFS-7718: - Summary: DFSClient objects created by AbstractFileSystem objects created by FileContext are not closed and results in thread leakage Key: HDFS-7718 URL: https://issues.apache.org/jira/browse/HDFS-7718 Project: Hadoop HDFS Issue Type: Bug Reporter: Arun Suresh Assignee: Arun Suresh Currently, the {{FileContext}} class used by clients such as (for eg. {{YARNRunner}}) creates new {{AbstractFilesystem}} object on initialization.. which creates new {{DFSClient}} objects.. which in turn creates KeyProvider objects.. If Encryption is turned on, and https is turned on, the keyprovider implementation (the {{KMSClientProvider}}) will create a {{ReloadingX509TrustManager}} per instance... which are never killed and can leak -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299454#comment-14299454 ] Jing Zhao commented on HDFS-7339: - I think not reserving currently should be fine. If we find we need to reserve, we can reserve from the other end of the block group id space. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299446#comment-14299446 ] Zhe Zhang commented on HDFS-7339: - Thanks [~arpitagarwal]. I feel reserving some block IDs makes sense, but the current value is probably too large. Would be nice to hear from others. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299443#comment-14299443 ] Arpit Agarwal commented on HDFS-7339: - bq. Do you know why we reserve 1 billion block IDs (LAST_RESERVED_BLOCK_ID) in the current block ID generator? So we could assign a special meaning to some block IDs in the future, if necessary. However the reservation was not useful in hindsight. We can free up this range for use. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7717) Erasure Coding: provide a tool for convert files between replication and erasure coding
Jing Zhao created HDFS-7717: --- Summary: Erasure Coding: provide a tool for convert files between replication and erasure coding Key: HDFS-7717 URL: https://issues.apache.org/jira/browse/HDFS-7717 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao We need a tool to do offline conversion between replication and erasure coding. The tool itself can either utilize MR just like the current distcp, or act like the balancer/mover. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang resolved HDFS-7339. - Resolution: Fixed > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299434#comment-14299434 ] Zhe Zhang commented on HDFS-7339: - I just committed the patch to HDFS-EC. Thanks a lot for the reviews from [~jingzhao], [~szetszwo], [~andrew.wang], and [~vinayrpet]! > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7339: Hadoop Flags: Reviewed > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7716) Erasure Coding: extend BlockInfo to handle EC info
Jing Zhao created HDFS-7716: --- Summary: Erasure Coding: extend BlockInfo to handle EC info Key: HDFS-7716 URL: https://issues.apache.org/jira/browse/HDFS-7716 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao The current BlockInfo's implementation only supports the replication mechanism. To use the same blocksMap handling block group and its data/parity blocks, we need to define a new BlockGroupInfo class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7339: Status: Open (was: Patch Available) > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299420#comment-14299420 ] Jing Zhao commented on HDFS-7339: - Thanks for the quick update, Zhe! bq. Do you know why we reserve 1 billion block IDs (LAST_RESERVED_BLOCK_ID) in the current block ID generator? Actually I'm not very sure about the reason. Maybe [~arpitagarwal] can comment. +1 for the current patch. In the meanwhile I just created a jira to address {{BlockGroupInfo}} and {{BlockInfo}}. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299413#comment-14299413 ] Andrew Wang commented on HDFS-7411: --- RAT complains about a psd file? seems spurious. > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, > hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, > hdfs-7411.009.patch, hdfs-7411.010.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299410#comment-14299410 ] Hadoop QA commented on HDFS-7339: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695650/HDFS-7339-008.patch against trunk revision 8dc59cb. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9385//console This message is automatically generated. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in
[ https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299407#comment-14299407 ] Anu Engineer commented on HDFS-7520: I looked at this code path and it looks like if Hadoop was indeed compiled without OpenSSL you would have gotten the following message "build does not support openssl." This failure seems have come from the InitDS call , which calls into Native Code. To understand why the loading failed, I need to understand which OS you are running, your LD config info and which version of OpenSSL shared objects are in your path. In other words, I need more info on how to reproduce this bug. This is error message is certainly not due to hadoop being compiled without openssl support. It is most probably due to a runtime error. > checknative should display a nicer error message when openssl support is not > compiled in > > > Key: HDFS-7520 > URL: https://issues.apache.org/jira/browse/HDFS-7520 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Anu Engineer > > checknative should display a nicer error message when openssl support is not > compiled in. Currently, it displays this: > {code} > [cmccabe@keter hadoop]$ hadoop checknative > 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized > native-bzip2 library system-native > 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib64/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V > {code} > Instead, we should display something like this, if openssl is not supported > by the current build: > {code} > openssl: false Hadoop was built without openssl support. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7712: -- Attachment: hdfs-7712.002.patch Woops missed a file. > Switch blockStateChangeLog to use slf4j > --- > > Key: HDFS-7712 > URL: https://issues.apache.org/jira/browse/HDFS-7712 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7712.001.patch, hdfs-7712.002.patch > > > As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will > save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhe Zhang updated HDFS-7339: Attachment: HDFS-7339-008.patch > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, HDFS-7339-008.patch, > Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299387#comment-14299387 ] Colin Patrick McCabe commented on HDFS-7648: bq. The original design of DirectoryScanner is to reconciles the differences between the block information maintained in memory and the actual blocks stored in disks. So it does fix the in-memory data structure. Fixing the in-memory data structure is different than fixing the on-disk data structure. I do not think that the DirectoryScanner should modify the files on the disk. It just introduces too much potential for error and mistakes in the scanner to cause data loss. bq. Yet more questions if the blocks are not fixed: should the block report include those blocks? How to access those blocks? How and when to fix those blocks? The only way we could ever get into this state is: * if someone manually renamed some block files on ext4 * if someone introduced a bug in the datanode code that put blocks in the wrong place. * if there is serious ext4 filesystem corruption None of those cases seems like something we should be trying to automatically recover from. > Verify the datanode directory layout > > > Key: HDFS-7648 > URL: https://issues.apache.org/jira/browse/HDFS-7648 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo Nicholas Sze >Assignee: Rakesh R > > HDFS-6482 changed datanode layout to use block ID to determine the directory > to store the block. We should have some mechanism to verify it. Either > DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7712) Switch blockStateChangeLog to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299378#comment-14299378 ] Hadoop QA commented on HDFS-7712: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695641/hdfs-7712.001.patch against trunk revision 8635822. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9384//console This message is automatically generated. > Switch blockStateChangeLog to use slf4j > --- > > Key: HDFS-7712 > URL: https://issues.apache.org/jira/browse/HDFS-7712 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7712.001.patch > > > As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will > save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299370#comment-14299370 ] Zhe Zhang commented on HDFS-7339: - Thanks [~jingzhao] for the helpful review! bq. Instead of the current ID division mechanism (calculating the mid point between LAST_RESERVED_BLOCK_ID and LONG.MAX), can we simply let the block group id take all the negative long space (i.e., with first bit set to 1)? In this way we can utilize larger space and use simple bit manipulations for id generation/checking. I think this is a good idea. With the current HDFS block ID generator, negative IDs will be used only when all positive ones are used up (i.e., the long value [reaches max | http://stackoverflow.com/questions/8513826/atomicinteger-incrementation]). With your proposal, regular block IDs are less likely to "grow into" the block group ID space. bq. Why do we need to reserve the first 1024 block group ids? Do you know why we reserve 1 billion block IDs ({{LAST_RESERVED_BLOCK_ID}}) in the current block ID generator? I couldn't figure out the exact reason, so chose to do the same for block groups. bq. If we directly extend the current BlockInfo to BlockGroupInfo, the semantic of the triplets may be different for BlockGroupInfo. One possible solution is to let triplets's size be 3*(k+m), where k is the number of data blocks and m is the number of the parity blocks. The 007 patch already attempts to do that but didn't finish -- if the file {{isStriped()}}, then the group size (currently hardcoded and will be configuration with HDFS-7337) will be used instead of {{getReplication()}} to choose targets. The updated patch will further use this logic to create the {{BlockInfo}} object. Then there will naturally be {{3*(k+m)}} elements in {{triplets}}. bq. The above #3 and #4 may need some extra refactoring work on the current BlockInfo class. I'm also fine with moving this part of work to a separate jira. I agree. {{BlockGroupInfo}} is for optimization. We should commit this patch faster to facilitate a working prototype. I took it out in the new patch. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299353#comment-14299353 ] Hadoop QA commented on HDFS-7411: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695598/hdfs-7411.010.patch against trunk revision 951b360. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9383//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9383//artifact/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9383//console This message is automatically generated. > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, > hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, > hdfs-7411.009.patch, hdfs-7411.010.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7712: -- Status: Patch Available (was: Open) > Switch blockStateChangeLog to use slf4j > --- > > Key: HDFS-7712 > URL: https://issues.apache.org/jira/browse/HDFS-7712 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7712.001.patch > > > As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will > save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7712) Switch blockStateChangeLog to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7712: -- Attachment: hdfs-7712.001.patch Patch attached. [~kihwal] willing to review? > Switch blockStateChangeLog to use slf4j > --- > > Key: HDFS-7712 > URL: https://issues.apache.org/jira/browse/HDFS-7712 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.7.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7712.001.patch > > > As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will > save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.
[ https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299338#comment-14299338 ] Kihwal Lee commented on HDFS-7714: -- On a related note, I've seen similar symproms when the two namenodes' ctimes in their storage are different. After a datanode registers with one nn, it won't be able to register with the other and cause the actor thread to die. Depending on whom each datanode talk to first, they will be divided into two sets, each of which talking to only one namenode, thus creating a split brain situation. Of course, running two namenodes with different storage version is a mistake, but I've seen people making this kind of mistake multiple times. Whenever it happened, I wished for a way to start the actor thread back up. The refreshNamenodes dfs admin command does not work for HA configuration. > Simultaneous restart of HA NameNodes and DataNode can cause DataNode to > register successfully with only one NameNode. > - > > Key: HDFS-7714 > URL: https://issues.apache.org/jira/browse/HDFS-7714 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Chris Nauroth > > In an HA deployment, DataNodes must register with both NameNodes and send > periodic heartbeats and block reports to both. However, if NameNodes and > DataNodes are restarted simultaneously, then this can trigger a race > condition in registration. The end result is that the {{BPServiceActor}} for > one NameNode terminates, but the {{BPServiceActor}} for the other NameNode > remains alive. The DataNode process is then in a "half-alive" state where it > only heartbeats and sends block reports to one of the NameNodes. This could > cause a loss of storage capacity after an HA failover. The DataNode process > would have to be restarted to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299300#comment-14299300 ] Tsz Wo Nicholas Sze commented on HDFS-7411: --- > This statement is false. Configuration compatibility was the core of the > above discussion. ... Sure, there is a discussion of how to be compatible with the old conf. However, it never mentions that the decision is to have an incompatible change. Anyway, thanks for the update. Will review the patch. > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, > hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, > hdfs-7411.009.patch, hdfs-7411.010.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7608: - Attachment: HDFS-7608.0.patch Post a patch for DFSClient newConnectedPeer write timeout. > hdfs dfsclient newConnectedPeer has no write timeout > - > > Key: HDFS-7608 > URL: https://issues.apache.org/jira/browse/HDFS-7608 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, fuse-dfs >Affects Versions: 2.3.0, 2.6.0 > Environment: hdfs 2.3.0 hbase 0.98.6 >Reporter: zhangshilong >Assignee: Xiaoyu Yao > Labels: patch > Fix For: 2.6.0 > > Attachments: HDFS-7608.0.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > problem: > hbase compactSplitThread may lock forever on read datanode blocks. > debug found: epollwait timeout set to 0,so epollwait can not run out. > cause: in hdfs 2.3.0 > hbase using DFSClient to read and write blocks. > DFSClient creates one socket using newConnectedPeer(addr), but has no read > or write timeout. > in v 2.6.0, newConnectedPeer has added readTimeout to deal with the > problem,but did not add writeTimeout. why did not add write Timeout? > I think NioInetPeer need a default socket timeout,so appalications will no > need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao reassigned HDFS-7608: Assignee: Xiaoyu Yao > hdfs dfsclient newConnectedPeer has no write timeout > - > > Key: HDFS-7608 > URL: https://issues.apache.org/jira/browse/HDFS-7608 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, fuse-dfs >Affects Versions: 2.3.0, 2.6.0 > Environment: hdfs 2.3.0 hbase 0.98.6 >Reporter: zhangshilong >Assignee: Xiaoyu Yao > Labels: patch > Fix For: 2.6.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > problem: > hbase compactSplitThread may lock forever on read datanode blocks. > debug found: epollwait timeout set to 0,so epollwait can not run out. > cause: in hdfs 2.3.0 > hbase using DFSClient to read and write blocks. > DFSClient creates one socket using newConnectedPeer(addr), but has no read > or write timeout. > in v 2.6.0, newConnectedPeer has added readTimeout to deal with the > problem,but did not add writeTimeout. why did not add write Timeout? > I think NioInetPeer need a default socket timeout,so appalications will no > need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7715) Implement the Hitchhiker erasure coding algorithm
Zhe Zhang created HDFS-7715: --- Summary: Implement the Hitchhiker erasure coding algorithm Key: HDFS-7715 URL: https://issues.apache.org/jira/browse/HDFS-7715 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25% and 45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms. The existing implementation is based on HDFS-RAID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7710) Remove dead code in BackupImage.java
[ https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299260#comment-14299260 ] Hadoop QA commented on HDFS-7710: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695580/HDFS-7710.0.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9382//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9382//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9382//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9382//console This message is automatically generated. > Remove dead code in BackupImage.java > > > Key: HDFS-7710 > URL: https://issues.apache.org/jira/browse/HDFS-7710 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Attachments: HDFS-7710.0.patch > > > BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is > proposed to clean it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use
[ https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299233#comment-14299233 ] Tianyin Xu commented on HDFS-7684: -- Yes, exactly. It seems that Hadoop has a bunch of such trimming issues that bothered a number of users... Thanks, Xiaoyu! ~t > The host:port settings of dfs.namenode.secondary.http-address should be > trimmed before use > -- > > Key: HDFS-7684 > URL: https://issues.apache.org/jira/browse/HDFS-7684 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.1 >Reporter: Tianyin Xu >Assignee: Anu Engineer > > With the following setting, > > dfs.namenode.secondary.http-address > myhostname:50090 > > The secondary NameNode could not be started > $ hadoop-daemon.sh start secondarynamenode > starting secondarynamenode, logging to > /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out > /home/hadoop/hadoop-2.4.1/bin/hdfs > Exception in thread "main" java.lang.IllegalArgumentException: Does not > contain a valid host:port authority: myhostname:50090 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651) > We were really confused and misled by the log message: we thought about the > DNS problems (changed to IP address but no success) and the network problem > (tried to test the connections with no success...) > It turned out to be that the setting is not trimmed and the additional space > character in the end of the setting caused the problem... OMG!!!... > Searching on the Internet, we find we are really not alone. So many users > encountered similar trim problems! The following lists a few: > http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html > http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh > https://issues.apache.org/jira/browse/HDFS-2799 > https://issues.apache.org/jira/browse/HBASE-6973 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use
[ https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299229#comment-14299229 ] Xiaoyu Yao commented on HDFS-7684: -- Thanks [~tianyin] for reporting this. The one that you hit can be fixed by changing the conf.get to conf.getTrimmed. {code} final String httpsAddrString = conf.get( DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_KEY, DFSConfigKeys.DFS_NAMENODE_SECONDARY_HTTPS_ADDRESS_DEFAULT); InetSocketAddress httpsAddr = NetUtils.createSocketAddr(httpsAddrString); {code} Searched the call of NetUtils.createSocketAddr() in HDFS code, I found many other places with similar untrimmed host:port issues. For example in DataNodeManager#DataNodeManager() below. I think we should fix them as well with this JIRA. {code} this.defaultXferPort = NetUtils.createSocketAddr( conf.get(DFSConfigKeys.DFS_DATANODE_ADDRESS_KEY, DFSConfigKeys.DFS_DATANODE_ADDRESS_DEFAULT)).getPort(); this.defaultInfoPort = NetUtils.createSocketAddr( conf.get(DFSConfigKeys.DFS_DATANODE_HTTP_ADDRESS_KEY, DFSConfigKeys.DFS_DATANODE_HTTP_ADDRESS_DEFAULT)).getPort(); this.defaultInfoSecurePort = NetUtils.createSocketAddr( conf.get(DFSConfigKeys.DFS_DATANODE_HTTPS_ADDRESS_KEY, DFSConfigKeys.DFS_DATANODE_HTTPS_ADDRESS_DEFAULT)).getPort(); this.defaultIpcPort = NetUtils.createSocketAddr( conf.get(DFSConfigKeys.DFS_DATANODE_IPC_ADDRESS_KEY, DFSConfigKeys.DFS_DATANODE_IPC_ADDRESS_DEFAULT)).getPort(); {code} > The host:port settings of dfs.namenode.secondary.http-address should be > trimmed before use > -- > > Key: HDFS-7684 > URL: https://issues.apache.org/jira/browse/HDFS-7684 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.1 >Reporter: Tianyin Xu >Assignee: Anu Engineer > > With the following setting, > > dfs.namenode.secondary.http-address > myhostname:50090 > > The secondary NameNode could not be started > $ hadoop-daemon.sh start secondarynamenode > starting secondarynamenode, logging to > /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out > /home/hadoop/hadoop-2.4.1/bin/hdfs > Exception in thread "main" java.lang.IllegalArgumentException: Does not > contain a valid host:port authority: myhostname:50090 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651) > We were really confused and misled by the log message: we thought about the > DNS problems (changed to IP address but no success) and the network problem > (tried to test the connections with no success...) > It turned out to be that the setting is not trimmed and the additional space > character in the end of the setting caused the problem... OMG!!!... > Searching on the Internet, we find we are really not alone. So many users > encountered similar trim problems! The following lists a few: > http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html > http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh > https://issues.apache.org/jira/browse/HDFS-2799 > https://issues.apache.org/jira/browse/HBASE-6973 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299227#comment-14299227 ] Chris Nauroth commented on HDFS-2882: - I'm linking this to HDFS-7714, where I reported that a bug in this part of the code can cause a DataNode process to remain running in a "half-alive" state registered to only one NameNode with no opportunity to re-register to the other one. I don't think this patch introduced the problem though. > DN continues to start up, even if block pool fails to initialize > > > Key: HDFS-2882 > URL: https://issues.apache.org/jira/browse/HDFS-2882 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.2-alpha >Reporter: Todd Lipcon >Assignee: Vinayakumar B > Fix For: 2.4.1 > > Attachments: HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, > HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, HDFS-2882.patch, > HDFS-2882.patch, hdfs-2882.txt > > > I started a DN on a machine that was completely out of space on one of its > drives. I saw the following: > 2012-02-02 09:56:50,499 FATAL > org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for > block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id > DS-507718931-172.29.5.194-11072-12978 > 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021 > java.io.IOException: Mkdirs failed to create > /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp > at > org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.(FSDataset.java:335) > but the DN continued to run, spewing NPEs when it tried to do block reports, > etc. This was on the HDFS-1623 branch but may affect trunk as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.
[ https://issues.apache.org/jira/browse/HDFS-7714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299223#comment-14299223 ] Chris Nauroth commented on HDFS-7714: - Here are more details on what I've observed. I saw that the main {{BPServiceActor#run}} loop was active for one NameNode, but for the other one, it had reported the fatal "Initialization failed" error from this part of the code: {code} while (true) { // init stuff try { // setup storage connectToNNAndHandshake(); break; } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed runningState = RunningState.INIT_FAILED; if (shouldRetryInit()) { // Retry until all namenode's of BPOS failed initialization LOG.error("Initialization failed for " + this + " " + ioe.getLocalizedMessage()); sleepAndLogInterrupts(5000, "initializing"); } else { runningState = RunningState.FAILED; LOG.fatal("Initialization failed for " + this + ". Exiting. ", ioe); return; } } } {code} The {{ioe}} was an {{EOFException}} while trying the {{registerDatanode}} RPC. Lining up timestamps from NN and DN logs, I could see that the NN had restarted at the same time, causing it to abandon this RPC connection, ultimately triggering the {{EOFException}} on the DataNode side. Most importantly, the fact that it was on the code path with the fatal-level logging means that it would never reattempt registration with this NameNode. {{shouldRetryInit()}} must have returned {{false}}. The implementation of {{BPOfferService#shouldRetryInit}} is that it should only retry if the other one already registered successfully: {code} /* * Let the actor retry for initialization until all namenodes of cluster have * failed. */ boolean shouldRetryInit() { if (hasBlockPoolId()) { // One of the namenode registered successfully. lets continue retry for // other. return true; } return isAlive(); } {code} Tying that all together, this bug happens when the first attempted NameNode registration fails but the second succeeds. The DataNode process remains running, but with only one live {{BPServiceActor}}. HDFS-2882 had a lot of discussion of DataNode startup failure scenarios. I think the summary of that discussion is that the DataNode should in general retry its NameNode registrations, but it should instead abort right away if there is no possibility for registration to be successful. (i.e. There is a misconfiguration or a hardware failure.) I think the change we need here is that we should keep retrying the {{registerDatanode}} RPC if there is NameNode downtime or transient connectivity failure. Other failure reasons should still cause an abort. > Simultaneous restart of HA NameNodes and DataNode can cause DataNode to > register successfully with only one NameNode. > - > > Key: HDFS-7714 > URL: https://issues.apache.org/jira/browse/HDFS-7714 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.6.0 >Reporter: Chris Nauroth > > In an HA deployment, DataNodes must register with both NameNodes and send > periodic heartbeats and block reports to both. However, if NameNodes and > DataNodes are restarted simultaneously, then this can trigger a race > condition in registration. The end result is that the {{BPServiceActor}} for > one NameNode terminates, but the {{BPServiceActor}} for the other NameNode > remains alive. The DataNode process is then in a "half-alive" state where it > only heartbeats and sends block reports to one of the NameNodes. This could > cause a loss of storage capacity after an HA failover. The DataNode process > would have to be restarted to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7714) Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode.
Chris Nauroth created HDFS-7714: --- Summary: Simultaneous restart of HA NameNodes and DataNode can cause DataNode to register successfully with only one NameNode. Key: HDFS-7714 URL: https://issues.apache.org/jira/browse/HDFS-7714 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Chris Nauroth In an HA deployment, DataNodes must register with both NameNodes and send periodic heartbeats and block reports to both. However, if NameNodes and DataNodes are restarted simultaneously, then this can trigger a race condition in registration. The end result is that the {{BPServiceActor}} for one NameNode terminates, but the {{BPServiceActor}} for the other NameNode remains alive. The DataNode process is then in a "half-alive" state where it only heartbeats and sends block reports to one of the NameNodes. This could cause a loss of storage capacity after an HA failover. The DataNode process would have to be restarted to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299221#comment-14299221 ] Yongjun Zhang commented on HDFS-7707: - Thank you so much Kihwal! What happened was, the user manually delete the dir by issuing {{Hadoop fs –rm –r –skipTrash}} command. So it seems still related to delayed block removal. It appears that snapshot is involved but I will confirm. > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7520) checknative should display a nicer error message when openssl support is not compiled in
[ https://issues.apache.org/jira/browse/HDFS-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer reassigned HDFS-7520: -- Assignee: Anu Engineer > checknative should display a nicer error message when openssl support is not > compiled in > > > Key: HDFS-7520 > URL: https://issues.apache.org/jira/browse/HDFS-7520 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Colin Patrick McCabe >Assignee: Anu Engineer > > checknative should display a nicer error message when openssl support is not > compiled in. Currently, it displays this: > {code} > [cmccabe@keter hadoop]$ hadoop checknative > 14/12/12 14:08:43 INFO bzip2.Bzip2Factory: Successfully loaded & initialized > native-bzip2 library system-native > 14/12/12 14:08:43 INFO zlib.ZlibFactory: Successfully loaded & initialized > native-zlib library > Native library checking: > hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 > zlib:true /lib64/libz.so.1 > snappy: true /usr/lib64/libsnappy.so.1 > lz4: true revision:99 > bzip2: true /lib64/libbz2.so.1 > openssl: false org.apache.hadoop.crypto.OpensslCipher.initIDs()V > {code} > Instead, we should display something like this, if openssl is not supported > by the current build: > {code} > openssl: false Hadoop was built without openssl support. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7684) The host:port settings of dfs.namenode.secondary.http-address should be trimmed before use
[ https://issues.apache.org/jira/browse/HDFS-7684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer reassigned HDFS-7684: -- Assignee: Anu Engineer > The host:port settings of dfs.namenode.secondary.http-address should be > trimmed before use > -- > > Key: HDFS-7684 > URL: https://issues.apache.org/jira/browse/HDFS-7684 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.1 >Reporter: Tianyin Xu >Assignee: Anu Engineer > > With the following setting, > > dfs.namenode.secondary.http-address > myhostname:50090 > > The secondary NameNode could not be started > $ hadoop-daemon.sh start secondarynamenode > starting secondarynamenode, logging to > /home/hadoop/hadoop-2.4.1/logs/hadoop-hadoop-secondarynamenode-xxx.out > /home/hadoop/hadoop-2.4.1/bin/hdfs > Exception in thread "main" java.lang.IllegalArgumentException: Does not > contain a valid host:port authority: myhostname:50090 > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:196) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:163) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:152) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.getHttpAddress(SecondaryNameNode.java:203) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:214) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192) > at > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:651) > We were really confused and misled by the log message: we thought about the > DNS problems (changed to IP address but no success) and the network problem > (tried to test the connections with no success...) > It turned out to be that the setting is not trimmed and the additional space > character in the end of the setting caused the problem... OMG!!!... > Searching on the Internet, we find we are really not alone. So many users > encountered similar trim problems! The following lists a few: > http://solaimurugan.blogspot.com/2013/10/hadoop-multi-node-cluster-configuration.html > http://stackoverflow.com/questions/11263664/error-while-starting-the-hadoop-using-strat-all-sh > https://issues.apache.org/jira/browse/HDFS-2799 > https://issues.apache.org/jira/browse/HBASE-6973 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299188#comment-14299188 ] Kihwal Lee commented on HDFS-7707: -- bq. Do you mean that we could get a wrong iFile here? Since the block collection of a block won't magically get updated to a new inode file, I don't see how it can be a wrong inode file. So it may not be due to delayed block removal. bq. what's the reason that tmpParent won't get a null at the dirX when trying to get the parent of dirX (if this happened)? If snapshot is not involved, the parent will be set to null during delete while in the fsn write lock. Lack of memory barrier can cause stale values to be used in multi-processor and multi-threaded env, but I am not sure whether that is the cause here. If {{commitBlockSynchronization()}} was involved, was it initiated by client (e.g. revoerLease() or create/append() )? > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299158#comment-14299158 ] Andrew Wang commented on HDFS-7411: --- I split the logging changes off to HDFS-7706 which I just committed. New rev posted. bq. It seems the discussion above did not consider the incompatibility. I guess the unnecessarily complicated and large patch did hide the important details. We need to revisit it. This statement is false. Configuration compatibility was the core of the above discussion. In fact, my 003 rev of this patch tried to keep compatibility with the old key, and based on the discussion we decided to change that. This newest rev does bring fallback support for the old key though, which satisfies your comment. > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, > hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, > hdfs-7411.009.patch, hdfs-7411.010.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299153#comment-14299153 ] Yongjun Zhang commented on HDFS-7707: - HI Kihwal, Thanks a lot for your further comments. I did the analysis based on the edit log. I assumed {{commitBlockSynchronization()}} is involved due to the delayed block removal. Basically the same code path as examined by HDFS-6825. I will take a look at other path too. Assuming {{commitBlockSynchronization}} is involved (. The {{iNodeFile}} is got by the following code: {code} BlockCollection blockCollection = storedBlock.getBlockCollection(); INodeFile iFile = ((INode)blockCollection).asFile(); {code} Do you mean that we could get a wrong iFile here? BTW, your comment rang a bell to me: when we delete a dir, what's the reason that {{tmpParent}} won't get a null at the {{dirX}} when trying to get the parent of {{dirX}} (if this happened)? {code} while (true) { if (tmpParent == null || tmpParent.searchChildren(tmpChild.getLocalNameBytes()) < 0) { return true; } if (tmpParent.isRoot()) { break; } tmpChild = tmpParent; tmpParent = tmpParent.getParent(); } {code} Thanks. > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: hdfs-7411.010.patch > Refactor and improve decommissioning logic into DecommissionManager > --- > > Key: HDFS-7411 > URL: https://issues.apache.org/jira/browse/HDFS-7411 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.1 >Reporter: Andrew Wang >Assignee: Andrew Wang > Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, > hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, > hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, > hdfs-7411.009.patch, hdfs-7411.010.patch > > > Would be nice to split out decommission logic from DatanodeManager to > DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7713) Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs, and setting replication
Ravi Prakash created HDFS-7713: -- Summary: Improve the HDFS Web UI browser to allow chowning / chmoding, creating dirs, and setting replication Key: HDFS-7713 URL: https://issues.apache.org/jira/browse/HDFS-7713 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Ravi Prakash Assignee: Ravi Prakash This JIRA is for improving the NN UI (everything except file uploads) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7697) Document the scope of the PB OIV tool
[ https://issues.apache.org/jira/browse/HDFS-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299129#comment-14299129 ] Lei (Eddy) Xu commented on HDFS-7697: - [~wheat9] Thank for very much for filling this. Where should I add document to? > Document the scope of the PB OIV tool > - > > Key: HDFS-7697 > URL: https://issues.apache.org/jira/browse/HDFS-7697 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Haohui Mai > > As par HDFS-6673, we need to document the applicable scope of the new PB OIV > tool so that it won't catch users by surprise. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7648) Verify the datanode directory layout
[ https://issues.apache.org/jira/browse/HDFS-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299128#comment-14299128 ] Tsz Wo Nicholas Sze commented on HDFS-7648: --- > It's not the goal of DirectoryScanner to fix anything. ... The original design of DirectoryScanner is to reconciles the differences between the block information maintained in memory and the actual blocks stored in disks. So it does fix the in-memory data structure. > What would be the suggested way to fix these unmatched blocks. Also, if it is > not fixed then this warning message will be printed repeatedly during the > directory scanning interval. Yet more questions if the blocks are not fixed: should the block report include those blocks? How to access those blocks? How and when to fix those blocks? It seems fixing the blocks is better. Of course, we still log an error message for those blocks. > Verify the datanode directory layout > > > Key: HDFS-7648 > URL: https://issues.apache.org/jira/browse/HDFS-7648 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Tsz Wo Nicholas Sze >Assignee: Rakesh R > > HDFS-6482 changed datanode layout to use block ID to determine the directory > to store the block. We should have some mechanism to verify it. Either > DirectoryScanner or block report generation could do the check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7706) Switch BlockManager logging to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299122#comment-14299122 ] Hudson commented on HDFS-7706: -- FAILURE: Integrated in Hadoop-trunk-Commit #6970 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6970/]) HDFS-7706. Switch BlockManager logging to use slf4j. (wang: rev 951b3608a8cb1d9063b9be9c740b524c137b816f) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestPendingInvalidateBlock.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestPipelinesFailover.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyBlockManagement.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/PendingReplicationBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/InvalidateBlocks.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestStandbyIsHot.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Switch BlockManager logging to use slf4j > > > Key: HDFS-7706 > URL: https://issues.apache.org/jira/browse/HDFS-7706 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.6.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7706.001.patch > > > Nice little refactor to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299116#comment-14299116 ] Jing Zhao commented on HDFS-7339: - Thanks Zhe! The patch looks good overall. Some comments and questions: # Instead of the current ID division mechanism (calculating the mid point between LAST_RESERVED_BLOCK_ID and LONG.MAX), can we simply let the block group id take all the negative long space (i.e., with first bit set to 1)? In this way we can utilize larger space and use simple bit manipulations for id generation/checking. # Why do we need to reserve the first 1024 block group ids? # If we directly extend the current BlockInfo to BlockGroupInfo, the semantic of the {{triplets}} may be different for BlockGroupInfo. One possible solution is to let {{triplets}}'s size be {{3*(k+m)}}, where k is the number of data blocks and m is the number of the parity blocks. # The current BlockGroupInfo's constructor calls BlockInfo's copy constructor which constructs triplets based on replication factor. We may still need to revisit BlockInfo and BlockGroupInfo to make sure BlockGroupInfo is strictly separated with replication operations and logic. The above #3 and #4 may need some extra refactoring work on the current BlockInfo class. I'm also fine with moving this part of work to a separate jira. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7706) Switch BlockManager logging to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7706: -- Resolution: Fixed Fix Version/s: 2.7.0 Status: Resolved (was: Patch Available) Thanks again for reviewing all, committed to trunk and branch-2. I'll work on blockStateChangeLog in HDFS-7712. > Switch BlockManager logging to use slf4j > > > Key: HDFS-7706 > URL: https://issues.apache.org/jira/browse/HDFS-7706 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.6.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Fix For: 2.7.0 > > Attachments: hdfs-7706.001.patch > > > Nice little refactor to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7712) Switch blockStateChangeLog to use slf4j
Andrew Wang created HDFS-7712: - Summary: Switch blockStateChangeLog to use slf4j Key: HDFS-7712 URL: https://issues.apache.org/jira/browse/HDFS-7712 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.7.0 Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor As pointed out in HDFS-7706, updating blockStateChangeLog to use slf4j will save a lot of string construction costs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299106#comment-14299106 ] Kihwal Lee commented on HDFS-7707: -- {{isFileDeletec()}} is always called with the fsn lock held, so no modification is done while in the method and {{tmpParent}} is obtained by calling {{file.getParent()}}. So {{tmpParent}} cannot be a newly created directory inode, unless something is automatically setting the file inode's parent to the new directory inode. If {{isFileDeletec()}} is called with a wrong file inode, then it is possible to hit this condition. That means both the parent dir and the file were recreated and NN got confused. Does this case also involve {{commitBlockSynchronization()}}? > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7706) Switch BlockManager logging to use slf4j
[ https://issues.apache.org/jira/browse/HDFS-7706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299104#comment-14299104 ] Andrew Wang commented on HDFS-7706: --- Hey Kihwal, I'd like to hit that in a follow-on patch. I split this out from HDFS-7411 to aid reviewers which I'd like to rev in the meantime, and this one actually doesn't touch {{blockStateChangeLog}}. Promise I'll get right to it, just would prefer not to wait the latency of another Jenkins run. Xiaoyu, I'll take care of the import too in the follow-on too. Thanks for reviewing. I ran the failed test locally and it passed, so looks like a flake. I'll commit this shortly based on Yi's +1, thank's Yi for reviewing :) > Switch BlockManager logging to use slf4j > > > Key: HDFS-7706 > URL: https://issues.apache.org/jira/browse/HDFS-7706 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 2.6.0 >Reporter: Andrew Wang >Assignee: Andrew Wang >Priority: Minor > Attachments: hdfs-7706.001.patch > > > Nice little refactor to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7710) Remove dead code in BackupImage.java
[ https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299094#comment-14299094 ] Haohui Mai commented on HDFS-7710: -- +1 pending jenkins. > Remove dead code in BackupImage.java > > > Key: HDFS-7710 > URL: https://issues.apache.org/jira/browse/HDFS-7710 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Attachments: HDFS-7710.0.patch > > > BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is > proposed to clean it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7608) hdfs dfsclient newConnectedPeer has no write timeout
[ https://issues.apache.org/jira/browse/HDFS-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7608: Summary: hdfs dfsclient newConnectedPeer has no write timeout (was: hdfs dfsclient newConnectedPeer has no read or write timeout) I updated the title to make it clear that write timeout is still missing. HDFS-7005 already added read timeout. > hdfs dfsclient newConnectedPeer has no write timeout > - > > Key: HDFS-7608 > URL: https://issues.apache.org/jira/browse/HDFS-7608 > Project: Hadoop HDFS > Issue Type: Bug > Components: dfsclient, fuse-dfs >Affects Versions: 2.3.0, 2.6.0 > Environment: hdfs 2.3.0 hbase 0.98.6 >Reporter: zhangshilong > Labels: patch > Fix For: 2.6.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > problem: > hbase compactSplitThread may lock forever on read datanode blocks. > debug found: epollwait timeout set to 0,so epollwait can not run out. > cause: in hdfs 2.3.0 > hbase using DFSClient to read and write blocks. > DFSClient creates one socket using newConnectedPeer(addr), but has no read > or write timeout. > in v 2.6.0, newConnectedPeer has added readTimeout to deal with the > problem,but did not add writeTimeout. why did not add write Timeout? > I think NioInetPeer need a default socket timeout,so appalications will no > need to force adding timeout by themselives. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7339) Allocating and persisting block groups in NameNode
[ https://issues.apache.org/jira/browse/HDFS-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299031#comment-14299031 ] Zhe Zhang commented on HDFS-7339: - The build failure is because of the divergence of HDFS-EC and trunk (HDFS-7347). [~jingzhao], [~szetszwo]: please let me know if the patch addresses the issues we discussed during the meeting. Thanks. > Allocating and persisting block groups in NameNode > -- > > Key: HDFS-7339 > URL: https://issues.apache.org/jira/browse/HDFS-7339 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Zhe Zhang > Attachments: HDFS-7339-001.patch, HDFS-7339-002.patch, > HDFS-7339-003.patch, HDFS-7339-004.patch, HDFS-7339-005.patch, > HDFS-7339-006.patch, HDFS-7339-007.patch, Meta-striping.jpg, NN-stripping.jpg > > > All erasure codec operations center around the concept of _block group_; they > are formed in initial encoding and looked up in recoveries and conversions. A > lightweight class {{BlockGroup}} is created to record the original and parity > blocks in a coding group, as well as a pointer to the codec schema (pluggable > codec schemas will be supported in HDFS-7337). With the striping layout, the > HDFS client needs to operate on all blocks in a {{BlockGroup}} concurrently. > Therefore we propose to extend a file’s inode to switch between _contiguous_ > and _striping_ modes, with the current mode recorded in a binary flag. An > array of BlockGroups (or BlockGroup IDs) is added, which remains empty for > “traditional” HDFS files with contiguous block layout. > The NameNode creates and maintains {{BlockGroup}} instances through the new > {{ECManager}} component; the attached figure has an illustration of the > architecture. As a simple example, when a {_Striping+EC_} file is created and > written to, it will serve requests from the client to allocate new > {{BlockGroups}} and store them under the {{INodeFile}}. In the current phase, > {{BlockGroups}} are allocated both in initial online encoding and in the > conversion from replication to EC. {{ECManager}} also facilitates the lookup > of {{BlockGroup}} information for block recovery work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements
[ https://issues.apache.org/jira/browse/HDFS-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7711: --- Description: 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. >> I did not seen above two properties in code...I feel,This should be >> *{color:green}dfs.hosts/dfs.hosts.exclude{color}* 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* */yarn-env.sh* >> Most of the places written as conf dir,,but currently conf dir will not >> present in hadoop distribution. It's better to give *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh was: 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. >> I did not seen above two properties in code...This should >> dfs.hosts/dfs.hosts.exclude 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* */yarn-env.sh* >> Most of the places written as conf dir,,but currently conf dir will not >> present in hadoop distribution. It's better to give *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh > [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements > - > > Key: HDFS-7711 > URL: https://issues.apache.org/jira/browse/HDFS-7711 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Brahma Reddy Battula > > 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. > >> I did not seen above two properties in code...I feel,This should be > >> *{color:green}dfs.hosts/dfs.hosts.exclude{color}* > 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* > */yarn-env.sh* > >> Most of the places written as conf dir,,but currently conf dir will not > >> present in hadoop distribution. > It's better to give *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or > *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements
[ https://issues.apache.org/jira/browse/HDFS-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-7711: --- Description: 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. >> I did not seen above two properties in code...This should >> dfs.hosts/dfs.hosts.exclude 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* */yarn-env.sh* >> Most of the places written as conf dir,,but currently conf dir will not >> present in hadoop distribution. It's better to give *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh was: 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. >> I did not seen above two properties in code...This should >> dfs.hosts/dfs.hosts.exclude 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* */yarn-env.sh* >> Most of the places written as conf dir,,but currently conf dir will not >> present in hadoop distribution. It can be HADOOP_CONF_DIR/hadoop-env.sh or HADOOP_HOME/etc/hadoop/hadoop-env.sh > [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements > - > > Key: HDFS-7711 > URL: https://issues.apache.org/jira/browse/HDFS-7711 > Project: Hadoop HDFS > Issue Type: Bug > Components: documentation >Affects Versions: 2.6.0 >Reporter: Brahma Reddy Battula > > 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. > >> I did not seen above two properties in code...This should > >> dfs.hosts/dfs.hosts.exclude > 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* > */yarn-env.sh* > >> Most of the places written as conf dir,,but currently conf dir will not > >> present in hadoop distribution. > It's better to give *{color:green}HADOOP_CONF_DIR{color}* /hadoop-env.sh or > *{color:green}HADOOP_HOME/etc/hadoop{color}* /hadoop-env.sh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7711) [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements
Brahma Reddy Battula created HDFS-7711: -- Summary: [ HDFS DOC ] Various Typos in ClusterSetup.html and improvements Key: HDFS-7711 URL: https://issues.apache.org/jira/browse/HDFS-7711 Project: Hadoop HDFS Issue Type: Bug Components: documentation Affects Versions: 2.6.0 Reporter: Brahma Reddy Battula 1) dfs.namenode.hosts / dfs.namenode.hosts.exclude. >> I did not seen above two properties in code...This should >> dfs.hosts/dfs.hosts.exclude 2) *{color:red}conf{color}* */hadoop-env.sh* and *{color:red}conf{color}* */yarn-env.sh* >> Most of the places written as conf dir,,but currently conf dir will not >> present in hadoop distribution. It can be HADOOP_CONF_DIR/hadoop-env.sh or HADOOP_HOME/etc/hadoop/hadoop-env.sh -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7710) Remove dead code in BackupImage.java
[ https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7710: - Status: Patch Available (was: Open) > Remove dead code in BackupImage.java > > > Key: HDFS-7710 > URL: https://issues.apache.org/jira/browse/HDFS-7710 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Attachments: HDFS-7710.0.patch > > > BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is > proposed to clean it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7710) Remove dead code in BackupImage.java
[ https://issues.apache.org/jira/browse/HDFS-7710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7710: - Attachment: HDFS-7710.0.patch > Remove dead code in BackupImage.java > > > Key: HDFS-7710 > URL: https://issues.apache.org/jira/browse/HDFS-7710 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiaoyu Yao >Assignee: Xiaoyu Yao >Priority: Minor > Attachments: HDFS-7710.0.patch > > > BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is > proposed to clean it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7710) Remove dead code in BackupImage.java
Xiaoyu Yao created HDFS-7710: Summary: Remove dead code in BackupImage.java Key: HDFS-7710 URL: https://issues.apache.org/jira/browse/HDFS-7710 Project: Hadoop HDFS Issue Type: Improvement Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor BackupImage#saveCheckpoint() is not being used anywhere. This JIRA is proposed to clean it up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads
[ https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298945#comment-14298945 ] Hadoop QA commented on HDFS-4265: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12695560/0006-HDFS-4265.patch against trunk revision f2c9109. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9381//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9381//artifact/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9381//artifact/patchprocess/newPatchFindbugsWarningsbkjournal.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9381//console This message is automatically generated. > BKJM doesn't take advantage of speculative reads > > > Key: HDFS-4265 > URL: https://issues.apache.org/jira/browse/HDFS-4265 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: 2.2.0 >Reporter: Ivan Kelly >Assignee: Rakesh R > Attachments: 0005-HDFS-4265.patch, 0006-HDFS-4265.patch, > 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch, > 004-HDFS-4265.patch > > > BookKeeperEditLogInputStream reads entry at a time, so it doesn't take > advantage of the speculative read mechanism introduced by BOOKKEEPER-336. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected
[ https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298939#comment-14298939 ] Xiaoyu Yao commented on HDFS-7658: -- bq. HDFS doesnt know the complete size of the file ahead. It considers default blocksize ( in your case 256MB) for calculation while adding the new block. His default blocksize should be 128 MB without modifying dfs.blocksize in hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail with RF=2 and a 500MB space quota as you mentioned. > HDFS Space Quota not working as expected > > > Key: HDFS-7658 > URL: https://issues.apache.org/jira/browse/HDFS-7658 > Project: Hadoop HDFS > Issue Type: Bug > Environment: CDH4.6 >Reporter: Puttaswamy > > I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been > working properly.But the Hdfs Space quota has not been working as > expected.i.e, > I set the space quota of 500MB for a directory say /test-space-quota. > Then i put a file of 10 Mb into /test-space-quota which worked .Now the space > available is 480 MB ( 500 - 10*2) where 2 is rep factor. > Then i put a file of 50Mb into /test-space-quota which worked too as > expected. Now the space available is 380 MB (480 - 50*2) > "I am checking the quota left from the command hadoop fs -count -q > /test-space-quota" > Then i tried to put a file of 100 Mb . It should since it will just consume > 200 Mb of space with replication. But when i put that i got an error > "DataStreamer Exception > org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota > of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = > 662700032 B = 632 MB" > But the quota says > hadoop fs -count -q /test-space-quota > none inf 524288000 3984588801 >2 62914560 /test-space-quota > Could you please help on this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected
[ https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298936#comment-14298936 ] Xiaoyu Yao commented on HDFS-7658: -- bq. HDFS doesnt know the complete size of the file ahead. It considers default blocksize ( in your case 256MB) for calculation while adding the new block. His default blocksize should be 128 MB without modifying dfs.blocksize in hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail with RF=2 and a 500MB space quota as you mentioned. > HDFS Space Quota not working as expected > > > Key: HDFS-7658 > URL: https://issues.apache.org/jira/browse/HDFS-7658 > Project: Hadoop HDFS > Issue Type: Bug > Environment: CDH4.6 >Reporter: Puttaswamy > > I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been > working properly.But the Hdfs Space quota has not been working as > expected.i.e, > I set the space quota of 500MB for a directory say /test-space-quota. > Then i put a file of 10 Mb into /test-space-quota which worked .Now the space > available is 480 MB ( 500 - 10*2) where 2 is rep factor. > Then i put a file of 50Mb into /test-space-quota which worked too as > expected. Now the space available is 380 MB (480 - 50*2) > "I am checking the quota left from the command hadoop fs -count -q > /test-space-quota" > Then i tried to put a file of 100 Mb . It should since it will just consume > 200 Mb of space with replication. But when i put that i got an error > "DataStreamer Exception > org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota > of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = > 662700032 B = 632 MB" > But the quota says > hadoop fs -count -q /test-space-quota > none inf 524288000 3984588801 >2 62914560 /test-space-quota > Could you please help on this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected
[ https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298938#comment-14298938 ] Xiaoyu Yao commented on HDFS-7658: -- bq. HDFS doesnt know the complete size of the file ahead. It considers default blocksize ( in your case 256MB) for calculation while adding the new block. His default blocksize should be 128 MB without modifying dfs.blocksize in hdfs-site.xml. If it is a 256MB block size, copy the first 10MB file will fail with RF=2 and a 500MB space quota as you mentioned. > HDFS Space Quota not working as expected > > > Key: HDFS-7658 > URL: https://issues.apache.org/jira/browse/HDFS-7658 > Project: Hadoop HDFS > Issue Type: Bug > Environment: CDH4.6 >Reporter: Puttaswamy > > I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been > working properly.But the Hdfs Space quota has not been working as > expected.i.e, > I set the space quota of 500MB for a directory say /test-space-quota. > Then i put a file of 10 Mb into /test-space-quota which worked .Now the space > available is 480 MB ( 500 - 10*2) where 2 is rep factor. > Then i put a file of 50Mb into /test-space-quota which worked too as > expected. Now the space available is 380 MB (480 - 50*2) > "I am checking the quota left from the command hadoop fs -count -q > /test-space-quota" > Then i tried to put a file of 100 Mb . It should since it will just consume > 200 Mb of space with replication. But when i put that i got an error > "DataStreamer Exception > org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota > of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = > 662700032 B = 632 MB" > But the quota says > hadoop fs -count -q /test-space-quota > none inf 524288000 3984588801 >2 62914560 /test-space-quota > Could you please help on this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7658) HDFS Space Quota not working as expected
[ https://issues.apache.org/jira/browse/HDFS-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298921#comment-14298921 ] Xiaoyu Yao commented on HDFS-7658: -- [~putta_jammy], I'm working on a quota related feature recently. So I took a quick look but can't repro this with a dfs.block.size=128MB and replication factor of 2. Based on the information you posted, the intended quota usage for the first block of the last file is 632 MB - 140 MB ~= 492 MB. Consider the replication factor of 2, the first block allocated for the last file should be at around ~250 MB. You could get quota exceeded exception if the dfs.block.size get changed (e.g., from 128MB to 256MB) for the last file OR your last file size is greater than 128MB? > HDFS Space Quota not working as expected > > > Key: HDFS-7658 > URL: https://issues.apache.org/jira/browse/HDFS-7658 > Project: Hadoop HDFS > Issue Type: Bug > Environment: CDH4.6 >Reporter: Puttaswamy > > I am implementing hdfs quota in a cdh4.6 cluster .Hdfs name quota has been > working properly.But the Hdfs Space quota has not been working as > expected.i.e, > I set the space quota of 500MB for a directory say /test-space-quota. > Then i put a file of 10 Mb into /test-space-quota which worked .Now the space > available is 480 MB ( 500 - 10*2) where 2 is rep factor. > Then i put a file of 50Mb into /test-space-quota which worked too as > expected. Now the space available is 380 MB (480 - 50*2) > "I am checking the quota left from the command hadoop fs -count -q > /test-space-quota" > Then i tried to put a file of 100 Mb . It should since it will just consume > 200 Mb of space with replication. But when i put that i got an error > "DataStreamer Exception > org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota > of /test is exceeded: quota = 524288000 B = 500 MB but diskspace consumed = > 662700032 B = 632 MB" > But the quota says > hadoop fs -count -q /test-space-quota > none inf 524288000 3984588801 >2 62914560 /test-space-quota > Could you please help on this? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes
[ https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-5782: - Target Version/s: 3.0.0, 2.7.0 (was: 3.0.0) > BlockListAsLongs should take lists of Replicas rather than concrete classes > --- > > Key: HDFS-5782 > URL: https://issues.apache.org/jira/browse/HDFS-5782 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: Joe Pallas >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5782-branch-2.patch, HDFS-5782.patch, > HDFS-5782.patch > > > From HDFS-5194: > {quote} > BlockListAsLongs's constructor takes a list of Blocks and a list of > ReplicaInfos. On the surface, the former is mildly irritating because it is > a concrete class, while the latter is a greater concern due to being a > File-based implementation of Replica. > On deeper inspection, BlockListAsLongs passes members of both to an internal > method that accepts just Blocks, which conditionally casts them *back* to > ReplicaInfos (this cast only happens to the latter, though this isn't > immediately obvious to the reader). > Conveniently, all methods called on these objects are found in the Replica > interface, and all functional (i.e. non-test) consumers of this interface > pass in Replica subclasses. If this constructor took Lists of Replicas > instead, it would be more generally useful and its implementation would be > cleaner as well. > {quote} > Fixing this indeed makes the business end of BlockListAsLongs cleaner while > requiring no changes to FsDatasetImpl. As suggested by the above > description, though, the HDFS tests use BlockListAsLongs differently from the > production code -- they pretty much universally provide a list of actual > Blocks. To handle this: > - In the case of SimulatedFSDataset, providing a list of Replicas is actually > less work. > - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly > invasive. Instead, the patch creates a second constructor in > BlockListOfLongs specifically for the use of NNThrougputBenchmark. It turns > the stomach a little, but is clearer and requires less code than the > alternatives (and isn't without precedent). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7709) Fix Findbug Warnings
[ https://issues.apache.org/jira/browse/HDFS-7709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298917#comment-14298917 ] Rakesh R commented on HDFS-7709: I could see lots of findbug warning showing in the pre-commit build. One way is to exclude this by adding in the findbug-exclude.xml or it needs to be fixed. What would be the best way, any thoughts? > Fix Findbug Warnings > > > Key: HDFS-7709 > URL: https://issues.apache.org/jira/browse/HDFS-7709 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Rakesh R >Assignee: Rakesh R > > There are many findbug warnings related to the warning types, > - DM_DEFAULT_ENCODING, > - RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE, > - RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE > https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html > https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-rumen.html > https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-5782) BlockListAsLongs should take lists of Replicas rather than concrete classes
[ https://issues.apache.org/jira/browse/HDFS-5782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe Pallas updated HDFS-5782: - Attachment: HDFS-5782-branch-2.patch Added a patch for branch-2. > BlockListAsLongs should take lists of Replicas rather than concrete classes > --- > > Key: HDFS-5782 > URL: https://issues.apache.org/jira/browse/HDFS-5782 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Affects Versions: 3.0.0 >Reporter: David Powell >Assignee: Joe Pallas >Priority: Minor > Fix For: 3.0.0 > > Attachments: HDFS-5782-branch-2.patch, HDFS-5782.patch, > HDFS-5782.patch > > > From HDFS-5194: > {quote} > BlockListAsLongs's constructor takes a list of Blocks and a list of > ReplicaInfos. On the surface, the former is mildly irritating because it is > a concrete class, while the latter is a greater concern due to being a > File-based implementation of Replica. > On deeper inspection, BlockListAsLongs passes members of both to an internal > method that accepts just Blocks, which conditionally casts them *back* to > ReplicaInfos (this cast only happens to the latter, though this isn't > immediately obvious to the reader). > Conveniently, all methods called on these objects are found in the Replica > interface, and all functional (i.e. non-test) consumers of this interface > pass in Replica subclasses. If this constructor took Lists of Replicas > instead, it would be more generally useful and its implementation would be > cleaner as well. > {quote} > Fixing this indeed makes the business end of BlockListAsLongs cleaner while > requiring no changes to FsDatasetImpl. As suggested by the above > description, though, the HDFS tests use BlockListAsLongs differently from the > production code -- they pretty much universally provide a list of actual > Blocks. To handle this: > - In the case of SimulatedFSDataset, providing a list of Replicas is actually > less work. > - In the case of NNThroughputBenchmark, rewriting to use Replicas is fairly > invasive. Instead, the patch creates a second constructor in > BlockListOfLongs specifically for the use of NNThrougputBenchmark. It turns > the stomach a little, but is clearer and requires less code than the > alternatives (and isn't without precedent). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7709) Fix Findbug Warnings
Rakesh R created HDFS-7709: -- Summary: Fix Findbug Warnings Key: HDFS-7709 URL: https://issues.apache.org/jira/browse/HDFS-7709 Project: Hadoop HDFS Issue Type: Bug Reporter: Rakesh R Assignee: Rakesh R There are many findbug warnings related to the warning types, - DM_DEFAULT_ENCODING, - RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE, - RCN_REDUNDANT_NULLCHECK_WOULD_HAVE_BEEN_A_NPE https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs-httpfs.html https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-rumen.html https://builds.apache.org/job/PreCommit-HADOOP-Build/5542//artifact/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4265) BKJM doesn't take advantage of speculative reads
[ https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-4265: --- Attachment: 0006-HDFS-4265.patch > BKJM doesn't take advantage of speculative reads > > > Key: HDFS-4265 > URL: https://issues.apache.org/jira/browse/HDFS-4265 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha >Affects Versions: 2.2.0 >Reporter: Ivan Kelly >Assignee: Rakesh R > Attachments: 0005-HDFS-4265.patch, 0006-HDFS-4265.patch, > 001-HDFS-4265.patch, 002-HDFS-4265.patch, 003-HDFS-4265.patch, > 004-HDFS-4265.patch > > > BookKeeperEditLogInputStream reads entry at a time, so it doesn't take > advantage of the speculative read mechanism introduced by BOOKKEEPER-336. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298796#comment-14298796 ] Yongjun Zhang commented on HDFS-7707: - Hi [~brahmareddy] and [~kihwal], Thanks a lot for your comments! Currently {{isFileDeleted()}} does the following: {code} if (tmpParent == null || tmpParent.searchChildren(tmpChild.getLocalNameBytes()) < 0) { return true; } {code} which is to check whether a child name exists in parent directory. That's the part I was referring to that gets defeated. I hope my understanding is correct. I described a possible solution as the first comment, would you please share some insight? Thanks. > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7603) The background replication queue initialization may not let others run
[ https://issues.apache.org/jira/browse/HDFS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298754#comment-14298754 ] Hudson commented on HDFS-7603: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2040 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2040/]) HDFS-7603. The background replication queue initialization may not let others run. Contributed by Kihwal Lee. (kihwal: rev 89b07490f8354bb83a67b7ffc917bfe99708e615) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > The background replication queue initialization may not let others run > -- > > Key: HDFS-7603 > URL: https://issues.apache.org/jira/browse/HDFS-7603 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7603.patch, HDFS-7603.patch > > > The background replication queue initialization processes configured number > of blocks at a time and releases the namesystem write lock. This was to let > namenode start serving right after a standby to active transition or leaving > safe mode. However, this does not allow others to run much if the lock > fairness is set to "unfair" for the higher throughput. > I propose adding a delay between unlocking and locking in the async repl > queue init thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7707) Edit log corruption due to delayed block removal again
[ https://issues.apache.org/jira/browse/HDFS-7707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298736#comment-14298736 ] Kihwal Lee commented on HDFS-7707: -- How is {{isFileDeleted()}} check defeated? The check walks up the tree following the parent reference, not symbolically using path name. Creation of another directory (i.e. different INode) with the same name should not affect the check. > Edit log corruption due to delayed block removal again > -- > > Key: HDFS-7707 > URL: https://issues.apache.org/jira/browse/HDFS-7707 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Edit log corruption is seen again, even with the fix of HDFS-6825. > Prior to HDFS-6825 fix, if dirX is deleted recursively, an OP_CLOSE can get > into edit log for the fileY under dirX, thus corrupting the edit log > (restarting NN with the edit log would fail). > What HDFS-6825 does to fix this issue is, to detect whether fileY is already > deleted by checking the ancestor dirs on it's path, if any of them doesn't > exist, then fileY is already deleted, and don't put OP_CLOSE to edit log for > the file. > For this new edit log corruption, what I found was, the client first deleted > dirX recursively, then create another dir with exactly the same name as dirX > right away. Because HDFS-6825 count on the namespace checking (whether dirX > exists in its parent dir) to decide whether a file has been deleted, the > newly created dirX defeats this checking, thus OP_CLOSE for the already > deleted file gets into the edit log, due to delayed block removal. > What we need to do is to have a more robust way to detect whether a file has > been deleted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7603) The background replication queue initialization may not let others run
[ https://issues.apache.org/jira/browse/HDFS-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298719#comment-14298719 ] Hudson commented on HDFS-7603: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #90 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/90/]) HDFS-7603. The background replication queue initialization may not let others run. Contributed by Kihwal Lee. (kihwal: rev 89b07490f8354bb83a67b7ffc917bfe99708e615) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > The background replication queue initialization may not let others run > -- > > Key: HDFS-7603 > URL: https://issues.apache.org/jira/browse/HDFS-7603 > Project: Hadoop HDFS > Issue Type: Bug > Components: rolling upgrades >Reporter: Kihwal Lee >Assignee: Kihwal Lee >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7603.patch, HDFS-7603.patch > > > The background replication queue initialization processes configured number > of blocks at a time and releases the namesystem write lock. This was to let > namenode start serving right after a standby to active transition or leaving > safe mode. However, this does not allow others to run much if the lock > fairness is set to "unfair" for the higher throughput. > I propose adding a delay between unlocking and locking in the async repl > queue init thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332)