[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14568109#comment-14568109 ] Arpit Agarwal commented on HDFS-8392: - Thanks for the review Jitendra. I fixed the first point and committed to branch-7240 since the delta is trivial. bq. Ideally we should change the tests to use the datasetsMap instead of legacy field 'data'. That can be taken in a separate jira. I'll file a Jira to make sure we fix it in trunk post merge. Deferring to avoid frequent merging pain. Delta for reference: {code} --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java @@ -2598,7 +2598,7 @@ public void scheduleAllBlockReport(long delay) { * * @return the fsdataset that stores the blocks */ - @VisibleForTesting + @Deprecated public FsDatasetSpi? getFSDataset() { Preconditions.checkState(datasets.size() = 1, Did not expect more than one Dataset here.); @@ -2621,7 +2621,7 @@ public BlockScanner getBlockScanner() { * * @return */ - @VisibleForTesting + @Deprecated DirectoryScanner getDirectoryScanner() { return directoryScannersMap.get(getFSDataset()); } {code} DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8392-HDFS-7240.01.patch, HDFS-8392-HDFS-7240.02.patch, HDFS-8392-HDFS-7240.03.patch For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566953#comment-14566953 ] Jitendra Nath Pandey commented on HDFS-8392: [~arpitagarwal], the patch looks very good. I have couple of minor comments. # Datanode#getFSDataset(), getDirectoryScanner : Please annotate it as deprecated, and probably we should remove @VisibleForTesting, so that new tests are not written using these methods. # Ideally we should change the tests to use the datasetsMap instead of legacy field 'data'. That can be taken in a separate jira. I am +1 with the patch. DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8392-HDFS-7240.01.patch, HDFS-8392-HDFS-7240.02.patch, HDFS-8392-HDFS-7240.03.patch For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14557061#comment-14557061 ] Hadoop QA commented on HDFS-8392: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 48s | Pre-patch HDFS-7240 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 32s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 35s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 14s | The applied patch generated 8 new checkstyle issues (total was 665, now 659). | | {color:red}-1{color} | whitespace | 0m 9s | The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 162m 28s | Tests failed in hadoop-hdfs. | | | | 205m 41s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734938/HDFS-8392-HDFS-7240.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 770ed92 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/2/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/2/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/2/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/2/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/2/console | This message was automatically generated. DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8392-HDFS-7240.01.patch, HDFS-8392-HDFS-7240.02.patch, HDFS-8392-HDFS-7240.03.patch For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555264#comment-14555264 ] Arpit Agarwal commented on HDFS-8392: - The attached patch is the first step to supporting alternate Dataset implementations. This fixes assumptions in the DataNode code that there is a single dataset instance. Instead the block pool Id is used to lookup the dataset. The dataset instantiation is keyed off the service type, so all federated block pools would share the same dataset instance. I left {{Datanode#data}} and {{Datanode#getFSDataset}} to avoid massive changes to existing test cases and tagged them {{@VisibleForTesting}}. For HDFS unit tests we will never have more than one dataset instance. DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8392-HDFS-7240.01.patch For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555459#comment-14555459 ] Hadoop QA commented on HDFS-8392: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 41s | Pre-patch HDFS-7240 compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 4 new or modified test files. | | {color:green}+1{color} | javac | 7m 28s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 39s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 2m 12s | The applied patch generated 38 new checkstyle issues (total was 665, now 691). | | {color:red}-1{color} | whitespace | 0m 8s | The patch has 7 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 3m 4s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 168m 28s | Tests failed in hadoop-hdfs. | | | | 211m 32s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.ha.TestHAStateTransitions | | | hadoop.hdfs.server.namenode.TestCheckpoint | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestWriteToReplica | | | hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | hadoop.hdfs.server.datanode.TestDeleteBlockPool | | | hadoop.hdfs.server.namenode.TestFsckWithMultipleNameNodes | | | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.TestDecommission | | | hadoop.hdfs.server.datanode.TestDataNodeMultipleRegistrations | | | hadoop.hdfs.TestAppendSnapshotTruncate | | | hadoop.fs.viewfs.TestViewFileSystemHdfs | | | hadoop.hdfs.server.datanode.TestRefreshNamenodes | | | hadoop.hdfs.server.datanode.TestDataNodeExit | | | hadoop.hdfs.server.datanode.TestBlockScanner | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12734668/HDFS-8392-HDFS-7240.01.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | HDFS-7240 / 770ed92 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/11092/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/11092/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/11092/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/11092/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/11092/console | This message was automatically generated. DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-8392-HDFS-7240.01.patch For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544064#comment-14544064 ] Arpit Agarwal commented on HDFS-8392: - FsDatasetSpi is geared towards storing and retrieving files. In the object store we want to be able to store and retrieve metadata containers and data containers. Files may not be the best abstraction for these containers. For these we'll introduce a StorageContainerDataset. We don't foresee a third dataset type right now. The DataNode already supports multiple block pools per storage volume and most of the difficult work was done as part of the federation feature. It is relatively straightforward to extend it to support the notion of a dataset per block pool. So in a cluster running non-federated HDFS and Object store services, the DataNodes would have two blockpools and two datasets, each servicing one block pool. Hope that's a little clearer. I intend to post a patch next week. DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8392) DataNode support for multiple datasets
[ https://issues.apache.org/jira/browse/HDFS-8392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542952#comment-14542952 ] Joe Pallas commented on HDFS-8392: -- The current Ozone Architecture document seems to say that storage would simply use different block pools, so it isn't clear what motivates this. The datanode has the notion of a single dataset pretty firmly at present, and it isn't clear how multiple datasets might share the same volumes (if that is the intent). Could you elaborate on what problems this would be trying to solve? DataNode support for multiple datasets -- Key: HDFS-8392 URL: https://issues.apache.org/jira/browse/HDFS-8392 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal For HDFS-7240 we would like to share available DataNode storage across HDFS blocks and Ozone objects. The DataNode already supports sharing available storage across multiple block pool IDs for the federation feature. However all federated block pools use the same dataset implementation i.e. {{FsDatasetImpl}}. We can extend the DataNode to support multiple dataset implementations so the same storage space can be shared across one or more HDFS block pools and one or more Ozone block pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)