[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15347794#comment-15347794 ] Hudson commented on HDFS-9543: -- SUCCESS: Integrated in Hadoop-trunk-Commit #10014 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10014/]) HDFS-9543. DiskBalancer: Add Data mover. Contributed by Anu Engineer. (arp: rev 1594b472bb9df7537dbc001411c99058cc11ba41) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/planner/MoveStep.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/datamodel/DiskBalancerVolume.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/datamodel/DiskBalancerVolumeSet.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/datamodel/DiskBalancerDataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DiskBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/diskbalancer/TestDiskBalancer.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/diskbalancer/planner/Step.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/diskbalancer/TestPlanner.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Fix For: HDFS-1312 > > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, > HDFS-9543-HDFS-1312.004.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263367#comment-15263367 ] Hadoop QA commented on HDFS-9543: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 11s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 46s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 180 unchanged - 1 fixed = 180 total (was 181) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 24s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 66m 51s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 159m 17s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.datanode.TestDataNodeMetrics | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.TestDFSUpgradeFromImage | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaRecovery | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.balancer.TestBa
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263212#comment-15263212 ] Anu Engineer commented on HDFS-9543: [~arpitagarwal] & [~eddyxu] Thanks for the code review and comments. The test failures are not related to this patch. I will commit this patch shortly. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, > HDFS-9543-HDFS-1312.004.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263179#comment-15263179 ] Hadoop QA commented on HDFS-9543: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 59s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 19s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 10s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 180 unchanged - 1 fixed = 180 total (was 181) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 4s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 67m 52s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 68m 14s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 175m 15s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestHFlush | | | hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.tools.TestHdfsConfigFields | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.balancer.TestBalancer | | JDK v1.8.0_91 Timed out junit tests | org.apache.hadoop.hdfs.TestWri
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263155#comment-15263155 ] Arpit Agarwal commented on HDFS-9543: - Thanks [~anu]. +1 pending Jenkins. The computeDelay calculation is correct as Anu explained above, I was not reading it correctly. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch, HDFS-9543-HDFS-1312.003.patch, > HDFS-9543-HDFS-1312.004.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15258637#comment-15258637 ] Arpit Agarwal commented on HDFS-9543: - Hi Anu, thanks for the updated patch. My comments: # In copyBlocks, the openPoolIters call should be inside the while loop, just before you open the try block. # I didn't understand the new computeDelay calculation. Your original calculation (v1 patch) looked correct, all that remained was to subtract {{timeUsed}} and check for negative result. # Thanks for the pointer to {{DirectoryScanner#scan}}. The synchronization is needed only when making changes to the in-memory block map. {{FsDatasetImpl#moveBlock}} and its callees already synchronize on the dataset when updating the map so we should remove the {{synchronized}} block in {{DiskBalancerMover#copyBlocks}}. # Nitpick: Line 818: We can also log the maximum error count value here for quick reference. Also a typo _cound_ -> _count_. # bq. I will make this comment clearer in the next update. Balancer supports a flag called -blockpools – HDFS-8890 seems to have added this. Just leaving a note that we don't do this yet. I think the -blockpools flag just restricts the balancing to a subset of blockpools. We cannot copy blocks across blockpools. Would you consider removing the TODO and filing a Jira if you think a similar flag will be useful for disk balancer. Sorry about not catching some of these on the last pass. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256730#comment-15256730 ] Anu Engineer commented on HDFS-9543: error seems to network related -- Failed to fetch http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/binary-amd64/Packages Hash Sum mismatch > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256725#comment-15256725 ] Hadoop QA commented on HDFS-9543: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 22s {color} | {color:red} Docker failed to build yetus/hadoop:fbe3e86. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12800583/HDFS-9543-HDFS-1312.002.patch | | JIRA Issue | HDFS-9543 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15281/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch, > HDFS-9543-HDFS-1312.002.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255717#comment-15255717 ] Anu Engineer commented on HDFS-9543: on minor comments : I will fix both the issues in the next patch. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255715#comment-15255715 ] Anu Engineer commented on HDFS-9543: [~arpitagarwal] Thanks for the code review comments. I will post a revised patch soon. bq. I expect this situation will be seen only if the storage type of a disk has changed since the planner generated the plan? We allow disk balancer plans to be edited if needed. This prevents users from making a mistake. We can avoid it but produces no harm. Please let me know if you need this removed. bq. That's a heavyweight lock to hold for the move. I don't think we should synchronize on the Dataset. Completely agree. I did this to play fair with DirectoryScanner. This lock is being held by directoryScanner to prevent changes to blockMap. I wanted to make sure that diskBalancer and DirectoryScanner play nicely. Please look at {{DirectoryScanner#scan line 586}}. Please do let me know if this is not needed in DiskBalancer. bq. think computeDelay should substract the time taken for the move else the computed delay will be too conservative. What do you think? good catch. Thank you, I will fix this in next update. bq. Precondition.checkState is redundant, subsequent lines will NPE if the either object is null. Agree and Precondition is going to throw somethign similar. But it does make the invariant clear to the next person who is reading code instead of wondering if I forgot to check a condition. bq. I did not get this TODO. What does it mean to move blocks across block pools I will make this comment clearer in the next update. Balancer supports a flag called -blockpools -- HDFS-8890 seems to have added this. Just leaving a note that we don't do this yet. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15255656#comment-15255656 ] Arpit Agarwal commented on HDFS-9543: - Thanks for adding the mover implementation [~anu]. My comments below, all in DiskBalancer.java: # Line 892: I expect this situation will be seen only if the storage type of a disk has changed since the planner generated the plan? # Lie 943: That's a heavyweight lock to hold for the move. I don't think we should synchronize on the Dataset. # Line 756: I think computeDelay should substract the time taken for the move else the computed delay will be too conservative. What do you think? # Lines 822/823: Precondition.checkState is redundant, subsequent lines will NPE if the either object is null. # Line 889: I did not get this TODO. What does it mean to move blocks across block pools? Minor: # Line 958: Don't need the logging guards since you are using SLF4J and the parameters are cheap to generate. # Should we rename getBlockTolerance to getBlockTolerancePercentage to make it clear its a percentage and not a fraction? I will take a look at the tests later. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246011#comment-15246011 ] Anu Engineer commented on HDFS-9543: [~eddyxu] Thank you for the code review comments. Please see some of my thoughts on your suggestions. bq.Could you put getNextBlock() logic into a separate Iterator, and make it Closable, which will include getBlockToCopy(), openPoolIters(), getNextBlock(), closePoolIters(). There are a few draw backs of separating them into different functions. I thought that this is an excellent idea, and I explored this path of code organization. But I got stuck in an intractable problem which makes me want to abandon this path. Please do let me know if you have suggestions on how you think I can solve this. In supporting an Iterator interface, we have to support hasNext. In our case it is a scan of the block list and finding a block that is small than the required move size. There are 2 ways to do this, look for the block and report true if found, but there is no gurantee that it will indeed be returned in the next() call since these blocks can go away underneath us. So a common pattern of iterator code becomes complex to write -- while(hasNext()) next() -- pattern now needs to worry next() failing even when hasNext has been successful. We can keep a pointer to the found block in memory, and return that in the next call, but that means that we have to do some unnecessary block state management in the iterator. I ended up writing all this and found that code is getting more complex instead of simpler and has kind of decided to abandon this approach. bq. 1) The states (i.e. poolIndex,) are stored outside these functions, the caller needs maintain these states. These are part of BlockMover class, and copyblocks in the only call made by other classes. So it is not visible to caller at all. bq. 2) poolIndex is never initialized and is not able be reset. PoolIndex is a index to a circular array, if you like I can initialize this to 0, but in most cases we just move to next block pool and get the next block. Before each block fetch we init the count variable so that we know if we have visited all the block pools. In other words, users should not be able to see this, nor need to reset this variable. bq. Please always log the IOEs. And I think it is better to throw IOE here as well as many other places. I will log a debug trace here. The reason why we are not throwing is because it is possible that we might encounter a damaged block when we try to move large number of blocks from one volume to another. Instead of failing or aborting the action we keep a count of errors we have encountered. We will just ignore that block and continue copy, until we hit max_error_counts. For each move -- that is a source disk, destination disk, bytes to move -- you can specify the max error count. Hence we are not throwing but keeping track of the error count. bq. Can it be a private static List openPoolIters() ? Since we are not doing the iterator interface, I am skipping this too. bq. In a few such places, should we actually break the while loop? Wouldn't continue here just generate a lot of LOGS and spend CPU cycles? You are right and I did think of writing a break here. The reason that I chose continue over break was this. It is easier to reason about a loop if it has only one exit point. With break, you have to reason about all exit points. This loop is a pretty complicated one since it can exit in many ways. Here are some : # when we meet the maximum error count. # if we have reach close enough to move bytes -- say close to 10% of the target # if we get an interrupt call from the client. # if we get a shutdown call. # if we are not able to find any blocks to move. # if we are out of destination space. so instead of making people reason about each of these, we set exit flag and loop back up. Since the while will turn to false, it will have one single exit, and will exit without any extra logging. bq. Why do you need to change float to double. In this case, wouldn't float good enough ? This is a stylistic change, Java docs recommend double as the default choice over float. Hence this fix. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243878#comment-15243878 ] Lei (Eddy) Xu commented on HDFS-9543: - Hi, [~anu] Thanks for the patches A few comments: Could you put {{getNextBlock()}} logic into a separate Iterator, and make it {{Closable}}, which will include {{getBlockToCopy(), openPoolIters(), getNextBlock(), closePoolIters()}}. There are a few draw backs of separating them into different functions. 1) The states (i.e. poolIndex,) are stored outside these functions, the caller needs maintain these states. 2) {{poolIndex}} is never initialized and is not able be reset. {code} } catch (IOException e) { item.incErrorCount(); } {code} Please always log the IOEs. And I think it is better to throw {{IOE}} here as well as many other places. {code} private void openPoolIters(); {code} Can it be a {{private static List openPoolIters()}}? {code} // Check for the max error count constraint. if (item.getErrorCount() > getMaxError(item)) { LOG.error("Exceeded the max error count. source {}, dest: {} " + "error count: {}", source.getBasePath(), dest.getBasePath(), item.getErrorCount()); this.setExitFlag(); continue; } {code} In a few such places, should we actually {{break}} the while loop? Wouldn't {{continue}} here just generate a lot of LOGS and spend CPU cycles? Why do you need to change {{float}} to {{double}}. In this case, wouldn't {{float}} good enough ? I think a {{5%}} of errors are OK for these tasks. Thanks very much! > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the actual mover logic to the datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243672#comment-15243672 ] Anu Engineer commented on HDFS-9543: Test failures are not related to this patch. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > Attachments: HDFS-9543-HDFS-1312.001.patch > > > This patch adds the RPCs and mover logic that allows data to be moved from > one storage partition to another -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243664#comment-15243664 ] Hadoop QA commented on HDFS-9543: - | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 50s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 30s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s {color} | {color:green} HDFS-1312 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 43s {color} | {color:green} HDFS-1312 passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 43s {color} | {color:green} HDFS-1312 passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} hadoop-hdfs-project/hadoop-hdfs: patch generated 0 new + 180 unchanged - 1 fixed = 180 total (was 181) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 40s {color} | {color:green} the patch passed with JDK v1.8.0_77 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 45s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m 13s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_77. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 23s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 32s {color} | {color:red} Patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 119m 6s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_77 Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestReadStripedFileWithMissingBlocks | | | hadoop.hdfs.server.namenode.ha.TestEditLogTailer | | | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead | | | hadoop.hdfs.security.TestDelegationTokenForProxyUser | | | hadoop.hdfs.TestFileAppend | | | hadoop.hdfs.tools.TestDFSAdmin | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | J
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237809#comment-15237809 ] Anu Engineer commented on HDFS-9543: actively working on this, should be able to post a patch soon. > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > > This patch adds the RPCs and mover logic that allows data to be moved from > one storage partition to another -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-9543) DiskBalancer : Add Data mover
[ https://issues.apache.org/jira/browse/HDFS-9543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237802#comment-15237802 ] Lei (Eddy) Xu commented on HDFS-9543: - Hi, [~anu] Any update on this? > DiskBalancer : Add Data mover > -- > > Key: HDFS-9543 > URL: https://issues.apache.org/jira/browse/HDFS-9543 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Anu Engineer >Assignee: Anu Engineer > > This patch adds the RPCs and mover logic that allows data to be moved from > one storage partition to another -- This message was sent by Atlassian JIRA (v6.3.4#6332)