[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935602#comment-15935602 ] Andrew Wang commented on HDFS-10530: Oh, interesting! Thanks for digging in Manoj. If there are three DNs each doing reconstruction, that's less efficient since it does 3*num_data_blocks of network reads, vs. having 1 DN do num_data_blocks to make all three missing blocks, and copying two to other DNs. Also something to investigate. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933920#comment-15933920 ] Manoj Govindassamy commented on HDFS-10530: --- Filed HDFS-11552 and HDFS-11552 to track the improvements needed as per the discussion in [comment|https://issues.apache.org/jira/browse/HDFS-10530?focusedCommentId=15928996=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15928996] {code} 2017-03-16 13:57:40,898 [DataNode: [[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17, [DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]] heartbeating to localhost/127.0.0.1:45189] INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2017-03-16 13:57:40,943 [DataXceiver for client at /127.0.0.1:47340 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: /127.0.0.1:47340 dest: /127.0.0.1:44841 2017-03-16 13:57:40,944 [DataXceiver for client at /127.0.0.1:54478 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: /127.0.0.1:54478 dest: /127.0.0.1:38977 2017-03-16 13:57:40,945 [DataXceiver for client at /127.0.0.1:51622 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: /127.0.0.1:51622 dest: /127.0.0.1:41895 {code} bq. Based on this, I think there's one DN doing reconstruction work to make three parity blocks, which get written to the three new nodes. The above logs are all from the receiving DNs. I see src addresses as src: /127.0.0.1:47340, src: /127.0.0.1:54478, src: /127.0.0.1:51622. ? I thought these reconstructions are coming from different DNs. Not so ? Will check the code. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929217#comment-15929217 ] Manoj Govindassamy commented on HDFS-10530: --- Thanks for the review and commit help [~andrew.wang]. Sure, will file a jira to track parity block not written out on insufficient DNs. And, will track all other logging related issued in a separate Jira. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929054#comment-15929054 ] Hudson commented on HDFS-10530: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11417 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11417/]) HDFS-10530. BlockManager reconstruction work scheduling should correctly (wang: rev 4812518b23cac496ab5cdad5258773bcd9728770) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Fix For: 3.0.0-alpha3 > > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928998#comment-15928998 ] Andrew Wang commented on HDFS-10530: This looks good to me. I can fix up the checkstyle whitespace issue at commit time. +1 will commit shortly. I raised a bunch of other questions in my previous comment, which we can address in follow-on JIRAs. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928996#comment-15928996 ] Andrew Wang commented on HDFS-10530: Thanks for digging in Manoj! Few follow-up Q's: bq. DFSStripedOutputStream verifies if the allocated block locations length is at least equals numDataBlocks, otherwise it throws IOException and the client halts. So, the relaxation is only for the parity blocks. Ran the test myself, looking through the output. It looks like with 6 DNs, we don't allocate any locations for the parity blocks (only 6 replicas): {noformat} 2017-03-16 13:57:38,902 [IPC Server handler 0 on 45189] INFO hdfs.StateChange (FSDirWriteFileOp.java:logAllocatedBlock(777)) - BLOCK* allocate blk_-9223372036854775792_1001, replicas=127.0.0.1:37655, 127.0.0.1:33575, 127.0.0.1:38319, 127.0.0.1:46751, 127.0.0.1:44029, 127.0.0.1:37065 for /ec/test1 {noformat} Could you file a JIRA to dig into this? It looks like we can't write blocks from the same EC group to the same DN. It's still better to write the parities then not at all though. bq. WARN hdfs.DFSOutputStream (DFSStripedOutputStream.java:logCorruptBlocks(1117)) - Block group <1> has 3 corrupt blocks. It's at high risk of losing data. Agree that this log is not accurate, mind filing a JIRA to correct this message? "corrupt" means we have data loss. Here, we haven't lost data yet, but are suffering extremely lowered durability. I'd prefer we also quantify the risk in the message, e.g. "loss of any block" or "loss of two blocks will result in data loss". {noformat} 2017-03-16 13:57:40,898 [DataNode: [[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17, [DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]] heartbeating to localhost/127.0.0.1:45189] INFO datanode.DataNode (BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: DNA_ERASURE_CODING_RECOVERY 2017-03-16 13:57:40,943 [DataXceiver for client at /127.0.0.1:47340 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: /127.0.0.1:47340 dest: /127.0.0.1:44841 2017-03-16 13:57:40,944 [DataXceiver for client at /127.0.0.1:54478 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: /127.0.0.1:54478 dest: /127.0.0.1:38977 2017-03-16 13:57:40,945 [DataXceiver for client at /127.0.0.1:51622 [Receiving block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] INFO datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: /127.0.0.1:51622 dest: /127.0.0.1:41895 {noformat} Based on this, I think there's one DN doing reconstruction work to make three parity blocks, which get written to the three new nodes. The above logs are all from the receiving DNs. Seems like we've got a serious lack of logging though in ECWorker / StripedBlockReconstructor / etc, since I determined the above via code inspection. I'd like to see logs for what blocks are being read in, for decoding, and also for writing the blocks out. Another JIRA? > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928919#comment-15928919 ] Hadoop QA commented on HDFS-10530: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 373 unchanged - 0 fixed = 374 total (was 373) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 37s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 20s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-10530 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12859151/HDFS-10530.5.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 0df06d50311b 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 09ad8ef | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18743/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/18743/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18743/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18743/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy >
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927364#comment-15927364 ] Hadoop QA commented on HDFS-10530: -- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 42s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 373 unchanged - 0 fixed = 374 total (was 373) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 44s{color} | {color:green} hadoop-hdfs in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 89m 40s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-10530 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12858793/HDFS-10530.4.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3739459772d2 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 615ac09 | | Default Java | 1.8.0_121 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18735/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/18735/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/18735/console | | Powered by | Apache Yetus 0.5.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components:
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927311#comment-15927311 ] Andrew Wang commented on HDFS-10530: Hi Manoj, thanks for working on this, hoping you can help me understand the current behavior, also a few review comments too: In the unit test, looks like we start a cluster with 6 racks, write a 6+3 file, add 3 more hosts, then wait for parity blocks to be written to these 3 new nodes. * Are these necessarily the parity blocks, or could they be any of the blocks that are co-located on the first 6 racks? * Also, does this happen via EC reconstruction, or do we simply copy the blocks over to the new racks? If it's the first, we should file a follow-on to do the second when possible. * Is the BPP violated before entering the waitFor? If so we should assert that. This may require pausing reconstruction work and resuming later. A few other notes: * "block recovery" refers to specifically the process of recovering from a DN failure while writing. I'm guessing this is actually more like "replication" or since it's EC, "reconstruction" work? * Do you think TestBPPRackFaultTolerant needs any additional unit tests along these lines? {code} cluster.startDataNodes(conf, 3, true, null, null, new String[]{"host3", "host4", "host5"}, null); {code} Looks like these have the same names as the initial DNs as Takanobu noted. Might be nice to specify the racks too to be explicit. {code} lb = DFSTestUtil.getAllBlocks(dfs, testFileUnsatisfied).get(0); blockInfo = bm.getStoredBlock(lb.getBlock().getLocalBlock()); dumpStoragesFor(blockInfo); // But, there will not be any block placement changes yet assertFalse("Block group of testFileUnsatisfied should not be placement" + " policy satisfied", bm.isPlacementPolicySatisfied(blockInfo)); {code} If we later enhance the NN to automatically fix up misplaced EC blocks, this assert will be flaky. Maybe add a comment? Or we could pause reconstruction work, add the DNs, and resume after the assert. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925588#comment-15925588 ] Takanobu Asanuma commented on HDFS-10530: - Thanks for the patch, [~manojg]! Mostly it looks good to me. Some comments. * It would be more readable if the names of the additional DNs are different from the first DNs. {code} cluster.startDataNodes(conf, 3, true, null, new String[]{ "/rack3", "/rack4", "/rack5"}, new String[]{"host3-2", "host4-2", "host5-2"}, null); cluster.triggerHeartbeats(); {code} * It seems the block-placement-policy keeps being satisfied during the EC reconstruction. So the {{GenericTestUtils.waitFor}} does not assure it's finished. We can use {{DFSTestUtil.waitForReplication}} instead of the {{GenericTestUtils.waitFor}}. {code} DFSTestUtil.waitForReplication(dfs, testFileUnsatisfied, (short)(numDataBlocks + numParityBlocks), 15 * 1000); {code} > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Manoj Govindassamy > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch, HDFS-10530.4.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907354#comment-15907354 ] Takanobu Asanuma commented on HDFS-10530: - Thanks for your detailed investigation, [~manojg]. [~demongaorui] is not mainly working on HDFS-EC. If you have interest in this task, please feel free to reassign to you. I can make contact with him directly and will let him know. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Rui Gao > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905906#comment-15905906 ] Manoj Govindassamy commented on HDFS-10530: --- bq. isPlacementPolicySatisfied should use getRealTotalBlockNum Patch 3 fix of changing the isPlacementPolicySatisfied() to make use getRealTotalBlockNum() instead of getRealDataBlockNum() will definitely fix the blocks which are either detected for reconstruction needed or the newly created files. But, for the ones which are already created and mis-replicated, I am not seeing this patch recovering them. I will post a addendum patch soon with the latest rebase and the modified test as explained in my previous comment. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Rui Gao > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905898#comment-15905898 ] Andrew Wang commented on HDFS-10530: BTW, do you have any review comments for the v3 patch? We need a new Jenkins run since it's gone stale, but the idea at least LGTM. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Rui Gao > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905897#comment-15905897 ] Andrew Wang commented on HDFS-10530: Nice investigation Manoj. This relates to a long-standing HDFS supportability issue, where the only way to fix mis-replicated blocks was to setrep the cluster to 4 then back to 3. I think the priority thus is to have some way for admins to triggering the big scan. When I looked earlier, fsck looked like a good place to hook in since it runs on the server side. Since this patch is pretty self-contained, maybe we can pursue the manual triggering as a separate JIRA. As a follow-on, it'd be nice to find some way of fixing these up in the background automatically. One idea I had is to do it during FBR processing. We want to be careful not to require this on the fast-path though since FBR processing is already expensive, possibly by using your ideas regarding tracking topology changes. Other non-FBR ideas would also be quite welcome, since it duplicates work and it takes a really long time for all DNs to FBR. Anything that can incrementally iterate the block map would work. I think it's also fair to handle some of this in a follow-on JIRA. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Rui Gao > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905864#comment-15905864 ] Manoj Govindassamy commented on HDFS-10530: --- [~demongaorui], [~andrew.wang], [~zhz], The test HDFS-10530.3.patch provided by [~demongaorui] shows the addition of data nodes in new racks doesn't necessarily rebalance the existing striped blocks for better placement. The test does the following 1. Start cluster with 6 DNs in 6 Racks 2. Create file with RS-6-3 EC policy. Parity blocks couldn't be created owing to non-availability of DNs 3. Verify block placement is satisfied as there are only 6 racks available [though the ideal would be 9] 4. Add 3 more DNs in 3 more Racks. 5. Verify the block placement is still not satisfied for the previously created file 6. Create a new striped file and now verify if the new file's blocks placement policy is satisfied I modified the test a little more to do the following.. 4. Add 3 more DNs in the existing racks 5. Wait for Replication monitor to recover under replicated blocks 6. Parity blocks now gets created on the newly added DNs. 7. Now the file has 9 blocks, in 9 DNs in 6 racks. 8. Verify this block placement is satisfied as there are only 6 Racks 9. Add 3 more DNs in 3 more Racks 10. With NO fix applied, verify this block placement is NOT satisfied as there are only 9 Racks, but blocks are not striped on all. 6. Create a new striped file and now verify if the new file's blocks placement policy is satisfied *Fix Proposal:* 1. The Goal is to find if there are Under-Replicated/Mis-Placed blocks for EC files upon DN addition on a new Rack. 2. {{DatanodeManager#addDatanode}} can detect the new topology changes. There is already a method {{#checkIfClusterIsNowMultiRack}} which was specifically written for Racks growing from 1 to more. But, I believe we need a more generic check here in the context of EC. 3. {{checkIfClusterIsNowMultiRack}} => {{processMisReplicatedBlocks}} => {{processMisReplicatesAsync}} is heavy weight as it scans the entire BlockMap. Though rack addition is a rare operation, doing this whole world scan for every rack addition seems quite time consuming. The ideal thing would be take some clues and trigger {{processMisReplicatesAsync}} only when needed so that it detect Mis-placements and trigger proper block placements 4. How about checking for the list of EC policies ever set on any files ? Just like Enabled EC policy list, we can maintain active EC Policy list (max of sys polices count). This can serve as a clue for the potential block mis-placements as we can deduce the rack requirements from the EC policy schema. Once getting the clue, we can trigger {{processMisReplicatesAsync}} which can then feed the work to Replication monitor which is running continuously. Please share your thoughts on the above proposal. Would love to hear your suggestions on better alternative approaches. Thanks. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: Rui Gao >Assignee: Rui Gao > Labels: hdfs-ec-3.0-nice-to-have > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, > HDFS-10530.3.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817065#comment-15817065 ] Hadoop QA commented on HDFS-10530: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 42s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 27s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 372 unchanged - 0 fixed = 374 total (was 372) {color} | | {color:red}-1{color} | {color:red} mvnsite {color} | {color:red} 0m 43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 15s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 44s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:a9ad5d6 | | JIRA Issue | HDFS-10530 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12815405/HDFS-10530.3.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 991eb2ce5c51 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 4db119b | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | mvninstall | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt | | compile | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | javac | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | mvnsite | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt | | unit |
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356891#comment-15356891 ] Hadoop QA commented on HDFS-10530: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 30s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 16s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 27s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 390 unchanged - 0 fixed = 392 total (was 390) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 28s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}115m 30s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency | | | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.server.namenode.ha.TestHAAppend | | | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12815405/HDFS-10530.3.patch | | JIRA Issue | HDFS-10530 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 355feed0992a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 846ada2 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/15954/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15954/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15954/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15954/console | | Powered by | Apache
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354466#comment-15354466 ] Hadoop QA commented on HDFS-10530: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 0s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 50s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 388 unchanged - 2 fixed = 390 total (was 390) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 28s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 82m 43s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes | | | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.server.blockmanagement.TestBlockManager | | | hadoop.hdfs.server.namenode.TestStartup | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12814461/HDFS-10530.2.patch | | JIRA Issue | HDFS-10530 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 5275a422366e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 77031a9 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/15939/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15939/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15939/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15939/console | | Powered by | Apache Yetus 0.4.0-SNAPSHOT http://yetus.apache.org | This message was
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354199#comment-15354199 ] GAO Rui commented on HDFS-10530: Patch 2 had been attached to address TestBalancer failures. {{hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer}} seems not related to block placement changes, and the other tests should could pass this time. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350895#comment-15350895 ] GAO Rui commented on HDFS-10530: I've investigated the failure of {{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}. Found out that, reconstruction works caused by block placement policy interfered {{Balancer}} to balance utilization of DatanodeStorages. We do could have conflict between block placement policy and balancer policy. Like in the scenario of {{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}, the last added datanode would be filled up with internal block/parity block of all block groups according to {{BlockPlacementPolicyRackFaultTolerant}}, while this would make this datanode always be recognized as {{over-utilized}} by {{Balancer}}. This may make {{Balancer}} could never finish it's work successfully. I suggest we make {{Balancer}} tolerate certain percent( say 10%) of datanodes as {{over-utilized}}, after {{Balancer#runOneIteration()}} runs for 5 times, and less than 10% of datanodes is {{over-utilized}}, we make {{Balancer}} finish it's work successfully. [~zhz], could you share your opinions? Thank you. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > Attachments: HDFS-10530.1.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350390#comment-15350390 ] Hadoop QA commented on HDFS-10530: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 30s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 51s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 43s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 184 unchanged - 0 fixed = 186 total (was 184) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m 6s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 93m 48s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped | | | hadoop.hdfs.server.balancer.TestBalancer | | | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.TestLeaseRecoveryStriped | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:85209cc | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12813576/HDFS-10530.1.patch | | JIRA Issue | HDFS-10530 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 93a4016d67e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 73615a7 | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/15916/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15916/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15916/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output |
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345772#comment-15345772 ] Zhe Zhang commented on HDFS-10530: -- Thanks [~demongaorui] for reporting this. Yes I believe the right behavior in the example is to distribute to 9 racks, agreed {{isPlacementPolicySatisfied}} should use {{getRealTotalBlockNum}}. A drawback is that this might cause significant more traffic when a new rack is added. But considering 1) adding a new rack is a rare scenario; 2) we are throttling reconstruction work, I think we can make this change for better fault tolerance. We should add a follow-on task to put this kind of reconstruction tasks to lowest priority in {{LowRedundancyBlocks}}. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy
[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331473#comment-15331473 ] GAO Rui commented on HDFS-10530: Any comments are welcome. Let's discuss and try to reach to an agreement for this issue. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Reporter: GAO Rui >Assignee: GAO Rui > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org