subject:"\[jira\] \[Commented\] \(HDFS\-10530\) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy"

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-21 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935602#comment-15935602
 ] 

Andrew Wang commented on HDFS-10530:


Oh, interesting! Thanks for digging in Manoj.

If there are three DNs each doing reconstruction, that's less efficient since 
it does 3*num_data_blocks of network reads, vs. having 1 DN do num_data_blocks 
to make all three missing blocks, and copying two to other DNs. Also something 
to investigate.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-20 Thread Manoj Govindassamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933920#comment-15933920
 ] 

Manoj Govindassamy commented on HDFS-10530:
---


Filed HDFS-11552 and HDFS-11552 to track the improvements needed as per the 
discussion in 
[comment|https://issues.apache.org/jira/browse/HDFS-10530?focusedCommentId=15928996=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15928996]

{code}
2017-03-16 13:57:40,898 [DataNode: 
[[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17,
 
[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]]
  heartbeating to localhost/127.0.0.1:45189] INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: 
DNA_ERASURE_CODING_RECOVERY
2017-03-16 13:57:40,943 [DataXceiver for client  at /127.0.0.1:47340 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: 
/127.0.0.1:47340 dest: /127.0.0.1:44841
2017-03-16 13:57:40,944 [DataXceiver for client  at /127.0.0.1:54478 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: 
/127.0.0.1:54478 dest: /127.0.0.1:38977
2017-03-16 13:57:40,945 [DataXceiver for client  at /127.0.0.1:51622 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: 
/127.0.0.1:51622 dest: /127.0.0.1:41895
{code}

bq. Based on this, I think there's one DN doing reconstruction work to make 
three parity blocks, which get written to the three new nodes. The above logs 
are all from the receiving DNs.
I see src addresses as src: /127.0.0.1:47340, src: /127.0.0.1:54478, src: 
/127.0.0.1:51622. ? I thought these reconstructions are coming from different 
DNs. Not so ? Will check the code.


> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-16 Thread Manoj Govindassamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929217#comment-15929217
 ] 

Manoj Govindassamy commented on HDFS-10530:
---

Thanks for the review and commit help [~andrew.wang]. 
Sure, will file a jira to track parity block not written out on insufficient 
DNs. And, will track all other logging related issued in a separate Jira.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-16 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929054#comment-15929054
 ] 

Hudson commented on HDFS-10530:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #11417 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/11417/])
HDFS-10530. BlockManager reconstruction work scheduling should correctly (wang: 
rev 4812518b23cac496ab5cdad5258773bcd9728770)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-16 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928998#comment-15928998
 ] 

Andrew Wang commented on HDFS-10530:


This looks good to me. I can fix up the checkstyle whitespace issue at commit 
time.

+1 will commit shortly. I raised a bunch of other questions in my previous 
comment, which we can address in follow-on JIRAs.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-16 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928996#comment-15928996
 ] 

Andrew Wang commented on HDFS-10530:


Thanks for digging in Manoj! Few follow-up Q's:

bq. DFSStripedOutputStream verifies if the allocated block locations length is 
at least equals numDataBlocks, otherwise it throws IOException and the client 
halts. So, the relaxation is only for the parity blocks.

Ran the test myself, looking through the output.

It looks like with 6 DNs, we don't allocate any locations for the parity blocks 
(only 6 replicas):

{noformat}
2017-03-16 13:57:38,902 [IPC Server handler 0 on 45189] INFO  hdfs.StateChange 
(FSDirWriteFileOp.java:logAllocatedBlock(777)) - BLOCK* allocate 
blk_-9223372036854775792_1001, replicas=127.0.0.1:37655, 127.0.0.1:33575, 
127.0.0.1:38319, 127.0.0.1:46751, 127.0.0.1:44029, 127.0.0.1:37065 for /ec/test1
{noformat}

Could you file a JIRA to dig into this? It looks like we can't write blocks 
from the same EC group to the same DN. It's still better to write the parities 
then not at all though.

bq. WARN  hdfs.DFSOutputStream 
(DFSStripedOutputStream.java:logCorruptBlocks(1117)) - Block group <1> has 3 
corrupt blocks. It's at high risk of losing data.

Agree that this log is not accurate, mind filing a JIRA to correct this 
message? "corrupt" means we have data loss. Here, we haven't lost data yet, but 
are suffering extremely lowered durability.

I'd prefer we also quantify the risk in the message, e.g. "loss of any block" 
or "loss of two blocks will result in data loss".

{noformat}
2017-03-16 13:57:40,898 [DataNode: 
[[[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data17,
 
[DISK]file:/home/andrew/dev/hadoop/trunk/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data18]]
  heartbeating to localhost/127.0.0.1:45189] INFO  datanode.DataNode 
(BPOfferService.java:processCommandFromActive(738)) - DatanodeCommand action: 
DNA_ERASURE_CODING_RECOVERY
2017-03-16 13:57:40,943 [DataXceiver for client  at /127.0.0.1:47340 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775786_1002 src: 
/127.0.0.1:47340 dest: /127.0.0.1:44841
2017-03-16 13:57:40,944 [DataXceiver for client  at /127.0.0.1:54478 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775785_1002 src: 
/127.0.0.1:54478 dest: /127.0.0.1:38977
2017-03-16 13:57:40,945 [DataXceiver for client  at /127.0.0.1:51622 [Receiving 
block BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002]] 
INFO  datanode.DataNode (DataXceiver.java:writeBlock(717)) - Receiving 
BP-1145201309-127.0.1.1-1489697856256:blk_-9223372036854775784_1002 src: 
/127.0.0.1:51622 dest: /127.0.0.1:41895
{noformat}

Based on this, I think there's one DN doing reconstruction work to make three 
parity blocks, which get written to the three new nodes. The above logs are all 
from the receiving DNs.

Seems like we've got a serious lack of logging though in ECWorker / 
StripedBlockReconstructor / etc, since I determined the above via code 
inspection. I'd like to see logs for what blocks are being read in, for 
decoding, and also for writing the blocks out. Another JIRA?

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch, HDFS-10530.5.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-16 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928919#comment-15928919
 ] 

Hadoop QA commented on HDFS-10530:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
43s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 36s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 373 unchanged - 0 fixed = 374 total (was 373) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 69m 20s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 94m 15s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-10530 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12859151/HDFS-10530.5.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0df06d50311b 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 09ad8ef |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18743/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18743/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18743/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18743/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
>

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-15 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927364#comment-15927364
 ] 

Hadoop QA commented on HDFS-10530:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
18s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 12m 
42s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 38s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 1 new + 373 unchanged - 0 fixed = 374 total (was 373) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 64m 
44s{color} | {color:green} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 89m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-10530 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12858793/HDFS-10530.4.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 3739459772d2 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 615ac09 |
| Default Java | 1.8.0_121 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18735/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18735/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18735/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components:

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-15 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927311#comment-15927311
 ] 

Andrew Wang commented on HDFS-10530:


Hi Manoj, thanks for working on this, hoping you can help me understand the 
current behavior, also a few review comments too:

In the unit test, looks like we start a cluster with 6 racks, write a 6+3 file, 
add 3 more hosts, then wait for parity blocks to be written to these 3 new 
nodes. 

* Are these necessarily the parity blocks, or could they be any of the blocks 
that are co-located on the first 6 racks?
* Also, does this happen via EC reconstruction, or do we simply copy the blocks 
over to the new racks? If it's the first, we should file a follow-on to do the 
second when possible.
* Is the BPP violated before entering the waitFor? If so we should assert that. 
This may require pausing reconstruction work and resuming later.

A few other notes:

* "block recovery" refers to specifically the process of recovering from a DN 
failure while writing. I'm guessing this is actually more like "replication" or 
since it's EC, "reconstruction" work?
* Do you think TestBPPRackFaultTolerant needs any additional unit tests along 
these lines?

{code}
  cluster.startDataNodes(conf, 3, true, null, null,
  new String[]{"host3", "host4", "host5"}, null);
{code}

Looks like these have the same names as the initial DNs as Takanobu noted. 
Might be nice to specify the racks too to be explicit.

{code}
  lb = DFSTestUtil.getAllBlocks(dfs, testFileUnsatisfied).get(0);
  blockInfo = bm.getStoredBlock(lb.getBlock().getLocalBlock());
  dumpStoragesFor(blockInfo);
  // But, there will not be any block placement changes yet
  assertFalse("Block group of testFileUnsatisfied should not be placement" +
  " policy satisfied", bm.isPlacementPolicySatisfied(blockInfo));
{code}

If we later enhance the NN to automatically fix up misplaced EC blocks, this 
assert will be flaky. Maybe add a comment? Or we could pause reconstruction 
work, add the DNs, and resume after the assert.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-15 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925588#comment-15925588
 ] 

Takanobu Asanuma commented on HDFS-10530:
-

Thanks for the patch, [~manojg]! Mostly it looks good to me. Some comments.

* It would be more readable if the names of the additional DNs are different 
from the first DNs.

{code}
cluster.startDataNodes(conf, 3, true, null,
new String[]{ "/rack3", "/rack4", "/rack5"},
new String[]{"host3-2", "host4-2", "host5-2"}, null);
cluster.triggerHeartbeats();
{code}

* It seems the block-placement-policy keeps being satisfied during the EC 
reconstruction. So the {{GenericTestUtils.waitFor}} does not assure it's 
finished. We can use {{DFSTestUtil.waitForReplication}} instead of the 
{{GenericTestUtils.waitFor}}.

{code}
DFSTestUtil.waitForReplication(dfs, testFileUnsatisfied,
  (short)(numDataBlocks + numParityBlocks), 15 * 1000);
{code}

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch, HDFS-10530.4.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-13 Thread Takanobu Asanuma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15907354#comment-15907354
 ] 

Takanobu Asanuma commented on HDFS-10530:
-

Thanks for your detailed investigation, [~manojg]. [~demongaorui] is not mainly 
working on HDFS-EC. If you have interest in this task, please feel free to 
reassign to you. I can make contact with him directly and will let him know.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Rui Gao
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-10 Thread Manoj Govindassamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905906#comment-15905906
 ] 

Manoj Govindassamy commented on HDFS-10530:
---


bq. isPlacementPolicySatisfied should use getRealTotalBlockNum

Patch 3 fix of changing the isPlacementPolicySatisfied() to make use 
getRealTotalBlockNum() instead of getRealDataBlockNum() will definitely fix the 
blocks which are either detected for reconstruction needed or the newly created 
files. But, for the ones which are already created and mis-replicated, I am not 
seeing this patch recovering them. I will post a addendum patch soon with the 
latest rebase and the modified test as explained in my previous comment.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Rui Gao
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-10 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905898#comment-15905898
 ] 

Andrew Wang commented on HDFS-10530:


BTW, do you have any review comments for the v3 patch? We need a new Jenkins 
run since it's gone stale, but the idea at least LGTM.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Rui Gao
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-10 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905897#comment-15905897
 ] 

Andrew Wang commented on HDFS-10530:


Nice investigation Manoj. This relates to a long-standing HDFS supportability 
issue, where the only way to fix mis-replicated blocks was to setrep the 
cluster to 4 then back to 3.

I think the priority thus is to have some way for admins to triggering the big 
scan. When I looked earlier, fsck looked like a good place to hook in since it 
runs on the server side. Since this patch is pretty self-contained, maybe we 
can pursue the manual triggering as a separate JIRA.

As a follow-on, it'd be nice to find some way of fixing these up in the 
background automatically. One idea I had is to do it during FBR processing. We 
want to be careful not to require this on the fast-path though since FBR 
processing is already expensive, possibly by using your ideas regarding 
tracking topology changes.

Other non-FBR ideas would also be quite welcome, since it duplicates work and 
it takes a really long time for all DNs to FBR. Anything that can incrementally 
iterate the block map would work.



I think it's also fair to handle some of this in a follow-on JIRA.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Rui Gao
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-03-10 Thread Manoj Govindassamy (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15905864#comment-15905864
 ] 

Manoj Govindassamy commented on HDFS-10530:
---

[~demongaorui], [~andrew.wang], [~zhz],

The test HDFS-10530.3.patch provided by [~demongaorui] shows the addition of 
data nodes in new racks doesn't necessarily rebalance the existing striped 
blocks for better placement. The test does the following

1. Start cluster with 6 DNs in 6 Racks
2. Create file with RS-6-3 EC policy. Parity blocks couldn't be created owing 
to non-availability of DNs
3. Verify block placement is satisfied as there are only 6 racks available 
[though the ideal would be 9]
4. Add 3 more DNs in 3 more Racks.
5. Verify the block placement is still not satisfied for the previously created 
file
6. Create a new striped file and now verify if the new file's blocks placement 
policy is satisfied

I modified the test a little more to do the following..

4. Add 3 more DNs in the existing racks
5. Wait for Replication monitor to recover under replicated blocks
6. Parity blocks now gets created on the newly added DNs.
7. Now the file has 9 blocks, in 9 DNs in 6 racks.
8. Verify this block placement is satisfied as there are only 6 Racks
9. Add 3 more DNs in 3 more Racks
10. With NO fix applied, verify this block placement is NOT satisfied as there 
are only 9 Racks, but blocks are not striped on all.
6. Create a new striped file and now verify if the new file's blocks placement 
policy is satisfied

*Fix Proposal:*

1. The Goal is to find if there are Under-Replicated/Mis-Placed blocks for EC 
files upon DN addition on a new Rack. 

2. {{DatanodeManager#addDatanode}} can detect the new topology changes. There 
is already a method {{#checkIfClusterIsNowMultiRack}} which was specifically 
written for Racks growing from 1 to more. But, I believe we need a more generic 
check here in the context of EC.

3. {{checkIfClusterIsNowMultiRack}} => {{processMisReplicatedBlocks}} => 
{{processMisReplicatesAsync}} is heavy weight as it scans the entire BlockMap. 
Though rack addition is a rare operation, doing this whole world scan for every 
rack addition seems quite time consuming. The ideal thing would be take some 
clues and trigger {{processMisReplicatesAsync}} only when needed so that it 
detect Mis-placements and trigger proper block placements

4. How about checking for the list of EC policies ever set on any files ? Just 
like Enabled EC policy list, we can maintain active EC Policy list (max of sys 
polices count). This can serve as a clue for the potential block mis-placements 
as we can deduce the rack requirements from the EC policy schema. Once getting 
the clue, we can trigger  {{processMisReplicatesAsync}} which can then feed the 
work to Replication monitor which is running continuously. 

Please share your thoughts on the above proposal. Would love to hear your 
suggestions on better alternative approaches. Thanks.


> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: Rui Gao
>Assignee: Rui Gao
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch, 
> HDFS-10530.3.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2017-01-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817065#comment-15817065
 ] 

Hadoop QA commented on HDFS-10530:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
16s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
15s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
42s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 43s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 27s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 372 unchanged - 0 fixed = 374 total (was 372) {color} |
| {color:red}-1{color} | {color:red} mvnsite {color} | {color:red}  0m 
43s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  0m 
15s{color} | {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 44s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 25m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:a9ad5d6 |
| JIRA Issue | HDFS-10530 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12815405/HDFS-10530.3.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 991eb2ce5c51 3.13.0-103-generic #150-Ubuntu SMP Thu Nov 24 
10:34:17 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 4db119b |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-mvninstall-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| compile | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-compile-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| mvnsite | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-mvnsite-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| findbugs | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18144/artifact/patchprocess/patch-findbugs-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit |

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15356891#comment-15356891
 ] 

Hadoop QA commented on HDFS-10530:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
25s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
21s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
16s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 390 unchanged - 0 fixed = 392 total (was 390) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 90m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}115m 30s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.namenode.ha.TestHAAppend |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
| Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12815405/HDFS-10530.3.patch |
| JIRA Issue | HDFS-10530 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 355feed0992a 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 846ada2 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15954/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15954/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15954/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15954/console |
| Powered by | Apache

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354466#comment-15354466
 ] 

Hadoop QA commented on HDFS-10530:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
20s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
 0s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
32s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 29s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 388 unchanged - 2 fixed = 390 total (was 390) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
 9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 62m 28s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 82m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithMultipleNameNodes |
|   | hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.blockmanagement.TestBlockManager |
|   | hadoop.hdfs.server.namenode.TestStartup |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12814461/HDFS-10530.2.patch |
| JIRA Issue | HDFS-10530 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 5275a422366e 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 77031a9 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15939/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15939/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15939/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15939/console |
| Powered by | Apache Yetus 0.4.0-SNAPSHOT   http://yetus.apache.org |


This message was

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-28 Thread GAO Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354199#comment-15354199
 ] 

GAO Rui commented on HDFS-10530:


Patch 2 had been attached to address TestBalancer failures.
{{hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer}} seems not 
related to block placement changes, and the other tests should could pass this 
time.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
> Attachments: HDFS-10530.1.patch, HDFS-10530.2.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-27 Thread GAO Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350895#comment-15350895
 ] 

GAO Rui commented on HDFS-10530:


I've investigated the failure of 
{{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}. Found out 
that, reconstruction works caused by block placement policy interfered 
{{Balancer}} to balance utilization of DatanodeStorages. We do could have 
conflict between block placement policy and balancer policy.  Like in the 
scenario of  
{{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}, the last 
added datanode would be filled up with internal block/parity block of all block 
groups according to {{BlockPlacementPolicyRackFaultTolerant}}, while this would 
make this datanode always be recognized as {{over-utilized}} by {{Balancer}}. 
This may make {{Balancer}} could never finish it's work successfully.

I suggest we make {{Balancer}} tolerate certain percent( say 10%) of datanodes 
as {{over-utilized}}, after {{Balancer#runOneIteration()}} runs for 5 times, 
and less than 10% of datanodes is {{over-utilized}}, we make {{Balancer}} 
finish it's work successfully. [~zhz], could you share your opinions? Thank you.


> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
> Attachments: HDFS-10530.1.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-26 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350390#comment-15350390
 ] 

Hadoop QA commented on HDFS-10530:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
30s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
51s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
45s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 26s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch 
generated 2 new + 184 unchanged - 0 fixed = 186 total (was 184) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 74m  6s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
19s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 93m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer |
|   | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped |
|   | hadoop.hdfs.server.balancer.TestBalancer |
|   | hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength |
|   | hadoop.hdfs.TestLeaseRecoveryStriped |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:85209cc |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12813576/HDFS-10530.1.patch |
| JIRA Issue | HDFS-10530 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 93a4016d67e4 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed 
Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 73615a7 |
| Default Java | 1.8.0_91 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15916/artifact/patchprocess/diff-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15916/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/15916/testReport/ |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output |

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-22 Thread Zhe Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15345772#comment-15345772
 ] 

Zhe Zhang commented on HDFS-10530:
--

Thanks [~demongaorui] for reporting this. Yes I believe the right behavior in 
the example is to distribute to 9 racks, agreed {{isPlacementPolicySatisfied}} 
should use {{getRealTotalBlockNum}}.

A drawback is that this might cause significant more traffic when a new rack is 
added. But considering 1) adding a new rack is a rare scenario; 2) we are 
throttling reconstruction work, I think we can make this change for better 
fault tolerance. 

We should add a follow-on task to put this kind of reconstruction tasks to 
lowest priority in {{LowRedundancyBlocks}}.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

2016-06-15 Thread GAO Rui (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331473#comment-15331473
 ] 

GAO Rui commented on HDFS-10530:


Any comments are welcome. Let's discuss and try to reach to an agreement for 
this issue.

> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> 
>
> Key: HDFS-10530
> URL: https://issues.apache.org/jira/browse/HDFS-10530
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Reporter: GAO Rui
>Assignee: GAO Rui
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

[jira] [Commented] (HDFS-10530) BlockManager reconstruction work scheduling should correctly adhere to EC block placement policy

23 matches

Site Navigation

Mail list logo

Footer information