[jira] [Comment Edited] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-26 Thread Xiao Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221224#comment-16221224
 ] 

Xiao Chen edited comment on HDFS-12725 at 10/26/17 8:54 PM:


Patch 1 to reproduce the error and fix.
This keeps the {{getMaxNodesPerRack}} calculation so normal placements can 
still be as even as possible. Modified unit test fail-before-pass-after.

In real clusters this come in the form of message {{Cannot allocate parity 
block}} in the client and 
{{org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
place enough replicas, still in need of 3 to reach 14}} in the NN.

[~andrew.wang] and [~eddyxu], could you take a look?


was (Author: xiaochen):
Patch 1 to reproduce the error and fix. [~andrew.wang] and [~eddyxu], could you 
take a look?

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HDFS-12725.01.patch, HDFS-12725.01.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-10-30 Thread Wei-Chiu Chuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225679#comment-16225679
 ] 

Wei-Chiu Chuang edited comment on HDFS-12725 at 10/30/17 9:23 PM:
--

Hi [~xiaochen] thanks for filing the jira and posting a patch.

It might make the code more readable if you can refactor the catch block into a 
separate method, since that catch block is pretty long.

A little nit in the debug message:
{code}
LOG.debug("Best effort placement failed.");
{code}
It might also be useful to log numResultsOflastChoose and totalReplicaExpected.

One question regarding the test code:

{code}
for (int i = 0; i <= numSingleDnRacks; i++) {
   racks[i] = "/rack" + i;
 }
for (int i = numSingleDnRacks + 1; i < numDatanodes; i++) {
racks[i] = "/rack" + (numSingleDnRacks + (i % (
numRacks - numSingleDnRacks)));
 }
{code}
Can you explain why the first loop is between 0 <= i  <= numSingleDnRacks? Why 
isn't it between 0 <= i   BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-12725) BlockPlacementPolicyRackFaultTolerant still fails with racks with very few nodes

2017-11-02 Thread Lei (Eddy) Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16236556#comment-16236556
 ] 

Lei (Eddy) Xu edited comment on HDFS-12725 at 11/2/17 8:42 PM:
---

LGTM +1 pending the following comment:

{noformat}
" (maxNodesPerRack={}, numOfReplicas={}) nodes " +
144   "evenly across racks, falling back to uneven placement. 
{noformat}

It should be something like "falling back to evenly placement on remained 
racks" 

Thanks Xiao.  

Btw, please verify that the test failures. 


was (Author: eddyxu):
LGTM +1 pending the following comment:

{preformat}
" (maxNodesPerRack={}, numOfReplicas={}) nodes " +
144   "evenly across racks, falling back to uneven placement. 
{preformat}

It should be something like "falling back to evenly placement on remained 
racks" 

Thanks Xiao.  

Btw, please verify that the test failures. 

> BlockPlacementPolicyRackFaultTolerant still fails with racks with very few 
> nodes
> 
>
> Key: HDFS-12725
> URL: https://issues.apache.org/jira/browse/HDFS-12725
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>Priority: Major
>  Labels: hdfs-ec-3.0-must-do
> Attachments: HDFS-12725.01.patch, HDFS-12725.02.patch, 
> HDFS-12725.03.patch, HDFS-12725.04.patch, HDFS-12725.05.patch
>
>
> HDFS-12567 tries to fix the scenario where EC blocks may not be allocated in 
> extremely rack-imbalanced cluster.
> The added fall-back step of the fix could be improved to do a best-effort 
> placement. This is more likely to happen in testing than in real clusters.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org