[ 
https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350895#comment-15350895
 ] 

GAO Rui commented on HDFS-10530:
--------------------------------

I've investigated the failure of 
{{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}. Found out 
that, reconstruction works caused by block placement policy interfered 
{{Balancer}} to balance utilization of DatanodeStorages. We do could have 
conflict between block placement policy and balancer policy.  Like in the 
scenario of  
{{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}, the last 
added datanode would be filled up with internal block/parity block of all block 
groups according to {{BlockPlacementPolicyRackFaultTolerant}}, while this would 
make this datanode always be recognized as {{over-utilized}} by {{Balancer}}. 
This may make {{Balancer}} could never finish it's work successfully.

I suggest we make {{Balancer}} tolerate certain percent( say 10%) of datanodes 
as {{over-utilized}}, after {{Balancer#runOneIteration()}} runs for 5 times, 
and less than 10% of datanodes is {{over-utilized}}, we make {{Balancer}} 
finish it's work successfully. [~zhz], could you share your opinions? Thank you.


> BlockManager reconstruction work scheduling should correctly adhere to EC 
> block placement policy
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10530
>                 URL: https://issues.apache.org/jira/browse/HDFS-10530
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: GAO Rui
>            Assignee: GAO Rui
>         Attachments: HDFS-10530.1.patch
>
>
> This issue was found by [~tfukudom].
> Under RS-DEFAULT-6-3-64k EC policy, 
> 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) 
> of the cluster.
> 2. Reconstruction work would be scheduled if the 6th rack is added. 
> 3. While adding the 7th rack or more racks will not trigger reconstruction 
> work. 
> Based on default EC block placement policy defined in 
> “BlockPlacementPolicyRackFaultTolerant.java”, EC file should be able to be 
> scheduled to distribute to 9 racks if possible.
> In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , 
> *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, 
> instead of *getRealDataBlockNum()*.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to