[ https://issues.apache.org/jira/browse/HDFS-10530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350895#comment-15350895 ]
GAO Rui commented on HDFS-10530: -------------------------------- I've investigated the failure of {{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}. Found out that, reconstruction works caused by block placement policy interfered {{Balancer}} to balance utilization of DatanodeStorages. We do could have conflict between block placement policy and balancer policy. Like in the scenario of {{hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFSStriped}}, the last added datanode would be filled up with internal block/parity block of all block groups according to {{BlockPlacementPolicyRackFaultTolerant}}, while this would make this datanode always be recognized as {{over-utilized}} by {{Balancer}}. This may make {{Balancer}} could never finish it's work successfully. I suggest we make {{Balancer}} tolerate certain percent( say 10%) of datanodes as {{over-utilized}}, after {{Balancer#runOneIteration()}} runs for 5 times, and less than 10% of datanodes is {{over-utilized}}, we make {{Balancer}} finish it's work successfully. [~zhz], could you share your opinions? Thank you. > BlockManager reconstruction work scheduling should correctly adhere to EC > block placement policy > ------------------------------------------------------------------------------------------------ > > Key: HDFS-10530 > URL: https://issues.apache.org/jira/browse/HDFS-10530 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode > Reporter: GAO Rui > Assignee: GAO Rui > Attachments: HDFS-10530.1.patch > > > This issue was found by [~tfukudom]. > Under RS-DEFAULT-6-3-64k EC policy, > 1. Create an EC file, the file was witten to all the 5 racks( 2 dns for each) > of the cluster. > 2. Reconstruction work would be scheduled if the 6th rack is added. > 3. While adding the 7th rack or more racks will not trigger reconstruction > work. > Based on default EC block placement policy defined in > âBlockPlacementPolicyRackFaultTolerant.javaâ, EC file should be able to be > scheduled to distribute to 9 racks if possible. > In *BlockManager#isPlacementPolicySatisfied(BlockInfo storedBlock)* , > *numReplicas* of striped blocks might should be *getRealTotalBlockNum()*, > instead of *getRealDataBlockNum()*. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org