[jira] [Commented] (HDFS-16799) The dn space size is not consistent, and Balancer can not work, resulting in a very unbalanced space

2023-02-15 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688914#comment-17688914
 ] 

ruiliang commented on HDFS-16799:
-

ok

> The dn space size is not consistent, and Balancer can not work, resulting in 
> a very unbalanced space
> 
>
> Key: HDFS-16799
> URL: https://issues.apache.org/jira/browse/HDFS-16799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.0
>Reporter: ruiliang
>Priority: Blocker
>
>  
> {code:java}
> echo 'A DFS Used 99.8% to ip' > sorucehost  
> hdfs --debug  balancer  -fs hdfs://xxcluster06  -threshold 10 -source -f 
> sorucehost  
> 
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.243:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.247:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-15-10/10.12.65.214:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-02-08/10.12.14.8:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-13/10.12.15.154:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-04/10.12.65.218:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.143:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-05/10.12.12.200:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.217:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.142:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.246:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.219:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.147:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-15-10/10.12.65.186:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-13/10.12.15.153:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-03-07/10.12.19.23:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-04-14/10.12.65.119:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.131:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-04/10.12.12.210:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-11/10.12.14.168:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.245:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-03-02/10.12.17.26:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.241:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-13/10.12.15.152:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.249:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-07-14/10.12.64.71:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-03-03/10.12.17.35:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.195:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.242:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.248:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.240:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-15-12/10.12.65.196:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-13/10.12.15.150:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.222:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.145:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-01-08/10.12.65.244:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-03-07/10.12.19.22:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.221:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.136:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-12-03/10.12.65.129:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-05-15/10.12.15.163:1019
> 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: 
> /4F08-0

[jira] [Commented] (HDFS-16799) The dn space size is not consistent, and Balancer can not work, resulting in a very unbalanced space

2022-10-09 Thread ruiliang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614703#comment-17614703
 ] 

ruiliang commented on HDFS-16799:
-

It seems that the empty nodes on the back of the rack are concentrated, so it 
is not possible to select enough racks first. In this case, only the rack is 
broken up?
{code:java}
  Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen 
nodes.
2022-10-09 19:27:18,407 DEBUG blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseLocalRack(637)) - Failed to choose from 
local rack (location = /4F08-05-15), retry with the rack of the next replica 
(location = /4F08-12-03)
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
 
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlentPolicyDefault.java:629)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:589)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:218)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:94)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:295)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:148)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:60)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1862)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1814)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4655)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4522)
        at java.lang.Thread.run(Thread.java:748)
2022-10-09 19:27:18,416 DEBUG blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseRandom(824)) - [
Node /4F08-01-08/10.12.65.242:1019 [
  Datanode 10.12.65.242:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.248:1019 [
  Datanode 10.12.65.248:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.195:1019 [
  Datanode 10.12.65.195:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.241:1019 [
  Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.243:1019 [
  Datanode 10.12.65.243:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.244:1019 [
  Datanode 10.12.65.244:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.249:1019 [
  Datanode 10.12.65.249:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.245:1019 [
  Datanode 10.12.65.245:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.240:1019 [
  Datanode 10.12.65.240:1019 is not chosen since the rack has too many chosen 
nodes.
Node /4F08-01-08/10.12.65.247:1019 [
  Datanode 10.12.65.247:1019 is not chosen since the rack has too many chosen 
nodes.
2022-10-09 19:27:18,416 INFO  blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseRandom(832)) - Not enough replicas was 
chosen. Reason:{TOO_MANY_NODES_ON_RACK=10}
2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy 
(BlockPlacementPolicyDefault.java:chooseFromNextRack(669)) - Failed to choose 
from the next rack (location = /4F08-01-08), retry choosing randomly
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException:
 
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:665)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockPlac