[jira] [Commented] (HDFS-16799) The dn space size is not consistent, and Balancer can not work, resulting in a very unbalanced space
[ https://issues.apache.org/jira/browse/HDFS-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688914#comment-17688914 ] ruiliang commented on HDFS-16799: - ok > The dn space size is not consistent, and Balancer can not work, resulting in > a very unbalanced space > > > Key: HDFS-16799 > URL: https://issues.apache.org/jira/browse/HDFS-16799 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.0 >Reporter: ruiliang >Priority: Blocker > > > {code:java} > echo 'A DFS Used 99.8% to ip' > sorucehost > hdfs --debug balancer -fs hdfs://xxcluster06 -threshold 10 -source -f > sorucehost > > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.243:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.247:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-10/10.12.65.214:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-02-08/10.12.14.8:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.154:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-04/10.12.65.218:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.143:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-05/10.12.12.200:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.217:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.142:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.246:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.219:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.147:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-10/10.12.65.186:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.153:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-07/10.12.19.23:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-04-14/10.12.65.119:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.131:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-04/10.12.12.210:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-11/10.12.14.168:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.245:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-02/10.12.17.26:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.241:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.152:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.249:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-07-14/10.12.64.71:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-03/10.12.17.35:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.195:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.242:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.248:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.240:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-12/10.12.65.196:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.150:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.222:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.145:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.244:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-07/10.12.19.22:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.221:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.136:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.129:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-15/10.12.15.163:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-0
[jira] [Commented] (HDFS-16799) The dn space size is not consistent, and Balancer can not work, resulting in a very unbalanced space
[ https://issues.apache.org/jira/browse/HDFS-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614703#comment-17614703 ] ruiliang commented on HDFS-16799: - It seems that the empty nodes on the back of the rack are concentrated, so it is not possible to select enough racks first. In this case, only the rack is broken up? {code:java} Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen nodes. 2022-10-09 19:27:18,407 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseLocalRack(637)) - Failed to choose from local rack (location = /4F08-05-15), retry with the rack of the next replica (location = /4F08-12-03) org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlentPolicyDefault.java:629) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:589) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:218) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:94) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:295) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:148) at org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:60) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1862) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1814) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4655) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4522) at java.lang.Thread.run(Thread.java:748) 2022-10-09 19:27:18,416 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(824)) - [ Node /4F08-01-08/10.12.65.242:1019 [ Datanode 10.12.65.242:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.248:1019 [ Datanode 10.12.65.248:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.195:1019 [ Datanode 10.12.65.195:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.241:1019 [ Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.243:1019 [ Datanode 10.12.65.243:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.244:1019 [ Datanode 10.12.65.244:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.249:1019 [ Datanode 10.12.65.249:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.245:1019 [ Datanode 10.12.65.245:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.240:1019 [ Datanode 10.12.65.240:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.247:1019 [ Datanode 10.12.65.247:1019 is not chosen since the rack has too many chosen nodes. 2022-10-09 19:27:18,416 INFO blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(832)) - Not enough replicas was chosen. Reason:{TOO_MANY_NODES_ON_RACK=10} 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseFromNextRack(669)) - Failed to choose from the next rack (location = /4F08-01-08), retry choosing randomly org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:665) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlac