[ https://issues.apache.org/jira/browse/HDFS-16799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17614703#comment-17614703 ]
ruiliang commented on HDFS-16799: --------------------------------- It seems that the empty nodes on the back of the rack are concentrated, so it is not possible to select enough racks first. In this case, only the rack is broken up? {code:java} Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen nodes. 2022-10-09 19:27:18,407 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseLocalRack(637)) - Failed to choose from local rack (location = /4F08-05-15), retry with the rack of the next replica (location = /4F08-12-03) org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlentPolicyDefault.java:629) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:589) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:218) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:94) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:295) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:148) at org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:60) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1862) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1814) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4655) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4522) at java.lang.Thread.run(Thread.java:748) 2022-10-09 19:27:18,416 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(824)) - [ Node /4F08-01-08/10.12.65.242:1019 [ Datanode 10.12.65.242:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.248:1019 [ Datanode 10.12.65.248:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.195:1019 [ Datanode 10.12.65.195:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.241:1019 [ Datanode 10.12.65.241:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.243:1019 [ Datanode 10.12.65.243:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.244:1019 [ Datanode 10.12.65.244:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.249:1019 [ Datanode 10.12.65.249:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.245:1019 [ Datanode 10.12.65.245:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.240:1019 [ Datanode 10.12.65.240:1019 is not chosen since the rack has too many chosen nodes. Node /4F08-01-08/10.12.65.247:1019 [ Datanode 10.12.65.247:1019 is not chosen since the rack has too many chosen nodes. 2022-10-09 19:27:18,416 INFO blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(832)) - Not enough replicas was chosen. Reason:{TOO_MANY_NODES_ON_RACK=10} 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseFromNextRack(669)) - Failed to choose from the next rack (location = /4F08-01-08), retry choosing randomly org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:834) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:722) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseFromNextRack(BlockPlacementPolicyDefault.java:665) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:641) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:589) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseOnce(BlockPlacementPolicyRackFaultTolerant.java:218) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant.chooseTargetInOrder(BlockPlacementPolicyRackFaultTolerant.java:94) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:419) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:295) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:148) at org.apache.hadoop.hdfs.server.blockmanagement.ErasureCodingWork.chooseTargets(ErasureCodingWork.java:60) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReconstructionWorkForBlocks(BlockManager.java:1862) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeBlockReconstructionWork(BlockManager.java:1814) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:4655) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$RedundancyMonitor.run(BlockManager.java:4522) at java.lang.Thread.run(Thread.java:748) 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.19.22:1019 does not have enough DISK space (required=523719088, scheduled=1047438176, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.19.20:1019 does not have enough DISK space (required=523719088, scheduled=523719088, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.65.186:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.64.71:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.19.23:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.15.162:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.12.200:1019 does not have enough DISK space (required=523719088, scheduled=1571157264, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.15.154:1019 does not have enough DISK space (required=523719088, scheduled=2618595440, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.14.8:1019 does not have enough DISK space (required=523719088, scheduled=523719088, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.15.163:1019 does not have enough DISK space (required=523719088, scheduled=523719088, remaining=0). 2022-10-09 19:27:18,417 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.17.27:1019 does not have enough DISK space (required=523719088, scheduled=1047438176, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.17.35:1019 does not have enough DISK space (required=523719088, scheduled=1047438176, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.14.168:1019 does not have enough DISK space (required=523719088, scheduled=1047438176, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.17.26:1019 does not have enough DISK space (required=523719088, scheduled=523719088, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.64.72:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.12.210:1019 does not have enough DISK space (required=523719088, scheduled=523719088, remaining=0). 2022-10-09 19:27:18,418 DEBUG blockmanagement.BlockPlacementPolicy (DatanodeDescriptor.java:chooseStorage4Block(768)) - The node 10.12.65.196:1019 does not have enough DISK space (required=523719088, scheduled=0, remaining=0). 2022-10-09 19:27:18,418 INFO blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(832)) - Not enough replicas was chosen. Reason:{TOO_MANY_NODES_ON_RACK=1, NOT_ENOUGH_STORAGE_SPACE=2} 2022-10-09 19:27:18,418 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(432)) - Failed to place enough replicas, still in need of 1 to reach 5 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) [ {code} > The dn space size is not consistent, and Balancer can not work, resulting in > a very unbalanced space > ---------------------------------------------------------------------------------------------------- > > Key: HDFS-16799 > URL: https://issues.apache.org/jira/browse/HDFS-16799 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.0 > Reporter: ruiliang > Priority: Blocker > > > {code:java} > echo 'A DFS Used 99.8% to ip' > sorucehost > hdfs --debug balancer -fs hdfs://xxcluster06 -threshold 10 -source -f > sorucehost > .... > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.243:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.247:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-10/10.12.65.214:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-02-08/10.12.14.8:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.154:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-04/10.12.65.218:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.143:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-05/10.12.12.200:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.217:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.142:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.246:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.219:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.147:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-10/10.12.65.186:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.153:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-07/10.12.19.23:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-04-14/10.12.65.119:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.131:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-04/10.12.12.210:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-11/10.12.14.168:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.245:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-02/10.12.17.26:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.241:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.152:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.249:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-07-14/10.12.64.71:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-03/10.12.17.35:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.195:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.242:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.248:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.240:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-15-12/10.12.65.196:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.150:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.222:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.145:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-01-08/10.12.65.244:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-07/10.12.19.22:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.221:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.136:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.129:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-15/10.12.15.163:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-07-14/10.12.64.72:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-13/10.12.15.149:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.130:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.220:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-01/10.12.17.27:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-05-15/10.12.15.162:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.216:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-03-07/10.12.19.20:1019 > 22/10/09 16:43:52 INFO net.NetworkTopology: Adding a new node: > /4F08-12-03/10.12.65.140:1019 > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.65.214:1019[DISK] > has utilization=93.89380810507323 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.14.8:1019[DISK] has > utilization=94.505152970232 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.154:1019[DISK] > has utilization=99.76244188398033 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.65.218:1019[DISK] > has utilization=79.45606721913859 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.12.200:1019[DISK] > has utilization=99.71656371224418 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.65.186:1019[DISK] > has utilization=93.95068743459564 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.153:1019[DISK] > has utilization=99.75232310558265 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.19.23:1019[DISK] > has utilization=99.79429219050178 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.65.119:1019[DISK] > has utilization=99.76831865018094 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.12.210:1019[DISK] > has utilization=99.80222443678781 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.14.168:1019[DISK] > has utilization=99.78579344694371 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.17.26:1019[DISK] > has utilization=99.78780750171514 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.64.71:1019[DISK] > has utilization=99.76463495521105 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.17.35:1019[DISK] > has utilization=99.79786446831488 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.65.196:1019[DISK] > has utilization=93.94608887965242 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.150:1019[DISK] > has utilization=99.79476181819427 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.19.22:1019[DISK] > has utilization=99.73035889533327 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.163:1019[DISK] > has utilization=99.73735827464508 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.64.72:1019[DISK] > has utilization=99.75988510473643 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.149:1019[DISK] > has utilization=99.78330434441013 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.17.27:1019[DISK] > has utilization=99.80551686705469 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.15.162:1019[DISK] > has utilization=99.80826338040788 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: DDatanode:10.12.19.20:1019[DISK] > has utilization=99.8238594786125 >= average=51.54652411051956 but it is not > specified as a source; skipping it. > 22/10/09 16:43:52 INFO balancer.Balancer: 1 over-utilized: > [10.12.15.152:1019:DISK] > 22/10/09 16:43:52 INFO balancer.Balancer: 26 underutilized: > [10.12.65.243:1019:DISK, 10.12.65.247:1019:DISK, 10.12.65.143:1019:DISK, > 10.12.65.217:1019:DISK, 10.12.65.142:1019:DISK, 10.12.65.246:1019:DISK, > 10.12.65.219:1019:DISK, 10.12.65.147:1019:DISK, 10.12.65.131:1019:DISK, > 10.12.65.245:1019:DISK, 10.12.65.241:1019:DISK, 10.12.65.249:1019:DISK, > 10.12.65.195:1019:DISK, 10.12.65.242:1019:DISK, 10.12.65.248:1019:DISK, > 10.12.65.240:1019:DISK, 10.12.65.222:1019:DISK, 10.12.65.145:1019:DISK, > 10.12.65.244:1019:DISK, 10.12.65.221:1019:DISK, 10.12.65.136:1019:DISK, > 10.12.65.129:1019:DISK, 10.12.65.130:1019:DISK, 10.12.65.220:1019:DISK, > 10.12.65.216:1019:DISK, 10.12.65.140:1019:DISK] > 22/10/09 16:43:52 INFO balancer.Balancer: Need to move 238.67 TB to make the > cluster balanced. > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > overUtilized => underUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > overUtilized => belowAvgUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > underUtilized => aboveAvgUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > overUtilized => underUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.152:1019:DISK to 10.12.65.243:1019:DISK > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > overUtilized => belowAvgUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > underUtilized => aboveAvgUtilized > 22/10/09 16:43:52 INFO balancer.Balancer: Will move 10 GB in this iteration > 22/10/09 16:43:52 INFO balancer.Dispatcher: Limiting threads per target to > the specified max. > 22/10/09 16:43:52 INFO balancer.Dispatcher: Allocating 100 threads per target. > No block has been moved for 5 iterations. Exiting... > Oct 9, 2022 4:43:52 PM 4 0 B 238.67 TB > 10 GB > Oct 9, 2022 4:43:52 PM Balancing took 37.497 seconds{code} > Didn't move any blocks? > hdfs dfsadmin -fs hdfs://xxcluster06/ -report | egrep 'DFS Used|Rack' | grep > -v 'Non' > {code:java} > DFS Used: 1749575862956140 (1.55 PB) > DFS Used%: 51.74% > Rack: /4F08-05-05 > DFS Used: 40696335411457 (37.01 TB) > DFS Used%: 99.72% > Rack: /4F08-05-04 > DFS Used: 40731295752330 (37.04 TB) > DFS Used%: 99.80% > Rack: /4F08-05-11 > DFS Used: 40724586650507 (37.04 TB) > DFS Used%: 99.79% > Rack: /4F08-02-08 > DFS Used: 83937719357159 (76.34 TB) > DFS Used%: 94.51% > Rack: /4F08-05-13 > DFS Used: 40723581615787 (37.04 TB) > DFS Used%: 99.78% > Rack: /4F08-05-13 > DFS Used: 40728246491332 (37.04 TB) > DFS Used%: 99.79% > Rack: /4F08-05-13 > DFS Used: 40713774137344 (37.03 TB) > DFS Used%: 99.76% > Rack: /4F08-05-13 > DFS Used: 37318355548626 (33.94 TB) > DFS Used%: 99.75% > Rack: /4F08-05-13 > DFS Used: 37322141246998 (33.94 TB) > DFS Used%: 99.76% > Rack: /4F08-05-15 > DFS Used: 40733756869637 (37.05 TB) > DFS Used%: 99.81% > Rack: /4F08-05-15 > DFS Used: 40704825253902 (37.02 TB) > DFS Used%: 99.74% > Rack: /4F08-03-02 > DFS Used: 40725411589924 (37.04 TB) > DFS Used%: 99.79% > Rack: /4F08-03-01 > DFS Used: 40732646551276 (37.05 TB) > DFS Used%: 99.81% > Rack: /4F08-03-03 > DFS Used: 40729516644144 (37.04 TB) > DFS Used%: 99.80% > Rack: /4F08-03-07 > DFS Used: 40740129858581 (37.05 TB) > DFS Used%: 99.82% > Rack: /4F08-03-07 > DFS Used: 40701966553095 (37.02 TB) > DFS Used%: 99.73% > Rack: /4F08-03-07 > DFS Used: 37334050457730 (33.96 TB) > DFS Used%: 99.79% > Rack: /4F08-07-14 > DFS Used: 33929974795045 (30.86 TB) > DFS Used%: 99.76% > Rack: /4F08-07-14 > DFS Used: 33928351336350 (30.86 TB) > DFS Used%: 99.76% > Rack: /4F08-04-14 > DFS Used: 73843637794466 (67.16 TB) > DFS Used%: 99.77% > Rack: /4F08-12-03 > DFS Used: 31197554331648 (28.37 TB) > DFS Used%: 35.13% > Rack: /4F08-12-03 > DFS Used: 31301323333632 (28.47 TB) > DFS Used%: 35.24% > Rack: /4F08-12-03 > DFS Used: 31256658031218 (28.43 TB) > DFS Used%: 35.19% > Rack: /4F08-12-03 > DFS Used: 28617560723456 (26.03 TB) > DFS Used%: 32.22% > Rack: /4F08-12-03 > DFS Used: 14908274495495 (13.56 TB) > DFS Used%: 16.79% > Rack: /4F08-12-03 > DFS Used: 28507104452608 (25.93 TB) > DFS Used%: 32.10% > Rack: /4F08-12-03 > DFS Used: 28551347621836 (25.97 TB) > DFS Used%: 32.15% > Rack: /4F08-12-03 > DFS Used: 28613518264435 (26.02 TB) > DFS Used%: 32.22% > Rack: /4F08-12-03 > DFS Used: 28713366585344 (26.11 TB) > DFS Used%: 32.33% > Rack: /4F08-15-10 > DFS Used: 38343166373888 (34.87 TB) > DFS Used%: 93.95% > Rack: /4F08-01-08 > DFS Used: 31815393923072 (28.94 TB) > DFS Used%: 35.82% > Rack: /4F08-15-12 > DFS Used: 38341297147501 (34.87 TB) > DFS Used%: 93.95% > Rack: /4F08-15-10 > DFS Used: 38319948818315 (34.85 TB) > DFS Used%: 93.89% > Rack: /4F08-12-03 > DFS Used: 28658795847680 (26.07 TB) > DFS Used%: 32.27% > Rack: /4F08-12-03 > DFS Used: 29024956961463 (26.40 TB) > DFS Used%: 32.68% > Rack: /4F08-12-04 > DFS Used: 70571404823817 (64.18 TB) > DFS Used%: 79.46% > Rack: /4F08-12-03 > DFS Used: 9769506916811 (8.89 TB) > DFS Used%: 11.00% > Rack: /4F08-12-03 > DFS Used: 9784800338859 (8.90 TB) > DFS Used%: 11.02% > Rack: /4F08-12-03 > DFS Used: 9788633366528 (8.90 TB) > DFS Used%: 11.02% > Rack: /4F08-12-03 > DFS Used: 9832897995355 (8.94 TB) > DFS Used%: 11.07% > Rack: /4F08-01-08 > DFS Used: 31535518071572 (28.68 TB) > DFS Used%: 35.51% > Rack: /4F08-01-08 > DFS Used: 31551866050268 (28.70 TB) > DFS Used%: 35.52% > Rack: /4F08-01-08 > DFS Used: 31948597213697 (29.06 TB) > DFS Used%: 35.97% > Rack: /4F08-01-08 > DFS Used: 31566025621504 (28.71 TB) > DFS Used%: 35.54% > Rack: /4F08-01-08 > DFS Used: 31385835208711 (28.55 TB) > DFS Used%: 35.34% > Rack: /4F08-01-08 > DFS Used: 31431967515314 (28.59 TB) > DFS Used%: 35.39% > Rack: /4F08-01-08 > DFS Used: 31484318341316 (28.63 TB) > DFS Used%: 35.45% > Rack: /4F08-01-08 > DFS Used: 32099065245696 (29.19 TB) > DFS Used%: 36.14% > Rack: /4F08-01-08 > DFS Used: 31287860413899 (28.46 TB) > DFS Used%: 35.23% > Rack: /4F08-01-08 > DFS Used: 32366995005512 (29.44 TB) > DFS Used%: 36.44%{code} > It seems that DFS Used is about the same size. Is that what you are judging? > DFS Used% is not used for Balancer data. Why? > > hdfs --debug balancer -fs hdfs://yycluster06 -threshold 10 > * The same goes for all balancer, no blocks moved? > {code:java} > 22/10/09 17:13:32 INFO balancer.Balancer: Need to move 350.03 TB to make the > cluster balanced. > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > overUtilized => underUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > overUtilized => belowAvgUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: > underUtilized => aboveAvgUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > overUtilized => underUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.19.23:1019:DISK to 10.12.65.246:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.162:1019:DISK to 10.12.65.147:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.152:1019:DISK to 10.12.65.143:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.65.186:1019:DISK to 10.12.65.220:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.17.35:1019:DISK to 10.12.65.221:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.19.22:1019:DISK to 10.12.65.216:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.65.214:1019:DISK to 10.12.65.131:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.150:1019:DISK to 10.12.65.142:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.17.27:1019:DISK to 10.12.65.136:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.12.210:1019:DISK to 10.12.65.240:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.65.218:1019:DISK to 10.12.65.242:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.64.71:1019:DISK to 10.12.65.222:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.149:1019:DISK to 10.12.65.249:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.14.168:1019:DISK to 10.12.65.245:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.17.26:1019:DISK to 10.12.65.244:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.14.8:1019:DISK to 10.12.65.145:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.153:1019:DISK to 10.12.65.241:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.12.200:1019:DISK to 10.12.65.247:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.154:1019:DISK to 10.12.65.243:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.65.119:1019:DISK to 10.12.65.129:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.15.163:1019:DISK to 10.12.65.130:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.64.72:1019:DISK to 10.12.65.195:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.65.196:1019:DISK to 10.12.65.217:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: Decided to move 10 GB bytes from > 10.12.19.20:1019:DISK to 10.12.65.219:1019:DISK > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > overUtilized => belowAvgUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: > underUtilized => aboveAvgUtilized > 22/10/09 17:13:32 INFO balancer.Balancer: Will move 240 GB in this iteration > 22/10/09 17:13:32 INFO balancer.Dispatcher: Allocating 12 threads per target. > No block has been moved for 5 iterations. Exiting... > Oct 9, 2022 5:13:34 PM 4 0 B 350.03 TB > 240 GB > Oct 9, 2022 5:13:34 PM Balancing took 42.665 seconds {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org