Sathish Kumar created HDFS-17600:
------------------------------------
Summary: HDFS Balancer not honouring upgrade domain policy
Key: HDFS-17600
URL: https://issues.apache.org/jira/browse/HDFS-17600
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Affects Versions: 3.1.1
Reporter: Sathish Kumar
There are 3 upgrade domain namely up1, up2,up3 with 2 upgrade domains (up1,up2)
policy having 5 DataNodes each and one upgrade domain (up3) having 4 DataNodes.
Though the upgrade domain having 5 DataNodes are balancing within but the
upgrade domain policy with 4 DataNodes not honouring the same.
When running the balancer, the balancer copying the blocks from upgrade domain
named up3 to up2 or up1. Example job run as below.
INFO balancer.Dispatcher: Successfully moved blk_2628472659_1554764988 with
size=3305207 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2830371192_1756682484 with
size=107537592 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
1.5.34.68:9866
INFO balancer.Dispatcher: Successfully moved blk_2712919527_1639220270 with
size=1358289 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
1.5.34.69:9866
INFO balancer.Dispatcher: Successfully moved blk_3018060755_1944407960 with
size=22866627 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
1.5.34.68:9866
INFO balancer.Dispatcher: Successfully moved blk_2528373000_1454657120 with
size=5898128 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2628472715_1554765044 with
size=5254384 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2876876269_1803191123 with
size=15647542 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_1144306578_70566613 with
size=104746420 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2628470767_1554763096 with
size=4183391 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2628470533_1554762862 with
size=3461325 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through 10.x.x.9:9866
INFO balancer.Dispatcher: Successfully moved blk_2612635299_1538926325 with
size=22033489 from 10.x.x.9:9866:DISK to 10.x.x.8:9866:DISK through
10.x.x.9:9866
Here the node with IP 10.x.x.9 belongs to the up3 upgrade domain and node with
IP 10.x.x.8 belongs to up2 upgrade domain due to which the copied block treats
like an excess replica and will be deleted from up2 domain causing the balancer
not to do the balancing properly.
The only workaround to exclude the other upgrade domains (up2 and up1) nodes in
the exclude list and run the balancer which balance within the upgrade domain
of up3
On further search it looks to be below Jira related to this issue though it’s
fixed.
https://issues.apache.org/jira/browse/HDFS-9007
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]