liuyiyang created HDFS-12128:
--------------------------------
Summary: Namenode failover may make balancer's efforts be in vain
Key: HDFS-12128
URL: https://issues.apache.org/jira/browse/HDFS-12128
Project: Hadoop HDFS
Issue Type: Bug
Components: balancer & mover
Affects Versions: 2.6.0
Reporter: liuyiyang
The problem can be reproduced as follows:
1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to
make the cluster balanced;
2.Before starting balancer, trigger failover of namenodes, this will make all
datanodes be marked as stale by active namenode;
3.Start balancer to make the datanode usage balanced;
4.As balancer is running, under-utilized datanodes' usage will increase, but
over-utilized datanodes' usage will stay unchanged for long time.
Since all datanodes are marked as stale, deletion will be postponed in stale
datanodes. During balancing, the replicas in source datanodes can't be deleted
immediately,
so the total usage of the cluster will increase and won't decrease until
datanodes' stale state be cancelled.
When the datanodes send next block report to namenode(default interval is 6h),
active namenode will cancel the stale state of datanodes. I found if replicas
on source datanodes can't be deleted immediately in OP_REPLACE operation via
del_hint to namenode,
namenode will schedule replicas on datanodes with least remaining space to
delete instead of replicas on source datanodes. Unfortunately, datanodes with
least remaining space may be the target datanodes when balancing, which will
lead to imbalanced datanode usage again.
If balancer finishes before next block report, all postponed over-replicated
replicas will be deleted based on remaining space of datanodes, this may lead
to furitless balancer efforts.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]