Frode Halvorsen created HDFS-7787:
-------------------------------------
Summary: Wrong priorty of replication
Key: HDFS-7787
URL: https://issues.apache.org/jira/browse/HDFS-7787
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Affects Versions: 2.6.0
Environment: 2 namenodes HA, 6 datanodes in two racks
Reporter: Frode Halvorsen
Each file has a setting of 3 replicas. split on different racks.
After a simulated crash of one rack (shutdown of all nodes, deleted
data-directory an started nodes) and decommssion of one of the nodes in the
orther rack the replication does not follow 'normal' rules...
My cluster has appx 25 mill files, and the one node I now try to decommision
has 9 millions underreplicated blocks, and 3,5 million blocks with 'no live
replicas'. After a restart of the node, it starts to replicate both types of
blocks, but after a while, it only repliates under-replicated blocks with other
live copies. I would think that the 'normal' way to do this would be to make
sure that all blocks this node keeps the only copy of, should be the first to
be replicated/balanced ? Another thing, is that this takes 'forever'. The rate
it's going now it will run for a couple of months before I can take down the
node for maintance.. It only has appx 250 G of data in total ..
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)