Re: Moving disks from one datanode to another

Erik Forsberg Wed, 07 Dec 2011 05:14:05 -0800

On 2011-12-07 11:43, Erik Forsberg wrote:

Hi!


I'm facing the problem where datanodes are marked as down due to them
being to slow in doing blockreports, which in turn is due to too many
blocks per node. I.e. https://issues.apache.org/jira/browse/HADOOP-4584,
but I can't easily upgrade to 0.21.

So I came up with a possible workaround - run multiple datanode
instances on each physical node, each handling a subset of the disks on
that node. Not sure it will work, but could be worth a try.

So I configured a second datanode on one of my nodes, configured to run
on a different set of ports, and configured the two datanode instances
to use half of the disks each.

However, when starting up this configuration, I get the below exception
(UnregisteredDatanodeException) in the namenode log, and the datanode
then shuts down after reporting the same.


I found a way:

1) Configure second datanode with a set of fresh empty directories.
2) Start second datanode, let it register with namenode.

3) Shut down first and second datanode, then move blk* and subdir dirsfrom data dirs of first node to data dirs of second datanode.

4) Start first and second datanode.

This seems to work as intended. However, after some thinking I came toworry about the replication. HDFS will now consider the two datanodeinstances on the same host as two different hosts, which may causereplication to put two copies of the same file on the same host.

It's probably not going to happen very often given that there's somerandomness involved. And in my case there's always a third copy onanother rack.

Still, it's less than optimal. Are there any ways to fool HDFS intoalways placing all copies on different physical hosts in this rathermessed up configuration?


Thanks,
\EF

Re: Moving disks from one datanode to another

Reply via email to