> I found a way:
> 
> 1) Configure second datanode with a set of fresh empty directories.
> 2) Start second datanode, let it register with namenode.
> 3) Shut down first and second datanode, then move blk* and subdir dirs
> from data dirs of first node to data dirs of second datanode.
> 4) Start first and second datanode.
> 
> This seems to work as intended. However, after some thinking I came to
> worry about the replication. HDFS will now consider the two datanode
> instances on the same host as two different hosts, which may cause
> replication to put two copies of the same file on the same host.
> 
> It's probably not going to happen very often given that there's some
> randomness involved. And in my case there's always a third copy on
> another rack.
> 
> Still, it's less than optimal. Are there any ways to fool HDFS into
> always placing all copies on different physical hosts in this rather
> messed up configuration?
> 
> Thanks,
> \EF


This is the same issue as for running multiple virtual machines on each 
physical host.  I've found (on 0.20.2) that this gives consistently better 
performance than a single VM or a native OS instance 
(http://www.vmware.com/resources/techresources/10222), at least for 
I/O-intensive apps.  I'm still investigating why, but one possibility is that a 
datanode can't efficiently handle too many disks (I have either 10 or 12 per 
physical host).  So I'm very interested in seeing if multiple datanodes has a 
similar performance effect as multiple VMs (each with one DN).

Back to replication:  Hadoop doesn't know that the machines it's running on 
might share a physical host, so there is a possibility that 2 copies end up on 
the same host.  What I'm doing now is define each host as a rack, so the second 
copy is guaranteed to go to a different host. I have a single physical rack.  
I'm tempted to call physical racks "super racks" to distinguish them from 
logical racks.  A better scheme may be to divide the physical rack into 2 
logical racks, so that most of the time the third copy goes on a different host 
than the second.  I think that is the best that can be done today.  Ideally we 
want to modify the block placement algorithm to recognize another level in the 
topology hierarchy for the multiple VM/DN case.  A simpler solution would be to 
add an option where the third copy is placed in a third rack when available 
(and extended to n replicas on n racks instead of random placement for n>3).  
This would work for the single physical rack case with each host defined as a 
rack for the topology.  Placing replicas on separate racks may be desirable for 
some conventional configurations also (e.g., ones with good inter-rack 
bandwidth).

Jeff

Reply via email to