Jeff Hammerbacher wrote:
Hey Vishal,
Check out the chooseTarget() method(s) of ReplicationTargetChooser.java in
the org.apache.hadoop.hdfs.server.namenode package:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java?view=markup
.
In words: assuming you're using the default replication level (3), the
default strategy will put one block on the local node, one on a node in a
remote rack, and another on that same remote rack.
Note that HADOOP-3799 (http://issues.apache.org/jira/browse/HADOOP-3799)
proposes making this strategy pluggable.
Yes, there's some good reasons for having different placement algorithms
for different datacentres, and I could even imagine different MR
sequences providing hints about where they want data, depending on what
they want to do afterwards