Jeff Hammerbacher wrote:
Hey Vishal,

Check out the chooseTarget() method(s) of ReplicationTargetChooser.java in
the org.apache.hadoop.hdfs.server.namenode package:
http://svn.apache.org/viewvc/hadoop/core/trunk/src/hdfs/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java?view=markup
.

In words: assuming you're using the default replication level (3), the
default strategy will put one block on the local node, one on a node in a
remote rack, and another on that same remote rack.

Note that HADOOP-3799 (http://issues.apache.org/jira/browse/HADOOP-3799)
proposes making this strategy pluggable.


Yes, there's some good reasons for having different placement algorithms for different datacentres, and I could even imagine different MR sequences providing hints about where they want data, depending on what they want to do afterwards

Reply via email to