HDFS architecture documentation describes outdated placement policy
-------------------------------------------------------------------
Key: HADOOP-5734
URL: https://issues.apache.org/jira/browse/HADOOP-5734
Project: Hadoop Core
Issue Type: Bug
Components: documentation
Affects Versions: 0.20.0
Reporter: Konstantin Boudnik
Priority: Minor
The "Replica Placement: The First Baby Steps" section of HDFS architecture
document states:
"...
For the common case, when the replication factor is three, HDFS's placement
policy is to put one replica on one node in the local rack, another on a
different node in the local rack, and the last on a different node in a
different rack. This policy cuts the inter-rack write traffic which generally
improves write performance.
..."
However, according to the ReplicationTargetChooser.chooseTarger()'s code the
actual logic is to put the second replica on a different rack as well as the
third replica. So you have two replicas located on a different nodes of remote
rack and one (initial replica) on the local rack's node. Thus, the sentence
should say something like this:
"For the common case, when the replication factor is three, HDFS's placement
policy is to put one replica on one node in the local rack, another on a node
in a different (remote) rack, and the last on a different node in the same
remote rack. This policy cuts the inter-rack write traffic which generally
improves write performance."
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.