Re: should data be evenly distributed to each (physical) node

Jean-Daniel Cryans Thu, 04 Mar 2010 08:51:16 -0800

There's nothing like reading the manual:
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_design.html#Replica+Placement%3A+The+First+Baby+Steps


Quote:

"For the common case, when the replication factor is three, HDFS’s
placement policy is to put one replica on one node in the local rack,
another on a different node in the local rack, and the last on a
different node in a different rack. "

So if you write the data from only 1 machine, every block will have 1
replica on that machine (although you can run the balancer
afterwards).

J-D

On Thu, Mar 4, 2010 at 7:25 AM, openresearch
<qiming...@openresearchinc.com> wrote:
>
> I am building a small two node cluster following
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>
> Every thing seems to be working, except I notice the data are NOT evenly
> distributed to each physical box.
> e.g., when I hadoop dfs -put <6G> data. I am expecting ~3G on each node
> (take turns every ~64MB), however, I checked dfshealth.jsp and "du -k" on
> local box, and found the uploaded data are ONLY residing on the physical box
> where I start "dfs -put". That defeats the whole (data locality) purpose of
> hadoop?!
>
> Please help.
>
> Thanks
>
> --
> View this message in context: 
> http://old.nabble.com/should-data-be-evenly-distributed-to-each-%28physical%29-node-tp27782215p27782215.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: should data be evenly distributed to each (physical) node

Reply via email to