Re: should data be evenly distributed to each (physical) node

Scott Carey Thu, 04 Mar 2010 10:19:22 -0800

Hadoop's allocation policy is:

If the client writing is not also a data node, place the blocks randomly.  If 
there are multiple racks, all replica blocks cannot be on the same rack.
If the client writing is a data node itself, place the first block on that 
node, and replica blocks elsewhere.  If there are multiple racks, all replicas 
cannot be on the same rack.


I assume your problem is that you are running "dfs -put" from a datanode, and 
that the replication factor is 1.  In that case it is expected that all the 
data is on the server you submitted it from.  You might want to set the 
replication factor to 2.  You might also want to submit the data from somewhere 
else.

On Mar 4, 2010, at 7:25 AM, openresearch wrote:

> 
> I am building a small two node cluster following
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
> 
> Every thing seems to be working, except I notice the data are NOT evenly
> distributed to each physical box.
> e.g., when I hadoop dfs -put <6G> data. I am expecting ~3G on each node
> (take turns every ~64MB), however, I checked dfshealth.jsp and "du -k" on
> local box, and found the uploaded data are ONLY residing on the physical box
> where I start "dfs -put". That defeats the whole (data locality) purpose of
> hadoop?!
> 
> Please help.
> 
> Thanks
> 
> -- 
> View this message in context: 
> http://old.nabble.com/should-data-be-evenly-distributed-to-each-%28physical%29-node-tp27782215p27782215.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>

Re: should data be evenly distributed to each (physical) node

Reply via email to