On Thu, Mar 4, 2010 at 10:25 AM, openresearch
<qiming...@openresearchinc.com> wrote:
>
> I am building a small two node cluster following
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>
> Every thing seems to be working, except I notice the data are NOT evenly
> distributed to each physical box.
> e.g., when I hadoop dfs -put <6G> data. I am expecting ~3G on each node
> (take turns every ~64MB), however, I checked dfshealth.jsp and "du -k" on
> local box, and found the uploaded data are ONLY residing on the physical box
> where I start "dfs -put". That defeats the whole (data locality) purpose of
> hadoop?!
>
> Please help.
>
> Thanks
>
> --
> View this message in context: 
> http://old.nabble.com/should-data-be-evenly-distributed-to-each-%28physical%29-node-tp27782215p27782215.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

The distribution of data on each datanode will not be exactly even,
however blocks should be located on different boxes. Check the
namenode webinterface
http://nn-ip:50070 make sure all the datanodes are listed. Make sure
the java DataNode process is running on all the nodes it should be.
Also check the logs of the datanode on the servers with no blocks. You
probably have a misconfiguration.

Shameless plug..sorry..
Take a look at http://www.jointhegrid.com/acod/index.jsp
I made it to generate and push out hadoop configurations. One of the
target audiences was first time multi-node setups. If you get a chance
give it a try and let me know if it helps or makes things worse.

Reply via email to