Further, look at the namenode file system browser for your cluster to see the chunking in action.
http://wiki.apache.org/hadoop/WebApp%20URLs Roshan On Thu, Jun 18, 2009 at 6:28 AM, Harish Mallipeddi < harish.mallipe...@gmail.com> wrote: > On Thu, Jun 18, 2009 at 3:43 PM, rajeev gupta <graj1...@yahoo.com> wrote: > > > > > I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS > > cluster and replication factor is 1. A large file is there on one of > those > > three cluster machines in its local file system. If I put that file in > HDFS > > will it be divided and distributed across all three machines? I had this > > doubt as HDFS "moving computation is cheaper than moving data". > > > > If file is distributed across all three machines, lots of data transfer > > will be there, whereas, if file is NOT distributed then compute power of > > other machine will be unused. Am I missing something here? > > > > -Raj > > > > > > > Irrespective of what you set as the replication factor, large files will > always be split into chunks (chunk size is what you set as your HDFS > block-size) and they'll be distributed across your entire cluster. > > > -- > Harish Mallipeddi > http://blog.poundbang.in >