I have this doubt regarding HDFS. Suppose I have 3 machines in my HDFS cluster 
and replication factor is 1. A large file is there on one of those three 
cluster machines in its local file system. If I put that file in HDFS will it 
be divided and distributed across all three machines? I had this doubt as HDFS 
"moving computation is cheaper than moving data". 

If file is distributed across all three machines, lots of data transfer will be 
there, whereas, if file is NOT distributed then compute power of other machine 
will be unused. Am I missing something here?

-Raj



      

Reply via email to