I'm failing to see your question. If you only want one copy of the data stored, but you want the 128MB to be replicated over your data nodes, then you need to set the replication factor to 1. I'm surprised that it let you set the factor to 0.
IF you were to set the replication value any higher than 1, then multiple copies would exist, for redundancy, and would be distributed across the three nodes.
hope this helps - Grant On Jul 14 2008, Yi Zhao wrote:
hi, all I have a hadoop cluster which have one master and three datanodes. I want to put a local file about 128M intpu hdfs, I have set the block-size to 10M when I set the replication to 0, I found that all the data distributed to the node which I execute the command 'bin/hadoop dfs -put file.gz input', so this node's disk space is used about 128M, but other nodes has no disk space used. when I set the replication to 3, I found that every nodes have the same data, so every nodes is about 128M disk space used. what should I do? I'm using hadoop-0.15.2. any one can help me? thanks.
-- Grant Mackey UCF Researcher Eng. III Rm238
