To add to the question, how does one decide what is the optimal replication factor for a cluster. For instance what would be the appropriate replication factor for a cluster consisting of 5 nodes. Mithila
On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com> wrote: > Did you load any files when replication was set to 3? If so, you'll have > to > rebalance: > > <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balancer> > < > http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalancer > > > > Note that most people run HDFS with a replication factor of 3. There have > been cases when clusters running with a replication of 2 discovered new > bugs, because replication is so often set to 3. That said, if you can do > it, it's probably advisable to run with a replication factor of 3 instead > of > 2. > > Alex > > On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem <aseem.p...@honeywell.com > >wrote: > > > Hi > > > > I am a new Hadoop user. I have a small cluster with 3 > > Datanodes. In hadoop-site.xml values of dfs.replication property is 2 > > but then also it is replicating data on 3 machines. > > > > > > > > Please tell why is it happening? > > > > > > > > Regards, > > > > Aseem Puri > > > > > > > > > > > > >