To add to the question, how does one decide what is the optimal replication
factor for a cluster. For instance what would be the appropriate replication
factor for a cluster consisting of 5 nodes.
Mithila

On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com> wrote:

> Did you load any files when replication was set to 3?  If so, you'll have
> to
> rebalance:
>
> <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balancer>
> <
> http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalancer
> >
>
> Note that most people run HDFS with a replication factor of 3.  There have
> been cases when clusters running with a replication of 2 discovered new
> bugs, because replication is so often set to 3.  That said, if you can do
> it, it's probably advisable to run with a replication factor of 3 instead
> of
> 2.
>
> Alex
>
> On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem <aseem.p...@honeywell.com
> >wrote:
>
> > Hi
> >
> >            I am a new Hadoop user. I have a small cluster with 3
> > Datanodes. In hadoop-site.xml values of dfs.replication property is 2
> > but then also it is replicating data on 3 machines.
> >
> >
> >
> > Please tell why is it happening?
> >
> >
> >
> > Regards,
> >
> > Aseem Puri
> >
> >
> >
> >
> >
> >
>

Reply via email to