Changing the default replication in hadoop-site.xml does not affect files already loaded into HDFS. File replication factor is controlled on a per-file basis.
You need to use the command `hadoop fs -setrep n path...` to set the replication factor to "n" for a particular path already present in HDFS. It can also take a -R for recursive. - Aaron On Fri, Apr 10, 2009 at 10:34 AM, Alex Loddengaard <a...@cloudera.com>wrote: > Aseem, > > How are you verifying that blocks are not being replicated? Have you ran > fsck? *bin/hadoop fsck /* > > I'd be surprised if replication really wasn't happening. Can you run fsck > and pay attention to "Under-replicated blocks" and "Mis-replicated blocks?" > In fact, can you just copy-paste the output of fsck? > > Alex > > On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem <aseem.p...@honeywell.com > >wrote: > > > > > Hi > > I also tried the command $ bin/hadoop balancer. But still the > > same problem. > > > > Aseem > > > > -----Original Message----- > > From: Puri, Aseem [mailto:aseem.p...@honeywell.com] > > Sent: Friday, April 10, 2009 11:18 AM > > To: core-user@hadoop.apache.org > > Subject: RE: More Replication on dfs > > > > Hi Alex, > > > > Thanks for sharing your knowledge. Till now I have three > > machines and I have to check the behavior of Hadoop so I want > > replication factor should be 2. I started my Hadoop server with > > replication factor 3. After that I upload 3 files to implement word > > count program. But as my all files are stored on one machine and > > replicated to other datanodes also, so my map reduce program takes input > > from one Datanode only. I want my files to be on different data node so > > to check functionality of map reduce properly. > > > > Also before starting my Hadoop server again with replication > > factor 2 I formatted all Datanodes and deleted all old data manually. > > > > Please suggest what I should do now. > > > > Regards, > > Aseem Puri > > > > > > -----Original Message----- > > From: Mithila Nagendra [mailto:mnage...@asu.edu] > > Sent: Friday, April 10, 2009 10:56 AM > > To: core-user@hadoop.apache.org > > Subject: Re: More Replication on dfs > > > > To add to the question, how does one decide what is the optimal > > replication > > factor for a cluster. For instance what would be the appropriate > > replication > > factor for a cluster consisting of 5 nodes. > > Mithila > > > > On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com> > > wrote: > > > > > Did you load any files when replication was set to 3? If so, you'll > > have > > > to > > > rebalance: > > > > > > > > <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance > > r> > > > < > > > > > http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc > > er > > > > > > > > > > Note that most people run HDFS with a replication factor of 3. There > > have > > > been cases when clusters running with a replication of 2 discovered > > new > > > bugs, because replication is so often set to 3. That said, if you can > > do > > > it, it's probably advisable to run with a replication factor of 3 > > instead > > > of > > > 2. > > > > > > Alex > > > > > > On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem <aseem.p...@honeywell.com > > > >wrote: > > > > > > > Hi > > > > > > > > I am a new Hadoop user. I have a small cluster with 3 > > > > Datanodes. In hadoop-site.xml values of dfs.replication property is > > 2 > > > > but then also it is replicating data on 3 machines. > > > > > > > > > > > > > > > > Please tell why is it happening? > > > > > > > > > > > > > > > > Regards, > > > > > > > > Aseem Puri > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >