That setting will instruct future file writes to replicate two-fold. This has no bearing on existing files; replication can be set on a per-file basis, so they already have their replications set in the DFS indivdually.
Use the command: bin/hadoop fs -setrep [-R] repl_factor filename... to change the replication factor for files already in HDFS - Aaron On Wed, Apr 15, 2009 at 10:04 PM, Puri, Aseem <aseem.p...@honeywell.com>wrote: > Hi > My problem is not that my data is under replicated. I have 3 > data nodes. In my hadoop-site.xml I also set the configuration as: > > <property> > <name>dfs.replication</name> > <value>2</value> > </property> > > But after this also data is replicated on 3 nodes instead of two nodes. > > Now, please tell what can be the problem? > > Thanks & Regards > Aseem Puri > > -----Original Message----- > From: Raghu Angadi [mailto:rang...@yahoo-inc.com] > Sent: Wednesday, April 15, 2009 2:58 AM > To: core-user@hadoop.apache.org > Subject: Re: More Replication on dfs > > Aseem, > > Regd over-replication, it is mostly app related issue as Alex mentioned. > > But if you are concerned about under-replicated blocks in fsck output : > > These blocks should not stay under-replicated if you have enough nodes > and enough space on them (check NameNode webui). > > Try grep-ing for one of the blocks in NameNode log (and datnode logs as > well, since you have just 3 nodes). > > Raghu. > > Puri, Aseem wrote: > > Alex, > > > > Ouput of $ bin/hadoop fsck / command after running HBase data insert > > command in a table is: > > > > ..... > > ..... > > ..... > > ..... > > ..... > > /hbase/test/903188508/tags/info/4897652949308499876: Under replicated > > blk_-5193 > > 695109439554521_3133. Target Replicas is 3 but found 1 replica(s). > > . > > /hbase/test/903188508/tags/mapfiles/4897652949308499876/data: Under > > replicated > > blk_-1213602857020415242_3132. Target Replicas is 3 but found 1 > > replica(s). > > . > > /hbase/test/903188508/tags/mapfiles/4897652949308499876/index: Under > > replicated > > blk_3934493034551838567_3132. Target Replicas is 3 but found 1 > > replica(s). > > . > > /user/HadoopAdmin/hbase table.doc: Under replicated > > blk_4339521803948458144_103 > > 1. Target Replicas is 3 but found 2 replica(s). > > . > > /user/HadoopAdmin/input/bin.doc: Under replicated > > blk_-3661765932004150973_1030 > > . Target Replicas is 3 but found 2 replica(s). > > . > > /user/HadoopAdmin/input/file01.txt: Under replicated > > blk_2744169131466786624_10 > > 01. Target Replicas is 3 but found 2 replica(s). > > . > > /user/HadoopAdmin/input/file02.txt: Under replicated > > blk_2021956984317789924_10 > > 02. Target Replicas is 3 but found 2 replica(s). > > . > > /user/HadoopAdmin/input/test.txt: Under replicated > > blk_-3062256167060082648_100 > > 4. Target Replicas is 3 but found 2 replica(s). > > ... > > /user/HadoopAdmin/output/part-00000: Under replicated > > blk_8908973033976428484_1 > > 010. Target Replicas is 3 but found 2 replica(s). > > Status: HEALTHY > > Total size: 48510226 B > > Total dirs: 492 > > Total files: 439 (Files currently being written: 2) > > Total blocks (validated): 401 (avg. block size 120973 B) (Total > > open file > > blocks (not validated): 2) > > Minimally replicated blocks: 401 (100.0 %) > > Over-replicated blocks: 0 (0.0 %) > > Under-replicated blocks: 399 (99.50124 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 2 > > Average block replication: 1.3117207 > > Corrupt blocks: 0 > > Missing replicas: 675 (128.327 %) > > Number of data-nodes: 2 > > Number of racks: 1 > > > > > > The filesystem under path '/' is HEALTHY > > Please tell what is wrong. > > > > Aseem > > > > -----Original Message----- > > From: Alex Loddengaard [mailto:a...@cloudera.com] > > Sent: Friday, April 10, 2009 11:04 PM > > To: core-user@hadoop.apache.org > > Subject: Re: More Replication on dfs > > > > Aseem, > > > > How are you verifying that blocks are not being replicated? Have you > > ran > > fsck? *bin/hadoop fsck /* > > > > I'd be surprised if replication really wasn't happening. Can you run > > fsck > > and pay attention to "Under-replicated blocks" and "Mis-replicated > > blocks?" > > In fact, can you just copy-paste the output of fsck? > > > > Alex > > > > On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem > > <aseem.p...@honeywell.com>wrote: > > > >> Hi > >> I also tried the command $ bin/hadoop balancer. But still the > >> same problem. > >> > >> Aseem > >> > >> -----Original Message----- > >> From: Puri, Aseem [mailto:aseem.p...@honeywell.com] > >> Sent: Friday, April 10, 2009 11:18 AM > >> To: core-user@hadoop.apache.org > >> Subject: RE: More Replication on dfs > >> > >> Hi Alex, > >> > >> Thanks for sharing your knowledge. Till now I have three > >> machines and I have to check the behavior of Hadoop so I want > >> replication factor should be 2. I started my Hadoop server with > >> replication factor 3. After that I upload 3 files to implement word > >> count program. But as my all files are stored on one machine and > >> replicated to other datanodes also, so my map reduce program takes > > input > >> from one Datanode only. I want my files to be on different data node > > so > >> to check functionality of map reduce properly. > >> > >> Also before starting my Hadoop server again with replication > >> factor 2 I formatted all Datanodes and deleted all old data manually. > >> > >> Please suggest what I should do now. > >> > >> Regards, > >> Aseem Puri > >> > >> > >> -----Original Message----- > >> From: Mithila Nagendra [mailto:mnage...@asu.edu] > >> Sent: Friday, April 10, 2009 10:56 AM > >> To: core-user@hadoop.apache.org > >> Subject: Re: More Replication on dfs > >> > >> To add to the question, how does one decide what is the optimal > >> replication > >> factor for a cluster. For instance what would be the appropriate > >> replication > >> factor for a cluster consisting of 5 nodes. > >> Mithila > >> > >> On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com> > >> wrote: > >> > >>> Did you load any files when replication was set to 3? If so, you'll > >> have > >>> to > >>> rebalance: > >>> > >>> > > > <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance > >> r> > >>> < > >>> > > > http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc > >> er > >>> Note that most people run HDFS with a replication factor of 3. > > There > >> have > >>> been cases when clusters running with a replication of 2 discovered > >> new > >>> bugs, because replication is so often set to 3. That said, if you > > can > >> do > >>> it, it's probably advisable to run with a replication factor of 3 > >> instead > >>> of > >>> 2. > >>> > >>> Alex > >>> > >>> On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem > > <aseem.p...@honeywell.com > >>>> wrote: > >>>> Hi > >>>> > >>>> I am a new Hadoop user. I have a small cluster with 3 > >>>> Datanodes. In hadoop-site.xml values of dfs.replication property > > is > >> 2 > >>>> but then also it is replicating data on 3 machines. > >>>> > >>>> > >>>> > >>>> Please tell why is it happening? > >>>> > >>>> > >>>> > >>>> Regards, > >>>> > >>>> Aseem Puri > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >