Re: More Replication on dfs

Alex Loddengaard Tue, 14 Apr 2009 14:00:09 -0700

Ah, I didn't realize you were using HBase.  It could definitely be the case
that HBase is explicitly setting file replication to 1 for certain files.
Unfortunately I don't know enough about HBase to know if or why certain
files are set to have no replication.  This might be a good question for the
HBase user list, or perhaps grepping the HBase source might tell you
something interesting.  Can you confirm that the under-replicated files are
created by HBase?


Alex

On Sat, Apr 11, 2009 at 7:00 AM, Puri, Aseem <aseem.p...@honeywell.com>wrote:

> Alex,
>
> Ouput of $ bin/hadoop fsck / command after running HBase data insert
> command in a table is:
>
> .....
> .....
> .....
> .....
> .....
> /hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
> blk_-5193
> 695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
> .
> /hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
> replicated
> blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
> replica(s).
> .
> /hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
> replicated
>  blk_3934493034551838567_3132. Target Replicas is 3 but found 1
> replica(s).
> .
> /user/HadoopAdmin/hbase table.doc:  Under replicated
> blk_4339521803948458144_103
> 1. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/bin.doc:  Under replicated
> blk_-3661765932004150973_1030
> . Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/file01.txt:  Under replicated
> blk_2744169131466786624_10
> 01. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/file02.txt:  Under replicated
> blk_2021956984317789924_10
> 02. Target Replicas is 3 but found 2 replica(s).
> .
> /user/HadoopAdmin/input/test.txt:  Under replicated
> blk_-3062256167060082648_100
> 4. Target Replicas is 3 but found 2 replica(s).
> ...
> /user/HadoopAdmin/output/part-00000:  Under replicated
> blk_8908973033976428484_1
> 010. Target Replicas is 3 but found 2 replica(s).
> Status: HEALTHY
>  Total size:    48510226 B
>  Total dirs:    492
>  Total files:   439 (Files currently being written: 2)
>  Total blocks (validated):      401 (avg. block size 120973 B) (Total
> open file
> blocks (not validated): 2)
>  Minimally replicated blocks:   401 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       399 (99.50124 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    2
>  Average block replication:     1.3117207
>  Corrupt blocks:                0
>  Missing replicas:              675 (128.327 %)
>  Number of data-nodes:          2
>  Number of racks:               1
>
>
> The filesystem under path '/' is HEALTHY
> Please tell what is wrong.
>
> Aseem
>
> -----Original Message-----
> From: Alex Loddengaard [mailto:a...@cloudera.com]
> Sent: Friday, April 10, 2009 11:04 PM
> To: core-user@hadoop.apache.org
> Subject: Re: More Replication on dfs
>
> Aseem,
>
> How are you verifying that blocks are not being replicated?  Have you
> ran
> fsck?  *bin/hadoop fsck /*
>
> I'd be surprised if replication really wasn't happening.  Can you run
> fsck
> and pay attention to "Under-replicated blocks" and "Mis-replicated
> blocks?"
> In fact, can you just copy-paste the output of fsck?
>
> Alex
>
> On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
> <aseem.p...@honeywell.com>wrote:
>
> >
> > Hi
> >        I also tried the command $ bin/hadoop balancer. But still the
> > same problem.
> >
> > Aseem
> >
> > -----Original Message-----
> > From: Puri, Aseem [mailto:aseem.p...@honeywell.com]
> > Sent: Friday, April 10, 2009 11:18 AM
> > To: core-user@hadoop.apache.org
> > Subject: RE: More Replication on dfs
> >
> > Hi Alex,
> >
> >        Thanks for sharing your knowledge. Till now I have three
> > machines and I have to check the behavior of Hadoop so I want
> > replication factor should be 2. I started my Hadoop server with
> > replication factor 3. After that I upload 3 files to implement word
> > count program. But as my all files are stored on one machine and
> > replicated to other datanodes also, so my map reduce program takes
> input
> > from one Datanode only. I want my files to be on different data node
> so
> > to check functionality of map reduce properly.
> >
> >        Also before starting my Hadoop server again with replication
> > factor 2 I formatted all Datanodes and deleted all old data manually.
> >
> > Please suggest what I should do now.
> >
> > Regards,
> > Aseem Puri
> >
> >
> > -----Original Message-----
> > From: Mithila Nagendra [mailto:mnage...@asu.edu]
> > Sent: Friday, April 10, 2009 10:56 AM
> > To: core-user@hadoop.apache.org
> > Subject: Re: More Replication on dfs
> >
> > To add to the question, how does one decide what is the optimal
> > replication
> > factor for a cluster. For instance what would be the appropriate
> > replication
> > factor for a cluster consisting of 5 nodes.
> > Mithila
> >
> > On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com>
> > wrote:
> >
> > > Did you load any files when replication was set to 3?  If so, you'll
> > have
> > > to
> > > rebalance:
> > >
> > >
> >
> <http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
> > r>
> > > <
> > >
> >
> http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
> > er
> > > >
> > >
> > > Note that most people run HDFS with a replication factor of 3.
> There
> > have
> > > been cases when clusters running with a replication of 2 discovered
> > new
> > > bugs, because replication is so often set to 3.  That said, if you
> can
> > do
> > > it, it's probably advisable to run with a replication factor of 3
> > instead
> > > of
> > > 2.
> > >
> > > Alex
> > >
> > > On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
> <aseem.p...@honeywell.com
> > > >wrote:
> > >
> > > > Hi
> > > >
> > > >            I am a new Hadoop user. I have a small cluster with 3
> > > > Datanodes. In hadoop-site.xml values of dfs.replication property
> is
> > 2
> > > > but then also it is replicating data on 3 machines.
> > > >
> > > >
> > > >
> > > > Please tell why is it happening?
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Aseem Puri
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: More Replication on dfs

Reply via email to