Aseem,

Regd over-replication, it is mostly app related issue as Alex mentioned.

But if you are concerned about under-replicated blocks in fsck output :

These blocks should not stay under-replicated if you have enough nodes and enough space on them (check NameNode webui).

Try grep-ing for one of the blocks in NameNode log (and datnode logs as well, since you have just 3 nodes).

Raghu.

Puri, Aseem wrote:
Alex,

Ouput of $ bin/hadoop fsck / command after running HBase data insert
command in a table is:

.....
.....
.....
.....
.....
/hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
blk_-5193
695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
replicated
blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
replica(s).
.
/hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
replicated
 blk_3934493034551838567_3132. Target Replicas is 3 but found 1
replica(s).
.
/user/HadoopAdmin/hbase table.doc:  Under replicated
blk_4339521803948458144_103
1. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/bin.doc:  Under replicated
blk_-3661765932004150973_1030
. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file01.txt:  Under replicated
blk_2744169131466786624_10
01. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/file02.txt:  Under replicated
blk_2021956984317789924_10
02. Target Replicas is 3 but found 2 replica(s).
.
/user/HadoopAdmin/input/test.txt:  Under replicated
blk_-3062256167060082648_100
4. Target Replicas is 3 but found 2 replica(s).
...
/user/HadoopAdmin/output/part-00000:  Under replicated
blk_8908973033976428484_1
010. Target Replicas is 3 but found 2 replica(s).
Status: HEALTHY
 Total size:    48510226 B
 Total dirs:    492
 Total files:   439 (Files currently being written: 2)
 Total blocks (validated):      401 (avg. block size 120973 B) (Total
open file
blocks (not validated): 2)
 Minimally replicated blocks:   401 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       399 (99.50124 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     1.3117207
 Corrupt blocks:                0
 Missing replicas:              675 (128.327 %)
 Number of data-nodes:          2
 Number of racks:               1


The filesystem under path '/' is HEALTHY
Please tell what is wrong.

Aseem

-----Original Message-----
From: Alex Loddengaard [mailto:a...@cloudera.com] Sent: Friday, April 10, 2009 11:04 PM
To: core-user@hadoop.apache.org
Subject: Re: More Replication on dfs

Aseem,

How are you verifying that blocks are not being replicated?  Have you
ran
fsck?  *bin/hadoop fsck /*

I'd be surprised if replication really wasn't happening.  Can you run
fsck
and pay attention to "Under-replicated blocks" and "Mis-replicated
blocks?"
In fact, can you just copy-paste the output of fsck?

Alex

On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
<aseem.p...@honeywell.com>wrote:

Hi
       I also tried the command $ bin/hadoop balancer. But still the
same problem.

Aseem

-----Original Message-----
From: Puri, Aseem [mailto:aseem.p...@honeywell.com]
Sent: Friday, April 10, 2009 11:18 AM
To: core-user@hadoop.apache.org
Subject: RE: More Replication on dfs

Hi Alex,

       Thanks for sharing your knowledge. Till now I have three
machines and I have to check the behavior of Hadoop so I want
replication factor should be 2. I started my Hadoop server with
replication factor 3. After that I upload 3 files to implement word
count program. But as my all files are stored on one machine and
replicated to other datanodes also, so my map reduce program takes
input
from one Datanode only. I want my files to be on different data node
so
to check functionality of map reduce properly.

       Also before starting my Hadoop server again with replication
factor 2 I formatted all Datanodes and deleted all old data manually.

Please suggest what I should do now.

Regards,
Aseem Puri


-----Original Message-----
From: Mithila Nagendra [mailto:mnage...@asu.edu]
Sent: Friday, April 10, 2009 10:56 AM
To: core-user@hadoop.apache.org
Subject: Re: More Replication on dfs

To add to the question, how does one decide what is the optimal
replication
factor for a cluster. For instance what would be the appropriate
replication
factor for a cluster consisting of 5 nodes.
Mithila

On Fri, Apr 10, 2009 at 8:20 AM, Alex Loddengaard <a...@cloudera.com>
wrote:

Did you load any files when replication was set to 3?  If so, you'll
have
to
rebalance:


<http://hadoop.apache.org/core/docs/r0.19.1/commands_manual.html#balance
r>
<

http://hadoop.apache.org/core/docs/r0.19.1/hdfs_user_guide.html#Rebalanc
er
Note that most people run HDFS with a replication factor of 3.
There
have
been cases when clusters running with a replication of 2 discovered
new
bugs, because replication is so often set to 3.  That said, if you
can
do
it, it's probably advisable to run with a replication factor of 3
instead
of
2.

Alex

On Thu, Apr 9, 2009 at 9:56 PM, Puri, Aseem
<aseem.p...@honeywell.com
wrote:
Hi

           I am a new Hadoop user. I have a small cluster with 3
Datanodes. In hadoop-site.xml values of dfs.replication property
is
2
but then also it is replicating data on 3 machines.



Please tell why is it happening?



Regards,

Aseem Puri







Reply via email to