On Wed, May 9, 2012 at 6:04 PM, Serge Blazhiyevskyy <
serge.blazhiyevs...@nice.com> wrote:

>
> Whats the response from fsck look like?
>
>
[snip lots of stuff about under replicated blocks]

......Status: HEALTHY
 Total size:    246858876262 B (Total open files size: 372 B)
 Total dirs:    14914
 Total files:   39248 (Files currently being written: 4)
 Total blocks (validated):      40657 (avg. block size 6071743 B) (Total
open file blocks (not validated): 4)
 Minimally replicated blocks:   40657 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       1410 (3.4680374 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     2.9911454
 Corrupt blocks:                0
 Missing replicas:              2831 (2.3279145 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Wed May 09 19:19:11 UTC 2012 in 980 milliseconds


Further information to add to this, it appear to be affecting 2 nodes in
the cluster, one more than the other though.  In the last couple of hours
one of the nodes has also experienced high load, this has now dropped but
both of these nodes are now considered dead by the namenode.  The first box
load is still increasing, currently 234! I think I might have to reboot it
via IPMI.


>
> hadoop fsck /
>
>
> It might be the case that some of the blocks are misreplicated
>
>
> Serge
>
> Hadoopway.blogspot.com
>
>
>
>
>
> On 5/9/12 9:58 AM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote:
>
> >On Wed, May 9, 2012 at 5:56 PM, Serge Blazhiyevskyy <
> >serge.blazhiyevs...@nice.com> wrote:
> >
> >> Take a look at your data distribution for that cluster. Maybe, it is
> >> unbalanced.
> >>
> >>
> >> Run balancer, if it isÅ 
> >>
> >
> >The cluster is balanced, I ran balancer yesterday.  Oddly enough the
> >problem started after I had run the balancer.
> >
> >I'm running CDH3 btw.
> >
> >
> >
> >>
> >> Regards,
> >> Serge
> >>
> >> hadoopway.blogspot.com
> >>
> >>
> >>
> >> On 5/9/12 9:52 AM, "Darrell Taylor" <darrell.tay...@gmail.com> wrote:
> >>
> >> >Hi,
> >> >
> >> >I wonder if someone could give some pointers with a problem I'm having?
> >> >
> >> >I have a 7 machine cluster setup for testing and we have been pouring
> >>data
> >> >into it for a week without issue, have learnt several thing along the
> >>way
> >> >and solved all the problems up to now by searching online, but now I'm
> >> >stuck.  One of the data nodes decided to have a load of 70+ this
> >>morning,
> >> >stopping datanode and tasktracker brought it back to normal, but every
> >> >time
> >> >I start the datanode again the load shoots through the roof, and all I
> >>get
> >> >in the logs is :
> >> >
> >> >STARTUP_MSG: Starting DataNode
> >> >
> >> >
> >> >STARTUP_MSG:   host = pl464/10.20.16.64
> >> >
> >> >
> >> >STARTUP_MSG:   args = []
> >> >
> >> >
> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
> >> >
> >> >
> >> >STARTUP_MSG:   build =
> >>
> >>>file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.
> >>>19
> >> >7-1~squeeze
> >> >-************************************************************/
> >> >
> >> >
> >> >2012-05-09 16:12:05,925 INFO
> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >already
> >> >set up for Hadoop, not re-installing.
> >> >
> >> >2012-05-09 16:12:06,139 INFO
> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
> >> >already
> >> >set up for Hadoop, not re-installing.
> >> >
> >> >Nothing else.
> >> >
> >> >The load seems to max out only 1 of the CPUs, but the machine becomes
> >> >*very* unresponsive
> >> >
> >> >Anybody got any pointers of things I can try?
> >> >
> >> >Thanks
> >> >Darrell.
> >>
> >>
>
>

Reply via email to