That's real weird..

If you can reproduce this after a reboot, I'd recommend letting the DN
run for a minute, and then capturing a "jstack <pid of dn>" as well as
the output of "top -H -p <pid of dn> -b -n 5" and send it to the list.

What JVM/JDK are you using? What OS version?

-Todd


On Wed, May 9, 2012 at 11:57 PM, Darrell Taylor
<darrell.tay...@gmail.com> wrote:
> On Wed, May 9, 2012 at 10:52 PM, Raj Vishwanathan <rajv...@yahoo.com> wrote:
>
>> The picture either too small or too pixelated for my eyes :-)
>>
>
> There should be a zoom option in the top right of the page that allows you
> to view it full size
>
>
>>
>> Can you login to the box and send the output of top? If the system is
>> unresponsive, it has to be something more than an unbalanced hdfs cluster,
>> methinks.
>>
>
> Sorry, I'm unable to login to the box, it's completely unresponsive.
>
>
>>
>> Raj
>>
>>
>>
>> >________________________________
>> > From: Darrell Taylor <darrell.tay...@gmail.com>
>> >To: common-user@hadoop.apache.org; Raj Vishwanathan <rajv...@yahoo.com>
>> >Sent: Wednesday, May 9, 2012 2:40 PM
>> >Subject: Re: High load on datanode startup
>> >
>> >On Wed, May 9, 2012 at 10:23 PM, Raj Vishwanathan <rajv...@yahoo.com>
>> wrote:
>> >
>> >> When you say 'load', what do you mean? CPU load or something else?
>> >>
>> >
>> >I mean in the unix sense of load average, i.e. top would show a load of
>> >(currently) 376.
>> >
>> >Looking at Ganglia stats for the box it's not CPU load as such, the graphs
>> >shows actual CPU usage as 30%, but the number of running processes is
>> >simply growing in a linear manner - screen shot of ganglia page here :
>> >
>> >
>> https://picasaweb.google.com/lh/photo/Q0uFSzyLiriDuDnvyRUikXVR0iWwMibMfH0upnTwi28?feat=directlink
>> >
>> >
>> >
>> >>
>> >> Raj
>> >>
>> >>
>> >>
>> >> >________________________________
>> >> > From: Darrell Taylor <darrell.tay...@gmail.com>
>> >> >To: common-user@hadoop.apache.org
>> >> >Sent: Wednesday, May 9, 2012 9:52 AM
>> >> >Subject: High load on datanode startup
>> >> >
>> >> >Hi,
>> >> >
>> >> >I wonder if someone could give some pointers with a problem I'm having?
>> >> >
>> >> >I have a 7 machine cluster setup for testing and we have been pouring
>> data
>> >> >into it for a week without issue, have learnt several thing along the
>> way
>> >> >and solved all the problems up to now by searching online, but now I'm
>> >> >stuck.  One of the data nodes decided to have a load of 70+ this
>> morning,
>> >> >stopping datanode and tasktracker brought it back to normal, but every
>> >> time
>> >> >I start the datanode again the load shoots through the roof, and all I
>> get
>> >> >in the logs is :
>> >> >
>> >> >STARTUP_MSG: Starting DataNode
>> >> >
>> >> >
>> >> >STARTUP_MSG:   host = pl464/10.20.16.64
>> >> >
>> >> >
>> >> >STARTUP_MSG:   args = []
>> >> >
>> >> >
>> >> >STARTUP_MSG:   version = 0.20.2-cdh3u3
>> >> >
>> >> >
>> >> >STARTUP_MSG:   build =
>> >>
>> >>
>> >file:///data/1/tmp/nightly_2012-03-20_13-13-48_3/hadoop-0.20-0.20.2+923.197-1~squeeze
>> >> >-************************************************************/
>> >> >
>> >> >
>> >> >2012-05-09 16:12:05,925 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >2012-05-09 16:12:06,139 INFO
>> >> >org.apache.hadoop.security.UserGroupInformation: JAAS Configuration
>> >> already
>> >> >set up for Hadoop, not re-installing.
>> >> >
>> >> >Nothing else.
>> >> >
>> >> >The load seems to max out only 1 of the CPUs, but the machine becomes
>> >> >*very* unresponsive
>> >> >
>> >> >Anybody got any pointers of things I can try?
>> >> >
>> >> >Thanks
>> >> >Darrell.
>> >> >
>> >> >
>> >> >
>> >>
>> >
>> >
>> >
>>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to