Li, just to narrow your search, in my experience this is usually caused by
OOME on the JT. Check the logs for OutOfMemoryException, see what you find.
You may need to configure it to retain fewer jobs in memory, or up your
heap.

-Todd

On Thu, Jun 17, 2010 at 5:03 PM, Li, Tan <t...@shopping.com> wrote:

> Thanks for your tips, Ted.
> All of our QA is done on 0.20.1, and I got a feeling it is not version
> related.
> I will run jstack and jmap once the problem happens again and I may need
> your help to analyze the result.
>
> Tan
>
> -----Original Message-----
> From: Ted Yu [mailto:yuzhih...@gmail.com]
> Sent: Thursday, June 17, 2010 2:39 PM
> To: common-user@hadoop.apache.org
> Subject: Re: Hadoop JobTracker Hanging
>
> Is upgrading to hadoop-0.20.2+228 possible ?
>
> Use jstack to get stack trace of job tracker process when this happens
> again.
> Use jmap to get shared object memory maps or heap memory details.
>
> On Thu, Jun 17, 2010 at 2:00 PM, Li, Tan <t...@shopping.com> wrote:
>
> > Folks,
> >
> > I need some help on job tracker.
> > I am running a two hadoop clusters (with 30+ nodes) on Ubuntu. One is
> with
> > version 0.19.1 (apache) and the other one is with version 0.20. 1+169.68
> > (Cloudera).
> >
> > I have the same problem with both the clusters: the job tracker hangs
> > almost once a day.
> > Symptom: The job tracker web page can not be loaded, the command "hadoop
> > job -list" hangs and jobtracker.log file stops being updated.
> > No useful information can I find in the job tracker log file.
> > The symptom is gone after I restart the job tracker and the cluster runs
> > fine for another 20+ hour period. And then the symptom comes back.
> >
> > I do not have serious problem with HDFS.
> >
> > Any ideas about the causes? Any configuration parameter that I can change
> > to reduce the chances of the problem?
> > Any tips for diagnosing and troubleshooting?
> >
> > Thanks!
> >
> > Tan
> >
> >
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to