Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-01 Thread Juwei Shi
Hi,

I faced a problem that the jobs are still running after executing "hadoop
job -kill jobId". I rebooted the cluster but the job still can not be
killed.

The hadoop version is 0.20.2.

Any idea?

Thanks in advance!

-- 
- Juwei


RE: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-05 Thread Jeff.Schmitz
Um kill  -9 "pid" ?

-Original Message-
From: Juwei Shi [mailto:shiju...@gmail.com] 
Sent: Friday, July 01, 2011 10:53 AM
To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
Subject: Jobs are still in running state after executing "hadoop job
-kill jobId"

Hi,

I faced a problem that the jobs are still running after executing
"hadoop
job -kill jobId". I rebooted the cluster but the job still can not be
killed.

The hadoop version is 0.20.2.

Any idea?

Thanks in advance!

-- 
- Juwei



Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-05 Thread Edward Capriolo
On Tue, Jul 5, 2011 at 10:05 AM,  wrote:

> Um kill  -9 "pid" ?
>
> -Original Message-
> From: Juwei Shi [mailto:shiju...@gmail.com]
> Sent: Friday, July 01, 2011 10:53 AM
> To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
> Subject: Jobs are still in running state after executing "hadoop job
> -kill jobId"
>
> Hi,
>
> I faced a problem that the jobs are still running after executing
> "hadoop
> job -kill jobId". I rebooted the cluster but the job still can not be
> killed.
>
> The hadoop version is 0.20.2.
>
> Any idea?
>
> Thanks in advance!
>
> --
> - Juwei
>
>
This happens sometimes. A task gets orphaned from the Task Tracker and never
goes away. It is a good idea to have a nagios check for very old tasks
because the orphans slowly such your memory away especially if the task
launches with a big Xmx. You really *should not* need to be nuking tasks
like this but occasionally it happens.

Edward


Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-05 Thread Juwei Shi
We sometimes have hundreds of map or reduce tasks for a job. I think it is
hard to find all of them and kill the corresponding jvm processes. If we do
not want to restart hadoop, is there any automatic methods?

2011/7/5 

> Um kill  -9 "pid" ?
>
> -Original Message-
> From: Juwei Shi [mailto:shiju...@gmail.com]
> Sent: Friday, July 01, 2011 10:53 AM
> To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
> Subject: Jobs are still in running state after executing "hadoop job
> -kill jobId"
>
> Hi,
>
> I faced a problem that the jobs are still running after executing
> "hadoop
> job -kill jobId". I rebooted the cluster but the job still can not be
> killed.
>
> The hadoop version is 0.20.2.
>
> Any idea?
>
> Thanks in advance!
>
> --
> - Juwei
>
>


Re: Jobs are still in running state after executing "hadoop job -kill jobId"

2011-07-05 Thread Edward Capriolo
On Tue, Jul 5, 2011 at 11:45 AM, Juwei Shi  wrote:

> We sometimes have hundreds of map or reduce tasks for a job. I think it is
> hard to find all of them and kill the corresponding jvm processes. If we do
> not want to restart hadoop, is there any automatic methods?
>
> 2011/7/5 
>
> > Um kill  -9 "pid" ?
> >
> > -Original Message-
> > From: Juwei Shi [mailto:shiju...@gmail.com]
> > Sent: Friday, July 01, 2011 10:53 AM
> > To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
> > Subject: Jobs are still in running state after executing "hadoop job
> > -kill jobId"
> >
> > Hi,
> >
> > I faced a problem that the jobs are still running after executing
> > "hadoop
> > job -kill jobId". I rebooted the cluster but the job still can not be
> > killed.
> >
> > The hadoop version is 0.20.2.
> >
> > Any idea?
> >
> > Thanks in advance!
> >
> > --
> > - Juwei
> >
> >
>

I do not think they pop up very often but after days and months of running a
orphans can be alive. The way I would handle it is write a check that runs
over Nagios (NRPE) and looks for Hadoop task processes using ps, that are
older then a certain age such as 1 day or 1 week etc. Then you can decide if
want nagios to terminate these orphans or do it by hand.

Edward