[ 
https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664053#action_12664053
 ] 

Vivek Ratan commented on HADOOP-5048:
-------------------------------------

This happens with the Capacity Scheduler. 

Jobs are killed after they are initialized, and before they are run. 
JoBQueuesManager receives an event for the job's status being changed, and 
removes it from the run queue. The removal of the job from the wait queue is 
left to the initialization poller. The latter is unable to remove the job from 
the wait queue because of the bug in HADOOP-5020. Hence the job remains in the 
scheduler's wait queue and shows up in the jobqueue_details.jsp page. 

I recommend we do the following:
* JobQueuesManager should be responsible for removing a job from both the run 
and wait queue when the job completes. It already does when the job's priority 
is changed, and so, is already aware that a job can be in both queues and thus 
needs to be removed from both. With this fix, the job will be removed from the 
wait queue, regardless of the fix for HADOOP-5020, as the JobQueuesManager 
receives the job state change event with the old job state. 
* The JobInitializationPoller needs some refactoring. It's really doing two 
separate things: it builds up a collection of jobs being initialized by walking 
through the wait queue. Separately, it needs to clean up job objects in its 
collection by walking through them and removing those jobs which have started 
running and those that have completed. This makes it responsible for its own 
collection and the JobQueueManager responsible for its run/wait queues. 


> Sometimes job is still displayed in jobqueue_details page for long time after 
> job was killed.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5048
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Karam Singh
>
> When I tried kill all running job, I noticed that were two jobs were listed 
> on jobqueue_details.jsp page page as well as they were also listed under 
> failed job on jobtracker.jsp page.
> When I checked status of each that was displayed "killed" and Cleanup task 
> status as "Successful", but both jobs were also being on jobqueue_details.jsp 
> page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
> Before killing the jobs, status of both jobs was running and no task of from 
> them was scheduled.
> I noticed this behavior on 3 different occasions. But is this random, not 
> always reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to