[
https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664053#action_12664053
]
Vivek Ratan commented on HADOOP-5048:
-------------------------------------
This happens with the Capacity Scheduler.
Jobs are killed after they are initialized, and before they are run.
JoBQueuesManager receives an event for the job's status being changed, and
removes it from the run queue. The removal of the job from the wait queue is
left to the initialization poller. The latter is unable to remove the job from
the wait queue because of the bug in HADOOP-5020. Hence the job remains in the
scheduler's wait queue and shows up in the jobqueue_details.jsp page.
I recommend we do the following:
* JobQueuesManager should be responsible for removing a job from both the run
and wait queue when the job completes. It already does when the job's priority
is changed, and so, is already aware that a job can be in both queues and thus
needs to be removed from both. With this fix, the job will be removed from the
wait queue, regardless of the fix for HADOOP-5020, as the JobQueuesManager
receives the job state change event with the old job state.
* The JobInitializationPoller needs some refactoring. It's really doing two
separate things: it builds up a collection of jobs being initialized by walking
through the wait queue. Separately, it needs to clean up job objects in its
collection by walking through them and removing those jobs which have started
running and those that have completed. This makes it responsible for its own
collection and the JobQueueManager responsible for its run/wait queues.
> Sometimes job is still displayed in jobqueue_details page for long time after
> job was killed.
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-5048
> URL: https://issues.apache.org/jira/browse/HADOOP-5048
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Karam Singh
>
> When I tried kill all running job, I noticed that were two jobs were listed
> on jobqueue_details.jsp page page as well as they were also listed under
> failed job on jobtracker.jsp page.
> When I checked status of each that was displayed "killed" and Cleanup task
> status as "Successful", but both jobs were also being on jobqueue_details.jsp
> page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
> Before killing the jobs, status of both jobs was running and no task of from
> them was scheduled.
> I noticed this behavior on 3 different occasions. But is this random, not
> always reproducible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.