[ 
https://issues.apache.org/jira/browse/HADOOP-5048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665222#action_12665222
 ] 

Hemanth Yamijala commented on HADOOP-5048:
------------------------------------------

A few comments:

- I feel it would be better to make initializedJobs a Map of jobid to 
JobInProgress. What worries me is that there's no equals or hashcode defined on 
the JobInProgress, and we want to lookup only by JobId for semantic reasons. 
Elsewhere, for e.g. in the JobTracker, if we want to lookup JobInProgress 
objects, we have a map of jobid to JobInProgress. It would be good to retain 
the same model.
- In the javadocs for getJobsToInitialize, where 'n' is introduced, it would be 
nice to mention that the computation of 'n' is shown later. I originally 
thought n was a parameter to the method. In that sense, it was a little 
confusing.
- Also, it mentions that the method 'picks' first n jobs. However, if the job 
is already initialized but not running, we don't pick it again. So, maybe 
saying it 'looks at' first n jobs is a better phrase.
- The pseudo code written as comments in the method is going to be hard to 
maintain, and as such was hard for me to even understand. It is also incorrect 
in a few details. For e.g. it says, if job is found in initialized job list:  
noOfUsers++. What if there are 2 jobs for the same user that are initialized. 
The code doesn't even have a variable for noOfUsers. It is much easier to 
follow code if the comments are inline with the code. A summary of the 
algorithm would be good if it is required (in this case, I don't think it is), 
but not the entire pseudo code at any rate.
- cleanupInitializedJobList has a redundant continue in the if condition 
checking for job status == Running.
- testJobRemovals can be improved. Once again, I feel a summary and not the 
complete steps in pseudo code makes the comment readable. I think it must only 
focus on removals, and not on the state of running jobs. Note that 
testJobMovement already tests the movement of jobs across the queues. I would 
recommend that the test does the following:
-- initialize, run and complete a job and make sure it is removed from both 
lists.
-- initialize and kill a job, make sure it is removed from both lists (this 
condition verifies the actual bug fix)
-- kill a job before initialization, make sure it is removed from the waiting 
list.
And it is more pertinent to check that the job is not present, rather than 
check only for the counts.

> Sometimes job is still displayed in jobqueue_details page for long time after 
> job was killed.
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5048
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5048
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/capacity-sched
>            Reporter: Karam Singh
>            Assignee: Sreekanth Ramakrishnan
>         Attachments: HADOOP-5048-1.patch, HADOOP-5048-2.patch
>
>
> When I tried kill all running job, I noticed that were two jobs were listed 
> on jobqueue_details.jsp page page as well as they were also listed under 
> failed job on jobtracker.jsp page.
> When I checked status of each that was displayed "killed" and Cleanup task 
> status as "Successful", but both jobs were also being on jobqueue_details.jsp 
> page for longtime e.g up to 10 -15 mins after I restarted JobTracker.
> Before killing the jobs, status of both jobs was running and no task of from 
> them was scheduled.
> I noticed this behavior on 3 different occasions. But is this random, not 
> always reproducible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to