[
https://issues.apache.org/jira/browse/HADOOP-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635317#action_12635317
]
Amar Kamat commented on HADOOP-4261:
------------------------------------
Few comments w.r.t job-recovery
1) Upon restart, the task-completion-events/task-reports for the setup tasks
should also match.
2) It would make more sense to call the job run-state as {{SETUP}} when
{{logInited()}} is invoked. While recovering, check if the SETUP state is
reached before calling {{init()}}.
3) Check if {{JobInProgress.obtainSetupTask()}} can reuse
{{JobInProgress.addRunningTaskToTIP()}}.
4) I think {{JobInProgress.canLaunchSetupTask()}} can also be written as
{code}
private synchronized boolean canLaunchSetupTask() {
// check if the job is in PREP, initialized and not setup
return status.getRunState() == JobStatus.PREP && tasksInited.get() &&
!launchedSetup;
}
{code}
5) I dont see any code that deals with setup task in job-recovery i.e
recovery-manager. Just make sure that the effect of scheduling setup tasks
before restart is same as the effect of replaying it from history. I assume
that when the JIP is given a task-attempt update, it figures out if the task if
setup or not. Ideally the way setup is launched from a recvory-manager should
mimic the way its invoked from the real(live) jobtracker.
> Jobs failing in the init stage will never cleanup
> -------------------------------------------------
>
> Key: HADOOP-4261
> URL: https://issues.apache.org/jira/browse/HADOOP-4261
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Reporter: Amar Kamat
> Assignee: Amareshwari Sriramadasu
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: patch-4261.txt
>
>
> Pre HADOOP-3150, if the job fails in the init stage, {{job.kill()}} was
> called. This used to make sure that the job was cleaned up w.r.t
> - staus set to KILLED/FAILED
> - job files from the system dir are deleted
> - closing of job history files
> - making jobtracker aware of this through {{jobTracker.finalizeJob()}}
> - cleaning up the data structures via {{JobInProgress.garbageCollect()}}
> Now if the job fails in the init stage, {{job.fail()}} is called which doesnt
> do the cleanup. HADOOP-3150 introduces cleanup tasks which are launched once
> the job completes i.e killed/failed/succeeded. Jobtracker will never
> consider this job for scheduling as the job will be in the {{PREP}} state
> forever.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.