[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Devaraj Das (JIRA) Fri, 04 Jan 2008 05:40:57 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12555927#action_12555927
 ]


Devaraj Das commented on HADOOP-1876:
-------------------------------------

Alejandro, did you evaluate the approach where you could tweak the jobhistory 
component in hadoop. The jobhistory component already saves the status of 
jobs/tasks on the localfs. It could save the history on the dfs and on top of 
that you would need a wrapper around it to return you JobStatus objects?

> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived 
> (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has 
> been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a 
> crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried 
> by a hadoop client (normally the job submitter or a monitoring component) 
> there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs 
> information upon job completion. This would be done at the time the job is 
> moved to the completed jobs queue. Then when querying the JobTracker for 
> information about a completed job, if it is not found in the memory queue, a 
> lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed 
> job information, for each completed job there would be a directory with the 
> job ID, within that directory all the information about the job: status, 
> jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information 
> should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1876) Persisting completed jobs status

Reply via email to