[ 
https://issues.apache.org/jira/browse/HADOOP-1876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557173#action_12557173
 ] 

Alejandro Abdelnur commented on HADOOP-1876:
--------------------------------------------

The JobHistory has a log file per job for the details of the file, but it has a 
single master file with some basic info. This makes difficult to use DFS as 
append is not supported at the moment.

The reason for using DFS for storing the job info is that if the JobTracker box 
dies I could bring it up on another box and still have the job info of 
completed jobs from previous runs. The configuration of the directory could 
point to a local FS directory if this is not a concern.

The way that JobHistory splits and process the log information it will make 
difficult and require much more code (than with the proposed patch) to recreate 
the RunningJob object out of the JobHistory LOG files. 
RunningJob/Counters/JobStatus/CompletedTasks are all Writable implementations, 
so writing/reading is already taken care of by them automatically.

It seems to me that it would be much easier to retrofit the JobHistory to use 
info out the files the patch is writing that the other way around.

In my opinion the use of the of the JObHistory log files is very different from 
the use of the proposed patch.

Also note that by default the proposed patch does not persist any job info, 
only if explicitly configured. So there is no penalty on performance/storage if 
it is not activated.


> Persisting completed jobs status
> --------------------------------
>
>                 Key: HADOOP-1876
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1876
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: patch1876.txt, patch1876.txt
>
>
> Currently the JobTracker keeps information about completed jobs in memory. 
> This information is  flushed from the cache when it has outlived 
> (#RETIRE_JOB_INTERVAL) or because the limit of completed jobs in memory has 
> been reach (#MAX_COMPLETE_USER_JOBS_IN_MEMORY). 
> Also, if the JobTracker is restarted (due to being recycled or due to a 
> crash) information about completed jobs is lost.
> If any of the above scenarios happens before the job information is queried 
> by a hadoop client (normally the job submitter or a monitoring component) 
> there is no way to obtain such information.
> A way to avoid this is the JobTracker to persist in DFS the completed jobs 
> information upon job completion. This would be done at the time the job is 
> moved to the completed jobs queue. Then when querying the JobTracker for 
> information about a completed job, if it is not found in the memory queue, a 
> lookup  in DFS would be done to retrieve the completed job information. 
> A directory in DFS (under mapred/system) would be used to persist completed 
> job information, for each completed job there would be a directory with the 
> job ID, within that directory all the information about the job: status, 
> jobprofile, counters and completion events.
> A configuration property will indicate for how log persisted job information 
> should be kept in DFS. After such period it will be cleaned up automatically.
> This improvement would not introduce API changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to