[ 
https://issues.apache.org/jira/browse/HADOOP-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated HADOOP-5083:
-------------------------------

    Attachment: HADOOP-5083-v1.9.patch

Attaching a new patch with the following changes :
# _JobHistoryServerProfile_ : The profile for the server. It contains server 
info that remains unchanged for the lifetime of the server.
# _JobHistoryServerStatus_ : The status for the server. It contains server info 
that frequently changes.
# _Min-job-retire-interval_ : Jobs are kept in the JobTracker's memory for some 
time before retiring. The parameter that controls that is 
{{mapred.jobtracker.retirejob.interval}}. After this interval of time the job 
is purged. There are several reasons for keeping/doing this :
 ## The {{JobClient}} periodically polls for job status and immediately 
removing the job might result into exceptions.
 ## Test cases are based on the fact that the job status and reports are 
available after the job is complete.
# Testcase configuration now supports jobs for 24 hrs just to make sure that 
the job is available after the jobs finish.
# Services in {{JobHistoryServer.java}} now follow the naming conventions 
suggested by Steve.
# Added a testcase to test if the jobs are removed from the memory.
# Added a testcase to test if jobhistory if properly served.

_Todo :_
# Check if the jsp files in the _webapps/job_ folder can be retained and 
provide access to each set (_job_ and _jobhistory_ set) based on the type of 
server accessing it?
 ## what about index.html for jobhistory? As of now there is a separate index 
file for the {{JobHistoryServer}}.
 ## what about security and access to the jsp files? With this patch there is 
no way to accidentally allow access to the jobhistory files. 
# Make {{JobHistoryServer}} resilient to memory issues. As of today 
{{loadhistory.jsp}} loads a job's history from the filesystem and caches the 
result in memory for subsequent access to the same job. One important question 
to ask is what if multiple users access the {{JobHistoryServer}} 
simultaneously? 

> Optionally a separate daemon should serve JobHistory
> ----------------------------------------------------
>
>                 Key: HADOOP-5083
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5083
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Amar Kamat
>         Attachments: HADOOP-5083-v1.2.patch, HADOOP-5083-v1.9.patch
>
>
> Currently the JobTracker serves the JobHistory to end-users off files 
> local-disk/hdfs. While running very large clusters with a large user-base 
> might result in lots of traffic for job-history which needlessly taxes the 
> JobTracker. The proposal is to have an optional daemon which handles serving 
> of job-history requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to