[
https://issues.apache.org/jira/browse/HADOOP-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amar Kamat updated HADOOP-5083:
-------------------------------
Attachment: HADOOP-5083-v1.9.patch
Attaching a new patch with the following changes :
# _JobHistoryServerProfile_ : The profile for the server. It contains server
info that remains unchanged for the lifetime of the server.
# _JobHistoryServerStatus_ : The status for the server. It contains server info
that frequently changes.
# _Min-job-retire-interval_ : Jobs are kept in the JobTracker's memory for some
time before retiring. The parameter that controls that is
{{mapred.jobtracker.retirejob.interval}}. After this interval of time the job
is purged. There are several reasons for keeping/doing this :
## The {{JobClient}} periodically polls for job status and immediately
removing the job might result into exceptions.
## Test cases are based on the fact that the job status and reports are
available after the job is complete.
# Testcase configuration now supports jobs for 24 hrs just to make sure that
the job is available after the jobs finish.
# Services in {{JobHistoryServer.java}} now follow the naming conventions
suggested by Steve.
# Added a testcase to test if the jobs are removed from the memory.
# Added a testcase to test if jobhistory if properly served.
_Todo :_
# Check if the jsp files in the _webapps/job_ folder can be retained and
provide access to each set (_job_ and _jobhistory_ set) based on the type of
server accessing it?
## what about index.html for jobhistory? As of now there is a separate index
file for the {{JobHistoryServer}}.
## what about security and access to the jsp files? With this patch there is
no way to accidentally allow access to the jobhistory files.
# Make {{JobHistoryServer}} resilient to memory issues. As of today
{{loadhistory.jsp}} loads a job's history from the filesystem and caches the
result in memory for subsequent access to the same job. One important question
to ask is what if multiple users access the {{JobHistoryServer}}
simultaneously?
> Optionally a separate daemon should serve JobHistory
> ----------------------------------------------------
>
> Key: HADOOP-5083
> URL: https://issues.apache.org/jira/browse/HADOOP-5083
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Amar Kamat
> Attachments: HADOOP-5083-v1.2.patch, HADOOP-5083-v1.9.patch
>
>
> Currently the JobTracker serves the JobHistory to end-users off files
> local-disk/hdfs. While running very large clusters with a large user-base
> might result in lots of traffic for job-history which needlessly taxes the
> JobTracker. The proposal is to have an optional daemon which handles serving
> of job-history requests.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.