[
https://issues.apache.org/jira/browse/HADOOP-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gautam Kowshik updated HADOOP-1032:
-----------------------------------
Fix Version/s: 0.12.0
Status: Patch Available (was: Open)
putting up first patch to be reviewed.. have added api to add jars to
classpath.
User is expected to do the following:
- upload the jars, once, to a predefined location in DFS
- for every job submission, register those jars with the DFS cache using
DistributedCache.addCacheArchive()
- use conf.addClassPath() or setClassPath() to mark them to be included in the
job's classpath
Comments?
> Support for caching Job JARs
> -----------------------------
>
> Key: HADOOP-1032
> URL: https://issues.apache.org/jira/browse/HADOOP-1032
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Affects Versions: 0.11.2
> Reporter: Gautam Kowshik
> Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HADOOP-1032.patch
>
>
> Often jobs need to be rerun number of times.. like a job that reads from
> crawled data time and again.. so having to upload job jars to every node is
> cumbersome. We need a caching mechanism to boost performance. Here are the
> features for job specific caching of jars/conf files..
> - Ability to resubmit jobs with jars without having to propagate same jar to
> all nodes.
> The idea is to keep a store(path mentioned by user in job.xml?) local to
> the task node so as to speed up task initiation on tasktrackers. Assumes that
> the jar does not change during an MR task.
> - An independent DFS store to upload jars to (Distributed File Cache?).. that
> does not cleanup between jobs.
> This might need user level configuration to indicate to the jobclient to
> upload files to DFSCache instead of the DFS.
> https://issues.apache.org/jira/browse/HADOOP-288 facilitates this. Our local
> cache can be client to the DFS Cache.
> - A standard cache mechanism that checks for changes in the local store and
> picks from dfs if found dirty.
> This does away with versioning. The DFSCache supports a md5 checksum
> check, we can use that.
> Anything else? Suggestions? Thoughts?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.