[
https://issues.apache.org/jira/browse/HADOOP-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577391#action_12577391
]
Amareshwari Sriramadasu commented on HADOOP-2622:
-------------------------------------------------
For addressing the issue itself, i.e. fixing -file to use distributed cache ,
we can do the following:
1. Leave streaming jar as job.jar
2. Create a jar file from the files/dir given using -file oprion
3. Copy the jar file, created in step 2, to the dfs at a job specific location.
say submitJobDir/_jobFiles (${mapred.system.dir}/jobid/_jobFiles)
4. add the jar file to the distributed cache using addArchiveToClassPath
Thoughts?
> Fix -file option in Streaming to use Distributed Cache
> ------------------------------------------------------
>
> Key: HADOOP-2622
> URL: https://issues.apache.org/jira/browse/HADOOP-2622
> Project: Hadoop Core
> Issue Type: Bug
> Components: contrib/streaming
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.17.0
>
> Attachments: patch-2622.txt
>
>
> The -file option works by putting the script into the job's jar file by
> unjar-ing, copying and then jar-ing it again.
> We should rework the -file option to use the DistributedCache and the symlink
> option it provides.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.