[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879468#comment-13879468
 ] 

Rohini Palaniswamy commented on PIG-2672:
-----------------------------------------

> JobSubmissionFiles.getStagingDir(jobClient, conf);  or We create 
> /tmp/$user.name/jarcache 
     I think we should create /user/$user.name/.pig/filecache (Not calling 
jarcache as we can have files used in streaming as well) and set the 
permissions of filecache to 700. That way it is more cleaner (as long-term user 
data is in user directory) and also don't have to rely on hadoop api's to get 
.staging dir location.  Please do not modify the mtime of the jar. If a 
distributed cache jar mtime is modified when a job completes, hadoop fails the 
job.

> Optimize the use of DistributedCache
> ------------------------------------
>
>                 Key: PIG-2672
>                 URL: https://issues.apache.org/jira/browse/PIG-2672
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.13.0
>
>         Attachments: PIG-2672-5.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>    * Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>    * Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to