[ https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879468#comment-13879468 ]
Rohini Palaniswamy commented on PIG-2672: ----------------------------------------- > JobSubmissionFiles.getStagingDir(jobClient, conf); or We create > /tmp/$user.name/jarcache I think we should create /user/$user.name/.pig/filecache (Not calling jarcache as we can have files used in streaming as well) and set the permissions of filecache to 700. That way it is more cleaner (as long-term user data is in user directory) and also don't have to rely on hadoop api's to get .staging dir location. Please do not modify the mtime of the jar. If a distributed cache jar mtime is modified when a job completes, hadoop fails the job. > Optimize the use of DistributedCache > ------------------------------------ > > Key: PIG-2672 > URL: https://issues.apache.org/jira/browse/PIG-2672 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Fix For: 0.13.0 > > Attachments: PIG-2672-5.patch, PIG-2672.patch > > > Pig currently copies jar files to a temporary location in hdfs and then adds > them to DistributedCache for each job launched. This is inefficient in terms > of > * Space - The jars are distributed to task trackers for every job taking > up lot of local temporary space in tasktrackers. > * Performance - The jar distribution impacts the job launch time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)