[ 
https://issues.apache.org/jira/browse/PIG-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880051#comment-13880051
 ] 

Rohini Palaniswamy commented on PIG-2672:
-----------------------------------------

stagingRootDir + user + "/.pig" is good. But if stagingRootDir starts with 
fs.getHomeDirectory(), can you make it stagingRootDir + "/.pig". This will 
avoid creating /user/<username>/<username>/.pig. 

bq. This way, files are reused for 1 week and then thrown away later 
automatically by a Trash cleanup.
   The cache will be reused by other pig jobs during and after the week and we 
will not be modifying the time of the files. So we can't put that under .Trash 
as it will be cleaned up.

> Optimize the use of DistributedCache
> ------------------------------------
>
>                 Key: PIG-2672
>                 URL: https://issues.apache.org/jira/browse/PIG-2672
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.13.0
>
>         Attachments: PIG-2672-5.patch, PIG-2672.patch
>
>
> Pig currently copies jar files to a temporary location in hdfs and then adds 
> them to DistributedCache for each job launched. This is inefficient in terms 
> of 
>    * Space - The jars are distributed to task trackers for every job taking 
> up lot of local temporary space in tasktrackers.
>    * Performance - The jar distribution impacts the job launch time.  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to