[ 
https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891383#action_12891383
 ] 

Richard Ding commented on PIG-787:
----------------------------------

Currently, Pig bundles UDFs and their dependencies (including pig.jar) into 
job.jar and sends it to the job track via jobconf. Hadoop then copies the jar 
to its hdfs and pushes it to all the nodes. This is essentially the same as 
using distributed cache (but Pig doesn't need to copy the jar to hdfs).

One use case of using distributed cache is that some UDF jars are already on 
hdfs. In this case, instead of adding them to job.jar, Pig can directly add 
them to Hadoop's distributed cache. This will reduce the size of job.jar and 
avoid copying those jars to hdfs again.

Is there any other use cases that distributed cache will be helpful to 
distribute UDFs and their dependencies? 

> Allow UDFs and their dependencies to be distributed via Hadoop's distributed 
> cache
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-787
>                 URL: https://issues.apache.org/jira/browse/PIG-787
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to