[
https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891383#action_12891383
]
Richard Ding commented on PIG-787:
----------------------------------
Currently, Pig bundles UDFs and their dependencies (including pig.jar) into
job.jar and sends it to the job track via jobconf. Hadoop then copies the jar
to its hdfs and pushes it to all the nodes. This is essentially the same as
using distributed cache (but Pig doesn't need to copy the jar to hdfs).
One use case of using distributed cache is that some UDF jars are already on
hdfs. In this case, instead of adding them to job.jar, Pig can directly add
them to Hadoop's distributed cache. This will reduce the size of job.jar and
avoid copying those jars to hdfs again.
Is there any other use cases that distributed cache will be helpful to
distribute UDFs and their dependencies?
> Allow UDFs and their dependencies to be distributed via Hadoop's distributed
> cache
> ----------------------------------------------------------------------------------
>
> Key: PIG-787
> URL: https://issues.apache.org/jira/browse/PIG-787
> Project: Pig
> Issue Type: New Feature
> Reporter: Olga Natkovich
> Assignee: Richard Ding
> Fix For: 0.8.0
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.