[ 
https://issues.apache.org/jira/browse/PIG-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-1752:
----------------------------

    Status: Patch Available  (was: Open)

This patch changes the EvalFunc interface to allow UDFs to declare a list of 
files they want to put in the distributed cache.  It adds a new method 

{code}
    /**
     * Allow a UDF to specify a list of files it would like placed in the 
distributed
     * cache.  These files will be put in the cache for every job the UDF is 
used in.
     * The default implementation returns null.
     * @return A list of files
     */
    public List<String> getCacheFiles() {
        return null;
    }
    
{code}

This change is backward compatible since EvalFunc is an abstract class and the 
default implementation returns null. 

Any files returned by getCacheFiles are captured and placed in the physical 
plan during logical->physical translation.  The JobControlCompiler then visits 
each UDF and adds the files returned to the list of files to load into the 
distributed cache for this job.

No special handling is provided for the files.  Users have to assure they are 
already on HDFS.  The filename should be of the form:

hdfs://namenode/path#symlink

where symlink is the name that the file will be linked into the tasks local 
directory under.  The UDF can then access the file in the backend by opening 
that symlink as a local file.


> UDFs should be able to indicate files to load in the distributed cache
> ----------------------------------------------------------------------
>
>                 Key: PIG-1752
>                 URL: https://issues.apache.org/jira/browse/PIG-1752
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>         Attachments: PIG-1752.patch
>
>
> Currently there is no way for a UDF to load a file into the distributed cache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to