[ 
https://issues.apache.org/jira/browse/PIG-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1752:
----------------------------

    Release Note: 
We add a new method to EvalFunc:
public List<String> getCacheFiles();

User can override this method to return a list of hdfs files need to shipped to 
distributed cache. Inside EvalFunc, user can assume these files are already 
exist in distributed cache.

For example:
public class Udfcachetest extends EvalFunc<String> {

    public String exec(Tuple input) throws IOException {
        FileReader fr = new FileReader("./smallfile");
        BufferedReader d = new BufferedReader(fr);
        return d.readLine();
    }

    public List<String> getCacheFiles() {
        List<String> list = new ArrayList<String>(1);
        list.add("/user/pig/tests/data/small#smallfile");
        return list;
    }
}

a = load '1.txt';
b = foreach a generate Udfcachetest(*);
dump b;

  was:
We add a new method to EvalFunc:
public List<String> getCacheFiles();

User can override this method to return a list of hdfs files need to shipped to 
distributed cache. Inside EvalFunc, user can assume these files are already 
exist in distributed cache.


> UDFs should be able to indicate files to load in the distributed cache
> ----------------------------------------------------------------------
>
>                 Key: PIG-1752
>                 URL: https://issues.apache.org/jira/browse/PIG-1752
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: PIG-1752.patch
>
>
> Currently there is no way for a UDF to load a file into the distributed cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to