[ 
https://issues.apache.org/jira/browse/PIG-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai closed PIG-1301.
---------------------------


> Problem pruning columns with UDF
> --------------------------------
>
>                 Key: PIG-1301
>                 URL: https://issues.apache.org/jira/browse/PIG-1301
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Andrew Groh
>             Fix For: 0.7.0
>
>
> I just upgraded to pig 0.6.0.
> I have a pig file like
> raw = load 'foo.csv' using PigStorage() as (field1:chararray, 
> field2:chararray);
> define contains com.mycompany.pig.Contains();
> rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
> reqcnt = foreach rawactions generate field1;
> dump reqcnt
> When I try to run this code, I get an error:
> Problem with input: (Name: Project 1-40 Projections: [1] Overloaded: false 
> Operator Key: 1-40) of User-defined function: (Name: UserFunc 1-39 function: 
> com.mycompany.pig.Contains Operator Key: 1-39)
> Thrown from line 98 of LOUserFunction.java
> This was caused by another FrontEndException 
> Attempt to access field: 1 from schema: {field1: chararray}
> from Schema.java
> I also investigated changing the pig code
> if you change
> rawactions = foreach raw generate contains(field1, field2) as junk,  field1;
> to either
> rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
> or
> rawactions = foreach raw generate contains(field2, field2) as junk,  field1;
> or if you change
> reqcnt = foreach rawactions generate field1;
> to
> reqcnt = foreach rawactions generate field1, junk;
> It all works correctly.
> The problem appears to be that it prunes out field2, but then gets confused 
> and does not prune out the plan associated with the UDF contains, since 
> field1 is not pruned.  So if the UDF only references field2 it will get 
> removed, if it only references field1 the field will have not been pruned and 
> it can run.
> I eventually tracked this down to the code around 947 of LOForEach.java
>             for (LOProject loProject : projectFinder.getProjectSet()) {
>                 Pair<Integer, Integer> pair = new Pair<Integer, Integer>(0,
>                         loProject.getCol());
>                 if (!columns.contains(pair)) {
>                     allPruned = false;
>                     break;
>                 }
>             }
>             if (allPruned) {
>                 planToRemove.add(i);
>             }
> In the example pig, allPruned is false for the plan associated the UDF.  This 
> is because field1 is both a column for the UDF and for the ForEach in 
> general.  Since field1 is not pruned, the plan is not removed and bad things 
> happen later.
> I don't really understand the pruning code all that well, so I don't have a 
> fix for it.  I hope that it will be clear to someone who understands this 
> code better.  I can provide a better test case for this if necessary.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to