[ 
https://issues.apache.org/jira/browse/PIG-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-435:
-------------------------------

    Fix Version/s: 0.9.0

> wrong columns produced if incomplete definition provided during load
> --------------------------------------------------------------------
>
>                 Key: PIG-435
>                 URL: https://issues.apache.org/jira/browse/PIG-435
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> Scrip:
> A = load 'studenttab10k' as (name); -- note that data has more than 1 column
> B = load 'votertab10k' as (name, age, reg, contrib);
> D = COGROUP A by name, B by name;  
> E = foreach D generate flatten(A), flatten(B); 
> F = foreach E generate registration, contr;
> dump F;
> The dump produces the wrong columns. This is because even though we declared 
> only one column, we actually load all columns of A. So any place where we 
> explicitely or implicitely use A.* as the case in flatten, we would produce 
> the wrong results.
> The long term solution is actually to push projections into the load. Shorter 
> term the proposal is to notice if the script uses A.* and stick a project 
> after the load. Note that we don't need to do that if types are declared 
> because there will be already casting foreach there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to