[ 
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786430#comment-13786430
 ] 

Koji Noguchi commented on PIG-3492:
-----------------------------------

We're only seeing this issue on complicate scripts with hundreds of lines.
This is the shortest I got.  This test needs to be called with '-t 
PushUpFilter'.
{noformat}
pig> cat test.pig
A = load './test.txt' as (a:int, b:chararray);
B = FOREACH A generate a;
C = GROUP B by a;
D = filter C by group > 0 and group < 100;
E = FOREACH D {
         F = LIMIT B 1 ;
         GENERATE B.a as mya, FLATTEN(F.a) as setting;
    }
G = FOREACH E GENERATE mya, setting as setting;
dump G;
{noformat}
Relation G should contain two columns, 'mya' and 'setting'.  But result only 
contains 1 column.

{noformat}
pig> cat test.txt
3       i
3       i
1       i
2       i
2       i
3       i
pig> pig -x local  -t PushUpFilter ./test.pig
({(1)})
({(2),(2)})
({(3),(3),(3)})
{noformat}

By skipping ColumnMapKeyPrune or SplitFilter, you get a correct result of 
{noformat}
pig> pig -x local  -t PushUpFilter -t ColumnMapKeyPrune ./test.pig
or
pig> pig -x local  -t PushUpFilter -t SplitFilter  ./test.pig
({(1)},1)
({(2),(2)},2)
({(3),(3),(3)},3)
{noformat}

Explain would show that second column was cut off.
{noformat}
Incorrect case (-t PushUpFilter)
G: (Name: LOStore Schema: 
mya#60:bag{#59:tuple(a#23:int)})ColumnPrune:InputUids=[63, 
60]ColumnPrune:OutputUids=[63, 60]
Correct case (-t PushUpFilter -t SplitFilter)
G: (Name: LOStore Schema: 
mya#60:bag{#59:tuple(a#23:int)},setting#63:int)ColumnPrune:InputUids=[63, 
60]ColumnPrune:OutputUids=[63, 60]
{noformat}


> ColumnPrune dropping used column due to 
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3492
>                 URL: https://issues.apache.org/jira/browse/PIG-3492
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1, 0.12.1, 0.13.0
>            Reporter: Koji Noguchi
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema -> 
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID 
> is not propagated to the entire plan (since SplitFilter.reportChanges only 
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to