[ https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13786430#comment-13786430 ]
Koji Noguchi commented on PIG-3492: ----------------------------------- We're only seeing this issue on complicate scripts with hundreds of lines. This is the shortest I got. This test needs to be called with '-t PushUpFilter'. {noformat} pig> cat test.pig A = load './test.txt' as (a:int, b:chararray); B = FOREACH A generate a; C = GROUP B by a; D = filter C by group > 0 and group < 100; E = FOREACH D { F = LIMIT B 1 ; GENERATE B.a as mya, FLATTEN(F.a) as setting; } G = FOREACH E GENERATE mya, setting as setting; dump G; {noformat} Relation G should contain two columns, 'mya' and 'setting'. But result only contains 1 column. {noformat} pig> cat test.txt 3 i 3 i 1 i 2 i 2 i 3 i pig> pig -x local -t PushUpFilter ./test.pig ({(1)}) ({(2),(2)}) ({(3),(3),(3)}) {noformat} By skipping ColumnMapKeyPrune or SplitFilter, you get a correct result of {noformat} pig> pig -x local -t PushUpFilter -t ColumnMapKeyPrune ./test.pig or pig> pig -x local -t PushUpFilter -t SplitFilter ./test.pig ({(1)},1) ({(2),(2)},2) ({(3),(3),(3)},3) {noformat} Explain would show that second column was cut off. {noformat} Incorrect case (-t PushUpFilter) G: (Name: LOStore Schema: mya#60:bag{#59:tuple(a#23:int)})ColumnPrune:InputUids=[63, 60]ColumnPrune:OutputUids=[63, 60] Correct case (-t PushUpFilter -t SplitFilter) G: (Name: LOStore Schema: mya#60:bag{#59:tuple(a#23:int)},setting#63:int)ColumnPrune:InputUids=[63, 60]ColumnPrune:OutputUids=[63, 60] {noformat} > ColumnPrune dropping used column due to > LogicalRelationalOperator.fixDuplicateUids changes not propagating > ---------------------------------------------------------------------------------------------------------- > > Key: PIG-3492 > URL: https://issues.apache.org/jira/browse/PIG-3492 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11.1, 0.12.1, 0.13.0 > Reporter: Koji Noguchi > > I don't have a testcase I can upload at the moment, but here's my observation. > SplitFilter -> schemaResetter -> LOGenerate.getSchema -> > LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID > is not propagated to the entire plan (since SplitFilter.reportChanges only > returns subplan). > As a result, I am seeing ColumnPruning cutting off those used columns. -- This message was sent by Atlassian JIRA (v6.1#6144)