[ 
https://issues.apache.org/jira/browse/PIG-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890942#comment-13890942
 ] 

Koji Noguchi commented on PIG-3347:
-----------------------------------

bq. UID is to track column lineage so in logical optimizer, so that we can 
freely move operate up and down,  ProjectionPatcher will reposition the column 
according to uid

I think part of my confusion comes from these two.  UID is used for (1) 
tracking column lineage.  (2) UID is also used for ProjectionPatcher to 
reposition therefore requiring UID to be unique within each relation.

Because of (2), we're seeing new uid being created whenever column is 
referenced multiple times.
Like 
A = load 'a.txt' as (a:int);
B = foreach A generate a as col1, a as col2; 

This would create a schema like 
{noformat}
1-2: (Name: LOStore Schema: col1#1:int,col2#2:int)
...
    |---A: (Name: LOLoad Schema: a#1:int)RequiredFields:null
{noformat}

So without traversing the lineage, I cannot connect 'col2' to original 'a'.
However, optimizer like PushUpFilter&FilterAboveForeach seems to be using just 
UID to determine the field usages...

But this is outside of this jira.  I need to spend more time learning how the 
pig compiler works.

> Store invocation brings side effect
> -----------------------------------
>
>                 Key: PIG-3347
>                 URL: https://issues.apache.org/jira/browse/PIG-3347
>             Project: Pig
>          Issue Type: Bug
>          Components: grunt
>    Affects Versions: 0.11
>         Environment: local mode
>            Reporter: Sergey
>            Assignee: Daniel Dai
>            Priority: Critical
>             Fix For: 0.12.1
>
>         Attachments: PIG-3347-1.patch, PIG-3347-2-testonly.patch, 
> PIG-3347-3.patch, PIG-3347-4-testonly.patch
>
>
> The problem is that intermediate 'store' invocation "changes" the final store 
> output. Looks like it brings some kind of side effect. We did use 'local' 
> mode to run script
> here is the input data:
> 1
> 1
> Here is the script:
> {code}
> a = load 'test';
> a_group = group a by $0;
> b = foreach a_group {
>   a_distinct = distinct a.$0;
>   generate group, a_distinct;
> }
> --store b into 'b';
> c = filter b by SIZE(a_distinct) == 1;
> store c into 'out';
> {code}
> We expect output to be:
> 1 1
> The output is empty file.
> Uncomment {code}--store b into 'b';{code} line and see the diffrence.
> Yuo would get expected output.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to