[ 
https://issues.apache.org/jira/browse/PIG-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13224076#comment-13224076
 ] 

Daniel Dai commented on PIG-2563:
---------------------------------

bq. Cheap code style comments
sure will change

bq. More expensive code content comments
Not sure if I completely understand your point, let me explain the design of 
foreach nested plan and why I make the change. Let me know if you need further 
explanation. Uid and schema inference process is very core to logical plan. If 
one changes anywhere in the process, he needs to make sure the existing 
functionality is not broken. In the patch, I change the way project infer its 
uid, because earlier, it does not generate new uid for the new bag after nested 
foreach. Here is how uid for foreach inner plan works:
# every foreach statement starts with LOInnerLoad, ends with LOGenerate
# simple foreach should keep uid, eg: foreach a generate $1, $2, we shall keep 
the uid for $1, $2, even if it is a bag column, there are couple of places make 
this assumption
# if input column is a bag, LOInnerLoad take the schema of its inner schema, 
eg, if $1 is bag#2{t#3(x#4, y#5)}, LOInnerLoad will have the schema (x#4, y#5), 
it can be followed with nested operator
# LOGenerate regenerates the bag after the inner operator pipeline, in this 
case, bag#2{t#3(x#4, y#5)}, we need to keep uid
# currently all nested operator does not change uid, except ForEach, that is 
the approach I took in the patch: unless see a ForEach, reuse uid

Here are complete examples:
{code}
b = foreach a generate a1, a2; (a0:xxxx, a1:chararray#1, a2:bag#2{t#3(x#4, 
y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                    LOGenerate(a1:chararray#1, a2:bag#2{t#3(x#4, y#5)})
{code}

{code}
b = foreach a { c = filter a2 by x==1;generate a1, c; }; (a0:xxxx, 
a1:chararray#1, a2:bag#2{t#3(x#4, y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                     \        LOFilter(x#4, y#5)
                      \        /
                    LOGenerate(a1:chararray#1, c:bag#2{t#3(x#4, y#5)})
{code}

{code}
b = foreach a { c = a2.x;generate a1, c; }; (a0:xxxx, a1:chararray#1, 
a2:bag#2{t#3(x#4, y#5)})

LOInnerLoad(a1:chararray)     LOInnerLoad(x#4, y#5)
                    \            /
                     \        LOForEach(x#4)
                      \        /
                    LOGenerate(a1:chararray#1, c:bag#7{t#6(x#4)})
{code}
                
> IndexOutOfBoundsException: while projecting fields from a bag
> -------------------------------------------------------------
>
>                 Key: PIG-2563
>                 URL: https://issues.apache.org/jira/browse/PIG-2563
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.9.1, 0.10
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>             Fix For: 0.10, 0.11
>
>         Attachments: PIG-2563-1.patch
>
>
> The below script fails with Pig 0.9 / Pig 0.10 but works fine for Pig 0.8.
> {code}
> A = load 'i1' as (a,b,c:chararray);
> B = load 'i2' as (d,e,f:chararray);
> C = cogroup A by a, B by d;
> D = foreach C { 
>   tmp = B.d;
>   tmp_dis = distinct tmp;
>   generate A,B,tmp_dis ; } ;
> E = foreach D generate B.(d,e) as v;
> dump E;
> {code}
> The script fails with the below exception. Looks like DereferenceExpression 
> is using wrong schema to build inner schema.
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
>       at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>       at java.util.ArrayList.get(ArrayList.java:322)
>       at 
> org.apache.pig.newplan.logical.relational.LogicalSchema.getField(LogicalSchema.java:653)
>       at 
> org.apache.pig.newplan.logical.expression.DereferenceExpression.getFieldSchema(DereferenceExpression.java:167)
>       at 
> org.apache.pig.newplan.logical.relational.LOGenerate.getSchema(LOGenerate.java:88)
>       at 
> org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:160)
>       at 
> org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:242)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to