[ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487965#comment-13487965
 ] 

Rohini Palaniswamy commented on PIG-2970:
-----------------------------------------

Gianmarco,
 Thanks for the pointer. They are related. The plan for your example in 
PIG-2219 gets generated as 

{noformat}
a = load 'b.txt' AS (id:chararray, num:int);
b = group a by id;
c = foreach b { 
  d = order a by num DESC;
  n = COUNT(a);
  e = limit d 1;
  generate n;
}

|---c: (Name: LOForEach Schema: id#1:chararray,num#2:int)
    |   |
    |   e: (Name: LOLimit Schema: id#1:chararray,num#2:int)
    |   |
    |   |---d: (Name: LOSort Schema: id#1:chararray,num#2:int)
    |       |   |
    |       |   num:(Name: Project Type: int Uid: 2 Input: 0 Column: 1)
    |       |
    |       |---a: (Name: LOInnerLoad[1] Schema: id#1:chararray,num#2:int)
    |   
    |   (Name: LOGenerate[false] Schema: #6:long)
    |   |   |
    |   |   (Name: UserFunc(org.apache.pig.builtin.COUNT) Type: long Uid: 6)
    |   |   |
    |   |   |---a:(Name: Project Type: bag Uid: 3 Input: 0 Column: (*))
    |   |
    |   |---a: (Name: LOInnerLoad[1] Schema: id#1:chararray,num#2:int)

For the query in this example:
|---c: (Name: LOForEach Schema: v1#2:bytearray)
        |   |
        |   e: (Name: LODistinct Schema: v1#2:bytearray)
        |   |
        |   |---d: (Name: LOForEach Schema: v1#2:bytearray)
        |       |   |
        |       |   (Name: LOGenerate[false] Schema: v1#2:bytearray)
        |       |   |   |
        |       |   |   v1:(Name: Project Type: bytearray Uid: 2 Input: 0 
Column: (*))
        |       |   |
        |       |   |---(Name: LOInnerLoad[1] Schema: v1#2:bytearray)
        |       |
        |       |---a: (Name: LOInnerLoad[1] Schema: 
id#1:bytearray,v1#2:bytearray)
        |   
        |   (Name: LOGenerate[false,false] Schema: 
group#1:bytearray,a#3:bag{#5:tuple(id#1:bytearray,v1#2:bytearray)})
        |   |   |
        |   |   group:(Name: Project Type: bytearray Uid: 1 Input: 0 Column: 
(*))
        |   |   |
        |   |   a:(Name: Project Type: bag Uid: 3 Input: 1 Column: (*))
        |   |
        |   |---(Name: LOInnerLoad[0] Schema: group#1:bytearray)
        |   |
        |   |---a: (Name: LOInnerLoad[1] Schema: id#1:bytearray,v1#2:bytearray)


{noformat}

The problem is Schema for ForEach gets set based on the first leaf. In your 
case, Schema for both the leaves contained the same required fields and so 
there was no error. In Koji's case the schema is different for both the leaves 
and hence the error. 

Koji,
   Connecting both the separate nodes just to get the schema correct changes 
the schema in such a way that it is not dangling anymore. My take on this is 
that we should move the DanglingNestedNodeRemover (which was wrote to handle 
this scenario) from HExecutionEngine to LogicalPlanBuilder.buildForeachOp() 
before the SchemaResetter is called through expandAndResetVisitor, so that the 
dangling node is removed during construction itself and the correct schema is 
set by SchemaResetter. Thoughts?
                
> Nested foreach getting incorrect schema when having unrelated inner query
> -------------------------------------------------------------------------
>
>                 Key: PIG-2970
>                 URL: https://issues.apache.org/jira/browse/PIG-2970
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.10.0
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>             Fix For: 0.11, 0.12
>
>         Attachments: pig-2970-trunk-v01.txt
>
>
> While looking at PIG-2968, hit a weird error message.
> {noformat}
> $ cat -n test/foreach2.pig
>      1  daily = load 'nyse' as (exchange, symbol);
>      2  grpd = group daily by exchange;
>      3  unique = foreach grpd {
>      4          sym = daily.symbol;
>      5          uniq_sym = distinct sym;
>      6          --ignoring uniq_sym result
>      7          generate group, daily;
>      8  };
>      9  describe unique;
>     10  zzz = foreach unique generate group;
>     11  explain zzz;
> % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
> ...
> unique: {symbol: bytearray}
> 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1025: 
> <file test/foreach2.pig, line 10, column 30> Invalid field projection. 
> Projected field [group] does not exist in schema: symbol:bytearray.
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to