[ https://issues.apache.org/jira/browse/PIG-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914749#comment-13914749 ]
Koji Noguchi commented on PIG-3782: ----------------------------------- This error is happening when PushDownForEachFlatten inserts a FOREACH after 'd=join' to move the flatten after the join for optimization. Somehow, this new foreach is containing completely new UIDs for q1 and q2. You can see below that new foreach has q1#25 and q2#26 instead of q1#13 and q2#14 that are later used. This breaks the linage tracking of ColumnMapKeyPrune. BEFORE PushDownForEachFlatten {noformat} |---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray) | | | (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray) | | | | | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*)) | | | | | c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*)) | | | | | c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: c::a0#1:int) | | | |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray) | | | |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray) | |---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray,b::b0#7:int,b::b1#8:bytearray) | | | a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0) | | | b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0) | |---c: (Name: LOForEach Schema: a0#1:int,q1#13:bytearray,q2#14:bytearray) {noformat} After PushDownForEachFlatten {noformat} |---e: (Name: LOForEach Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray) | | | (Name: LOGenerate[false,false,false] Schema: c::a0#1:int,c::q1#13:bytearray,c::q2#14:bytearray) | | | | | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*)) | | | | | c::q1:(Name: Project Type: bytearray Uid: 13 Input: 1 Column: (*)) | | | | | c::q2:(Name: Project Type: bytearray Uid: 14 Input: 2 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: c::a0#1:int) | | | |---(Name: LOInnerLoad[1] Schema: c::q1#13:bytearray) | | | |---(Name: LOInnerLoad[2] Schema: c::q2#14:bytearray) | |---d: (Name: LOForEach Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray) | | | (Name: LOGenerate[false,true,false,false] Schema: c::a0#1:int,q1#25:bytearray,q2#26:bytearray,b::b0#7:int,b::b1#8:bytearray) | | | | | c::a0:(Name: Project Type: int Uid: 1 Input: 0 Column: (*)) | | | | | c::a2:(Name: Project Type: bag Uid: 3 Input: 1 Column: (*)) | | | | | b::b0:(Name: Project Type: int Uid: 7 Input: 2 Column: (*)) | | | | | b::b1:(Name: Project Type: bytearray Uid: 8 Input: 3 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: c::a0#1:int) | | | |---c::a2: (Name: LOInnerLoad[1] Schema: null) | | | |---(Name: LOInnerLoad[2] Schema: b::b0#7:int) | | | |---(Name: LOInnerLoad[3] Schema: b::b1#8:bytearray) | |---d: (Name: LOJoin(HASH) Schema: c::a0#1:int,c::a2#3:bag{#4:tuple()},b::b0#7:int,b::b1#8:bytearray) | | | a0:(Name: Project Type: int Uid: 1 Input: 0 Column: 0) | | | b0:(Name: Project Type: int Uid: 7 Input: 1 Column: 0) | |---c: (Name: LOForEach Schema: a0#1:int,a2#3:bag{#4:tuple()}) {noformat} > PushDownForEachFlatten + ColumnMapKeyPrune with user defined schema failing > due to incorrect UID assignment > ----------------------------------------------------------------------------------------------------------- > > Key: PIG-3782 > URL: https://issues.apache.org/jira/browse/PIG-3782 > Project: Pig > Issue Type: Bug > Reporter: Koji Noguchi > Assignee: Koji Noguchi > > {noformat} > a = load '1.txt' as (a0:int, a1, a2:bag{}); > b = load '2.txt' as (b0:int, b1); > c = foreach a generate a0, flatten(a2) as (q1, q2); > d = join c by a0, b by b0; > e = foreach d generate a0, q1, q2; > f = foreach e generate a0, (int)q1, (int)q2; > store f into 'output'; > {noformat} > This pig script fails with > 2014-02-27 11:49:45,657 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 2229: Couldn't find matching uid -1 for project (Name: Project Type: > bytearray Uid: 13 Input: 0 Column: 1) -- This message was sent by Atlassian JIRA (v6.1.5#6160)