[ https://issues.apache.org/jira/browse/PIG-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anthony Hsu updated PIG-3972: ----------------------------- Description: {code:title=test1.txt} {(,A1111,A),(,B222,B),(,C333,C)} {code} {code:title=test2.txt} A Helloworld B Pig C Hive {code} {code:title=tt.pig} A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)}); A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray); B = LOAD 'test2.txt' AS (id:chararray, content:chararray); C = JOIN A1 BY id LEFT OUTER, B BY id; dump C; {code} {code} $ pig -x local -f tt.pig .... java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.get(ArrayList.java:382) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441) [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu re if you want Pig to stop immediately on failure. ============== {code} If we change the code into this one: {code:title=tt.pig} A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)}); A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray); A1 = FOREACH A1 generate title,name,id; B = LOAD 'test2.txt' AS (id:chararray, content:chararray); C = JOIN A1 BY id LEFT OUTER, B BY id; dump C; {code} The job succeed, and here is the result of execution. {code} ======== (,A1111,A,A,Helloworld) (,B222,B,B,Pig) (,C333,C,C,Hive) (,,,,) ======== {code} was: $ cat test1.txt {(,A1111,A),(,B222,B),(,C333,C)} $ cat test2.txt A Helloworld B Pig C Hive $ cat tt.pig A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)}); A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray); B = LOAD 'test2.txt' AS (id:chararray, content:chararray); C = JOIN A1 BY id LEFT OUTER, B BY id; dump C; $ pig -x local -f tt.pig .... java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:604) at java.util.ArrayList.get(ArrayList.java:382) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441) [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failu re if you want Pig to stop immediately on failure. ============== If we change the code into this one: $ cat tt.pig A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, id:chararray)}); A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, id:chararray); A1 = FOREACH A1 generate title,name,id; B = LOAD 'test2.txt' AS (id:chararray, content:chararray); C = JOIN A1 BY id LEFT OUTER, B BY id; dump C; The job succeed, and here is the result of execution. ======== (,A1111,A,A,Helloworld) (,B222,B,B,Pig) (,C333,C,C,Hive) (,,,,) ======== > java.lang.IndexOutOfBoundsException when flatten meets an empty row > ------------------------------------------------------------------- > > Key: PIG-3972 > URL: https://issues.apache.org/jira/browse/PIG-3972 > Project: Pig > Issue Type: Bug > Affects Versions: 0.12.0, 0.13.0 > Reporter: Bing Jiang > Assignee: Mona Chitnis > > {code:title=test1.txt} > {(,A1111,A),(,B222,B),(,C333,C)} > {code} > {code:title=test2.txt} > A Helloworld > B Pig > C Hive > {code} > {code:title=tt.pig} > A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, > id:chararray)}); > A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, > id:chararray); > B = LOAD 'test2.txt' AS (id:chararray, content:chararray); > C = JOIN A1 BY id LEFT OUTER, B BY id; > dump C; > {code} > {code} > $ pig -x local -f tt.pig > .... > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:604) > at java.util.ArrayList.get(ArrayList.java:382) > at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:115) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getValueTuple(POPackage.java:350) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:273) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.processOnePackageOutput(PigGenericMapReduce.java:425) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:416) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.reduce(PigGenericMapReduce.java:256) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:170) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:636) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:396) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:441) > [main] WARN > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Ooops! Some job has failed! Specify -stop_on_failu > re if you want Pig to stop immediately on failure. > ============== > {code} > If we change the code into this one: > {code:title=tt.pig} > A = LOAD 'test1.txt' AS (mybag:bag{t:tuple(title:chararray,name:chararray, > id:chararray)}); > A1 = FOREACH A generate flatten(mybag) as (title:chararray,name:chararray, > id:chararray); > A1 = FOREACH A1 generate title,name,id; > B = LOAD 'test2.txt' AS (id:chararray, content:chararray); > C = JOIN A1 BY id LEFT OUTER, B BY id; > dump C; > {code} > The job succeed, and here is the result of execution. > {code} > ======== > (,A1111,A,A,Helloworld) > (,B222,B,B,Pig) > (,C333,C,C,Hive) > (,,,,) > ======== > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)