We ran into what looks like some edge case bug in Pig, which causes it to throw an IndexOutOfBoundsException (stack trace below). The script just joins two relations; it looks like our data was generated incorrectly, and the join is empty, which may be what's causing the failure. It also appears to only happen when at least one of the inputs is on the large size (at least a few hundred megs). Any ideas on what could be happening and how to zoom in on the underlying cause? We are running off unmodified trunk.
Script: register datagen.jar; E = load 'Employee' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,name,cc,dc); D = load 'Department' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (dept_id,dept_nm); P = load 'Project' using org.apache.pig.test.utils.datagen.PigPerformanceLoader() as (id,emp_id,role); R1 = JOIN E by dc, D by dept_id; R2 = JOIN R1 by E::id, P by emp_id; store R2 into 'TestCase2Output'; R2 join fails with the stack trace below. It also fails if we pre-calculate R1, store it, and load it directly (so, load R1, load P, join R1 by $0, P by emp_id). We've verified that the records in R1 and R2 have the expected fields, etc. Stack Trace: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170)