Yes and yes.. in any case, the latest from SVN doesn't have this issue. Guessing it was 921 that did it.
-D On Tue, Oct 13, 2009 at 4:01 PM, Alan Gates <ga...@yahoo-inc.com> wrote: > Have you checked that each record your input data has at least the number of > fields you specify? Have you checked that the field separator in your data > matches the default for PigPerformanceLoader (^A I think)? > > Alan. > > On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote: > >> We ran into what looks like some edge case bug in Pig, which causes it >> to throw an IndexOutOfBoundsException (stack trace below). The script >> just joins two relations; it looks like our data was generated >> incorrectly, and the join is empty, which may be what's causing the >> failure. It also appears to only happen when at least one of the >> inputs is on the large size (at least a few hundred megs). Any ideas >> on what could be happening and how to zoom in on the underlying cause? >> We are running off unmodified trunk. >> >> Script: >> >> register datagen.jar; >> E = load 'Employee' using >> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as >> (id,name,cc,dc); >> D = load 'Department' using >> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as >> (dept_id,dept_nm); >> P = load 'Project' using >> org.apache.pig.test.utils.datagen.PigPerformanceLoader() as >> (id,emp_id,role); >> R1 = JOIN E by dc, D by dept_id; >> R2 = JOIN R1 by E::id, P by emp_id; >> store R2 into 'TestCase2Output'; >> >> R2 join fails with the stack trace below. It also fails if we >> pre-calculate R1, store it, and load it directly (so, load R1, load P, >> join R1 by $0, P by emp_id). We've verified that the records in R1 and >> R2 have the expected fields, etc. >> >> >> Stack Trace: >> >> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 >> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >> at java.util.ArrayList.get(ArrayList.java:322) >> at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:148) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:226) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:260) >> at >> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) > >