[ https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037483#comment-13037483 ]
ey-chih chow commented on AVRO-792: ----------------------------------- Sorry, I found that there is no data sent to the reducer for the above job. So I still need to do more investigation. > map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException > -------------------------------------------------------------------- > > Key: AVRO-792 > URL: https://issues.apache.org/jira/browse/AVRO-792 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.5.0, 1.5.1 > Environment: Mac with VMWare running Linux training-vm-Ubuntu > Reporter: ey-chih chow > Priority: Blocker > Fix For: 1.5.2 > > Attachments: AVRO-792-2.patch, AVRO-792-3.patch, AVRO-792.patch, > part-00000.avro, part-00000.avro, part-00001.avro, part-00001.avro > > Original Estimate: 504h > Remaining Estimate: 504h > > We have an avro map/reduce job used to be working with avro 1.4, but broken > with avro 1.5. The M/R job with avro 1.5 worked fine under our debugging > environment, but broken when we moved to a real cluster. At one instance f > testing, the job had 23 reducers. Four of them succeeded and the rest failed > because of the ArrayIndexOutOfBoundsException generated. Here are two > instances of the stack traces: > ================================================================================= > java.lang.ArrayIndexOutOfBoundsException: -1576799025 > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) > at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) > at > org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86) > at > org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68) > at > org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136) > at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076) > at > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246) > at > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242) > at > org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47) > at > com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46) > at > com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1) > at > org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60) > at > org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > at org.apache.hadoop.mapred.Child$4.run(Child.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:234) > ===================================================================================================== > java.lang.ArrayIndexOutOfBoundsException: 40 > at > org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) > at > org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) > at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) > at > org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) > at > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138) > at > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129) > at > org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86) > at > org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68) > at > org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136) > at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076) > at > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246) > at > org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242) > at > org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47) > at > com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74) > at > com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1) > at > org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60) > at > org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > at org.apache.hadoop.mapred.Child$4.run(Child.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:234) > ===================================================================================================== > The signature of our map() is: > public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> > collector, Reporter reporter) throws IOException; > and reduce() is: > public void reduce(Utf8 key, Iterable<GenericRecord> values, > AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException; > All the GenericRecords are of the same schema. > There are many changes in the area of serialization/de-serailization between > avro 1.4 and 1.5, but could not figure out why the exceptions were generated. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira