[ 
https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016669#comment-13016669
 ] 

Scott Carey commented on AVRO-792:
----------------------------------

The performance changes for the single-threaded use case is acceptable now.   
There is no difference for all use cases that re-use the DatumReader.  For 
those that create a new GenericDatumReader for each read, there is a 5% 
decrease.

However, Perf.java does not test multi-threaded.  A thread using 
GenericDatumReader that did not create it is not so lucky.

I added a new test to Perf.java for this case.  This case is still quite slow.
The code for that is below:
{code}
  static class GenericThreadCreateReader extends GenericTest {
    public GenericThreadCreateReader() throws IOException {
      super("GenericThreadCreateReader_");
      isWriteTest = false;
    }
    GenericDatumReader<Object> r = null;
    
    @Override
    protected GenericDatumReader<Object> getReader() {
      Thread t = new Thread() {
        @Override
        public void run() {
          r = newReader();
        }
      };
      t.run();
      try {
        t.join();
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
      return r;
    }
  }
{code}

This case may not be very important for now.
I plan to propose some changes to the DatumReaders for performance and thread 
safety for the 1.6.0 timeframe that will avoid all this by making them 
immutable.  setSchema() makes things very complicated.

Lastly, this contains an API change, making a public method private.  This is 
probably a good thing here -- the method in question was one I did not know 
what to do with previously and was a bit opaque for public consumption.  This 
might not be acceptable for 1.5.1 and need to remain public -- though we can 
mark it deprecated and plan to hide it for 1.6.0.


> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
>                 Key: AVRO-792
>                 URL: https://issues.apache.org/jira/browse/AVRO-792
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>         Environment: Mac with VMWare running Linux training-vm-Ubuntu
>            Reporter: ey-chih chow
>            Assignee: Thiruvalluvan M. G.
>            Priority: Blocker
>             Fix For: 1.5.1
>
>         Attachments: AVRO-792-2.patch, AVRO-792.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken 
> with avro 1.5.  The M/R job with avro 1.5 worked fine under our debugging 
> environment, but broken when we moved to a real cluster.  At one instance f 
> testing, the job had 23 reducers.  Four of them succeeded and the rest failed 
> because of the ArrayIndexOutOfBoundsException generated.  Here are two 
> instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
>       at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>       at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>       at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
>       at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
>       at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
>       at 
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
>       at 
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
>       at 
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
>       at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>       at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>       at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
>       at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
>       at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
>       at 
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
>       at 
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
>       at 
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> 
> collector, Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values, 
> AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between 
> avro 1.4 and 1.5, but could not figure out why the exceptions were generated. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to