[ 
https://issues.apache.org/jira/browse/AVRO-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015590#comment-13015590
 ] 

Scott Carey commented on AVRO-792:
----------------------------------

-1

This patch drops performance of GenericDatumReader by a factor of 3 to 5.  The 
portion of time that is in GenericDatumReader and not BinaryDecoder has 
increased by up to a factor of 8.

Perf.java results with JVM args: -Xmx256 -Xms256 -server -XX:+UseParallelGC 
-XX:+UseCompressedOops
and params: -nowrite -generic -generic-onetime

before:
{code}
Executing tests: 
[GenericTest, GenericNested, GenericNestedFake, GenericWithDefault, 
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse, 
GenericOneTimeReaderUse, GenericOneTimeUse]
 readTests:true
 writeTests:false
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  
bytes/cycle
                  GenericRead:   5054 ms       3.297       127.953        808498
           GenericNested_Read:   7701 ms       2.164        83.980        808498
       GenericNestedFake_Read:   2869 ms       5.809       225.433        808498
      GenericWithDefault_Read:   8993 ms       1.853        71.922        808498
   GenericWithOutOfOrder_Read:   5051 ms       3.300       128.050        808498
    GenericWithPromotion_Read:   5210 ms       3.199       124.132        808498
GenericOneTimeDecoderUse_Read:   5060 ms       3.293       127.804        808498
 GenericOneTimeReaderUse_Read:   6433 ms       2.590       100.531        808498
       GenericOneTimeUse_Read:   6306 ms       2.643       102.558        808498
{code}

After:

{code}
Executing tests: 
[GenericTest, GenericNested, GenericNestedFake, GenericWithDefault, 
GenericWithOutOfOrder, GenericWithPromotion, GenericOneTimeDecoderUse, 
GenericOneTimeReaderUse, GenericOneTimeUse]
 readTests:true
 writeTests:false
 cycles=800
                    test name     time    M entries/sec   M bytes/sec  
bytes/cycle
                  GenericRead:  18918 ms       0.881        34.189        808498
           GenericNested_Read:  21704 ms       0.768        29.800        808498
       GenericNestedFake_Read:   2899 ms       5.747       223.037        808498
      GenericWithDefault_Read:  25627 ms       0.650        25.238        808498
   GenericWithOutOfOrder_Read:  21413 ms       0.778        30.206        808498
    GenericWithPromotion_Read:  21254 ms       0.784        30.432        808498
GenericOneTimeDecoderUse_Read:  18831 ms       0.885        34.346        808498
 GenericOneTimeReaderUse_Read:  18650 ms       0.894        34.680        808498
       GenericOneTimeUse_Read:  18698 ms       0.891        34.592        808498
{code}

I got a few dozen stack traces while the latter was executing, and the most 
common stack traces are:

{noformat}
"main" prio=5 tid=103800800 nid=0x100501000 runnable [100500000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.avro.io.Perf$GenericTest.getReader(Perf.java:1087)
        at org.apache.avro.io.Perf$GenericTest.readInternal(Perf.java:1110)
        at org.apache.avro.io.Perf$BasicTest.readTest(Perf.java:313)
        at org.apache.avro.io.Perf.main(Perf.java:210)

"main" prio=5 tid=103800800 nid=0x100501000 runnable [100500000]
   java.lang.Thread.State: RUNNABLE
        at java.util.Arrays.copyOf(Arrays.java:2734)
        at org.apache.avro.io.parsing.Parser.expandStack(Parser.java:66)
        at org.apache.avro.io.parsing.Parser.pushProduction(Parser.java:151)
        at org.apache.avro.io.parsing.Parser.advance(Parser.java:100)
        at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:144)
        at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:126)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:102)
        at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:93)
        at org.apache.avro.io.Perf$GenericTest.readInternal(Perf.java:1110)
        at org.apache.avro.io.Perf$BasicTest.readTest(Perf.java:313)
        at org.apache.avro.io.Perf.main(Perf.java:210)
{noformat}

> map reduce job for avro 1.5 generates ArrayIndexOutOfBoundsException
> --------------------------------------------------------------------
>
>                 Key: AVRO-792
>                 URL: https://issues.apache.org/jira/browse/AVRO-792
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.5.0
>         Environment: Mac with VMWare running Linux training-vm-Ubuntu
>            Reporter: ey-chih chow
>            Assignee: Thiruvalluvan M. G.
>            Priority: Blocker
>             Fix For: 1.5.1
>
>         Attachments: AVRO-792.patch
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We have an avro map/reduce job used to be working with avro 1.4, but broken 
> with avro 1.5.  The M/R job with avro 1.5 worked fine under our debugging 
> environment, but broken when we moved to a real cluster.  At one instance f 
> testing, the job had 23 reducers.  Four of them succeeded and the rest failed 
> because of the ArrayIndexOutOfBoundsException generated.  Here are two 
> instances of the stack traces:
> =================================================================================
> java.lang.ArrayIndexOutOfBoundsException: -1576799025
>       at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>       at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>       at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:232)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
>       at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
>       at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
>       at 
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
>       at 
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:46)
>       at 
> com.ngmoco.ngpipes.etl.NgEventETLReducer.reduce(NgEventETLReducer.java:1)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> java.lang.ArrayIndexOutOfBoundsException: 40
>       at 
> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>       at 
> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>       at 
> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
>       at 
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
>       at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:86)
>       at 
> org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer.deserialize(AvroSerialization.java:68)
>       at 
> org.apache.hadoop.mapred.Task$ValuesIterator.readNextValue(Task.java:1136)
>       at org.apache.hadoop.mapred.Task$ValuesIterator.next(Task.java:1076)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.moveToNext(ReduceTask.java:246)
>       at 
> org.apache.hadoop.mapred.ReduceTask$ReduceValuesIterator.next(ReduceTask.java:242)
>       at 
> org.apache.avro.mapred.HadoopReducerBase$ReduceIterable.next(HadoopReducerBase.java:47)
>       at 
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:74)
>       at 
> com.ngmoco.ngpipes.sourcing.sessions.NgSessionReducer.reduce(NgSessionReducer.java:1)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:60)
>       at 
> org.apache.avro.mapred.HadoopReducerBase.reduce(HadoopReducerBase.java:30)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>       at org.apache.hadoop.mapred.Child.main(Child.java:234)
> =====================================================================================================
> The signature of our map() is:
> public void map(Utf8 input, AvroCollector<Pair<Utf8, GenericRecord>> 
> collector, Reporter reporter) throws IOException;
> and reduce() is:
> public void reduce(Utf8 key, Iterable<GenericRecord> values, 
> AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException;
> All the GenericRecords are of the same schema.
> There are many changes in the area of serialization/de-serailization between 
> avro 1.4 and 1.5, but could not figure out why the exceptions were generated. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to