[ https://issues.apache.org/jira/browse/AVRO-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698788#comment-16698788 ]
ASF GitHub Bot commented on AVRO-2247: -------------------------------------- unchuckable opened a new pull request #391: AVRO-2247 - improved java reading performance with new reader URL: https://github.com/apache/avro/pull/391 Cannot reopen the original PR (#354), since I've rebased to current master. I've tried to adress the points that @rstata brought up with my approach. The feature switch between traditional and newly suggested reader mechanism now is done inside `GenericDatumReader`. All tests provided with the avro project run smoothly (I stole @rstata's idea to trigger the tests an additional time with the feature switch enabled). Also fixed defaulting in a way that takes advantage of immutable values and only actually re-reads default objects with a distinct decoder when really required. If there is any more things that would need testing, please do give me a pointer. Overall, the newly proposed writer sacrifices time building a `DatumReader`, allowing it to perform the actual reading at a highly improved rate. For all applications that are remotely "big data", that tradeoff should turn out highly beneficial. I also included a small module (`benchmark`) that uses JMH to test the performance of the proposed reader approach against the current generic reader. Using JMH should be preferable to Perf.java, for it allows to perform benchmarks in a controlled and statistical significant way. As stated in the last PR, I'm open to any changes, fire ahead. It's the overall concept and its aparent reader performance gains that I'm chasing after, not having my implementation find its way into the main branch 1:1. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve Java reading performance with a new reader > -------------------------------------------------- > > Key: AVRO-2247 > URL: https://issues.apache.org/jira/browse/AVRO-2247 > Project: Apache Avro > Issue Type: Improvement > Components: java > Reporter: Martin Jubelgas > Priority: Major > Fix For: 1.9.0 > > Attachments: Perf-Comparison.md > > > Complementary to AVRO-2090, I have been working on decoding of Avro objects > in Java and am suggesting a new implementation of a DatumReader that improves > read performance for both generic and specific records by approximately 20% > (and even more in cases of nested objects with defaults, a case I encounter a > lot in practical use). > Key concept is to create a detailed execution plan once at DatumReader. This > execution plan contains all required defaulting/lookup values so they need > not be looked up during object traversal while reading. > The reader implementation can be enabled and disabled per GenericData > instance. The system default is set via the system variable > "org.apache.avro.fastread" (defaults to "false"). > Attached a performance comparison of the existing implementation with the > proposed one. Will open a pull request with respective code in a bit (not > including interoperability with the optimizations of AVRO-2090 yet). Please > let me know your opinion of whether this is worth pursuing further. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)