[ https://issues.apache.org/jira/browse/AVRO-3527?focusedWorklogId=777936&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777936 ]
ASF GitHub Bot logged work on AVRO-3527: ---------------------------------------- Author: ASF GitHub Bot Created on: 03/Jun/22 05:32 Start Date: 03/Jun/22 05:32 Worklog Time Spent: 10m Work Description: steven-aerts opened a new pull request, #1708: URL: https://github.com/apache/avro/pull/1708 Update the compiler to generate the implementation of the `.equals()` and `.hashCode() function, instead of relying on the implementation of GenericData. This improves the performance of those functions significantly. The generated implementations are factor 10 to 20 faster for `.equals()` and a factor 5 to 10 for `.hashCode()`. The implementation generates the same hashCode as the genericData, which is validated by existing tests Result of Perf test before the change: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 12598610.194 +/- 11160265.279 ops/s SpecficTest.hashCode thrpt 3 24729446.862 +/- 29051332.794 ops/s ``` Results using generated functions: ``` Benchmark Mode Cnt Score Error Units SpecficTest.equals thrpt 3 211314296.950 +/- 104154793.126 ops/s SpecficTest.hashCode thrpt 3 180349506.632 +/- 143639246.771 ops/s ``` ### Jira - [x] My PR addresses the following: [AVRO-3527](https://issues.apache.org/jira/browse/AVRO-3527) Generated equals() and hashCode() for SpecificRecords ### Tests - [x] My PR adds the following unit tests: * TestUtf8#testHashCodeSameAsString() * TestGeneratedCode#ignoredFields() * JMH test for SpecificRecords `equals()` and `hashCode()` ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does Issue Time Tracking ------------------- Worklog Id: (was: 777936) Remaining Estimate: 0h Time Spent: 10m > Generated equals() and hashCode() for SpecificRecords > ----------------------------------------------------- > > Key: AVRO-3527 > URL: https://issues.apache.org/jira/browse/AVRO-3527 > Project: Apache Avro > Issue Type: Improvement > Components: java > Reporter: Steven Aerts > Priority: Major > Attachments: equals_hashcode_after.txt, equals_hashcode_before.txt, > flame_graph.jpeg > > Time Spent: 10m > Remaining Estimate: 0h > > When profiling our production system, we found that it was spending almost > 40% of its overall time in the {{SpecificRecordBase.hashCode()}} and > {{SpecificRecordBase.equals()}} implementations. > In some sections of its logic we see that almost all time is spend in those > function, as can be seen in attached flame graph (blue "pyramids") > !flame_graph.jpeg|width=385,height=99! > By generating the {{.equals()}} and {{.hashCode()}} all this overhead > disappeared and this application became 35% faster overall. > Also on other AVRO heavy applications we saw noticeable performance gains > where we hadn't expect them due to this improvement. > A generated implementation of {{.hashCode()}} becomes 5 to 10 times faster > than its generic counterpart. For {{.equals()}} it is 10 to 20 times faster. > Which is also visible in the attached JMH benchmarks. -- This message was sent by Atlassian Jira (v8.20.7#820007)