[
https://issues.apache.org/jira/browse/AVRO-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863703#comment-13863703
]
Doug Cutting commented on AVRO-1428:
------------------------------------
Your patch does not apply cleanly. Can you please provide a proper patch file,
generated with 'svn diff'? Thanks!
Also, can you please provide your benchmark program?
> Schema.computeHash() to add if check to avoid unnecessary hashcode computation
> ------------------------------------------------------------------------------
>
> Key: AVRO-1428
> URL: https://issues.apache.org/jira/browse/AVRO-1428
> Project: Avro
> Issue Type: Improvement
> Components: java
> Reporter: Tie Liu
> Attachments: AVRO-1428.patch
>
>
> In current Schma.java we have following implementation:
> public final int hashCode() {
> if (hashCode == NO_HASHCODE)
> hashCode = computeHash();
> return hashCode;
> }
> int computeHash() { return getType().hashCode() + props.hashCode(); }
> While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the
> computeHash method is not. But the computeHash method is being called from
> Schema$Field.hashCode and the subclasses hashCode implementation like
> following:
> public int hashCode() { return name.hashCode() + schema.computeHash(); }
> //this is from Schema$Field class
> This is causing the the calculation of hashCode getting called
> unnecessarily extensively. The proposed changed is to add the "if" check
> inside the computeHash method instead:
> int computeHash()
> {
> if (hashCode == NO_HASHCODE)
> {
> hashCode = getType().hashCode() + props.hashCode();
> }
> return hashCode;
> }
> We did a simple test to compare the performance difference, below is a
> summary of the heap snapshot of comparing the difference:
> As a test I wrote a small program that creates a HashMap<Schema.Field,
> Integer>() and enters a loop simply identifying whether various Schema.Field
> instances are keys in the map. Obviously this is a pathological test case,
> but when running with the current implementation of Schema.Field it has (in
> about 30 seconds) used up nearly 8 GBytes of heap in instantiating
> intermediate objects associated with calling Schema.computeHash():
> Heap
> PSYoungGen total 17432576K, used 8666481K [0x0000000340000000,
> 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 58% used
> [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000)
> from space 2490368K, 0% used
> [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used
> [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000,
> 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used
> [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5782K [0x00000002fae00000,
> 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used
> [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000)
> When running with the modified implementation (and no other change) all the
> object allocation vanishes:
> Heap
> PSYoungGen total 17432576K, used 896532K [0x0000000340000000,
> 0x0000000800000000, 0x0000000800000000)
> eden space 14942208K, 6% used
> [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000)
> from space 2490368K, 0% used
> [0x0000000768000000,0x0000000768000000,0x0000000800000000)
> to space 2490368K, 0% used
> [0x00000006d0000000,0x00000006d0000000,0x0000000768000000)
> ParOldGen total 1048576K, used 0K [0x0000000300000000,
> 0x0000000340000000, 0x0000000340000000)
> object space 1048576K, 0% used
> [0x0000000300000000,0x0000000300000000,0x0000000340000000)
> PSPermGen total 21504K, used 5768K [0x00000002fae00000,
> 0x00000002fc300000, 0x0000000300000000)
> object space 21504K, 26% used
> [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000)
> As a side-effect the test runs x3 faster with the modified hashCode()
> implementation.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)