Tony, This sounds like it could be a useful change. I have some concerns, but these would better be addressed in Jira once a patch has been posted.
https://cwiki.apache.org/confluence/display/AVRO/How+To+Contribute Thanks, Doug On Thu, Jan 2, 2014 at 11:21 AM, Liu, Tie <[email protected]> wrote: > Hi, > In current Schma.java we have following implementation: > public final int hashCode() { > if (hashCode == NO_HASHCODE) > hashCode = computeHash(); > return hashCode; > } > > int computeHash() { return getType().hashCode() + props.hashCode(); } > > While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the > computeHash method is not. But the computeHash method is being called from > Schema$Field.hashCode and the subclasses hashCode implementation like > following: > public int hashCode() { return name.hashCode() + schema.computeHash(); } > //this is from Schema$Field class > > This is causing the the calculation of hashCode getting called > unnecessarily extensively. The proposed changed is to add the "if" check > inside the computeHash method instead: > > int computeHash() > { > if (hashCode == NO_HASHCODE) > { > hashCode = getType().hashCode() + props.hashCode(); > } > return hashCode; > } > > > We did a simple test to compare the performance difference, below is a > summary of the heap snapshot of comparing the difference: > > As a test I wrote a small program that creates a HashMap<Schema.Field, > Integer>() and enters a loop simply identifying whether various Schema.Field > instances are keys in the map. Obviously this is a pathological test case, > but when running with the current implementation of Schema.Field it has (in > about 30 seconds) used up nearly 8 GBytes of heap in instantiating > intermediate objects associated with calling Schema.computeHash(): > > Heap > PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, > 0x0000000800000000, 0x0000000800000000) > eden space 14942208K, 58% used > [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000) > from space 2490368K, 0% used > [0x0000000768000000,0x0000000768000000,0x0000000800000000) > to space 2490368K, 0% used > [0x00000006d0000000,0x00000006d0000000,0x0000000768000000) > ParOldGen total 1048576K, used 0K [0x0000000300000000, > 0x0000000340000000, 0x0000000340000000) > object space 1048576K, 0% used > [0x0000000300000000,0x0000000300000000,0x0000000340000000) > PSPermGen total 21504K, used 5782K [0x00000002fae00000, > 0x00000002fc300000, 0x0000000300000000) > object space 21504K, 26% used > [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000) > > When running with the modified implementation (and no other change) all the > object allocation vanishes: > > Heap > PSYoungGen total 17432576K, used 896532K [0x0000000340000000, > 0x0000000800000000, 0x0000000800000000) > eden space 14942208K, 6% used > [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000) > from space 2490368K, 0% used > [0x0000000768000000,0x0000000768000000,0x0000000800000000) > to space 2490368K, 0% used > [0x00000006d0000000,0x00000006d0000000,0x0000000768000000) > ParOldGen total 1048576K, used 0K [0x0000000300000000, > 0x0000000340000000, 0x0000000340000000) > object space 1048576K, 0% used > [0x0000000300000000,0x0000000300000000,0x0000000340000000) > PSPermGen total 21504K, used 5768K [0x00000002fae00000, > 0x00000002fc300000, 0x0000000300000000) > object space 21504K, 26% used > [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000) > > As a side-effect the test runs x3 faster with the modified hashCode() > implementation. > > Please let us know how you think of this proposal. If it's ok, I can create a > jira and submit a patch for the change. > > Thanks, > > Tony > > > > ---------------------------------------------------------------------- > This message, and any attachments, is for the intended recipient(s) only, may > contain information that is privileged, confidential and/or proprietary and > subject to important terms and conditions available at > http://www.bankofamerica.com/emaildisclaimer. If you are not the intended > recipient, please delete this message.
