Re: Hashtable's and the tale of runtime
Tom Tromey wrote: At some point we changed libgcj's hash function to this: // This was chosen to yield relatively well distributed results on // both 32- and 64-bit architectures. Note 0x7fff is prime. return (jint) ((unsigned long) obj % 0x7fff); What if you're on a 32-bit machine and all the object addresses are less than 0x8000.. won't that give you the same result as just returning (jint)obj ? -Archie __ Archie Cobbs *CTO, Awarix* http://www.awarix.com
Hashtable's and the tale of runtime
Hi! We had a curious problem with SPECjvm98 _213_javac and cacao. A newly recompiled cacao ended with 10% preformance degradation. After further investigation we ended with VMSystem.identityHashCode() and it's return value. _213_javac make heavy use of Hashtable's and the objects stored in the hashtable use VMSystem.identityHashCode() as their hash function. Since alignment is normally 8 or 16-bytes, the distribution of the objects seems to be not optimal. We ended with 4 Object.equals() calls more and a doubled runtime for Hashtable.get() (measured clock cycles), as Object.equals() is used in Hashtable.get(). A shifting of the VMSystem.identityHashCode() pointer value helps (sometimes), but shifting by 3 makes it even worse (what was my guess to make it better). Any thoughts? Should we change Hashtable.hash()? Or return a better value for VMSystem.identityHashCode()? But most runtimes only return the memory pointer... TWISTI
Re: Hashtable's and the tale of runtime
Christian Thalinger writes: Hi! We had a curious problem with SPECjvm98 _213_javac and cacao. A newly recompiled cacao ended with 10% preformance degradation. After further investigation we ended with VMSystem.identityHashCode() and it's return value. _213_javac make heavy use of Hashtable's and the objects stored in the hashtable use VMSystem.identityHashCode() as their hash function. Since alignment is normally 8 or 16-bytes, the distribution of the objects seems to be not optimal. We ended with 4 Object.equals() calls more and a doubled runtime for Hashtable.get() (measured clock cycles), as Object.equals() is used in Hashtable.get(). This seems a bit strange. Hashtable.rehash() seems always to generate a new capacity based on the old capacity * 2 + 1, so you wouldn't expect to see a great many collisions as the definition of hash() is key.hashCode() % buckets.length Okay, buckets.length isn't necessarily prime, but you wouldn't expect this to have a very non-uniform distribution. A shifting of the VMSystem.identityHashCode() pointer value helps (sometimes), but shifting by 3 makes it even worse (what was my guess to make it better). Any thoughts? Should we change Hashtable.hash()? Or return a better value for VMSystem.identityHashCode()? But most runtimes only return the memory pointer... There are things that we can try, such a using a better way to turn an address into a hash. But it would be nice first to know the real cause of the very non-uniform distribution, if indeed that's what is happening. Andrew.
Re: Hashtable's and the tale of runtime
Andrew == Andrew Haley [EMAIL PROTECTED] writes: Andrew There are things that we can try, such a using a better way to turn an Andrew address into a hash. At some point we changed libgcj's hash function to this: // This was chosen to yield relatively well distributed results on // both 32- and 64-bit architectures. Note 0x7fff is prime. return (jint) ((unsigned long) obj % 0x7fff); It used to simply return the address. I think we were having trouble because it was losing all the upper bits on 64 bit platforms. As far as I know we haven't looked into how effective, or not, this is (on different workloads, etc... I don't remember hearing about Hashtable showing up in a profile before). Twisti, if you try it out, let us know :-) Tom