Re: HashMap collision speed (regression 7->8)

Peter Levart Sun, 11 Jan 2015 15:27:49 -0800


On 01/11/2015 10:00 PM, Doug Lea wrote:

On 01/11/2015 02:26 PM, Peter Levart wrote:
Although majority of entries constitute the bins of size 13 or 14,there's only
a single hashCode value per bin.
So in this benchmark, treeifying with non-comparable keys is a wasteof effort.
On the other hand, the waste seems to only cost about 10% in your runs.
I wonder why the original report using jdk7 vs jdk8 seemed larger.

I don't know. I ran the same benchmark with same VM options on JDK 7too. Here are all results together:


Original JDK 7 HashMap (and JVM):

Benchmark (initialSize) ModeSamples Score Score error Unitsj.t.HashMapCollision.badDistNoComp 16 ss 602839.458 157.299 msj.t.HashMapCollision.badDistWithComp 16 ss 602673.924 187.063 msj.t.HashMapCollision.goodDistNoComp 16 ss 60686.972 32.928 msj.t.HashMapCollision.goodDistWithComp 16 ss 60631.001 6.574 ms


Original JDK 9 HashMap:

Benchmark (initialSize) ModeSamples Score Score error Unitsj.t.HashMapCollision.badDistNoComp 16 ss 603011.738 78.249 msj.t.HashMapCollision.badDistWithComp 16 ss 602984.280 48.315 msj.t.HashMapCollision.goodDistNoComp 16 ss 60682.060 52.341 msj.t.HashMapCollision.goodDistWithComp 16 ss 60685.705 55.183 ms


Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:

Benchmark (initialSize) ModeSamples Score Score error Unitsj.t.HashMapCollision.badDistNoComp 16 ss 602780.771 236.647 msj.t.HashMapCollision.badDistWithComp 16 ss 602541.740 233.429 msj.t.HashMapCollision.goodDistNoComp 16 ss 60757.364 67.869 msj.t.HashMapCollision.goodDistWithComp 16 ss 60671.617 54.943 ms

Caching of comparableClassFor (in ClassRepository - good forheterogeneous keys too):

Benchmark (initialSize) ModeSamples Score Score error Unitsj.t.HashMapCollision.badDistNoComp 16 ss 603014.888 71.778 msj.t.HashMapCollision.badDistWithComp 16 ss 602279.757 54.159 msj.t.HashMapCollision.goodDistNoComp 16 ss 60760.743 70.674 msj.t.HashMapCollision.goodDistWithComp 16 ss 60725.188 67.853 ms


Caching of comparableClassFor (internally - good for homogeneous keys only):

Benchmark (initialSize) ModeSamples Score Score error Unitsj.t.HashMapCollision.badDistNoComp 16 ss 603026.707 84.571 msj.t.HashMapCollision.badDistWithComp 16 ss 602137.296 66.140 msj.t.HashMapCollision.goodDistNoComp 16 ss 60635.964 8.213 msj.t.HashMapCollision.goodDistWithComp 16 ss 60685.129 46.783 ms


Are there (non-forged) sets of non-comparable keys with hashCodes where
treeifying makes sense?


Try using a class like:

class FHC { float f; int hashCode() { returnFloat.floatToIntBits(f); } }

and populate with instances with integral values for f.

Similarly for doubles.

Pre-jdk8, we devised a bit-smearing function that (among other
constraints) did OK for float/double keys with integral values,
that are not all that rare.  With treeification, we don't need to
penalize classes with decent hashCodes by bit-smearing to still
get OK performance for these kinds of cases where the tree-based
hashCode comparison helps more than Comparability per se.

I see. These keys actually have unique or near unique hashCodes butwhich are not good for power of two length tables without bit-smearing.With tree bins we don't need heavy bit-smearing to get decentperformance in speed, but the table gets quite sparse anyway (althoughthis is the smaller of the space overheads - tree nodes are bigger). Forexample, for 1M integral Floats, we get the following:


>>> Float ...
                 Capacity: 2097152
              Load factor: 0.75
                     Size: 1000000

Bin sizes: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*07*0 8*0 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000

               Empty bins: 93.8 %

Unique hash codes per bin: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0 7*08*0 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000


Also...

It looks like the simplest path to a minor improvement is
just to cache internally (your fourth test below). But I now
recall not doing this because it adds to footprint and
the field could prevent class unloading, for only a small
benefit.

Footprint, yes (one reference field in HM instance), while classunloading is taken care of using WeakReference:


http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/


(Every time HashMap has changed, there have been reports of
performance regressions even though typical performance
generally improves.)

-Doug


Regards, Peter

Original JDK9 HashMap:

Benchmark                               (initialSize)   Mode Samples
Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16 ss        60
3011.738       78.249       ms
j.t.HashMapCollision.badDistWithComp               16 ss        60
2984.280       48.315       ms
j.t.HashMapCollision.goodDistNoComp                16 ss        60
682.060       52.341       ms
j.t.HashMapCollision.goodDistWithComp              16 ss        60
685.705       55.183       ms

Original JDK9 HashMap with TREEIFY_THRESHOLD = 1 << 20:

Benchmark                               (initialSize)   Mode Samples
Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16 ss        60
2780.771      236.647       ms
j.t.HashMapCollision.badDistWithComp               16 ss        60
2541.740      233.429       ms
j.t.HashMapCollision.goodDistNoComp                16 ss        60
757.364       67.869       ms
j.t.HashMapCollision.goodDistWithComp              16 ss        60
671.617       54.943       ms

Caching of comparableClassFor (in ClassRepository - good forheterogeneous

keys too):

Benchmark                               (initialSize)   Mode Samples
Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16 ss        60
3014.888       71.778       ms
j.t.HashMapCollision.badDistWithComp               16 ss        60
2279.757       54.159       ms
j.t.HashMapCollision.goodDistNoComp                16 ss        60
760.743       70.674       ms
j.t.HashMapCollision.goodDistWithComp              16 ss        60
725.188       67.853       ms

Caching of comparableClassFor (internally - good for homogeneouskeys only):


Benchmark                               (initialSize)   Mode Samples
Score  Score error    Units
j.t.HashMapCollision.badDistNoComp                 16 ss        60
3026.707       84.571       ms
j.t.HashMapCollision.badDistWithComp               16 ss        60
2137.296       66.140       ms
j.t.HashMapCollision.goodDistNoComp                16 ss        60
635.964        8.213       ms
j.t.HashMapCollision.goodDistWithComp              16 ss        60
685.129       46.783       ms

Re: HashMap collision speed (regression 7->8)

Reply via email to