On 01/11/2015 10:00 PM, Doug Lea wrote:
On 01/11/2015 02:26 PM, Peter Levart wrote:
Although majority of entries constitute the bins of size 13 or 14,
there's only
a single hashCode value per bin.
So in this benchmark, treeifying with non-comparable keys is a waste
of effort.
On the other hand, the waste seems to only cost about 10% in your runs.
I wonder why the original report using jdk7 vs jdk8 seemed larger.
I don't know. I ran the same benchmark with same VM options on JDK 7
too. Here are all results together:
Original JDK 7 HashMap (and JVM):
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
2839.458 157.299 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2673.924 187.063 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
686.972 32.928 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
631.001 6.574 ms
Original JDK 9 HashMap:
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3011.738 78.249 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2984.280 48.315 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
682.060 52.341 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
685.705 55.183 ms
Original JDK 9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
2780.771 236.647 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2541.740 233.429 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
757.364 67.869 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
671.617 54.943 ms
Caching of comparableClassFor (in ClassRepository - good for
heterogeneous keys too):
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3014.888 71.778 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2279.757 54.159 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
760.743 70.674 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
725.188 67.853 ms
Caching of comparableClassFor (internally - good for homogeneous keys only):
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3026.707 84.571 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2137.296 66.140 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
635.964 8.213 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
685.129 46.783 ms
Are there (non-forged) sets of non-comparable keys with hashCodes where
treeifying makes sense?
Try using a class like:
class FHC { float f; int hashCode() { return
Float.floatToIntBits(f); } }
and populate with instances with integral values for f.
Similarly for doubles.
Pre-jdk8, we devised a bit-smearing function that (among other
constraints) did OK for float/double keys with integral values,
that are not all that rare. With treeification, we don't need to
penalize classes with decent hashCodes by bit-smearing to still
get OK performance for these kinds of cases where the tree-based
hashCode comparison helps more than Comparability per se.
I see. These keys actually have unique or near unique hashCodes but
which are not good for power of two length tables without bit-smearing.
With tree bins we don't need heavy bit-smearing to get decent
performance in speed, but the table gets quite sparse anyway (although
this is the smaller of the space overheads - tree nodes are bigger). For
example, for 1M integral Floats, we get the following:
>>> Float ...
Capacity: 2097152
Load factor: 0.75
Size: 1000000
Bin sizes: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0
7*0 8*0 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
Empty bins: 93.8 %
Unique hash codes per bin: 0*1966080 1*0 2*0 3*24288 4*41248 5*0 6*0 7*0
8*0 9*0 10*4456 11*22963 12*30554 13*7539 14*24 total=1000000
Also...
It looks like the simplest path to a minor improvement is
just to cache internally (your fourth test below). But I now
recall not doing this because it adds to footprint and
the field could prevent class unloading, for only a small
benefit.
Footprint, yes (one reference field in HM instance), while class
unloading is taken care of using WeakReference:
http://cr.openjdk.java.net/~plevart/jdk9-dev/HM.comparableClassFor/HomogeneousKeysCache/webrev.01/
(Every time HashMap has changed, there have been reports of
performance regressions even though typical performance
generally improves.)
-Doug
Regards, Peter
Original JDK9 HashMap:
Benchmark (initialSize) Mode Samples
Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3011.738 78.249 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2984.280 48.315 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
682.060 52.341 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
685.705 55.183 ms
Original JDK9 HashMap with TREEIFY_THRESHOLD = 1 << 20:
Benchmark (initialSize) Mode Samples
Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
2780.771 236.647 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2541.740 233.429 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
757.364 67.869 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
671.617 54.943 ms
Caching of comparableClassFor (in ClassRepository - good for
heterogeneous
keys too):
Benchmark (initialSize) Mode Samples
Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3014.888 71.778 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2279.757 54.159 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
760.743 70.674 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
725.188 67.853 ms
Caching of comparableClassFor (internally - good for homogeneous
keys only):
Benchmark (initialSize) Mode Samples
Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 ss 60
3026.707 84.571 ms
j.t.HashMapCollision.badDistWithComp 16 ss 60
2137.296 66.140 ms
j.t.HashMapCollision.goodDistNoComp 16 ss 60
635.964 8.213 ms
j.t.HashMapCollision.goodDistWithComp 16 ss 60
685.129 46.783 ms