Hi, Bernd
The initial discussion of the change that set TREEIFY_THRESHOLD to 8 (it
was initially 16) can be read here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-July/018685.html
...continuing here...
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-August/019853.html
The code review discussion is archived here:
http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-August/020131.html
-Brent
On 1/8/15 4:10 PM, Bernd Eckenfels wrote:
Hello Vitaly,
the TREEIFY_THRESHOLD in HashMap is 8 (6 for UNTREEIFY). Not sure if
there have been benchmarks for this, I dont see an justification in the
source comments. There is a comment, that it is expected to waste a
factor of two time and space when not compareable.
I would also expect that larger thresholds make sense (especially when key is
not
compareable).
The JEP 180 also does not mention results from the benchmarks, maybe
somebody knows details?
http://openjdk.java.net/jeps/180
Gruss
Bernd
Am Thu, 8 Jan 2015 18:38:53 -0500
schrieb Vitaly Davidovich <vita...@gmail.com>:
Hmmm, 15 entries doesn't seem like a large enough number to drop
linear search. I'd have expected something like 60-80 entries (in
some binary vs linear search benchmarks I've come across, that seems
to be crossover point). It's hard to beat linear search for small
sets when the comparison function is dirt cheap.
Sent from my phone
On Jan 8, 2015 6:25 PM, "Bernd Eckenfels" <e...@zusammenkunft.net>
wrote:
Hello Peter,
hm not sure what you mean, i was not suggesting the regression is
caused by simpler hashCode bits. (do you mean my comment about the
removed randomizer? that is not a problem for the benchmark, but it
might be an opportunity for DOS (again)).
I think the regression itself is caused by the fact that a tree is
used instead of the linear search. The benchmark does colide
heavily, but only places 15 or so entries in one bucket. This is
enough to trigger the migration from linear search to tree, but not
enough to hurt for linear search performance (at least I think this
is the case).
The way the nodes are constructed they actually do sort pretty good
if compareable is implemented. It would most likely not help if
Compareable cannot distinguish more than hashCode would.
Gruss
Bernd
Am Thu, 08 Jan 2015 23:10:10 +0100
schrieb Peter Levart <peter.lev...@gmail.com>:
Bernd,
I tried to change the "comparableClassFor" myself and it didn't
work (HashMap is used very early in boot-up sequence and
initializing ClassValue at that time triggers a NPE).
Anyway, caching of "comparableClassFor" result would only
potentially improve the "badDistWithComp" result. But can not
improve "badDistNoComp" which is the one with speed regression as
you're benchmark suggests.
But your feeling that this is caused by "simpler" hashCode bits
spreading function is not correct. I tried to replace the hash()
method with the one that was in HM before and I get comparable
results. This is the JDK8 HashMap.hash() method:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>>
16); }
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 avgt 4
3171.264 1152.995 ms/op
j.t.HashMapCollision.badDistWithComp 16 avgt 4
2819.342 422.861 ms/op
j.t.HashMapCollision.goodDistNoComp 16 avgt 4
1026.064 72.049 ms/op
j.t.HashMapCollision.goodDistWithComp 16 avgt 4
1025.312 39.858 ms/op
...and this is my re-interpretation of pre JDK8 HashMap.hash():
static final int randomHash =
mix32(System.currentTimeMillis() ^ System.nanoTime());
private static int mix32(long z) {
z = (z ^ (z >>> 33)) * 0xff51afd7ed558ccdL;
return (int)(((z ^ (z >>> 33)) * 0xc4ceb9fe1a85ec53L) >>>
32); }
static final int hash(Object k) {
int h = k == null ? randomHash : randomHash ^
k.hashCode();
// This function ensures that hashCodes that differ only
by // constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load
factor). h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
Benchmark (initialSize) Mode
Samples Score Score error Units
j.t.HashMapCollision.badDistNoComp 16 avgt 4
3257.348 1079.088 ms/op
j.t.HashMapCollision.badDistWithComp 16 avgt 4
2866.740 414.687 ms/op
j.t.HashMapCollision.goodDistNoComp 16 avgt 4
1041.068 99.370 ms/op
j.t.HashMapCollision.goodDistWithComp 16 avgt 4
1041.653 53.925 ms/op
Your benchmark does not show much difference. Perhaps the
regression for "badDistNoComp" case could be caused by the fact
that with really bad hashCode and no Comparable interface, the
red-black tree becomes less performant to search than a simple
linked list of Nodes...
Regards, Peter
On 01/08/2015 08:41 PM, Bernd Eckenfels wrote:
Hello Peter,
yes it is only keys without an Compareable interface, but they
are quite common. I think the main problem with the internal
comparator (based on instance identity) is, that it would work
for looking up the same instance again, but not for the case
where the actual instance is re-created (as in my example code).
I would love to test your modified code, but I don't have a
OpenJDK build environment handy. Or actually I can try to get
one, is there somewhere a Virtulisation or Cloud Image
available which is pre-installed?
I have (1.6) a compiled benchmark.jar here, in case anyone
wants to try it:
https://onedrive.live.com/redir?resid=A98B6F4E09966AFD!20440&authkey=!AFFk03-5jq21Xz0&ithint=file%2cjar
BTW: I am (as usual) not expecting commoents on that, but just
to mention it: the expected good behavior of degenerated
hashmaps (due to the tree) was reason for removing the hashcode
secret randomization. I wonder if that was such a good idea if
colliding lookups (with more than a handfull of entries in a
bucket) have this 50% penalty.
Greetings
Bernd
Am Thu, 08 Jan 2015 20:22:13 +0100
schrieb Peter Levart <peter.lev...@gmail.com>:
Hi Bernd,
It seems that only bad hash codes (without comparable keys) in
JDK8 HM are worse than JDK7 HM.
Since you have already taken time to measure JDK7 vs JDK8 HM,
could you try to take the JDK8 source and just replace the
internal "comparableClassFor" method with the following
implementation:
static final ClassValue<Boolean> selfComparable = new
ClassValue<Boolean>() {
@Override
protected Boolean computeValue(Class<?> c) {
Type[] as; ParameterizedType p;
for (Type t : c.getGenericInterfaces()) {
if (t instanceof ParameterizedType &&
(p = (ParameterizedType) t).getRawType()
== Comparable.class &&
(as = p.getActualTypeArguments()).length
== 1 && as[0] == c) // type arg is c
return true;
}
return false;
}
};
static Class<?> comparableClassFor(Object x) {
if (x instanceof Comparable) {
Class<?> c = x.getClass();
if (c == String.class || selfComparable.get(c)) {
return c;
}
}
return null;
}
...and retry your measurements. I just want to see if it has
any impact.
Thanks, Peter
On 01/08/2015 05:38 PM, Bernd Eckenfels wrote:
Hello,
I think it was topic before, but I just wanted to point out,
that it is still an topic on the internetz. :)
Motivated by a StackOverflow question regarding HashMap
performance regression in Java 8
http://stackoverflow.com/questions/27759527/using-java-7-hashmap-in-java-8/27760442
I made a JMH test and compared 7 and 8 speed. (the test is not
very scientific as I dont really know how to squeeze the
longrunning loop which alters state into the harness, but the
results seem to be consitent wth theory and stopwatch testing)
https://gist.github.com/ecki/9f69773eb29428a36077
What can be seen is, that with a good distribution of hash
keys 8 looks faster than 7, and with a bad distribution of
hash keys Java 7 is faster (unless you supply a Comparator
for the key). (and a good distributed hashkey with comparable
seems to be a bit slower)
I think the regression is somewhat expected, but I guess its
not widely known.
(I do not use a cached hashcode, but it has a nearly trivial
implementation just to make it more life like. the tests also
compares different initial sizes, but they do not have an
measurable effect on the performance, I show only default size
below:)
java version "1.7.0_72"
Benchmark (initialSize) Mode Samp Score
Error Units n.e.j.h.HashMapCollision.badDistNoComp 16 avgt
4 10847,318 ± 5596,690 ms/op
n.e.j.h.HashMapCollision.badDistWithComp 16 avgt 4
10761,430 ± 5376,975 ms/op
n.e.j.h.HashMapCollision.goodDistNoComp 16 avgt 4
3613,923 ± 254,823 ms/op
n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4
3656,229 ± 274,350 ms/op java version "1.8.0_25"
Benchmark (initialSize) Mode Samp Score
Error Units n.e.j.h.HashMapCollision.badDistNoComp 16 avgt
4 14309,880 ± 1811,709 ms/op <-slower
n.e.j.h.HashMapCollision.badDistWithComp 16 avgt 4
8232,037 ± 5974,530 ms/op
n.e.j.h.HashMapCollision.goodDistNoComp 16 avgt 4
3304,698 ± 116,866 ms/op
n.e.j.h.HashMapCollision.goodDistWithComp 16 avgt 4
3425,762 ± 107,659 ms/op
Greetings
Bernd