Re: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size

Peter Levart Sat, 12 Jul 2014 08:48:33 -0700


On 07/11/2014 09:08 PM, Doug Lea wrote:

I've been content to just observe Martin and Peter's nice efforts
on this, but one note:

On 07/11/2014 03:00 PM, Martin Buchholz wrote:
On Wed, Jul 9, 2014 at 3:17 PM, Peter Levart <peter.lev...@gmail.com>wrote:
IMH resizing is arranged so that the table is always 33% ... 66%full (ifnothing is ever removed from it) except when capacity reaches 2**29,then
it can be filled up to the top.
avg. # of probes can be taken as a rough estimation of averageslow-down,max. # of probes can be taken as a rough estimation ofworst-case-operation
slow-down.

So where to draw the line?
Linear probing has long been known to be prone to long worst-caseprobes,but with recent computer architectures this is offset by its extremecache
friendliness.
Bear in mind that the number of bits of identityHashCode is less
than 32 on all JVMs I know. It can be as low as 23, which means that
you are sure to see a lot of exact collisions on IHMs with only
tens of millions of elements, and there's nothing much you can
do that will help.

-Doug

We have max. size = 2^29 - 1. This can not change because ofserialization backwards compatibility. It could be a little less, butnot much. Experiment shows that it is possible to fill IHM up to 99%with not too much CPU effort. If such IHM is serialized, we must be ableto deserialize it...

But why do we have to limit table length to 2^30 ? Because it has to bepower of 2 and 2^31 is already a negative number...

What would happen if max. table length was 2^30 + 2^29 ...

Here's what we get now:

// max. size = 2^29 - 1
// table length = 2^30
 10% max. size, probes:  0.1 avg.,      9 max.
 20% max. size, probes:  0.1 avg.,     15 max.
 30% max. size, probes:  0.2 avg.,     25 max.
 40% max. size, probes:  0.3 avg.,     38 max.
 50% max. size, probes:  0.5 avg.,     64 max.
 60% max. size, probes:  0.7 avg.,     93 max.
 65% max. size, probes:  0.9 avg.,    147 max.
 70% max. size, probes:  1.2 avg.,    224 max.
 75% max. size, probes:  1.5 avg.,    354 max.
 80% max. size, probes:  2.0 avg.,    441 max.
 85% max. size, probes:  2.8 avg.,    789 max.
 90% max. size, probes:  4.5 avg.,   1869 max.
 91% max. size, probes:  5.1 avg.,   2377 max.
 92% max. size, probes:  5.7 avg.,   3409 max.
 93% max. size, probes:  6.6 avg.,   3804 max.
 94% max. size, probes:  7.8 avg.,   5824 max.
 95% max. size, probes:  9.5 avg.,   7021 max.
 96% max. size, probes: 12.0 avg.,  12607 max.
 97% max. size, probes: 16.2 avg.,  17643 max.
 98% max. size, probes: 24.5 avg.,  34561 max.
 99% max. size, probes: 49.6 avg., 131371 max.
100% ... haven't waited long enough....

If table length was 2^30 + 2^29, we would only fill it up to 66% likealways and there would be no slow-down:


// max. size = 2^29 - 1
// table length = 2^30 + 2^29
 10% max. size, probes:  0.0 avg.,      7 max.
 20% max. size, probes:  0.1 avg.,     11 max.
 30% max. size, probes:  0.1 avg.,     13 max.
 40% max. size, probes:  0.2 avg.,     20 max.
 50% max. size, probes:  0.3 avg.,     28 max.
 60% max. size, probes:  0.4 avg.,     40 max.
 65% max. size, probes:  0.4 avg.,     42 max.
 70% max. size, probes:  0.5 avg.,     52 max.
 75% max. size, probes:  0.5 avg.,     65 max.
 80% max. size, probes:  0.6 avg.,     87 max.
 85% max. size, probes:  0.7 avg.,     87 max.
 90% max. size, probes:  0.8 avg.,    128 max.
 91% max. size, probes:  0.8 avg.,    128 max.
 92% max. size, probes:  0.8 avg.,    128 max.
 93% max. size, probes:  0.8 avg.,    129 max.
 94% max. size, probes:  0.9 avg.,    129 max.
 95% max. size, probes:  0.9 avg.,    131 max.
 96% max. size, probes:  0.9 avg.,    150 max.
 97% max. size, probes:  0.9 avg.,    150 max.
 98% max. size, probes:  1.0 avg.,    159 max.
 99% max. size, probes:  1.0 avg.,    159 max.
100% max. size, probes:  1.0 avg.,    159 max.

If we're willing to pay the price of special-casing the non-power-of-2MAX_CAPACITY = (1 << 29) + (1 << 28), which amounts to approx. 4% ofperformance:


Original code:

Benchmark                Mode   Samples        Score  Score error Units
j.t.IHMBench.ihm0Put    thrpt        30 25217087.593    50117.867 ops/s
j.t.IHMBench.ihm1Get    thrpt        30 43017677.457   230902.599 ops/s

Patched code:

Benchmark                Mode   Samples        Score  Score error Units
j.t.IHMBench.ihm0Put    thrpt        30 24213091.899   122980.408 ops/s
j.t.IHMBench.ihm1Get    thrpt        30 40754537.027   936380.022 ops/s

Using JMH benchmark:

@State(Scope.Thread)
public class IHMBench {

    static final Object[] keys = new Object[1000_000];
    static {
        for (int i = 0; i < keys.length; i++) {
            keys[i] = new Object();
        }
    }

    static final Object value = new Object();

final Map<Object, Object> map = new IdentityHashMap<Object,Object>(keys.length);


    int i = 0;

    @Benchmark
    public void ihm0Put(Blackhole bh) {
        bh.consume(map.put(keys[i], value));
        int j = i + 1;
        i = j < keys.length ? j : 0;
    }

    @Benchmark
    public void ihm1Get(Blackhole bh) {
        bh.consume(map.get(keys[i]));
        int j = i + 1;
        i = j < keys.length ? j : 0;
    }
}


Then here's a possible solution:

http://cr.openjdk.java.net/~plevart/jdk9-dev/IdentityHashMap/webrev.06/


Is it worth it?

Regards, Peter

Re: RFR: [6904367]: (coll) IdentityHashMap is resized before exceeding the expected maximum size

Reply via email to