Map.of()

Claes Redestad Wed, 06 Dec 2017 12:23:34 -0800

Hi,

please help review this patch to consolidate the number ofimplementation classes returned by the static collection factories:


http://cr.openjdk.java.net/~redestad/8193128/open.00/

I set out to explore options for addressing small inefficiencies we'vebeen running into, the latest after replacing use of @Stable arrays injava.lang.invoke with List.of() (JDK-8184777):

- List.indexOf deferred to the iterator in AbstractList, which check forconcurrent modification- Warmup takes a bit longer due to having to warm up four differentclasses and associated methods- Slowdowns in certain code paths attributable to certain calls becomingmegamorphic

Microbenchmark:http://cr.openjdk.java.net/~redestad/scratch/less_immutables/ListMorphism.java

http://cr.openjdk.java.net/~redestad/scratch/less_immutables/benchmarks.jar

The benchmark explores how call-sites behave when performing some naiveoperations on a few different Lists.

For every benchmark using List.of() there's a variant using ArrayListfor comparison:


Baseline:

Benchmark                             Mode  Cnt    Score Error   Units
ListMorphism.finalGetFromArrayList   thrpt   25   92.659 ±  3.058 ops/us

ListMorphism.finalGetFromList thrpt 25 335.245 ± 0.244 ops/us# 3.6x

ListMorphism.finalSumSizesArrayList  thrpt   25  245.020 ±  0.106 ops/us

ListMorphism.finalSumSizesList thrpt 25 335.107 ± 0.439 ops/us# 1.4x

ListMorphism.getFromArrayList        thrpt   25   70.343 ±  0.972 ops/us

ListMorphism.getFromList thrpt 25 37.121 ± 0.135 ops/us# 0.53x

ListMorphism.getFromArrayList12      thrpt   25 109.890 ±  0.286  ops/us

ListMorphism.getFromList12 thrpt 25 109.552 ± 6.972 ops/us# 1.0x

ListMorphism.sumSizesArrayList       thrpt   25  131.467 ±  4.672 ops/us

ListMorphism.sumSizesList thrpt 25 57.877 ± 0.102 ops/us# 0.45x

ListMorphism.sumSizesArrayList12     thrpt   25 208.652 ± 11.294  ops/us

ListMorphism.sumSizesList12 thrpt 25 227.269 ± 0.961 ops/us# 1.1x

The good: When dealing with List literals (the final* benchmarks),List.of() allows really nice speed-ups compared to ArrayList.

The bad: When not used as a literal, List.of() implementations atcall-sites can cause a substantial slowdown (megamorphic dispatch)


The ugly:

After some explorations[1], I narrowed in on the following experiment:http://cr.openjdk.java.net/~redestad/scratch/less_immutables/webrev/

Basically: Merge List1 and List2 into a single class, List12, mergeList0 into ListN (List0 has a singleton instance, so might as well be aninstance of ListN). Same for Set0,1,2,N. Map0 is merged into MapN.

This strikes a balance between throughput, footprint and slightly betterstartup/warmup behavior.

According to jol estimates[2] the size of List12 is the same as bothList1 and List2 in the current JDK implementation. Set12 is footprintneutral compared to Set2 on all platforms but lose a few bytes on 64-bitVMs compared to Set1.


Benchmark                             Mode  Cnt    Score   Error Units
ListMorphism.finalGetFromArrayList   thrpt   25   93.046 ± 0.569 ops/us

ListMorphism.finalGetFromList thrpt 25 335.280 ± 0.154 ops/us# 3.6x

ListMorphism.finalSumSizesArrayList  thrpt   25  244.595 ± 1.085 ops/us

ListMorphism.finalSumSizesList thrpt 25 303.351 ± 0.182 ops/us# 1.2x

ListMorphism.getFromArrayList        thrpt   25   70.631 ± 0.070 ops/us

ListMorphism.getFromList thrpt 25 73.258 ± 2.955 ops/us# 1.04x

ListMorphism.getFromArrayList12      thrpt   25 109.921 ± 0.096  ops/us

ListMorphism.getFromList12 thrpt 25 127.392 ± 0.088 ops/us# 1.16x

ListMorphism.sumSizesArrayList       thrpt   25  131.393 ± 4.882 ops/us

ListMorphism.sumSizesList thrpt 25 107.686 ± 5.286 ops/us# 0.82x

ListMorphism.sumSizesArrayList12     thrpt   25  212.350 ± 0.134  ops/us

ListMorphism.sumSizesList12 thrpt 25 198.778 ± 0.479 ops/us# 0.94x

The experiment has a flag to change number of specialized List/Set/Mapclasses (-Djdk.ImmutableCollections.specializations=0|1|2, default=2).

1 specialization (List1 + ListN, Set1 + SetN) is more or less the sameas 2, some ~1-2% improvements, mainly in sumSizes micros.

0 specializations (List-, Set, MapN only) achieves a small increase inthroughput on some micros by ensuring callsites stay monomorphic, butit's not very substantial by my measures (~5%, but mostly in sumSizesmicros).

Keeping the footprint more or less the same, while evening out a fewrough edges and improving startup and static footprint seems like theoverall best option. An alternative would be to keep the experimentalflag, but I don't think a 5% gain on micros warrants the extramaintenance burden and testing that entails.

The proposed patch is more or less this experiment with 2specializations, but having removed the flag and code movement needed tosupport it removed (along with a fix in the writeReplace methods ofList12/Set12)


Thanks!

/Claes

[1] Older experiments:
http://cr.openjdk.java.net/~redestad/immutable/list12N.0/

-- simply merge L0 into LN (still have L1, L2 and LN) - nothing reallychanged, though


http://cr.openjdk.java.net/~redestad/immutable/list1N.0/

-- L0 merged into LN, drop L2. Substantial performance boost onmicros. Footprint drop for 2-element lists.


http://cr.openjdk.java.net/~redestad/immutable/listNdouble.0/

-- L0+L1+L2+LN merged into one implementation. Same footprint with asingle class, but loses a lot on throughput in micros.


http://cr.openjdk.java.net/~redestad/immutable/listNsingle.0/

-- L0+L1+LN merged, drop L2. Simplification of the previous. Like thelist1N.0 experiment in footprint, but a loss in throughput on all measures.

No approach seemed a win across the board here: we either had to accepta footprint reduction, or see throughput suffer dramatically.


[2] http://openjdk.java.net/projects/code-tools/jol/

[10?] RFR: 8193128: Reduce number of implementation classes returned by List/Set/Map.of()

Reply via email to