Hi,
please help review this patch to consolidate the number of
implementation classes returned by the static collection factories:
http://cr.openjdk.java.net/~redestad/8193128/open.00/
I set out to explore options for addressing small inefficiencies we've
been running into, the latest after replacing use of @Stable arrays in
java.lang.invoke with List.of() (JDK-8184777):
- List.indexOf deferred to the iterator in AbstractList, which check for
concurrent modification
- Warmup takes a bit longer due to having to warm up four different
classes and associated methods
- Slowdowns in certain code paths attributable to certain calls becoming
megamorphic
Microbenchmark:
http://cr.openjdk.java.net/~redestad/scratch/less_immutables/ListMorphism.java
http://cr.openjdk.java.net/~redestad/scratch/less_immutables/benchmarks.jar
The benchmark explores how call-sites behave when performing some naive
operations on a few different Lists.
For every benchmark using List.of() there's a variant using ArrayList
for comparison:
Baseline:
Benchmark Mode Cnt Score Error Units
ListMorphism.finalGetFromArrayList thrpt 25 92.659 ± 3.058 ops/us
ListMorphism.finalGetFromList thrpt 25 335.245 ± 0.244 ops/us
# 3.6x
ListMorphism.finalSumSizesArrayList thrpt 25 245.020 ± 0.106 ops/us
ListMorphism.finalSumSizesList thrpt 25 335.107 ± 0.439 ops/us
# 1.4x
ListMorphism.getFromArrayList thrpt 25 70.343 ± 0.972 ops/us
ListMorphism.getFromList thrpt 25 37.121 ± 0.135 ops/us
# 0.53x
ListMorphism.getFromArrayList12 thrpt 25 109.890 ± 0.286 ops/us
ListMorphism.getFromList12 thrpt 25 109.552 ± 6.972 ops/us
# 1.0x
ListMorphism.sumSizesArrayList thrpt 25 131.467 ± 4.672 ops/us
ListMorphism.sumSizesList thrpt 25 57.877 ± 0.102 ops/us
# 0.45x
ListMorphism.sumSizesArrayList12 thrpt 25 208.652 ± 11.294 ops/us
ListMorphism.sumSizesList12 thrpt 25 227.269 ± 0.961 ops/us
# 1.1x
The good: When dealing with List literals (the final* benchmarks),
List.of() allows really nice speed-ups compared to ArrayList.
The bad: When not used as a literal, List.of() implementations at
call-sites can cause a substantial slowdown (megamorphic dispatch)
The ugly:
After some explorations[1], I narrowed in on the following experiment:
http://cr.openjdk.java.net/~redestad/scratch/less_immutables/webrev/
Basically: Merge List1 and List2 into a single class, List12, merge
List0 into ListN (List0 has a singleton instance, so might as well be an
instance of ListN). Same for Set0,1,2,N. Map0 is merged into MapN.
This strikes a balance between throughput, footprint and slightly better
startup/warmup behavior.
According to jol estimates[2] the size of List12 is the same as both
List1 and List2 in the current JDK implementation. Set12 is footprint
neutral compared to Set2 on all platforms but lose a few bytes on 64-bit
VMs compared to Set1.
Benchmark Mode Cnt Score Error Units
ListMorphism.finalGetFromArrayList thrpt 25 93.046 ± 0.569 ops/us
ListMorphism.finalGetFromList thrpt 25 335.280 ± 0.154 ops/us
# 3.6x
ListMorphism.finalSumSizesArrayList thrpt 25 244.595 ± 1.085 ops/us
ListMorphism.finalSumSizesList thrpt 25 303.351 ± 0.182 ops/us
# 1.2x
ListMorphism.getFromArrayList thrpt 25 70.631 ± 0.070 ops/us
ListMorphism.getFromList thrpt 25 73.258 ± 2.955 ops/us
# 1.04x
ListMorphism.getFromArrayList12 thrpt 25 109.921 ± 0.096 ops/us
ListMorphism.getFromList12 thrpt 25 127.392 ± 0.088 ops/us
# 1.16x
ListMorphism.sumSizesArrayList thrpt 25 131.393 ± 4.882 ops/us
ListMorphism.sumSizesList thrpt 25 107.686 ± 5.286 ops/us
# 0.82x
ListMorphism.sumSizesArrayList12 thrpt 25 212.350 ± 0.134 ops/us
ListMorphism.sumSizesList12 thrpt 25 198.778 ± 0.479 ops/us
# 0.94x
The experiment has a flag to change number of specialized List/Set/Map
classes (-Djdk.ImmutableCollections.specializations=0|1|2, default=2).
1 specialization (List1 + ListN, Set1 + SetN) is more or less the same
as 2, some ~1-2% improvements, mainly in sumSizes micros.
0 specializations (List-, Set, MapN only) achieves a small increase in
throughput on some micros by ensuring callsites stay monomorphic, but
it's not very substantial by my measures (~5%, but mostly in sumSizes
micros).
Keeping the footprint more or less the same, while evening out a few
rough edges and improving startup and static footprint seems like the
overall best option. An alternative would be to keep the experimental
flag, but I don't think a 5% gain on micros warrants the extra
maintenance burden and testing that entails.
The proposed patch is more or less this experiment with 2
specializations, but having removed the flag and code movement needed to
support it removed (along with a fix in the writeReplace methods of
List12/Set12)
Thanks!
/Claes
[1] Older experiments:
http://cr.openjdk.java.net/~redestad/immutable/list12N.0/
-- simply merge L0 into LN (still have L1, L2 and LN) - nothing really
changed, though
http://cr.openjdk.java.net/~redestad/immutable/list1N.0/
-- L0 merged into LN, drop L2. Substantial performance boost on
micros. Footprint drop for 2-element lists.
http://cr.openjdk.java.net/~redestad/immutable/listNdouble.0/
-- L0+L1+L2+LN merged into one implementation. Same footprint with a
single class, but loses a lot on throughput in micros.
http://cr.openjdk.java.net/~redestad/immutable/listNsingle.0/
-- L0+L1+LN merged, drop L2. Simplification of the previous. Like the
list1N.0 experiment in footprint, but a loss in throughput on all measures.
No approach seemed a win across the board here: we either had to accept
a footprint reduction, or see throughput suffer dramatically.
[2] http://openjdk.java.net/projects/code-tools/jol/