Hi Claes, Looking at http://cr.openjdk.java.net/~redestad/8193128/open.00/src/java.base/share/classes/java/util/ImmutableCollections.java.cdiff.html, there are sections labelled --- 646,657 ---- and --- 834,845 ---- where lines like `Objects.requireNonNull(0 /* zero */);` are written. I believe that they were supposed to be either removed or made to be written like `Objects.requireNonNull(o /* lowercase o */)`. Is my belief/understanding correct?
Cheers, Jonathan On 6 December 2017 at 20:21, Claes Redestad <claes.redes...@oracle.com> wrote: > Hi, > > please help review this patch to consolidate the number of implementation > classes returned by the static collection factories: > > http://cr.openjdk.java.net/~redestad/8193128/open.00/ > > I set out to explore options for addressing small inefficiencies we've > been running into, the latest after replacing use of @Stable arrays in > java.lang.invoke with List.of() (JDK-8184777): > > - List.indexOf deferred to the iterator in AbstractList, which check for > concurrent modification > - Warmup takes a bit longer due to having to warm up four different > classes and associated methods > - Slowdowns in certain code paths attributable to certain calls becoming > megamorphic > > Microbenchmark: http://cr.openjdk.java.net/~re > destad/scratch/less_immutables/ListMorphism.java > http://cr.openjdk.java.net/~redestad/scratch/less_immutables > /benchmarks.jar > > The benchmark explores how call-sites behave when performing some naive > operations on a few different Lists. > > For every benchmark using List.of() there's a variant using ArrayList for > comparison: > > Baseline: > > Benchmark Mode Cnt Score Error Units > ListMorphism.finalGetFromArrayList thrpt 25 92.659 ± 3.058 ops/us > ListMorphism.finalGetFromList thrpt 25 335.245 ± 0.244 ops/us > # 3.6x > ListMorphism.finalSumSizesArrayList thrpt 25 245.020 ± 0.106 ops/us > ListMorphism.finalSumSizesList thrpt 25 335.107 ± 0.439 ops/us > # 1.4x > ListMorphism.getFromArrayList thrpt 25 70.343 ± 0.972 ops/us > ListMorphism.getFromList thrpt 25 37.121 ± 0.135 ops/us > # 0.53x > ListMorphism.getFromArrayList12 thrpt 25 109.890 ± 0.286 ops/us > ListMorphism.getFromList12 thrpt 25 109.552 ± 6.972 ops/us > # 1.0x > ListMorphism.sumSizesArrayList thrpt 25 131.467 ± 4.672 ops/us > ListMorphism.sumSizesList thrpt 25 57.877 ± 0.102 ops/us > # 0.45x > ListMorphism.sumSizesArrayList12 thrpt 25 208.652 ± 11.294 ops/us > ListMorphism.sumSizesList12 thrpt 25 227.269 ± 0.961 ops/us > # 1.1x > > The good: When dealing with List literals (the final* benchmarks), > List.of() allows really nice speed-ups compared to ArrayList. > > The bad: When not used as a literal, List.of() implementations at > call-sites can cause a substantial slowdown (megamorphic dispatch) > > The ugly: > > After some explorations[1], I narrowed in on the following experiment: > http://cr.openjdk.java.net/~redestad/scratch/less_immutables/webrev/ > > Basically: Merge List1 and List2 into a single class, List12, merge List0 > into ListN (List0 has a singleton instance, so might as well be an instance > of ListN). Same for Set0,1,2,N. Map0 is merged into MapN. > > This strikes a balance between throughput, footprint and slightly better > startup/warmup behavior. > > According to jol estimates[2] the size of List12 is the same as both List1 > and List2 in the current JDK implementation. Set12 is footprint neutral > compared to Set2 on all platforms but lose a few bytes on 64-bit VMs > compared to Set1. > > Benchmark Mode Cnt Score Error Units > ListMorphism.finalGetFromArrayList thrpt 25 93.046 ± 0.569 ops/us > ListMorphism.finalGetFromList thrpt 25 335.280 ± 0.154 ops/us # > 3.6x > ListMorphism.finalSumSizesArrayList thrpt 25 244.595 ± 1.085 ops/us > ListMorphism.finalSumSizesList thrpt 25 303.351 ± 0.182 ops/us # > 1.2x > ListMorphism.getFromArrayList thrpt 25 70.631 ± 0.070 ops/us > ListMorphism.getFromList thrpt 25 73.258 ± 2.955 ops/us # > 1.04x > ListMorphism.getFromArrayList12 thrpt 25 109.921 ± 0.096 ops/us > ListMorphism.getFromList12 thrpt 25 127.392 ± 0.088 ops/us > # 1.16x > ListMorphism.sumSizesArrayList thrpt 25 131.393 ± 4.882 ops/us > ListMorphism.sumSizesList thrpt 25 107.686 ± 5.286 ops/us # > 0.82x > ListMorphism.sumSizesArrayList12 thrpt 25 212.350 ± 0.134 ops/us > ListMorphism.sumSizesList12 thrpt 25 198.778 ± 0.479 ops/us > # 0.94x > > The experiment has a flag to change number of specialized List/Set/Map > classes (-Djdk.ImmutableCollections.specializations=0|1|2, default=2). > > 1 specialization (List1 + ListN, Set1 + SetN) is more or less the same as > 2, some ~1-2% improvements, mainly in sumSizes micros. > > 0 specializations (List-, Set, MapN only) achieves a small increase in > throughput on some micros by ensuring callsites stay monomorphic, but it's > not very substantial by my measures (~5%, but mostly in sumSizes micros). > > Keeping the footprint more or less the same, while evening out a few rough > edges and improving startup and static footprint seems like the overall > best option. An alternative would be to keep the experimental flag, but I > don't think a 5% gain on micros warrants the extra maintenance burden and > testing that entails. > > The proposed patch is more or less this experiment with 2 specializations, > but having removed the flag and code movement needed to support it removed > (along with a fix in the writeReplace methods of List12/Set12) > > Thanks! > > /Claes > > [1] Older experiments: > http://cr.openjdk.java.net/~redestad/immutable/list12N.0/ > -- simply merge L0 into LN (still have L1, L2 and LN) - nothing really > changed, though > > http://cr.openjdk.java.net/~redestad/immutable/list1N.0/ > -- L0 merged into LN, drop L2. Substantial performance boost on micros. > Footprint drop for 2-element lists. > > http://cr.openjdk.java.net/~redestad/immutable/listNdouble.0/ > -- L0+L1+L2+LN merged into one implementation. Same footprint with a > single class, but loses a lot on throughput in micros. > > http://cr.openjdk.java.net/~redestad/immutable/listNsingle.0/ > -- L0+L1+LN merged, drop L2. Simplification of the previous. Like the > list1N.0 experiment in footprint, but a loss in throughput on all measures. > > No approach seemed a win across the board here: we either had to accept a > footprint reduction, or see throughput suffer dramatically. > > [2] http://openjdk.java.net/projects/code-tools/jol/ >