Hi everyone, My implementation is at https://github.com/AugustNagro/presize-collectors-bench and I have a webrev here: http://august.nagro.us/presized-collectors/webrev/.
The changes that I made were: - Add `default IntFunction<A> sizedSupplier()` to Collector - Add 2 new Collector.of helper methods taking a sized supplier - Update the applicable collectors in Collectors to provide a sizedSupplier - Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate Some notes/questions I have: - I considered adding Collector.Characteristics.PRESIZEABLE, but figured it was not needed given that sizedSupplier should always delegate to supplier. - Collector.sizedSupplier could be a LongFunction instead of IntFunction (since Sink.begin provides long sizes), but every Collection type I've looked at uses int for initialCapacity, so I think IntFunction makes more sense. - StringJoiner should be updated to take an initalCapacity (should I commit this change in the webrev, or leave for another enhancement?) - What tests should I add for this enhancement? There are some benchmarks in the github repo. I was initially surprised when I saw that my bare-bones parallel stream did not have a throughput improvement like the serial one, but after doing some debugging I think it is from Collector.combiner dominating the runtime. Regards, August On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amae...@gmail.com> wrote: > Hello! > > I wouldn't use presized HashSet, because you never know how many > duplicates are expected. What if the input stream has a million of > elements, but only two of them are distinct? Do you really want to allocate > a hash table for million elements in advance? > > For toMap() without custom supplier and merger preallocation sounds > reasonable as in this case duplicating key is an exceptional case, so we > may expect a specific number of elements. > > чт, 28 февр. 2019 г., 11:38 August Nagro augustna...@gmail.com: > >> Tagir, >> >> Great to see the validating benchmarks. >> >> > I think it's the best solution given the fact that very few collectors may >> benefit from the exact known size, and this benefit usually disappears >> when collectors are composed (e.g. using groupingBy: downstream toList() >> would not win anything if it provides a sizedSupplier). >> >> I would like to see your benchmarks for that statement too. After all, >> Hashmap, HashSet, etc all have presizing constructors, aren't those there >> for a reason? >> >> So I am still convinced that presizing is important for other collector >> types, including custom collectors. Although I'm open to having my mind >> changed. >> >> I'm happy to contribute an implementation as well, I was hoping that >> this this (smallish) patch would help me learn the OpenJdk process and make >> future contributions easier! >> >> - August >> >> On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amae...@gmail.com> wrote: >> >>> Hello! >>> >>> > A less intrusive API direction might be a version of Collector whose >>> > supplier function took a size-estimate argument; this might even help >>> in >>> > parallel since it allows for intermediate results to start with a >>> better >>> > initial size guess. (And this could be implemented as a default that >>> > delegates to the existing supplier.) Still, not really sure this >>> > carries its weight. >>> >>> It's interesting that such optimization is possible without API >>> change, using the "hidden knowledge". >>> While public API stays the same, the collect method implementation may >>> check if custom >>> non-public Collector implementation is supplied, and in this case may >>> use the sized supplier. >>> I think it's the best solution given the fact that very few collectors >>> may benefit from the exact >>> known size, and this benefit usually disappears when collectors are >>> composed (e.g. using groupingBy: >>> downstream toList() would not win anything if it provides a >>> sizedSupplier). >>> >>> I created a quick patch and launched a quick benchmark like this: >>> return new SplittableRandom(0).ints(length, 0, >>> 100).boxed().collect(Collectors.toList()); >>> For length = 100, 10000, 1000000 >>> >>> Here's results for vanilla 13-ea+8: >>> length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 B/op >>> length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = >>> 169280 B/op >>> length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc >>> rate = 14586750 B/op >>> >>> Patched: >>> length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 B/op >>> length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = >>> 40368 B/op >>> length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc >>> rate = 4000382 B/op >>> >>> As you can see, exact allocation really helps to reduce alloc pressure >>> and produces better performance for big lists. >>> >>> If such kind of patch is acceptable for Stream API, I can file a >>> ticket and submit a webrev. >>> >>> With best regards, >>> Tagir Valeev >>> >>> The patch follows: >>> >>> --- src/java.base/share/classes/java/util/stream/Collectors.java >>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) >>> +++ src/java.base/share/classes/java/util/stream/Collectors.java >>> (revision 53626+:e2fc434b410a+) >>> @@ -50,6 +50,7 @@ >>> import java.util.function.BinaryOperator; >>> import java.util.function.Consumer; >>> import java.util.function.Function; >>> +import java.util.function.IntFunction; >>> import java.util.function.Predicate; >>> import java.util.function.Supplier; >>> import java.util.function.ToDoubleFunction; >>> @@ -194,23 +195,34 @@ >>> */ >>> static class CollectorImpl<T, A, R> implements Collector<T, A, R> { >>> private final Supplier<A> supplier; >>> + private final IntFunction<A> sizedSupplier; >>> private final BiConsumer<A, T> accumulator; >>> private final BinaryOperator<A> combiner; >>> private final Function<A, R> finisher; >>> private final Set<Characteristics> characteristics; >>> >>> CollectorImpl(Supplier<A> supplier, >>> + IntFunction<A> sizedSupplier, >>> BiConsumer<A, T> accumulator, >>> BinaryOperator<A> combiner, >>> Function<A,R> finisher, >>> Set<Characteristics> characteristics) { >>> this.supplier = supplier; >>> + this.sizedSupplier = sizedSupplier; >>> this.accumulator = accumulator; >>> this.combiner = combiner; >>> this.finisher = finisher; >>> this.characteristics = characteristics; >>> } >>> >>> + CollectorImpl(Supplier<A> supplier, >>> + BiConsumer<A, T> accumulator, >>> + BinaryOperator<A> combiner, >>> + Function<A,R> finisher, >>> + Set<Characteristics> characteristics) { >>> + this(supplier, null, accumulator, combiner, finisher, >>> characteristics); >>> + } >>> + >>> CollectorImpl(Supplier<A> supplier, >>> BiConsumer<A, T> accumulator, >>> BinaryOperator<A> combiner, >>> @@ -228,6 +240,10 @@ >>> return supplier; >>> } >>> >>> + IntFunction<A> sizedSupplier() { >>> + return sizedSupplier; >>> + } >>> + >>> @Override >>> public BinaryOperator<A> combiner() { >>> return combiner; >>> @@ -275,8 +291,11 @@ >>> */ >>> public static <T> >>> Collector<T, ?, List<T>> toList() { >>> - return new CollectorImpl<>((Supplier<List<T>>) >>> ArrayList::new, List::add, >>> + return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, >>> + ArrayList::new, >>> + List::add, >>> (left, right) -> { >>> left.addAll(right); return left; }, >>> + castingIdentity(), >>> CH_ID); >>> } >>> >>> Index: src/java.base/share/classes/java/util/stream/ReduceOps.java >>> IDEA additional info: >>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP >>> <+>UTF-8 >>> =================================================================== >>> --- src/java.base/share/classes/java/util/stream/ReduceOps.java >>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) >>> +++ src/java.base/share/classes/java/util/stream/ReduceOps.java >>> (revision 53626+:e2fc434b410a+) >>> @@ -36,6 +36,7 @@ >>> import java.util.function.BinaryOperator; >>> import java.util.function.DoubleBinaryOperator; >>> import java.util.function.IntBinaryOperator; >>> +import java.util.function.IntFunction; >>> import java.util.function.LongBinaryOperator; >>> import java.util.function.ObjDoubleConsumer; >>> import java.util.function.ObjIntConsumer; >>> @@ -155,13 +156,20 @@ >>> public static <T, I> TerminalOp<T, I> >>> makeRef(Collector<? super T, I, ?> collector) { >>> Supplier<I> supplier = >>> Objects.requireNonNull(collector).supplier(); >>> + @SuppressWarnings("unchecked") >>> + IntFunction<I> sizedSupplier = collector instanceof >>> Collectors.CollectorImpl ? >>> + ((Collectors.CollectorImpl<?, I, ?>) >>> collector).sizedSupplier() : null; >>> BiConsumer<I, ? super T> accumulator = collector.accumulator(); >>> BinaryOperator<I> combiner = collector.combiner(); >>> class ReducingSink extends Box<I> >>> implements AccumulatingSink<T, I, ReducingSink> { >>> @Override >>> public void begin(long size) { >>> - state = supplier.get(); >>> + if (sizedSupplier != null && size >= 0 && size <= >>> Integer.MAX_VALUE) { >>> + state = sizedSupplier.apply((int) size); >>> + } else { >>> + state = supplier.get(); >>> + } >>> } >>> >>> @Override >>> >>