Tagir, I slapped my forehead when I saw that StringBuilder's initialCapacity is the number of chars, and not added elements!
The CollectorOp class I added for ReferencePipeline.collect(Collector) is so a concurrent + unordered collector may be pre-sized. I thought it was needed for completeness (let concurrent collectors benefit from pre-sizing just like serial). But maybe not worth the change? I hope I can change your mind about the spec change. We agree that pre-sizing collections is important, and the benchmarks show that Streams supporting pre-sizing can reduce allocations (up to 60% for ArrayLists) while often improving throughput. The question then is if the ends are worth the means? Take a look at this (non-exhaustive) list of Collections that stand to benefit from having Collector.sizedSupplier(): - Eclipse Collections. The library provides its own Collectors2 utility class [1], and the authors have high performance / low memory usage as a goal. - MutableList - Used in Collectors2.toList(). Collector.sizedSupplier() could be specified with Lists.mutable.ofInitialCapacity(int). - BooleanArrayList(int initialCapacity) - HashBag(int size) - BooleanList, IntList, etc. - Guava. A quick google search found libraries like https://github.com/yanaga/guava-stream that aggregate Collectors for guava collections. - The FutureJoiner I shared earlier - JDK classes that don't have Collectors factory method (ex. ArrayDeque(int numElements)) Thanks Tagir for the comments, and I updated the webrev: http://august.nagro.us/presized-collectors/webrev2/index.html - August [1]: https://www.eclipse.org/collections/javadoc/9.2.0/org/eclipse/collections/impl/collector/Collectors2.html On Sun, Mar 3, 2019 at 9:08 PM Tagir Valeev <amae...@gmail.com> wrote: > Hello August. > > I'm not supporting the proposed spec change, but I have few comments > about implementation (in case OpenJDK reviewers would support it). > > 1. joining(): Setting initial StringBuilder size to the number of > characters which is equivalent to the number of stream elements would > be a good idea only under assumption that every stream element has the > length 1. Which is wrong in 99.9% of real-life cases. > 2. flatMapping(), filtering(): Bypassing the original size to the > downstream collector is plain wrong here as number of resulting > elements can be either bigger (in flatMapping case) or arbitrarily > smaller (in both cases) than the stream size. In the latter case you > may just waste big amount of memory: a stream of million elements > could be collapsed to 1-2 elements, but you would still allocate an > ArrayList of a million of elements. > 3. toMap(): a HashMap constructor parameter is an initial capacity, > not the number of elements. Supplying it and ignoring the load factor > you may force the HashMap to resize the table once which is > suboptimal. You should pass an estimated number of elements divided by > default load factor (and check corner cases as well). > 4. teeing(): you defer null-check to the actual call of supplier or > sizedSupplier. This is very bad as broken collector (e.g. returning > null from sizedSupplier) may not fail fast for a long time if it's > never used for sized stream. > 5. ReduceOps: also you defer a null-check and capture not the > function, but Collector object itself. This produces subtle behavioral > change for no reason. > 6. ReferencePipeline::collect: I don't see why such huge change is > necessary. > > With best regards, > Tagir Valeev. > > On Mon, Mar 4, 2019 at 7:36 AM August Nagro <augustna...@gmail.com> wrote: > > > > Hi everyone, > > > > My implementation is at > https://github.com/AugustNagro/presize-collectors-bench and I have a > webrev here: http://august.nagro.us/presized-collectors/webrev/. > > > > The changes that I made were: > > > > Add `default IntFunction<A> sizedSupplier()` to Collector > > Add 2 new Collector.of helper methods taking a sized supplier > > Update the applicable collectors in Collectors to provide a sizedSupplier > > Have ReferencePipeline & ReduceOps call sizedSupplier when appropriate > > > > Some notes/questions I have: > > > > I considered adding Collector.Characteristics.PRESIZEABLE, but figured > it was not needed given that sizedSupplier should always delegate to > supplier. > > Collector.sizedSupplier could be a LongFunction instead of IntFunction > (since Sink.begin provides long sizes), but every Collection type I've > looked at uses int for initialCapacity, so I think IntFunction makes more > sense. > > StringJoiner should be updated to take an initalCapacity (should I > commit this change in the webrev, or leave for another enhancement?) > > What tests should I add for this enhancement? > > > > There are some benchmarks in the github repo. I was initially surprised > when I saw that my bare-bones parallel stream did not have a throughput > improvement like the serial one, but after doing some debugging I think it > is from Collector.combiner dominating the runtime. > > > > Regards, > > > > August > > > > On Thu, Feb 28, 2019 at 12:56 AM Tagir Valeev <amae...@gmail.com> wrote: > >> > >> Hello! > >> > >> I wouldn't use presized HashSet, because you never know how many > duplicates are expected. What if the input stream has a million of > elements, but only two of them are distinct? Do you really want to allocate > a hash table for million elements in advance? > >> > >> For toMap() without custom supplier and merger preallocation sounds > reasonable as in this case duplicating key is an exceptional case, so we > may expect a specific number of elements. > >> > >> чт, 28 февр. 2019 г., 11:38 August Nagro augustna...@gmail.com: > >>> > >>> Tagir, > >>> > >>> Great to see the validating benchmarks. > >>> > >>> > I think it's the best solution given the fact that very few > collectors may benefit from the exact known size, and this benefit usually > disappears when collectors are composed (e.g. using groupingBy: downstream > toList() would not win anything if it provides a sizedSupplier). > >>> > >>> I would like to see your benchmarks for that statement too. After all, > Hashmap, HashSet, etc all have presizing constructors, aren't those there > for a reason? > >>> > >>> So I am still convinced that presizing is important for other > collector types, including custom collectors. Although I'm open to having > my mind changed. > >>> > >>> I'm happy to contribute an implementation as well, I was hoping that > this this (smallish) patch would help me learn the OpenJdk process and make > future contributions easier! > >>> > >>> - August > >>> > >>> On Wed, Feb 27, 2019 at 1:06 AM Tagir Valeev <amae...@gmail.com> > wrote: > >>>> > >>>> Hello! > >>>> > >>>> > A less intrusive API direction might be a version of Collector whose > >>>> > supplier function took a size-estimate argument; this might even > help in > >>>> > parallel since it allows for intermediate results to start with a > better > >>>> > initial size guess. (And this could be implemented as a default > that > >>>> > delegates to the existing supplier.) Still, not really sure this > >>>> > carries its weight. > >>>> > >>>> It's interesting that such optimization is possible without API > >>>> change, using the "hidden knowledge". > >>>> While public API stays the same, the collect method implementation may > >>>> check if custom > >>>> non-public Collector implementation is supplied, and in this case may > >>>> use the sized supplier. > >>>> I think it's the best solution given the fact that very few collectors > >>>> may benefit from the exact > >>>> known size, and this benefit usually disappears when collectors are > >>>> composed (e.g. using groupingBy: > >>>> downstream toList() would not win anything if it provides a > sizedSupplier). > >>>> > >>>> I created a quick patch and launched a quick benchmark like this: > >>>> return new SplittableRandom(0).ints(length, 0, > >>>> 100).boxed().collect(Collectors.toList()); > >>>> For length = 100, 10000, 1000000 > >>>> > >>>> Here's results for vanilla 13-ea+8: > >>>> length = 100, avg time = 1434,469 ± 27,147 ns/op, alloc rate = 1712 > B/op > >>>> length = 10000, avg time = 155032,390 ± 2657,927 ns/op, alloc rate = > >>>> 169280 B/op > >>>> length = 1000000, avg time = 27099621,763 ± 1979054,500 ns/op, alloc > >>>> rate = 14586750 B/op > >>>> > >>>> Patched: > >>>> length = 100, avg time = 1414,480 ± 30,040 ns/op, alloc rate = 768 > B/op > >>>> length = 10000, avg time = 137338,320 ± 1316,789 ns/op, alloc rate = > 40368 B/op > >>>> length = 1000000, avg time = 17654635,871 ± 210607,441 ns/op, alloc > >>>> rate = 4000382 B/op > >>>> > >>>> As you can see, exact allocation really helps to reduce alloc pressure > >>>> and produces better performance for big lists. > >>>> > >>>> If such kind of patch is acceptable for Stream API, I can file a > >>>> ticket and submit a webrev. > >>>> > >>>> With best regards, > >>>> Tagir Valeev > >>>> > >>>> The patch follows: > >>>> > >>>> --- src/java.base/share/classes/java/util/stream/Collectors.java > >>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) > >>>> +++ src/java.base/share/classes/java/util/stream/Collectors.java > >>>> (revision 53626+:e2fc434b410a+) > >>>> @@ -50,6 +50,7 @@ > >>>> import java.util.function.BinaryOperator; > >>>> import java.util.function.Consumer; > >>>> import java.util.function.Function; > >>>> +import java.util.function.IntFunction; > >>>> import java.util.function.Predicate; > >>>> import java.util.function.Supplier; > >>>> import java.util.function.ToDoubleFunction; > >>>> @@ -194,23 +195,34 @@ > >>>> */ > >>>> static class CollectorImpl<T, A, R> implements Collector<T, A, > R> { > >>>> private final Supplier<A> supplier; > >>>> + private final IntFunction<A> sizedSupplier; > >>>> private final BiConsumer<A, T> accumulator; > >>>> private final BinaryOperator<A> combiner; > >>>> private final Function<A, R> finisher; > >>>> private final Set<Characteristics> characteristics; > >>>> > >>>> CollectorImpl(Supplier<A> supplier, > >>>> + IntFunction<A> sizedSupplier, > >>>> BiConsumer<A, T> accumulator, > >>>> BinaryOperator<A> combiner, > >>>> Function<A,R> finisher, > >>>> Set<Characteristics> characteristics) { > >>>> this.supplier = supplier; > >>>> + this.sizedSupplier = sizedSupplier; > >>>> this.accumulator = accumulator; > >>>> this.combiner = combiner; > >>>> this.finisher = finisher; > >>>> this.characteristics = characteristics; > >>>> } > >>>> > >>>> + CollectorImpl(Supplier<A> supplier, > >>>> + BiConsumer<A, T> accumulator, > >>>> + BinaryOperator<A> combiner, > >>>> + Function<A,R> finisher, > >>>> + Set<Characteristics> characteristics) { > >>>> + this(supplier, null, accumulator, combiner, finisher, > >>>> characteristics); > >>>> + } > >>>> + > >>>> CollectorImpl(Supplier<A> supplier, > >>>> BiConsumer<A, T> accumulator, > >>>> BinaryOperator<A> combiner, > >>>> @@ -228,6 +240,10 @@ > >>>> return supplier; > >>>> } > >>>> > >>>> + IntFunction<A> sizedSupplier() { > >>>> + return sizedSupplier; > >>>> + } > >>>> + > >>>> @Override > >>>> public BinaryOperator<A> combiner() { > >>>> return combiner; > >>>> @@ -275,8 +291,11 @@ > >>>> */ > >>>> public static <T> > >>>> Collector<T, ?, List<T>> toList() { > >>>> - return new CollectorImpl<>((Supplier<List<T>>) > >>>> ArrayList::new, List::add, > >>>> + return new CollectorImpl<>((Supplier<List<T>>) > ArrayList::new, > >>>> + ArrayList::new, > >>>> + List::add, > >>>> (left, right) -> { > >>>> left.addAll(right); return left; }, > >>>> + castingIdentity(), > >>>> CH_ID); > >>>> } > >>>> > >>>> Index: src/java.base/share/classes/java/util/stream/ReduceOps.java > >>>> IDEA additional info: > >>>> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > >>>> <+>UTF-8 > >>>> =================================================================== > >>>> --- src/java.base/share/classes/java/util/stream/ReduceOps.java > >>>> (revision 53626:e2fc434b410a35b28ab433c29863c8a26e4e813a) > >>>> +++ src/java.base/share/classes/java/util/stream/ReduceOps.java > >>>> (revision 53626+:e2fc434b410a+) > >>>> @@ -36,6 +36,7 @@ > >>>> import java.util.function.BinaryOperator; > >>>> import java.util.function.DoubleBinaryOperator; > >>>> import java.util.function.IntBinaryOperator; > >>>> +import java.util.function.IntFunction; > >>>> import java.util.function.LongBinaryOperator; > >>>> import java.util.function.ObjDoubleConsumer; > >>>> import java.util.function.ObjIntConsumer; > >>>> @@ -155,13 +156,20 @@ > >>>> public static <T, I> TerminalOp<T, I> > >>>> makeRef(Collector<? super T, I, ?> collector) { > >>>> Supplier<I> supplier = > Objects.requireNonNull(collector).supplier(); > >>>> + @SuppressWarnings("unchecked") > >>>> + IntFunction<I> sizedSupplier = collector instanceof > >>>> Collectors.CollectorImpl ? > >>>> + ((Collectors.CollectorImpl<?, I, ?>) > >>>> collector).sizedSupplier() : null; > >>>> BiConsumer<I, ? super T> accumulator = > collector.accumulator(); > >>>> BinaryOperator<I> combiner = collector.combiner(); > >>>> class ReducingSink extends Box<I> > >>>> implements AccumulatingSink<T, I, ReducingSink> { > >>>> @Override > >>>> public void begin(long size) { > >>>> - state = supplier.get(); > >>>> + if (sizedSupplier != null && size >= 0 && size <= > >>>> Integer.MAX_VALUE) { > >>>> + state = sizedSupplier.apply((int) size); > >>>> + } else { > >>>> + state = supplier.get(); > >>>> + } > >>>> } > >>>> > >>>> @Override >