Hello again,

after some excellent feedback, I have changed the EmptyStream implementation to contain state. This means we don't get the object allocation down to zero, but it is very close thanks to escape analysis. The speedup is impressive. For an empty ArrayList, we get the following improvements:

minimal - 2.13x faster:
stream.max(Integer::compare)

basic - 2.47x faster:
stream.filter(Objects::nonNull)
           .map(Function.identity())
           .max(Integer::compare),

complex - 4.75x faster:
stream.filter(Objects::nonNull)
           .map(Function.identity())
           .filter(Objects::nonNull)
           .sorted()
           .distinct()
           .max(Integer::compare)

crossover - 9.37x faster:
stream.filter(Objects::nonNull)
           .map(String::valueOf)
           .filter(s -> s.length() > 0)
           .mapToInt(Integer::parseInt)
           .map(i -> i * 2)
           .mapToLong(i -> i + 1000)
           .mapToDouble(i -> i * 3.5)
           .boxed()
           .mapToLong(Double::intValue)
           .mapToInt(d -> (int) d)
           .boxed()
           .max(Integer::compare)

Other collections like ConcurrentLinkedQueue, ConcurrentSkipListSet, CopyOnWriteArrayList, ConcurrentHashMap have similar speedups.

There is no detectable slowdown once we have non-empty streams, since the only extra instructions in those cases is an additional if (isEmpty()) call. Even for concurrent collections, the isEmpty() is fast.

There are still some issues that need to be solved, specifically lazy stream creation. However, besides that, the empty streams behave exactly as normal streams would in terms of characteristics and exceptions.

The jdk_util tests are still not working, as they are down-casting to AbstractPipeline. Since documentation on that is scarce, I would appreciate a bit of guidance on how to fix those.

Regards

Heinz
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" - www.javaspecialists.eu
Java Champion - www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz

On 2021/11/06 18:45, Dr Heinz M. Kabutz wrote:
Good evening,

a couple of months ago a fellow Java Champion told me that he had "banned" streams at his company, or at least discouraged their use. The reason was their high allocation rates with empty collections. With traditional for loops, if the collection is empty, then hardly any objects are allocated and it is very fast. But if we have a stream, then we first have to build up the entire pipeline, only to discover that we didn't need all those objects and throw them away again.

When communicating with Brian Goetz last week, I mentioned this to him and he suggested that perhaps we could have the stream() method inside Collection check whether it is empty, and if so, to return a specialized class EmptyStream that returns "this" for methods such as filter() and map(). I spent a bit of time trying to write such a class, together with EmptyIntStream, EmptyLongStream and EmptyDoubleStream. I've also written a set of tests that compare our Empty[Int|Long|Double]Streams to what would be returned with Stream[Int|Long|Double].empty(). I've also written a little benchmark to demonstrate its effectiveness.

You can see what I've done here:

https://github.com/openjdk/jdk/pull/6275

(I think I was premature in issuing the PR)

However, I have hit a brick wall with the way that the streams are currently being tested in the JDK. First off, there are several tests that make assumptions about how Stream is implemented and down-casts it to an AbstractPipeline. Since our EmptyStream is not an AbstractPipeline, the tests fail.

Secondly, with a normal stream, some of the methods can only be called once, for example filter() and map(). They return a new stream and we have to continue working with those. With my EmptyStream, since filter() and map() return "this", we would not get an exception if we continued using it.

Thirdly, with a normal stream, the method parallel() changes the state of the current stream, but then returns "this". In order to keep the EmptyStream consistent with the current Stream.empty() behavior, I return StreamSupport.stream(Spliterators.emptySpliterator(), true) from the parallel() method. Thus with the EmptyStream this is opposite to how it currently happens to work. The Javadocs say that the parallel() method "may return itself", but it does not have to, whereas the filter() method seems to suggest that it would be a new stream objects, but it also does not prescribe that it absolutely has to be.

How important is the white-box testing with the streams? And could we perhaps make special cases for empty streams?

Regards

Heinz

Reply via email to