Re: EmptyStream to boost performance

Dr Heinz M. Kabutz Sat, 13 Nov 2021 05:58:39 -0800

Hello again,

after some excellent feedback, I have changed the EmptyStreamimplementation to contain state. This means we don't get the objectallocation down to zero, but it is very close thanks to escape analysis.The speedup is impressive. For an empty ArrayList, we get the followingimprovements:


minimal - 2.13x faster:
stream.max(Integer::compare)

basic - 2.47x faster:
stream.filter(Objects::nonNull)
           .map(Function.identity())
           .max(Integer::compare),

complex - 4.75x faster:
stream.filter(Objects::nonNull)
           .map(Function.identity())
           .filter(Objects::nonNull)
           .sorted()
           .distinct()
           .max(Integer::compare)

crossover - 9.37x faster:
stream.filter(Objects::nonNull)
           .map(String::valueOf)
           .filter(s -> s.length() > 0)
           .mapToInt(Integer::parseInt)
           .map(i -> i * 2)
           .mapToLong(i -> i + 1000)
           .mapToDouble(i -> i * 3.5)
           .boxed()
           .mapToLong(Double::intValue)
           .mapToInt(d -> (int) d)
           .boxed()
           .max(Integer::compare)

Other collections like ConcurrentLinkedQueue, ConcurrentSkipListSet,CopyOnWriteArrayList, ConcurrentHashMap have similar speedups.

There is no detectable slowdown once we have non-empty streams, sincethe only extra instructions in those cases is an additional if(isEmpty()) call. Even for concurrent collections, the isEmpty() is fast.

There are still some issues that need to be solved, specifically lazystream creation. However, besides that, the empty streams behave exactlyas normal streams would in terms of characteristics and exceptions.

The jdk_util tests are still not working, as they are down-casting toAbstractPipeline. Since documentation on that is scarce, I wouldappreciate a bit of guidance on how to fix those.


Regards

Heinz
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" - www.javaspecialists.eu
Java Champion - www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz

On 2021/11/06 18:45, Dr Heinz M. Kabutz wrote:

Good evening,
a couple of months ago a fellow Java Champion told me that he had"banned" streams at his company, or at least discouraged their use.The reason was their high allocation rates with empty collections.With traditional for loops, if the collection is empty, then hardlyany objects are allocated and it is very fast. But if we have astream, then we first have to build up the entire pipeline, only todiscover that we didn't need all those objects and throw them away again.
When communicating with Brian Goetz last week, I mentioned this to himand he suggested that perhaps we could have the stream() method insideCollection check whether it is empty, and if so, to return aspecialized class EmptyStream that returns "this" for methods such asfilter() and map(). I spent a bit of time trying to write such aclass, together with EmptyIntStream, EmptyLongStream andEmptyDoubleStream. I've also written a set of tests that compare ourEmpty[Int|Long|Double]Streams to what would be returned withStream[Int|Long|Double].empty(). I've also written a little benchmarkto demonstrate its effectiveness.
You can see what I've done here:

https://github.com/openjdk/jdk/pull/6275

(I think I was premature in issuing the PR)
However, I have hit a brick wall with the way that the streams arecurrently being tested in the JDK. First off, there are several teststhat make assumptions about how Stream is implemented and down-castsit to an AbstractPipeline. Since our EmptyStream is not anAbstractPipeline, the tests fail.
Secondly, with a normal stream, some of the methods can only be calledonce, for example filter() and map(). They return a new stream and wehave to continue working with those. With my EmptyStream, sincefilter() and map() return "this", we would not get an exception if wecontinued using it.
Thirdly, with a normal stream, the method parallel() changes the stateof the current stream, but then returns "this". In order to keep theEmptyStream consistent with the current Stream.empty() behavior, Ireturn StreamSupport.stream(Spliterators.emptySpliterator(), true)from the parallel() method. Thus with the EmptyStream this is oppositeto how it currently happens to work. The Javadocs say that theparallel() method "may return itself", but it does not have to,whereas the filter() method seems to suggest that it would be a newstream objects, but it also does not prescribe that it absolutely hasto be.
How important is the white-box testing with the streams? And could weperhaps make special cases for empty streams?
Regards

Heinz

Re: EmptyStream to boost performance

Reply via email to