EmptyStream to boost performance

Dr Heinz M. Kabutz Sat, 06 Nov 2021 09:46:05 -0700

Good evening,

a couple of months ago a fellow Java Champion told me that he had"banned" streams at his company, or at least discouraged their use. Thereason was their high allocation rates with empty collections. Withtraditional for loops, if the collection is empty, then hardly anyobjects are allocated and it is very fast. But if we have a stream, thenwe first have to build up the entire pipeline, only to discover that wedidn't need all those objects and throw them away again.

When communicating with Brian Goetz last week, I mentioned this to himand he suggested that perhaps we could have the stream() method insideCollection check whether it is empty, and if so, to return a specializedclass EmptyStream that returns "this" for methods such as filter() andmap(). I spent a bit of time trying to write such a class, together withEmptyIntStream, EmptyLongStream and EmptyDoubleStream. I've also writtena set of tests that compare our Empty[Int|Long|Double]Streams to whatwould be returned with Stream[Int|Long|Double].empty(). I've alsowritten a little benchmark to demonstrate its effectiveness.


You can see what I've done here:

https://github.com/openjdk/jdk/pull/6275

(I think I was premature in issuing the PR)

However, I have hit a brick wall with the way that the streams arecurrently being tested in the JDK. First off, there are several teststhat make assumptions about how Stream is implemented and down-casts itto an AbstractPipeline. Since our EmptyStream is not anAbstractPipeline, the tests fail.

Secondly, with a normal stream, some of the methods can only be calledonce, for example filter() and map(). They return a new stream and wehave to continue working with those. With my EmptyStream, since filter()and map() return "this", we would not get an exception if we continuedusing it.

Thirdly, with a normal stream, the method parallel() changes the stateof the current stream, but then returns "this". In order to keep theEmptyStream consistent with the current Stream.empty() behavior, Ireturn StreamSupport.stream(Spliterators.emptySpliterator(), true) fromthe parallel() method. Thus with the EmptyStream this is opposite to howit currently happens to work. The Javadocs say that the parallel()method "may return itself", but it does not have to, whereas thefilter() method seems to suggest that it would be a new stream objects,but it also does not prescribe that it absolutely has to be.

How important is the white-box testing with the streams? And could weperhaps make special cases for empty streams?


Regards

Heinz
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" - www.javaspecialists.eu
Java Champion - www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz

EmptyStream to boost performance

Reply via email to