Re: 8148250 Stream.limit parallel ordered performance

Paul Sandoz Tue, 26 Jan 2016 09:51:11 -0800

> On 26 Jan 2016, at 16:51, Tagir F. Valeev <amae...@gmail.com> wrote:
> 
> Hello!
> 
> Thank you for review! Here's the issue:
> https://bugs.openjdk.java.net/browse/JDK-8148250
> Will post complete webrev later.
> 
> PS> Note that it is still the case that in almost all scenarios this
> PS> is likely to be a bad form of parallel stream.
> 
> Well not always it's possible to estimate in advance the size of the
> stream. Consider that we have user-specified filter upstream which
> searches over the big collection (we want to return first 10 search
> results, order is important):
> 
> data.parallelStream()
>    .filter(element -> userQuery.test(element))
>    .limit(10).collect(toList());
> 
> If user query produces 10-15 results, using parallel stream is very
> reasonable, but if it produces millions of results it should not
> regress very much (at least should not become much slower than
> sequential version which is what we see currently).


I have my doubts that the cost of splitting and filtering a small number of 
elements concurrently will pay off in the majority of scenarios, hence the 
“almost all”.

It could work in cases where there is lots of data to be filtered and only a 
few items are reported that are proportionally spread out, or over small data 
and the filter operation is costly.

In any case it’s good to avoid the OOME, i am very glad you found a simple way 
to resolve that.


> 
> PS> I think the comment you refer to still applies but now for larger n, so 
> we should refine it.
> 
> Should we replace "regardless of the value of n" with "when
> n*parallelismLevel is sufficiently large”?

Yes, when N * P is sufficiently large e.g to pluck a number out of the air > 
2^32, say

Paul.

Re: 8148250 Stream.limit parallel ordered performance

Reply via email to