Hi Tagir,

I like it. Please open an issue and i will sponsor the fix for you.

Thanks,
Paul.

> On 8 Aug 2016, at 22:31, Tagir F. Valeev <amae...@gmail.com> wrote:
> 
> Hello!
> 
> I'd like to propose a simple performance patch for Collectors.toSet():
> 
> --- a/src/java.base/share/classes/java/util/stream/Collectors.java      Tue 
> Aug 02 07:19:06 2016 +0530
> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java      Tue 
> Aug 09 11:47:20 2016 +0700
> @@ -295,7 +295,12 @@
>     public static <T>
>     Collector<T, ?, Set<T>> toSet() {
>         return new CollectorImpl<>((Supplier<Set<T>>) HashSet::new, Set::add,
> -                                   (left, right) -> { left.addAll(right); 
> return left; },
> +                                   (left, right) -> {
> +                                       if (left.size() < right.size()) {
> +                                           right.addAll(left); return right;
> +                                       }
> +                                       left.addAll(right); return left;
> +                                   },
>                                    CH_UNORDERED_ID);
>     }
> 
> This will add the smaller set to the larger which may improve parallel
> performance when subtasks produced uneven results. In normal case the
> additional overhead is insignificant (additional HashSet size
> comparison per each combine operation which usually performed
> ~4*parallelism times).
> 
> Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
> Win7, JDK 9-ea+129 64bit) which confirms the speedup:
> http://cr.openjdk.java.net/~tvaleev/patches/toSet/
> 
> When data is unevenly distributed, it's significantly faster:
> 
> Original:
> ToSet.toSetParallel    10000      true  avgt   50     88,580 ±   0,874  us/op
> ToSet.toSetParallel   100000      true  avgt   50    676,243 ±  41,188  us/op
> ToSet.toSetParallel  1000000      true  avgt   50   9126,399 ±  83,260  us/op
> Patched:
> ToSet.toSetParallel    10000      true  avgt   50     62,121 ±   1,207  us/op
> ToSet.toSetParallel   100000      true  avgt   50    556,968 ±  37,404  us/op
> ToSet.toSetParallel  1000000      true  avgt   50   6572,372 ± 183,466  us/op
> 
> When data is evenly distributed, no statistically significant changes
> observed:
> 
> Original:
> ToSet.toSetParallel    10000     false  avgt   50    177,137 ±    5,941  us/op
> ToSet.toSetParallel   100000     false  avgt   50   2056,332 ±   87,433  us/op
> ToSet.toSetParallel  1000000     false  avgt   50  51864,198 ±  560,883  us/op
> Patched:
> ToSet.toSetParallel    10000     false  avgt   50    172,606 ±   7,321  us/op
> ToSet.toSetParallel   100000     false  avgt   50   1922,478 ±  79,593  us/op
> ToSet.toSetParallel  1000000     false  avgt   50  52648,100 ± 621,526  us/op
> 
> What do you think? Should I open an issue?
> 
> With best regards,
> Tagir Valeev.
> 

Reply via email to