Re: Collectors.toSet() small performance improvement

2016-08-17 Thread Paul Sandoz

> On 16 Aug 2016, at 20:08, Tagir F. Valeev  wrote:
> 
> Hello, Paul!
> 
> Thank you. Here's issue:
> https://bugs.openjdk.java.net/browse/JDK-8164189
> And webrev:
> http://cr.openjdk.java.net/~tvaleev/webrev/8164189/r1/
> 

Thanks will push today or tomorrow,
Paul.

> With best regards,
> Tagir Valeev.
> 
> PS> Hi Tagir,
> 
> PS> I like it. Please open an issue and i will sponsor the fix for you.
> 
> PS> Thanks,
> PS> Paul.
> 
>>> On 8 Aug 2016, at 22:31, Tagir F. Valeev  wrote:
>>> 
>>> Hello!
>>> 
>>> I'd like to propose a simple performance patch for Collectors.toSet():
>>> 
>>> --- a/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
>>> Aug 02 07:19:06 2016 +0530
>>> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
>>> Aug 09 11:47:20 2016 +0700
>>> @@ -295,7 +295,12 @@
>>>public static 
>>>Collector> toSet() {
>>>return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,
>>> -   (left, right) -> { left.addAll(right); 
>>> return left; },
>>> +   (left, right) -> {
>>> +   if (left.size() < right.size()) {
>>> +   right.addAll(left); return 
>>> right;
>>> +   }
>>> +   left.addAll(right); return left;
>>> +   },
>>>   CH_UNORDERED_ID);
>>>}
>>> 
>>> This will add the smaller set to the larger which may improve parallel
>>> performance when subtasks produced uneven results. In normal case the
>>> additional overhead is insignificant (additional HashSet size
>>> comparison per each combine operation which usually performed
>>> ~4*parallelism times).
>>> 
>>> Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
>>> Win7, JDK 9-ea+129 64bit) which confirms the speedup:
>>> http://cr.openjdk.java.net/~tvaleev/patches/toSet/
>>> 
>>> When data is unevenly distributed, it's significantly faster:
>>> 
>>> Original:
>>> ToSet.toSetParallel1  true  avgt   50 88,580 ±   0,874  
>>> us/op
>>> ToSet.toSetParallel   10  true  avgt   50676,243 ±  41,188  
>>> us/op
>>> ToSet.toSetParallel  100  true  avgt   50   9126,399 ±  83,260  
>>> us/op
>>> Patched:
>>> ToSet.toSetParallel1  true  avgt   50 62,121 ±   1,207  
>>> us/op
>>> ToSet.toSetParallel   10  true  avgt   50556,968 ±  37,404  
>>> us/op
>>> ToSet.toSetParallel  100  true  avgt   50   6572,372 ± 183,466  
>>> us/op
>>> 
>>> When data is evenly distributed, no statistically significant changes
>>> observed:
>>> 
>>> Original:
>>> ToSet.toSetParallel1 false  avgt   50177,137 ±5,941  
>>> us/op
>>> ToSet.toSetParallel   10 false  avgt   50   2056,332 ±   87,433  
>>> us/op
>>> ToSet.toSetParallel  100 false  avgt   50  51864,198 ±  560,883  
>>> us/op
>>> Patched:
>>> ToSet.toSetParallel1 false  avgt   50172,606 ±   7,321  
>>> us/op
>>> ToSet.toSetParallel   10 false  avgt   50   1922,478 ±  79,593  
>>> us/op
>>> ToSet.toSetParallel  100 false  avgt   50  52648,100 ± 621,526  
>>> us/op
>>> 
>>> What do you think? Should I open an issue?
>>> 
>>> With best regards,
>>> Tagir Valeev.
>>> 
> 



Re: Collectors.toSet() small performance improvement

2016-08-16 Thread Tagir F. Valeev
Hello, Paul!

Thank you. Here's issue:
https://bugs.openjdk.java.net/browse/JDK-8164189
And webrev:
http://cr.openjdk.java.net/~tvaleev/webrev/8164189/r1/

With best regards,
Tagir Valeev.

PS> Hi Tagir,

PS> I like it. Please open an issue and i will sponsor the fix for you.

PS> Thanks,
PS> Paul.

>> On 8 Aug 2016, at 22:31, Tagir F. Valeev  wrote:
>> 
>> Hello!
>> 
>> I'd like to propose a simple performance patch for Collectors.toSet():
>> 
>> --- a/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
>> Aug 02 07:19:06 2016 +0530
>> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
>> Aug 09 11:47:20 2016 +0700
>> @@ -295,7 +295,12 @@
>> public static 
>> Collector> toSet() {
>> return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,
>> -   (left, right) -> { left.addAll(right); 
>> return left; },
>> +   (left, right) -> {
>> +   if (left.size() < right.size()) {
>> +   right.addAll(left); return right;
>> +   }
>> +   left.addAll(right); return left;
>> +   },
>>CH_UNORDERED_ID);
>> }
>> 
>> This will add the smaller set to the larger which may improve parallel
>> performance when subtasks produced uneven results. In normal case the
>> additional overhead is insignificant (additional HashSet size
>> comparison per each combine operation which usually performed
>> ~4*parallelism times).
>> 
>> Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
>> Win7, JDK 9-ea+129 64bit) which confirms the speedup:
>> http://cr.openjdk.java.net/~tvaleev/patches/toSet/
>> 
>> When data is unevenly distributed, it's significantly faster:
>> 
>> Original:
>> ToSet.toSetParallel1  true  avgt   50 88,580 ±   0,874  us/op
>> ToSet.toSetParallel   10  true  avgt   50676,243 ±  41,188  us/op
>> ToSet.toSetParallel  100  true  avgt   50   9126,399 ±  83,260  us/op
>> Patched:
>> ToSet.toSetParallel1  true  avgt   50 62,121 ±   1,207  us/op
>> ToSet.toSetParallel   10  true  avgt   50556,968 ±  37,404  us/op
>> ToSet.toSetParallel  100  true  avgt   50   6572,372 ± 183,466  us/op
>> 
>> When data is evenly distributed, no statistically significant changes
>> observed:
>> 
>> Original:
>> ToSet.toSetParallel1 false  avgt   50177,137 ±5,941  
>> us/op
>> ToSet.toSetParallel   10 false  avgt   50   2056,332 ±   87,433  
>> us/op
>> ToSet.toSetParallel  100 false  avgt   50  51864,198 ±  560,883  
>> us/op
>> Patched:
>> ToSet.toSetParallel1 false  avgt   50172,606 ±   7,321  us/op
>> ToSet.toSetParallel   10 false  avgt   50   1922,478 ±  79,593  us/op
>> ToSet.toSetParallel  100 false  avgt   50  52648,100 ± 621,526  us/op
>> 
>> What do you think? Should I open an issue?
>> 
>> With best regards,
>> Tagir Valeev.
>> 



Re: Collectors.toSet() small performance improvement

2016-08-16 Thread Paul Sandoz
Hi Tagir,

I like it. Please open an issue and i will sponsor the fix for you.

Thanks,
Paul.

> On 8 Aug 2016, at 22:31, Tagir F. Valeev  wrote:
> 
> Hello!
> 
> I'd like to propose a simple performance patch for Collectors.toSet():
> 
> --- a/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
> Aug 02 07:19:06 2016 +0530
> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
> Aug 09 11:47:20 2016 +0700
> @@ -295,7 +295,12 @@
> public static 
> Collector> toSet() {
> return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,
> -   (left, right) -> { left.addAll(right); 
> return left; },
> +   (left, right) -> {
> +   if (left.size() < right.size()) {
> +   right.addAll(left); return right;
> +   }
> +   left.addAll(right); return left;
> +   },
>CH_UNORDERED_ID);
> }
> 
> This will add the smaller set to the larger which may improve parallel
> performance when subtasks produced uneven results. In normal case the
> additional overhead is insignificant (additional HashSet size
> comparison per each combine operation which usually performed
> ~4*parallelism times).
> 
> Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
> Win7, JDK 9-ea+129 64bit) which confirms the speedup:
> http://cr.openjdk.java.net/~tvaleev/patches/toSet/
> 
> When data is unevenly distributed, it's significantly faster:
> 
> Original:
> ToSet.toSetParallel1  true  avgt   50 88,580 ±   0,874  us/op
> ToSet.toSetParallel   10  true  avgt   50676,243 ±  41,188  us/op
> ToSet.toSetParallel  100  true  avgt   50   9126,399 ±  83,260  us/op
> Patched:
> ToSet.toSetParallel1  true  avgt   50 62,121 ±   1,207  us/op
> ToSet.toSetParallel   10  true  avgt   50556,968 ±  37,404  us/op
> ToSet.toSetParallel  100  true  avgt   50   6572,372 ± 183,466  us/op
> 
> When data is evenly distributed, no statistically significant changes
> observed:
> 
> Original:
> ToSet.toSetParallel1 false  avgt   50177,137 ±5,941  us/op
> ToSet.toSetParallel   10 false  avgt   50   2056,332 ±   87,433  us/op
> ToSet.toSetParallel  100 false  avgt   50  51864,198 ±  560,883  us/op
> Patched:
> ToSet.toSetParallel1 false  avgt   50172,606 ±   7,321  us/op
> ToSet.toSetParallel   10 false  avgt   50   1922,478 ±  79,593  us/op
> ToSet.toSetParallel  100 false  avgt   50  52648,100 ± 621,526  us/op
> 
> What do you think? Should I open an issue?
> 
> With best regards,
> Tagir Valeev.
> 



Re: Collectors.toSet() small performance improvement

2016-08-09 Thread Aleksey Shipilev
On 08/09/2016 10:19 AM, Tagir F. Valeev wrote:
> AS> Good trick, but does it work properly with the sets that care about the
> AS> add order, e.g. LinkedHashSet? I guess our saving grace here is
> AS> Collector.toSet() is declared UNORDERED, so we can disturb the encounter
> AS> order without violating the API contract.
> 
> This implementation is bound to HashSet which is created right here,
> in supplier:
> 
> return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,... )
> 
> No other set implementation is possible here. Users who want to use
> custom set should use Collectors.toCollection(LinkedHashSet::new)
> which I'm not going to change as indeed here order might be important.

Right. (seeps some green tea trying to wake up)

Looks good then.

Thanks,
-Aleksey



Re: Collectors.toSet() small performance improvement

2016-08-09 Thread Tagir F. Valeev
Hello, Aleksey!

Thank you for review.

AS> Good trick, but does it work properly with the sets that care about the
AS> add order, e.g. LinkedHashSet? I guess our saving grace here is
AS> Collector.toSet() is declared UNORDERED, so we can disturb the encounter
AS> order without violating the API contract.

This implementation is bound to HashSet which is created right here,
in supplier:

return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,... )

No other set implementation is possible here. Users who want to use
custom set should use Collectors.toCollection(LinkedHashSet::new)
which I'm not going to change as indeed here order might be important.

AS> I would make it a bit cleaner though:

AS> (left, right) -> {
AS> if (left.size() < right.size()) {
AS>right.addAll(left); return right;
AS> } else {
AS>left.addAll(right); return left;
AS> }
AS> }

Agreed.

With best regards,
Tagir Valeev.



Re: Collectors.toSet() small performance improvement

2016-08-09 Thread Aleksey Shipilev
On 08/09/2016 08:31 AM, Tagir F. Valeev wrote:
> --- a/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
> Aug 02 07:19:06 2016 +0530
> +++ b/src/java.base/share/classes/java/util/stream/Collectors.java  Tue 
> Aug 09 11:47:20 2016 +0700
> @@ -295,7 +295,12 @@
>  public static 
>  Collector> toSet() {
>  return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,
> -   (left, right) -> { left.addAll(right); 
> return left; },
> +   (left, right) -> {
> +   if (left.size() < right.size()) {
> +   right.addAll(left); return right;
> +   }
> +   left.addAll(right); return left;
> +   },
> CH_UNORDERED_ID);
>  }

Good trick, but does it work properly with the sets that care about the
add order, e.g. LinkedHashSet? I guess our saving grace here is
Collector.toSet() is declared UNORDERED, so we can disturb the encounter
order without violating the API contract.

I would make it a bit cleaner though:

(left, right) -> {
if (left.size() < right.size()) {
   right.addAll(left); return right;
} else {
   left.addAll(right); return left;
}
}

Thanks,
-Aleksey



Collectors.toSet() small performance improvement

2016-08-08 Thread Tagir F. Valeev
Hello!

I'd like to propose a simple performance patch for Collectors.toSet():

--- a/src/java.base/share/classes/java/util/stream/Collectors.java  Tue Aug 
02 07:19:06 2016 +0530
+++ b/src/java.base/share/classes/java/util/stream/Collectors.java  Tue Aug 
09 11:47:20 2016 +0700
@@ -295,7 +295,12 @@
 public static 
 Collector> toSet() {
 return new CollectorImpl<>((Supplier>) HashSet::new, Set::add,
-   (left, right) -> { left.addAll(right); 
return left; },
+   (left, right) -> {
+   if (left.size() < right.size()) {
+   right.addAll(left); return right;
+   }
+   left.addAll(right); return left;
+   },
CH_UNORDERED_ID);
 }

This will add the smaller set to the larger which may improve parallel
performance when subtasks produced uneven results. In normal case the
additional overhead is insignificant (additional HashSet size
comparison per each combine operation which usually performed
~4*parallelism times).

Here's simple JDK benchmark with results (Core-i7-4702MQ 4 HT cores,
Win7, JDK 9-ea+129 64bit) which confirms the speedup:
http://cr.openjdk.java.net/~tvaleev/patches/toSet/

When data is unevenly distributed, it's significantly faster:

Original:
ToSet.toSetParallel1  true  avgt   50 88,580 ±   0,874  us/op
ToSet.toSetParallel   10  true  avgt   50676,243 ±  41,188  us/op
ToSet.toSetParallel  100  true  avgt   50   9126,399 ±  83,260  us/op
Patched:
ToSet.toSetParallel1  true  avgt   50 62,121 ±   1,207  us/op
ToSet.toSetParallel   10  true  avgt   50556,968 ±  37,404  us/op
ToSet.toSetParallel  100  true  avgt   50   6572,372 ± 183,466  us/op

When data is evenly distributed, no statistically significant changes
observed:

Original:
ToSet.toSetParallel1 false  avgt   50177,137 ±5,941  us/op
ToSet.toSetParallel   10 false  avgt   50   2056,332 ±   87,433  us/op
ToSet.toSetParallel  100 false  avgt   50  51864,198 ±  560,883  us/op
Patched:
ToSet.toSetParallel1 false  avgt   50172,606 ±   7,321  us/op
ToSet.toSetParallel   10 false  avgt   50   1922,478 ±  79,593  us/op
ToSet.toSetParallel  100 false  avgt   50  52648,100 ± 621,526  us/op

What do you think? Should I open an issue?

With best regards,
Tagir Valeev.