Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]

2021-04-05 Thread Jorn Vernee
On Mon, 5 Apr 2021 15:39:39 GMT, Vladimir Ivanov  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Review comment: Rearrange code to explicitly reference erased setter
>
> Looks good.

Thanks for the reviews. I've submitted one more tier1-3 test run and if that's 
all green I'll go ahead and integrate this.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]

2021-04-05 Thread Vladimir Ivanov
On Mon, 5 Apr 2021 14:16:37 GMT, Jorn Vernee  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comment: Rearrange code to explicitly reference erased setter

Looks good.

-

Marked as reviewed by vlivanov (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Jorn Vernee
On Mon, 5 Apr 2021 12:48:07 GMT, Vladimir Ivanov  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Revert back to injected constructor handle
>>   - Add lambda form sharing
>>   - Add test case for collecting a custom class
>
> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1883:
> 
>> 1881: 
>> 1882: MethodHandle newArray = 
>> MethodHandles.arrayConstructor(arrayType);
>> 1883: MethodHandle storeFunc = ArrayAccessor.getAccessor(arrayType, 
>> ArrayAccess.SET);
> 
> I'd move `storeFunc` into `makeCollectorForm()` and use erased accessor there 
> (`ArrayAccessor.OBJECT_ARRAY_SETTER`). 
> 
> (It is interesting that `OBJECT_ARRAY_SETTER` is `ARRAY_STORE` intrinsic.)

Done. I think it's also more obvious this way that the erased setter is being 
used. Good suggestion.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]

2021-04-05 Thread Jorn Vernee
> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

Jorn Vernee has updated the pull request incrementally with one additional 
commit since the last revision:

  Review comment: Rearrange code to explicitly reference erased setter

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3306/files
  - new: https://git.openjdk.java.net/jdk/pull/3306/files/dbba7910..72ccf282

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=02-03

  Stats: 11 lines in 1 file changed: 6 ins; 2 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3306.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Vladimir Ivanov
On Mon, 5 Apr 2021 11:57:08 GMT, Jorn Vernee  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   - Revert back to injected constructor handle
>   - Add lambda form sharing
>   - Add test case for collecting a custom class

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1883:

> 1881: 
> 1882: MethodHandle newArray = 
> MethodHandles.arrayConstructor(arrayType);
> 1883: MethodHandle storeFunc = ArrayAccessor.getAccessor(arrayType, 
> ArrayAccess.SET);

I'd move `storeFunc` into `makeCollectorForm()` and use erased accessor there 
(`ArrayAccessor.OBJECT_ARRAY_SETTER`). 

(It is interesting that `OBJECT_ARRAY_SETTER` is `ARRAY_STORE` intrinsic.)

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Vladimir Ivanov
On Mon, 5 Apr 2021 11:57:08 GMT, Jorn Vernee  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   - Revert back to injected constructor handle
>   - Add lambda form sharing
>   - Add test case for collecting a custom class

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1909:

> 1907: boolean isSharedLambdaForm = parameterCount == 0 || 
> basicType.parameterType(0) == Object.class;
> 1908: if (isSharedLambdaForm) {
> 1909: LambdaForm lform = 
> basicType.form().cachedLambdaForm(MethodTypeForm.LF_COLLECTOR);

How does sharing work w.r.t. array store check? Unless you put `storeFunc` on 
the BMH, I don't see how sharing preserves proper check.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Vladimir Ivanov
On Mon, 5 Apr 2021 12:37:06 GMT, Vladimir Ivanov  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Revert back to injected constructor handle
>>   - Add lambda form sharing
>>   - Add test case for collecting a custom class
>
> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1909:
> 
>> 1907: boolean isSharedLambdaForm = parameterCount == 0 || 
>> basicType.parameterType(0) == Object.class;
>> 1908: if (isSharedLambdaForm) {
>> 1909: LambdaForm lform = 
>> basicType.form().cachedLambdaForm(MethodTypeForm.LF_COLLECTOR);
> 
> How does sharing work w.r.t. array store check? Unless you put `storeFunc` on 
> the BMH, I don't see how sharing preserves proper check.

Nevermind, it's the type of the method handle produced by `makeCollector` which 
ensures that all the arguments have proper type.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Jorn Vernee
On Fri, 2 Apr 2021 13:23:07 GMT, Jorn Vernee  wrote:

>> That's an elegant solution.
>> 
>> At first i thought it might unduly perturb lambda form generation and 
>> caching. but you slotted a different lambda form implementation underneath 
>> the varargs implementation.
>
> I've addressed review comments, plus some other things:
> 
> - I realized that I was calling the uncached version of the store function 
> factory. Fixed that.
> - I also realized that there's already an `ARRAY_STORE` intrinsic, which I'm 
> now using to avoid generating a call.
> - I also realized that since we only have 1 array creation handle per lambda 
> form, we can instead generate a direct call to `Array::newInstance` instead 
> of going through the array constructor handle (which also avoids having to 
> use a BoundMethodHandle).
> - Finally, I added an instrinsic, under the old `NEW_ARRAY` name, that 
> intrinsifies a call to `Array::newInstance` if the component type argument is 
> constant (which it is in this case).
> 
> As a result, the lambda form is now fully intrinsified (no more calls in the 
> generated bytecode) e.g.:
> 
> static java.lang.Object collector001__L(java.lang.Object, 
> java.lang.Object, java.lang.Object, java.lang.Object);
> Code:
>0: iconst_3
>1: anewarray #12 // class java/lang/String
>4: astore4
>6: aload 4
>8: checkcast #14 // class "[Ljava/lang/String;"
>   11: dup
>   12: astore4
>   14: iconst_0
>   15: aload_1
>   16: checkcast #12 // class java/lang/String
>   19: aastore
>   20: aload 4
>   22: iconst_1
>   23: aload_2
>   24: checkcast #12 // class java/lang/String
>   27: aastore
>   28: aload 4
>   30: iconst_2
>   31: aload_3
>   32: checkcast #12 // class java/lang/String
>   35: aastore
>   36: aload 4
>   38: areturn
> 
> Thanks,
> Jorn

Addressed latest review comments:
- Reverted back to using an injected constructor handle (to be able to take 
advantage of lambda form sharing)
- Added lambda form sharing for empty and reference arrays
- Added a test case for a custom class to VarargsArrayTest, which catches the 
illegal access error in the intrinsified case that Vlad pointed out (though the 
itrinsic itself is now removed).

I also had to switch back to the un-cached version for creating the array 
element setter, since the cached version casts the array to be a specific type, 
and if the lambda form is shared, this won't work (e.g. casting an Object[] to 
String[], depending on the order of creating the collectors). Adding caching 
there is left as a followup.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-05 Thread Jorn Vernee
On Fri, 2 Apr 2021 15:12:04 GMT, Vladimir Ivanov  wrote:

>> That's what I thought as well (but not 100% sure). I can partially roll back 
>> the last commit so the code still uses an injected array constructor handle, 
>> and then it should be easy to add caching in the cases where the component 
>> type is a reference type.
>
> Whatever you prefer. I'm fine with it either way.

I found it easier to add sharing now, since it makes the intrinsic redundant 
(and it can be removed again).

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]

2021-04-05 Thread Jorn Vernee
> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

Jorn Vernee has updated the pull request incrementally with one additional 
commit since the last revision:

  - Revert back to injected constructor handle
  - Add lambda form sharing
  - Add test case for collecting a custom class

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3306/files
  - new: https://git.openjdk.java.net/jdk/pull/3306/files/313ffaaf..dbba7910

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=01-02

  Stats: 68 lines in 4 files changed: 32 ins; 22 del; 14 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3306.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Vladimir Ivanov
On Fri, 2 Apr 2021 14:41:06 GMT, Jorn Vernee  wrote:

>> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java 
>> line 874:
>> 
>>> 872: case NEW_ARRAY:
>>> 873: Class rtype = 
>>> name.function.methodType().returnType();
>>> 874: if (isStaticallyNameable(rtype)) {
>> 
>> Why do you remote `isStaticallyNameable(rtype)` check? 
>> 
>> It is there for a reason: only a very limited set of classes can be directly 
>> referenced from LF bytecode.
>
> I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in 
> the latest commit. I didn't add the check since I was not aware of the 
> restriction (looks like some of the tests failed as well).
> 
> Good to know, will add a check.

Also, please, add a test case for such scenario. It should be trivial: just use 
a class defined by the test (loaded by `AppClassLoader`).

>> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:
>> 
>>> 1895: // field of the receiver handle.
>>> 1896: // This is left as a followup enhancement, as it needs to be 
>>> investigated if this causes
>>> 1897: // profile pollution.
>> 
>> Profile pollution shouldn't be a problem here: both types (element type + 
>> array type) will be constants attached to the "receiver" BMH instance and 
>> during inlining will be fetched from there.  
>> 
>> I'm fine with handling it as a separate RFE.
>
> That's what I thought as well (but not 100% sure). I can partially roll back 
> the last commit so the code still uses an injected array constructor handle, 
> and then it should be easy to add caching in the cases where the component 
> type is a reference type.

Whatever you prefer. I'm fine with it either way.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee
On Fri, 2 Apr 2021 13:56:31 GMT, Vladimir Ivanov  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Address review comments
>>   - Use cached version of store func getter
>>   - Use ARRAY_STORE intrinsic for array stores
>>   - Generate direct call to Array.newInstance instead of using an array 
>> constructor handle
>>   - Intrinsify call to Array.newInstance if the component type is constant
>
> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java 
> line 874:
> 
>> 872: case NEW_ARRAY:
>> 873: Class rtype = 
>> name.function.methodType().returnType();
>> 874: if (isStaticallyNameable(rtype)) {
> 
> Why do you remote `isStaticallyNameable(rtype)` check? 
> 
> It is there for a reason: only a very limited set of classes can be directly 
> referenced from LF bytecode.

I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in 
the latest commit. I didn't add the check since I was not aware of the 
restriction.

Good to know, will add a check.

> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:
> 
>> 1895: // field of the receiver handle.
>> 1896: // This is left as a followup enhancement, as it needs to be 
>> investigated if this causes
>> 1897: // profile pollution.
> 
> Profile pollution shouldn't be a problem here: both types (element type + 
> array type) will be constants attached to the "receiver" BMH instance and 
> during inlining will be fetched from there.  
> 
> I'm fine with handling it as a separate RFE.

That's what I thought as well (but not 100% sure). I can partially roll back 
the last commit so the code still uses an injected array constructor handle, 
and then it should be easy to add caching in the cases where the component type 
is a reference type.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Vladimir Ivanov
On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   - Address review comments
>   - Use cached version of store func getter
>   - Use ARRAY_STORE intrinsic for array stores
>   - Generate direct call to Array.newInstance instead of using an array 
> constructor handle
>   - Intrinsify call to Array.newInstance if the component type is constant

src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 
874:

> 872: case NEW_ARRAY:
> 873: Class rtype = 
> name.function.methodType().returnType();
> 874: if (isStaticallyNameable(rtype)) {

Why do you remote `isStaticallyNameable(rtype)` check? 

It is there for a reason: only a very limited set of classes can be directly 
referenced from LF bytecode.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:

> 1895: // field of the receiver handle.
> 1896: // This is left as a followup enhancement, as it needs to be 
> investigated if this causes
> 1897: // profile pollution.

Profile pollution should

Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee
On Thu, 1 Apr 2021 19:19:10 GMT, Paul Sandoz  wrote:

>> Jorn Vernee has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   - Address review comments
>>   - Use cached version of store func getter
>>   - Use ARRAY_STORE intrinsic for array stores
>>   - Generate direct call to Array.newInstance instead of using an array 
>> constructor handle
>>   - Intrinsify call to Array.newInstance if the component type is constant
>
> That's an elegant solution.
> 
> At first i thought it might unduly perturb lambda form generation and 
> caching. but you slotted a different lambda form implementation underneath 
> the varargs implementation.

I've address review comments, plus some other things:

- I realized that I was calling the uncached version of the store function 
factory. Fixed that.
- I also realized that there's already an `ARRAY_STORE` intrinsic, which I'm 
now using to avoid generating a call.
- I also realized that since we only have 1 array creation handle per lambda 
form, we can instead generate a direct call to `Array::newInstance` instead of 
going through the array constructor handle (which also avoids having to use a 
BoundMethodHandle).
- Finally, I added an instrinsic, under the old `NEW_ARRAY` name, that 
intrinsifies a call to `Array::newInstance` if the component type argument is 
constant (which it is in this case).

As a result, the lambda form is now fully intrinsified (no more calls in the 
generated bytecode) e.g.:

static java.lang.Object collector001__L(java.lang.Object, java.lang.Object, 
java.lang.Object, java.lang.Object);
Code:
   0: iconst_3
   1: anewarray #12 // class java/lang/String
   4: astore4
   6: aload 4
   8: checkcast #14 // class "[Ljava/lang/String;"
  11: dup
  12: astore4
  14: iconst_0
  15: aload_1
  16: checkcast #12 // class java/lang/String
  19: aastore
  20: aload 4
  22: iconst_1
  23: aload_2
  24: checkcast #12 // class java/lang/String
  27: aastore
  28: aload 4
  30: iconst_2
  31: aload_3
  32: checkcast #12 // class java/lang/String
  35: aastore
  36: aload 4
  38: areturn

Thanks,
Jorn

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]

2021-04-02 Thread Jorn Vernee
> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

Jorn Vernee has updated the pull request incrementally with one additional 
commit since the last revision:

  - Address review comments
  - Use cached version of store func getter
  - Use ARRAY_STORE intrinsic for array stores
  - Generate direct call to Array.newInstance instead of using an array 
constructor handle
  - Intrinsify call to Array.newInstance if the component type is constant

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3306/files
  - new: https://git.openjdk.java.net/jdk/pull/3306/files/46bc5d19..313ffaaf

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=00-01

  Stats: 57 lines in 2 files changed: 27 ins; 19 del; 11 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3306.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread Jorn Vernee
On Thu, 1 Apr 2021 19:09:53 GMT, John R Rose  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1903:
> 
>> 1901: }
>> 1902: 
>> 1903: private static LambdaForm makeCollectorForm(MethodType basicType, 
>> MethodHandle storeFunc) {
> 
> This reads very well.  It is the right way to do this job with LFs.
> There is a slight risk that the JIT will fail to inline the tiny MHs
> that (a) construct arrays and (b) poke elements into them.
> If that happens, then perhaps the bytecode generator can
> treat them as intrinsics.  Maybe that's worth a comment in
> the code, to help later readers?

Good idea, I'll leave some notes.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread Jorn Vernee
On Thu, 1 Apr 2021 19:09:14 GMT, John R Rose  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1342:
> 
>> 1340: }
>> 1341: 
>> 1342: // Array identity function (used as 
>> getConstantHandle(MH_arrayIdentity)).
> 
> If this is no longer used, remove it.

Yes, it looks like this can be removed. Good catch!

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread Jorn Vernee
On Thu, 1 Apr 2021 19:12:42 GMT, Paul Sandoz  wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> BenchmarkMode  CntScoreError  
>> Units
>> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> BenchmarkMode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1277:
> 
>> 1275: if (intrinsicName == Intrinsic.IDENTITY) {
>> 1276: MethodType resultType = 
>> type().asCollectorType(arrayType, type().parameterCount() - 1, arrayLength);
>> 1277: MethodHandle collector = 
>> MethodHandleImpl.makeCollector(arrayType, arrayLength);
> 
> Should `MethodHandleImpl.varargsArray` still be used here since it performs 
> arity checking and caching? 
> 
> Maybe the arity checks are performed by `asCollectorType`, but that would 
> still leave the caching.

Ah right, good catch! This is a leftover from moving things around and should 
indeed use `varargsArray`.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread Paul Sandoz
On Thu, 1 Apr 2021 13:25:05 GMT, Jorn Vernee  wrote:

> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

That's an elegant solution.

At first i thought it might unduly perturb lambda form generation and caching. 
but you slotted a different lambda form implementation underneath the varargs 
implementation.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1277:

> 1275: if (intrinsicName == Intrinsic.IDENTITY) {
> 1276: MethodType resultType = 
> type().asCollectorType(arrayType, type().parameterCount() - 1, arrayLength);
> 1277: MethodHandle collector = 
> MethodHandleImpl.makeCollector(arrayType, arrayLength);

Should `MethodHandleImpl.varargsArray` still be used here since it performs 
arity checking and caching? 

Maybe the arity checks are performed by `asCollectorType`, but that would still 
leave the caching.

-

PR: https://git.openjdk.java.net/jdk/pull/3306


Re: RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread John R Rose
On Thu, 1 Apr 2021 13:25:05 GMT, Jorn Vernee  wrote:

> This patch speeds up MethodHandle.asCollector handles where the array type is 
> not Object[], as well as speeding up all collectors where the arity is 
> greater than 10.
> 
> The old code is creating a collector handle by combining a set of hard coded 
> methods for collecting arguments into an Object[], up to a maximum of ten 
> elements, and then copying this intermediate array into a final array.
> 
> In principle, it shouldn't matter how slow (or fast) this handle is, because 
> it should be replaced by existing bytecode intrinsification which does the 
> right thing. But, through investigation it turns out that the 
> intrinsification is only being applied in a very limited amount of cases: 
> Object[] with max 10 elements only, only for the intermediate collector 
> handles. Every other collector shape incurs overhead because it essentially 
> ends up calling the ineffecient fallback handle.
> 
> Rather than sticking on a band aid (I tried this, but it turned out to be 
> very tricky to untangle the existing code), the new code replaces the 
> existing implementation with a collector handle implemented using a 
> LambdaForm, which removes the need for intrinsification, and also greatly 
> reduces code-complexity of the implementation. (I plan to expose this 
> collector using a public API in the future as well, so users don't have to go 
> through MHs::identity to make a collector).
> 
> The old code was also using a special lambda form transform for collecting 
> arguments into an array. I believe this was done to take advantage of the 
> existing-but-limited bytecode intrinsification, at least for Object[] with 
> less than 10 elements.
> 
> The new code just uses the existing collect arguments transform with the 
> newly added collector handle as filter, and this works just as well for the 
> existing case, but as a bonus is also much simpler, since no separate 
> transform is needed. Using the collect arguments transform should also 
> improve sharing.
> 
> As a result of these changes a lot of code was unused and has been removed in 
> this patch.
> 
> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
> as well as another variant of the benchmark that used a declared static 
> identity method instead of MHs::identity (not included). Before/after 
> comparison of MethodHandleAs* benchmarks (no differences there).
> 
> Here are some numbers from the added benchmark:
> 
> Before:
> BenchmarkMode  CntScoreError  
> Units
> TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  
> ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  
> ns/op
> TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  
> ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
> ns/op
> TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  
> ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
> ns/op
> (as you can see, just the Object[] with arity less than 10 case is fast here)
> After:
> BenchmarkMode  Cnt  Score   Error  Units
> TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
> TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
> TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
> TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
> 
> Thanks,
> Jorn

So many red deletion lines; I'm looking at a beautiful sunset!
It's a sunset for some very old code, some of the first code
I wrote for method handles, long before Lambda Forms
made this sort of task easier.

Thanks very much for cleaning this out.  See a few
minor comments on some diff lines.

src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 651:

> 649: }
> 650: 
> 651: LambdaForm collectArgumentArrayForm(int pos, MethodHandle 
> arrayCollector) {

It's counter-intuitive that removing a LFE tactic would be harmless.
Each LFE tactic is a point where LFs can be shared, reducing footprint.
But in this case `collectArgumentArrayForm` is always paired with
`collectArgumentsForm`, so the latter takes care of sharing.
The actual code which makes the arrays is also shared, via
`Makers.TYPED_COLLECTORS` (unchanged).

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1342:

> 1340: }
> 1341: 
> 1342: // Array identity function (used as 
> getConstantHandle(MH_arrayIdentity)).

If this is no longer used, remove it.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1903:

> 1901: }
> 1902: 
> 1903: private static LambdaForm m

RFR: 8264288: Performance issue with MethodHandle.asCollector

2021-04-01 Thread Jorn Vernee
This patch speeds up MethodHandle.asCollector handles where the array type is 
not Object[], as well as speeding up all collectors where the arity is greater 
than 10.

The old code is creating a collector handle by combining a set of hard coded 
methods for collecting arguments into an Object[], up to a maximum of ten 
elements, and then copying this intermediate array into a final array.

In principle, it shouldn't matter how slow (or fast) this handle is, because it 
should be replaced by existing bytecode intrinsification which does the right 
thing. But, through investigation it turns out that the intrinsification is 
only being applied in a very limited amount of cases: Object[] with max 10 
elements only, only for the intermediate collector handles. Every other 
collector shape incurs overhead because it essentially ends up calling the 
ineffecient fallback handle.

Rather than sticking on a band aid, the new code replaces the existing 
implementation with a collector handle implemented using a LambdaForm, which 
removes the need for intrinsification, and also greatly reduces code-complexity 
of the implementation. (I plan to expose this collector using a public API in 
the future as well, so users don't have to go through MHs::identity to make a 
collector).

The old code was also using a special lambda form transform for collecting 
arguments into an array. I believe this was done to take advantage of the 
existing-but-limited bytecode intrinsification, at least for Object[] with less 
than 10 elements.

The new code just uses the existing collect arguments transform with the newly 
added collector handle as filter, and this works just as well for the existing 
case, but as a bonus is also much simpler, since no separate transform is 
needed. Using the collect arguments transform should also improve sharing.

As a result of these changes a lot of code was unused and has been removed in 
this patch.

Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), as 
well as another variant of the benchmark that used a declared static identity 
method instead of MHs::identity (not included). Before/after comparison of 
MethodHandleAs* benchmarks (no differences there).

Here are some numbers from the added benchmark:

Before:
BenchmarkMode  CntScoreError  Units
TypedAsCollector.testIntCollect  avgt   30  189.156 �  1.796  ns/op
TypedAsCollector.testIntCollectHighArity avgt   30  660.549 � 10.159  ns/op
TypedAsCollector.testObjectCollect   avgt   307.092 �  0.042  ns/op
TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  ns/op
TypedAsCollector.testStringCollect   avgt   30   28.511 �  0.243  ns/op
TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  ns/op
(as you can see, just the Object[] with arity less than 10 case is fast here)
After:
BenchmarkMode  Cnt  Score   Error  Units
TypedAsCollector.testIntCollect  avgt   30  6.569 � 0.131  ns/op
TypedAsCollector.testIntCollectHighArity avgt   30  8.923 � 0.066  ns/op
TypedAsCollector.testObjectCollect   avgt   30  6.813 � 0.035  ns/op
TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
TypedAsCollector.testStringCollect   avgt   30  6.737 � 0.016  ns/op
TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op

Thanks,
Jorn

-

Commit messages:
 - WIP - clean up
 - Completely implement collect in lambdaform
 - WIP - fix MH collector perf, with jumbo arity bandaid

Changes: https://git.openjdk.java.net/jdk/pull/3306/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264288
  Stats: 513 lines in 8 files changed: 154 ins; 337 del; 22 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3306.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306

PR: https://git.openjdk.java.net/jdk/pull/3306