On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee <jver...@openjdk.org> wrote:

>> This patch speeds up MethodHandle.asCollector handles where the array type 
>> is not Object[], as well as speeding up all collectors where the arity is 
>> greater than 10.
>> 
>> The old code is creating a collector handle by combining a set of hard coded 
>> methods for collecting arguments into an Object[], up to a maximum of ten 
>> elements, and then copying this intermediate array into a final array.
>> 
>> In principle, it shouldn't matter how slow (or fast) this handle is, because 
>> it should be replaced by existing bytecode intrinsification which does the 
>> right thing. But, through investigation it turns out that the 
>> intrinsification is only being applied in a very limited amount of cases: 
>> Object[] with max 10 elements only, only for the intermediate collector 
>> handles. Every other collector shape incurs overhead because it essentially 
>> ends up calling the ineffecient fallback handle.
>> 
>> Rather than sticking on a band aid (I tried this, but it turned out to be 
>> very tricky to untangle the existing code), the new code replaces the 
>> existing implementation with a collector handle implemented using a 
>> LambdaForm, which removes the need for intrinsification, and also greatly 
>> reduces code-complexity of the implementation. (I plan to expose this 
>> collector using a public API in the future as well, so users don't have to 
>> go through MHs::identity to make a collector).
>> 
>> The old code was also using a special lambda form transform for collecting 
>> arguments into an array. I believe this was done to take advantage of the 
>> existing-but-limited bytecode intrinsification, at least for Object[] with 
>> less than 10 elements.
>> 
>> The new code just uses the existing collect arguments transform with the 
>> newly added collector handle as filter, and this works just as well for the 
>> existing case, but as a bonus is also much simpler, since no separate 
>> transform is needed. Using the collect arguments transform should also 
>> improve sharing.
>> 
>> As a result of these changes a lot of code was unused and has been removed 
>> in this patch.
>> 
>> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), 
>> as well as another variant of the benchmark that used a declared static 
>> identity method instead of MHs::identity (not included). Before/after 
>> comparison of MethodHandleAs* benchmarks (no differences there).
>> 
>> Here are some numbers from the added benchmark:
>> 
>> Before:
>> Benchmark                                    Mode  Cnt    Score    Error  
>> Units
>> TypedAsCollector.testIntCollect              avgt   30  189.156 �  1.796  
>> ns/op
>> TypedAsCollector.testIntCollectHighArity     avgt   30  660.549 � 10.159  
>> ns/op
>> TypedAsCollector.testObjectCollect           avgt   30    7.092 �  0.042  
>> ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30   65.225 �  0.546  
>> ns/op
>> TypedAsCollector.testStringCollect           avgt   30   28.511 �  0.243  
>> ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30   57.054 �  0.635  
>> ns/op
>> (as you can see, just the Object[] with arity less than 10 case is fast here)
>> After:
>> Benchmark                                    Mode  Cnt  Score   Error  Units
>> TypedAsCollector.testIntCollect              avgt   30  6.569 � 0.131  ns/op
>> TypedAsCollector.testIntCollectHighArity     avgt   30  8.923 � 0.066  ns/op
>> TypedAsCollector.testObjectCollect           avgt   30  6.813 � 0.035  ns/op
>> TypedAsCollector.testObjectCollectHighArity  avgt   30  9.718 � 0.066  ns/op
>> TypedAsCollector.testStringCollect           avgt   30  6.737 � 0.016  ns/op
>> TypedAsCollector.testStringCollectHighArity  avgt   30  9.618 � 0.052  ns/op
>> 
>> Thanks,
>> Jorn
>
> Jorn Vernee has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   - Address review comments
>   - Use cached version of store func getter
>   - Use ARRAY_STORE intrinsic for array stores
>   - Generate direct call to Array.newInstance instead of using an array 
> constructor handle
>   - Intrinsify call to Array.newInstance if the component type is constant

src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 
874:

> 872:                 case NEW_ARRAY:
> 873:                     Class<?> rtype = 
> name.function.methodType().returnType();
> 874:                     if (isStaticallyNameable(rtype)) {

Why do you remote `isStaticallyNameable(rtype)` check? 

It is there for a reason: only a very limited set of classes can be directly 
referenced from LF bytecode.

src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897:

> 1895:         // field of the receiver handle.
> 1896:         // This is left as a followup enhancement, as it needs to be 
> investigated if this causes
> 1897:         // profile pollution.

Profile pollution shouldn't be a problem here: both types (element type + array 
type) will be constants attached to the "receiver" BMH instance and during 
inlining will be fetched from there.  

I'm fine with handling it as a separate RFE.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3306

Reply via email to