On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee <jver...@openjdk.org> wrote:
>> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> Benchmark Mode Cnt Score Error >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 30 7.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> Benchmark Mode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > - Address review comments > - Use cached version of store func getter > - Use ARRAY_STORE intrinsic for array stores > - Generate direct call to Array.newInstance instead of using an array > constructor handle > - Intrinsify call to Array.newInstance if the component type is constant src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 874: > 872: case NEW_ARRAY: > 873: Class<?> rtype = > name.function.methodType().returnType(); > 874: if (isStaticallyNameable(rtype)) { Why do you remote `isStaticallyNameable(rtype)` check? It is there for a reason: only a very limited set of classes can be directly referenced from LF bytecode. src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897: > 1895: // field of the receiver handle. > 1896: // This is left as a followup enhancement, as it needs to be > investigated if this causes > 1897: // profile pollution. Profile pollution shouldn't be a problem here: both types (element type + array type) will be constants attached to the "receiver" BMH instance and during inlining will be fetched from there. I'm fine with handling it as a separate RFE. ------------- PR: https://git.openjdk.java.net/jdk/pull/3306