Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]
On Mon, 5 Apr 2021 15:39:39 GMT, Vladimir Ivanov wrote: >> Jorn Vernee has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Review comment: Rearrange code to explicitly reference erased setter > > Looks good. Thanks for the reviews. I've submitted one more tier1-3 test run and if that's all green I'll go ahead and integrate this. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]
On Mon, 5 Apr 2021 14:16:37 GMT, Jorn Vernee wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > Review comment: Rearrange code to explicitly reference erased setter Looks good. - Marked as reviewed by vlivanov (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
On Mon, 5 Apr 2021 12:48:07 GMT, Vladimir Ivanov wrote: >> Jorn Vernee has updated the pull request incrementally with one additional >> commit since the last revision: >> >> - Revert back to injected constructor handle >> - Add lambda form sharing >> - Add test case for collecting a custom class > > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1883: > >> 1881: >> 1882: MethodHandle newArray = >> MethodHandles.arrayConstructor(arrayType); >> 1883: MethodHandle storeFunc = ArrayAccessor.getAccessor(arrayType, >> ArrayAccess.SET); > > I'd move `storeFunc` into `makeCollectorForm()` and use erased accessor there > (`ArrayAccessor.OBJECT_ARRAY_SETTER`). > > (It is interesting that `OBJECT_ARRAY_SETTER` is `ARRAY_STORE` intrinsic.) Done. I think it's also more obvious this way that the erased setter is being used. Good suggestion. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v4]
> This patch speeds up MethodHandle.asCollector handles where the array type is > not Object[], as well as speeding up all collectors where the arity is > greater than 10. > > The old code is creating a collector handle by combining a set of hard coded > methods for collecting arguments into an Object[], up to a maximum of ten > elements, and then copying this intermediate array into a final array. > > In principle, it shouldn't matter how slow (or fast) this handle is, because > it should be replaced by existing bytecode intrinsification which does the > right thing. But, through investigation it turns out that the > intrinsification is only being applied in a very limited amount of cases: > Object[] with max 10 elements only, only for the intermediate collector > handles. Every other collector shape incurs overhead because it essentially > ends up calling the ineffecient fallback handle. > > Rather than sticking on a band aid (I tried this, but it turned out to be > very tricky to untangle the existing code), the new code replaces the > existing implementation with a collector handle implemented using a > LambdaForm, which removes the need for intrinsification, and also greatly > reduces code-complexity of the implementation. (I plan to expose this > collector using a public API in the future as well, so users don't have to go > through MHs::identity to make a collector). > > The old code was also using a special lambda form transform for collecting > arguments into an array. I believe this was done to take advantage of the > existing-but-limited bytecode intrinsification, at least for Object[] with > less than 10 elements. > > The new code just uses the existing collect arguments transform with the > newly added collector handle as filter, and this works just as well for the > existing case, but as a bonus is also much simpler, since no separate > transform is needed. Using the collect arguments transform should also > improve sharing. > > As a result of these changes a lot of code was unused and has been removed in > this patch. > > Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), > as well as another variant of the benchmark that used a declared static > identity method instead of MHs::identity (not included). Before/after > comparison of MethodHandleAs* benchmarks (no differences there). > > Here are some numbers from the added benchmark: > > Before: > BenchmarkMode CntScoreError > Units > TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 > ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 > ns/op > TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 > ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 > ns/op > TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 > ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 > ns/op > (as you can see, just the Object[] with arity less than 10 case is fast here) > After: > BenchmarkMode Cnt Score Error Units > TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op > TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op > TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op > > Thanks, > Jorn Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Review comment: Rearrange code to explicitly reference erased setter - Changes: - all: https://git.openjdk.java.net/jdk/pull/3306/files - new: https://git.openjdk.java.net/jdk/pull/3306/files/dbba7910..72ccf282 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=02-03 Stats: 11 lines in 1 file changed: 6 ins; 2 del; 3 mod Patch: https://git.openjdk.java.net/jdk/pull/3306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306 PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
On Mon, 5 Apr 2021 11:57:08 GMT, Jorn Vernee wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > - Revert back to injected constructor handle > - Add lambda form sharing > - Add test case for collecting a custom class src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1883: > 1881: > 1882: MethodHandle newArray = > MethodHandles.arrayConstructor(arrayType); > 1883: MethodHandle storeFunc = ArrayAccessor.getAccessor(arrayType, > ArrayAccess.SET); I'd move `storeFunc` into `makeCollectorForm()` and use erased accessor there (`ArrayAccessor.OBJECT_ARRAY_SETTER`). (It is interesting that `OBJECT_ARRAY_SETTER` is `ARRAY_STORE` intrinsic.) - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
On Mon, 5 Apr 2021 11:57:08 GMT, Jorn Vernee wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > - Revert back to injected constructor handle > - Add lambda form sharing > - Add test case for collecting a custom class src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1909: > 1907: boolean isSharedLambdaForm = parameterCount == 0 || > basicType.parameterType(0) == Object.class; > 1908: if (isSharedLambdaForm) { > 1909: LambdaForm lform = > basicType.form().cachedLambdaForm(MethodTypeForm.LF_COLLECTOR); How does sharing work w.r.t. array store check? Unless you put `storeFunc` on the BMH, I don't see how sharing preserves proper check. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
On Mon, 5 Apr 2021 12:37:06 GMT, Vladimir Ivanov wrote: >> Jorn Vernee has updated the pull request incrementally with one additional >> commit since the last revision: >> >> - Revert back to injected constructor handle >> - Add lambda form sharing >> - Add test case for collecting a custom class > > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1909: > >> 1907: boolean isSharedLambdaForm = parameterCount == 0 || >> basicType.parameterType(0) == Object.class; >> 1908: if (isSharedLambdaForm) { >> 1909: LambdaForm lform = >> basicType.form().cachedLambdaForm(MethodTypeForm.LF_COLLECTOR); > > How does sharing work w.r.t. array store check? Unless you put `storeFunc` on > the BMH, I don't see how sharing preserves proper check. Nevermind, it's the type of the method handle produced by `makeCollector` which ensures that all the arguments have proper type. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
On Fri, 2 Apr 2021 13:23:07 GMT, Jorn Vernee wrote: >> That's an elegant solution. >> >> At first i thought it might unduly perturb lambda form generation and >> caching. but you slotted a different lambda form implementation underneath >> the varargs implementation. > > I've addressed review comments, plus some other things: > > - I realized that I was calling the uncached version of the store function > factory. Fixed that. > - I also realized that there's already an `ARRAY_STORE` intrinsic, which I'm > now using to avoid generating a call. > - I also realized that since we only have 1 array creation handle per lambda > form, we can instead generate a direct call to `Array::newInstance` instead > of going through the array constructor handle (which also avoids having to > use a BoundMethodHandle). > - Finally, I added an instrinsic, under the old `NEW_ARRAY` name, that > intrinsifies a call to `Array::newInstance` if the component type argument is > constant (which it is in this case). > > As a result, the lambda form is now fully intrinsified (no more calls in the > generated bytecode) e.g.: > > static java.lang.Object collector001__L(java.lang.Object, > java.lang.Object, java.lang.Object, java.lang.Object); > Code: >0: iconst_3 >1: anewarray #12 // class java/lang/String >4: astore4 >6: aload 4 >8: checkcast #14 // class "[Ljava/lang/String;" > 11: dup > 12: astore4 > 14: iconst_0 > 15: aload_1 > 16: checkcast #12 // class java/lang/String > 19: aastore > 20: aload 4 > 22: iconst_1 > 23: aload_2 > 24: checkcast #12 // class java/lang/String > 27: aastore > 28: aload 4 > 30: iconst_2 > 31: aload_3 > 32: checkcast #12 // class java/lang/String > 35: aastore > 36: aload 4 > 38: areturn > > Thanks, > Jorn Addressed latest review comments: - Reverted back to using an injected constructor handle (to be able to take advantage of lambda form sharing) - Added lambda form sharing for empty and reference arrays - Added a test case for a custom class to VarargsArrayTest, which catches the illegal access error in the intrinsified case that Vlad pointed out (though the itrinsic itself is now removed). I also had to switch back to the un-cached version for creating the array element setter, since the cached version casts the array to be a specific type, and if the lambda form is shared, this won't work (e.g. casting an Object[] to String[], depending on the order of creating the collectors). Adding caching there is left as a followup. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
On Fri, 2 Apr 2021 15:12:04 GMT, Vladimir Ivanov wrote: >> That's what I thought as well (but not 100% sure). I can partially roll back >> the last commit so the code still uses an injected array constructor handle, >> and then it should be easy to add caching in the cases where the component >> type is a reference type. > > Whatever you prefer. I'm fine with it either way. I found it easier to add sharing now, since it makes the intrinsic redundant (and it can be removed again). - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v3]
> This patch speeds up MethodHandle.asCollector handles where the array type is > not Object[], as well as speeding up all collectors where the arity is > greater than 10. > > The old code is creating a collector handle by combining a set of hard coded > methods for collecting arguments into an Object[], up to a maximum of ten > elements, and then copying this intermediate array into a final array. > > In principle, it shouldn't matter how slow (or fast) this handle is, because > it should be replaced by existing bytecode intrinsification which does the > right thing. But, through investigation it turns out that the > intrinsification is only being applied in a very limited amount of cases: > Object[] with max 10 elements only, only for the intermediate collector > handles. Every other collector shape incurs overhead because it essentially > ends up calling the ineffecient fallback handle. > > Rather than sticking on a band aid (I tried this, but it turned out to be > very tricky to untangle the existing code), the new code replaces the > existing implementation with a collector handle implemented using a > LambdaForm, which removes the need for intrinsification, and also greatly > reduces code-complexity of the implementation. (I plan to expose this > collector using a public API in the future as well, so users don't have to go > through MHs::identity to make a collector). > > The old code was also using a special lambda form transform for collecting > arguments into an array. I believe this was done to take advantage of the > existing-but-limited bytecode intrinsification, at least for Object[] with > less than 10 elements. > > The new code just uses the existing collect arguments transform with the > newly added collector handle as filter, and this works just as well for the > existing case, but as a bonus is also much simpler, since no separate > transform is needed. Using the collect arguments transform should also > improve sharing. > > As a result of these changes a lot of code was unused and has been removed in > this patch. > > Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), > as well as another variant of the benchmark that used a declared static > identity method instead of MHs::identity (not included). Before/after > comparison of MethodHandleAs* benchmarks (no differences there). > > Here are some numbers from the added benchmark: > > Before: > BenchmarkMode CntScoreError > Units > TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 > ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 > ns/op > TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 > ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 > ns/op > TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 > ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 > ns/op > (as you can see, just the Object[] with arity less than 10 case is fast here) > After: > BenchmarkMode Cnt Score Error Units > TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op > TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op > TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op > > Thanks, > Jorn Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: - Revert back to injected constructor handle - Add lambda form sharing - Add test case for collecting a custom class - Changes: - all: https://git.openjdk.java.net/jdk/pull/3306/files - new: https://git.openjdk.java.net/jdk/pull/3306/files/313ffaaf..dbba7910 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=01-02 Stats: 68 lines in 4 files changed: 32 ins; 22 del; 14 mod Patch: https://git.openjdk.java.net/jdk/pull/3306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306 PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
On Fri, 2 Apr 2021 14:41:06 GMT, Jorn Vernee wrote: >> src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java >> line 874: >> >>> 872: case NEW_ARRAY: >>> 873: Class rtype = >>> name.function.methodType().returnType(); >>> 874: if (isStaticallyNameable(rtype)) { >> >> Why do you remote `isStaticallyNameable(rtype)` check? >> >> It is there for a reason: only a very limited set of classes can be directly >> referenced from LF bytecode. > > I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in > the latest commit. I didn't add the check since I was not aware of the > restriction (looks like some of the tests failed as well). > > Good to know, will add a check. Also, please, add a test case for such scenario. It should be trivial: just use a class defined by the test (loaded by `AppClassLoader`). >> src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897: >> >>> 1895: // field of the receiver handle. >>> 1896: // This is left as a followup enhancement, as it needs to be >>> investigated if this causes >>> 1897: // profile pollution. >> >> Profile pollution shouldn't be a problem here: both types (element type + >> array type) will be constants attached to the "receiver" BMH instance and >> during inlining will be fetched from there. >> >> I'm fine with handling it as a separate RFE. > > That's what I thought as well (but not 100% sure). I can partially roll back > the last commit so the code still uses an injected array constructor handle, > and then it should be easy to add caching in the cases where the component > type is a reference type. Whatever you prefer. I'm fine with it either way. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
On Fri, 2 Apr 2021 13:56:31 GMT, Vladimir Ivanov wrote: >> Jorn Vernee has updated the pull request incrementally with one additional >> commit since the last revision: >> >> - Address review comments >> - Use cached version of store func getter >> - Use ARRAY_STORE intrinsic for array stores >> - Generate direct call to Array.newInstance instead of using an array >> constructor handle >> - Intrinsify call to Array.newInstance if the component type is constant > > src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java > line 874: > >> 872: case NEW_ARRAY: >> 873: Class rtype = >> name.function.methodType().returnType(); >> 874: if (isStaticallyNameable(rtype)) { > > Why do you remote `isStaticallyNameable(rtype)` check? > > It is there for a reason: only a very limited set of classes can be directly > referenced from LF bytecode. I removed the `NEW_ARRAY` intrinsic altogether at first, but added it back in the latest commit. I didn't add the check since I was not aware of the restriction. Good to know, will add a check. > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897: > >> 1895: // field of the receiver handle. >> 1896: // This is left as a followup enhancement, as it needs to be >> investigated if this causes >> 1897: // profile pollution. > > Profile pollution shouldn't be a problem here: both types (element type + > array type) will be constants attached to the "receiver" BMH instance and > during inlining will be fetched from there. > > I'm fine with handling it as a separate RFE. That's what I thought as well (but not 100% sure). I can partially roll back the last commit so the code still uses an injected array constructor handle, and then it should be easy to add caching in the cases where the component type is a reference type. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
On Fri, 2 Apr 2021 13:17:55 GMT, Jorn Vernee wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > Jorn Vernee has updated the pull request incrementally with one additional > commit since the last revision: > > - Address review comments > - Use cached version of store func getter > - Use ARRAY_STORE intrinsic for array stores > - Generate direct call to Array.newInstance instead of using an array > constructor handle > - Intrinsify call to Array.newInstance if the component type is constant src/java.base/share/classes/java/lang/invoke/InvokerBytecodeGenerator.java line 874: > 872: case NEW_ARRAY: > 873: Class rtype = > name.function.methodType().returnType(); > 874: if (isStaticallyNameable(rtype)) { Why do you remote `isStaticallyNameable(rtype)` check? It is there for a reason: only a very limited set of classes can be directly referenced from LF bytecode. src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1897: > 1895: // field of the receiver handle. > 1896: // This is left as a followup enhancement, as it needs to be > investigated if this causes > 1897: // profile pollution. Profile pollution should
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
On Thu, 1 Apr 2021 19:19:10 GMT, Paul Sandoz wrote: >> Jorn Vernee has updated the pull request incrementally with one additional >> commit since the last revision: >> >> - Address review comments >> - Use cached version of store func getter >> - Use ARRAY_STORE intrinsic for array stores >> - Generate direct call to Array.newInstance instead of using an array >> constructor handle >> - Intrinsify call to Array.newInstance if the component type is constant > > That's an elegant solution. > > At first i thought it might unduly perturb lambda form generation and > caching. but you slotted a different lambda form implementation underneath > the varargs implementation. I've address review comments, plus some other things: - I realized that I was calling the uncached version of the store function factory. Fixed that. - I also realized that there's already an `ARRAY_STORE` intrinsic, which I'm now using to avoid generating a call. - I also realized that since we only have 1 array creation handle per lambda form, we can instead generate a direct call to `Array::newInstance` instead of going through the array constructor handle (which also avoids having to use a BoundMethodHandle). - Finally, I added an instrinsic, under the old `NEW_ARRAY` name, that intrinsifies a call to `Array::newInstance` if the component type argument is constant (which it is in this case). As a result, the lambda form is now fully intrinsified (no more calls in the generated bytecode) e.g.: static java.lang.Object collector001__L(java.lang.Object, java.lang.Object, java.lang.Object, java.lang.Object); Code: 0: iconst_3 1: anewarray #12 // class java/lang/String 4: astore4 6: aload 4 8: checkcast #14 // class "[Ljava/lang/String;" 11: dup 12: astore4 14: iconst_0 15: aload_1 16: checkcast #12 // class java/lang/String 19: aastore 20: aload 4 22: iconst_1 23: aload_2 24: checkcast #12 // class java/lang/String 27: aastore 28: aload 4 30: iconst_2 31: aload_3 32: checkcast #12 // class java/lang/String 35: aastore 36: aload 4 38: areturn Thanks, Jorn - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector [v2]
> This patch speeds up MethodHandle.asCollector handles where the array type is > not Object[], as well as speeding up all collectors where the arity is > greater than 10. > > The old code is creating a collector handle by combining a set of hard coded > methods for collecting arguments into an Object[], up to a maximum of ten > elements, and then copying this intermediate array into a final array. > > In principle, it shouldn't matter how slow (or fast) this handle is, because > it should be replaced by existing bytecode intrinsification which does the > right thing. But, through investigation it turns out that the > intrinsification is only being applied in a very limited amount of cases: > Object[] with max 10 elements only, only for the intermediate collector > handles. Every other collector shape incurs overhead because it essentially > ends up calling the ineffecient fallback handle. > > Rather than sticking on a band aid (I tried this, but it turned out to be > very tricky to untangle the existing code), the new code replaces the > existing implementation with a collector handle implemented using a > LambdaForm, which removes the need for intrinsification, and also greatly > reduces code-complexity of the implementation. (I plan to expose this > collector using a public API in the future as well, so users don't have to go > through MHs::identity to make a collector). > > The old code was also using a special lambda form transform for collecting > arguments into an array. I believe this was done to take advantage of the > existing-but-limited bytecode intrinsification, at least for Object[] with > less than 10 elements. > > The new code just uses the existing collect arguments transform with the > newly added collector handle as filter, and this works just as well for the > existing case, but as a bonus is also much simpler, since no separate > transform is needed. Using the collect arguments transform should also > improve sharing. > > As a result of these changes a lot of code was unused and has been removed in > this patch. > > Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), > as well as another variant of the benchmark that used a declared static > identity method instead of MHs::identity (not included). Before/after > comparison of MethodHandleAs* benchmarks (no differences there). > > Here are some numbers from the added benchmark: > > Before: > BenchmarkMode CntScoreError > Units > TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 > ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 > ns/op > TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 > ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 > ns/op > TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 > ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 > ns/op > (as you can see, just the Object[] with arity less than 10 case is fast here) > After: > BenchmarkMode Cnt Score Error Units > TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op > TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op > TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op > > Thanks, > Jorn Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: - Address review comments - Use cached version of store func getter - Use ARRAY_STORE intrinsic for array stores - Generate direct call to Array.newInstance instead of using an array constructor handle - Intrinsify call to Array.newInstance if the component type is constant - Changes: - all: https://git.openjdk.java.net/jdk/pull/3306/files - new: https://git.openjdk.java.net/jdk/pull/3306/files/46bc5d19..313ffaaf Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=00-01 Stats: 57 lines in 2 files changed: 27 ins; 19 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/3306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306 PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector
On Thu, 1 Apr 2021 19:09:53 GMT, John R Rose wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1903: > >> 1901: } >> 1902: >> 1903: private static LambdaForm makeCollectorForm(MethodType basicType, >> MethodHandle storeFunc) { > > This reads very well. It is the right way to do this job with LFs. > There is a slight risk that the JIT will fail to inline the tiny MHs > that (a) construct arrays and (b) poke elements into them. > If that happens, then perhaps the bytecode generator can > treat them as intrinsics. Maybe that's worth a comment in > the code, to help later readers? Good idea, I'll leave some notes. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector
On Thu, 1 Apr 2021 19:09:14 GMT, John R Rose wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1342: > >> 1340: } >> 1341: >> 1342: // Array identity function (used as >> getConstantHandle(MH_arrayIdentity)). > > If this is no longer used, remove it. Yes, it looks like this can be removed. Good catch! - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector
On Thu, 1 Apr 2021 19:12:42 GMT, Paul Sandoz wrote: >> This patch speeds up MethodHandle.asCollector handles where the array type >> is not Object[], as well as speeding up all collectors where the arity is >> greater than 10. >> >> The old code is creating a collector handle by combining a set of hard coded >> methods for collecting arguments into an Object[], up to a maximum of ten >> elements, and then copying this intermediate array into a final array. >> >> In principle, it shouldn't matter how slow (or fast) this handle is, because >> it should be replaced by existing bytecode intrinsification which does the >> right thing. But, through investigation it turns out that the >> intrinsification is only being applied in a very limited amount of cases: >> Object[] with max 10 elements only, only for the intermediate collector >> handles. Every other collector shape incurs overhead because it essentially >> ends up calling the ineffecient fallback handle. >> >> Rather than sticking on a band aid (I tried this, but it turned out to be >> very tricky to untangle the existing code), the new code replaces the >> existing implementation with a collector handle implemented using a >> LambdaForm, which removes the need for intrinsification, and also greatly >> reduces code-complexity of the implementation. (I plan to expose this >> collector using a public API in the future as well, so users don't have to >> go through MHs::identity to make a collector). >> >> The old code was also using a special lambda form transform for collecting >> arguments into an array. I believe this was done to take advantage of the >> existing-but-limited bytecode intrinsification, at least for Object[] with >> less than 10 elements. >> >> The new code just uses the existing collect arguments transform with the >> newly added collector handle as filter, and this works just as well for the >> existing case, but as a bonus is also much simpler, since no separate >> transform is needed. Using the collect arguments transform should also >> improve sharing. >> >> As a result of these changes a lot of code was unused and has been removed >> in this patch. >> >> Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), >> as well as another variant of the benchmark that used a declared static >> identity method instead of MHs::identity (not included). Before/after >> comparison of MethodHandleAs* benchmarks (no differences there). >> >> Here are some numbers from the added benchmark: >> >> Before: >> BenchmarkMode CntScoreError >> Units >> TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 >> ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 >> ns/op >> TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 >> ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 >> ns/op >> TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 >> ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 >> ns/op >> (as you can see, just the Object[] with arity less than 10 case is fast here) >> After: >> BenchmarkMode Cnt Score Error Units >> TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op >> TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op >> TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op >> TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op >> TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op >> TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op >> >> Thanks, >> Jorn > > src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1277: > >> 1275: if (intrinsicName == Intrinsic.IDENTITY) { >> 1276: MethodType resultType = >> type().asCollectorType(arrayType, type().parameterCount() - 1, arrayLength); >> 1277: MethodHandle collector = >> MethodHandleImpl.makeCollector(arrayType, arrayLength); > > Should `MethodHandleImpl.varargsArray` still be used here since it performs > arity checking and caching? > > Maybe the arity checks are performed by `asCollectorType`, but that would > still leave the caching. Ah right, good catch! This is a leftover from moving things around and should indeed use `varargsArray`. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector
On Thu, 1 Apr 2021 13:25:05 GMT, Jorn Vernee wrote: > This patch speeds up MethodHandle.asCollector handles where the array type is > not Object[], as well as speeding up all collectors where the arity is > greater than 10. > > The old code is creating a collector handle by combining a set of hard coded > methods for collecting arguments into an Object[], up to a maximum of ten > elements, and then copying this intermediate array into a final array. > > In principle, it shouldn't matter how slow (or fast) this handle is, because > it should be replaced by existing bytecode intrinsification which does the > right thing. But, through investigation it turns out that the > intrinsification is only being applied in a very limited amount of cases: > Object[] with max 10 elements only, only for the intermediate collector > handles. Every other collector shape incurs overhead because it essentially > ends up calling the ineffecient fallback handle. > > Rather than sticking on a band aid (I tried this, but it turned out to be > very tricky to untangle the existing code), the new code replaces the > existing implementation with a collector handle implemented using a > LambdaForm, which removes the need for intrinsification, and also greatly > reduces code-complexity of the implementation. (I plan to expose this > collector using a public API in the future as well, so users don't have to go > through MHs::identity to make a collector). > > The old code was also using a special lambda form transform for collecting > arguments into an array. I believe this was done to take advantage of the > existing-but-limited bytecode intrinsification, at least for Object[] with > less than 10 elements. > > The new code just uses the existing collect arguments transform with the > newly added collector handle as filter, and this works just as well for the > existing case, but as a bonus is also much simpler, since no separate > transform is needed. Using the collect arguments transform should also > improve sharing. > > As a result of these changes a lot of code was unused and has been removed in > this patch. > > Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), > as well as another variant of the benchmark that used a declared static > identity method instead of MHs::identity (not included). Before/after > comparison of MethodHandleAs* benchmarks (no differences there). > > Here are some numbers from the added benchmark: > > Before: > BenchmarkMode CntScoreError > Units > TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 > ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 > ns/op > TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 > ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 > ns/op > TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 > ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 > ns/op > (as you can see, just the Object[] with arity less than 10 case is fast here) > After: > BenchmarkMode Cnt Score Error Units > TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op > TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op > TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op > > Thanks, > Jorn That's an elegant solution. At first i thought it might unduly perturb lambda form generation and caching. but you slotted a different lambda form implementation underneath the varargs implementation. src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1277: > 1275: if (intrinsicName == Intrinsic.IDENTITY) { > 1276: MethodType resultType = > type().asCollectorType(arrayType, type().parameterCount() - 1, arrayLength); > 1277: MethodHandle collector = > MethodHandleImpl.makeCollector(arrayType, arrayLength); Should `MethodHandleImpl.varargsArray` still be used here since it performs arity checking and caching? Maybe the arity checks are performed by `asCollectorType`, but that would still leave the caching. - PR: https://git.openjdk.java.net/jdk/pull/3306
Re: RFR: 8264288: Performance issue with MethodHandle.asCollector
On Thu, 1 Apr 2021 13:25:05 GMT, Jorn Vernee wrote: > This patch speeds up MethodHandle.asCollector handles where the array type is > not Object[], as well as speeding up all collectors where the arity is > greater than 10. > > The old code is creating a collector handle by combining a set of hard coded > methods for collecting arguments into an Object[], up to a maximum of ten > elements, and then copying this intermediate array into a final array. > > In principle, it shouldn't matter how slow (or fast) this handle is, because > it should be replaced by existing bytecode intrinsification which does the > right thing. But, through investigation it turns out that the > intrinsification is only being applied in a very limited amount of cases: > Object[] with max 10 elements only, only for the intermediate collector > handles. Every other collector shape incurs overhead because it essentially > ends up calling the ineffecient fallback handle. > > Rather than sticking on a band aid (I tried this, but it turned out to be > very tricky to untangle the existing code), the new code replaces the > existing implementation with a collector handle implemented using a > LambdaForm, which removes the need for intrinsification, and also greatly > reduces code-complexity of the implementation. (I plan to expose this > collector using a public API in the future as well, so users don't have to go > through MHs::identity to make a collector). > > The old code was also using a special lambda form transform for collecting > arguments into an array. I believe this was done to take advantage of the > existing-but-limited bytecode intrinsification, at least for Object[] with > less than 10 elements. > > The new code just uses the existing collect arguments transform with the > newly added collector handle as filter, and this works just as well for the > existing case, but as a bonus is also much simpler, since no separate > transform is needed. Using the collect arguments transform should also > improve sharing. > > As a result of these changes a lot of code was unused and has been removed in > this patch. > > Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), > as well as another variant of the benchmark that used a declared static > identity method instead of MHs::identity (not included). Before/after > comparison of MethodHandleAs* benchmarks (no differences there). > > Here are some numbers from the added benchmark: > > Before: > BenchmarkMode CntScoreError > Units > TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 > ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 > ns/op > TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 > ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 > ns/op > TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 > ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 > ns/op > (as you can see, just the Object[] with arity less than 10 case is fast here) > After: > BenchmarkMode Cnt Score Error Units > TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op > TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op > TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op > TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op > TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op > TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op > > Thanks, > Jorn So many red deletion lines; I'm looking at a beautiful sunset! It's a sunset for some very old code, some of the first code I wrote for method handles, long before Lambda Forms made this sort of task easier. Thanks very much for cleaning this out. See a few minor comments on some diff lines. src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java line 651: > 649: } > 650: > 651: LambdaForm collectArgumentArrayForm(int pos, MethodHandle > arrayCollector) { It's counter-intuitive that removing a LFE tactic would be harmless. Each LFE tactic is a point where LFs can be shared, reducing footprint. But in this case `collectArgumentArrayForm` is always paired with `collectArgumentsForm`, so the latter takes care of sharing. The actual code which makes the arrays is also shared, via `Makers.TYPED_COLLECTORS` (unchanged). src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1342: > 1340: } > 1341: > 1342: // Array identity function (used as > getConstantHandle(MH_arrayIdentity)). If this is no longer used, remove it. src/java.base/share/classes/java/lang/invoke/MethodHandleImpl.java line 1903: > 1901: } > 1902: > 1903: private static LambdaForm m
RFR: 8264288: Performance issue with MethodHandle.asCollector
This patch speeds up MethodHandle.asCollector handles where the array type is not Object[], as well as speeding up all collectors where the arity is greater than 10. The old code is creating a collector handle by combining a set of hard coded methods for collecting arguments into an Object[], up to a maximum of ten elements, and then copying this intermediate array into a final array. In principle, it shouldn't matter how slow (or fast) this handle is, because it should be replaced by existing bytecode intrinsification which does the right thing. But, through investigation it turns out that the intrinsification is only being applied in a very limited amount of cases: Object[] with max 10 elements only, only for the intermediate collector handles. Every other collector shape incurs overhead because it essentially ends up calling the ineffecient fallback handle. Rather than sticking on a band aid, the new code replaces the existing implementation with a collector handle implemented using a LambdaForm, which removes the need for intrinsification, and also greatly reduces code-complexity of the implementation. (I plan to expose this collector using a public API in the future as well, so users don't have to go through MHs::identity to make a collector). The old code was also using a special lambda form transform for collecting arguments into an array. I believe this was done to take advantage of the existing-but-limited bytecode intrinsification, at least for Object[] with less than 10 elements. The new code just uses the existing collect arguments transform with the newly added collector handle as filter, and this works just as well for the existing case, but as a bonus is also much simpler, since no separate transform is needed. Using the collect arguments transform should also improve sharing. As a result of these changes a lot of code was unused and has been removed in this patch. Testing: tier 1-3, benchmarking using TypedAsCollector (part of the patch), as well as another variant of the benchmark that used a declared static identity method instead of MHs::identity (not included). Before/after comparison of MethodHandleAs* benchmarks (no differences there). Here are some numbers from the added benchmark: Before: BenchmarkMode CntScoreError Units TypedAsCollector.testIntCollect avgt 30 189.156 � 1.796 ns/op TypedAsCollector.testIntCollectHighArity avgt 30 660.549 � 10.159 ns/op TypedAsCollector.testObjectCollect avgt 307.092 � 0.042 ns/op TypedAsCollector.testObjectCollectHighArity avgt 30 65.225 � 0.546 ns/op TypedAsCollector.testStringCollect avgt 30 28.511 � 0.243 ns/op TypedAsCollector.testStringCollectHighArity avgt 30 57.054 � 0.635 ns/op (as you can see, just the Object[] with arity less than 10 case is fast here) After: BenchmarkMode Cnt Score Error Units TypedAsCollector.testIntCollect avgt 30 6.569 � 0.131 ns/op TypedAsCollector.testIntCollectHighArity avgt 30 8.923 � 0.066 ns/op TypedAsCollector.testObjectCollect avgt 30 6.813 � 0.035 ns/op TypedAsCollector.testObjectCollectHighArity avgt 30 9.718 � 0.066 ns/op TypedAsCollector.testStringCollect avgt 30 6.737 � 0.016 ns/op TypedAsCollector.testStringCollectHighArity avgt 30 9.618 � 0.052 ns/op Thanks, Jorn - Commit messages: - WIP - clean up - Completely implement collect in lambdaform - WIP - fix MH collector perf, with jumbo arity bandaid Changes: https://git.openjdk.java.net/jdk/pull/3306/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3306&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8264288 Stats: 513 lines in 8 files changed: 154 ins; 337 del; 22 mod Patch: https://git.openjdk.java.net/jdk/pull/3306.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3306/head:pull/3306 PR: https://git.openjdk.java.net/jdk/pull/3306