On Thu, 17 Oct 2024 10:10:56 GMT, Galder ZamarreƱo <[email protected]> wrote:
>> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in
>> order to help improve vectorization performance.
>>
>> Currently vectorization does not kick in for loops containing either of
>> these calls because of the following error:
>>
>>
>> VLoop::check_preconditions: failed: control flow in loop not allowed
>>
>>
>> The control flow is due to the java implementation for these methods, e.g.
>>
>>
>> public static long max(long a, long b) {
>> return (a >= b) ? a : b;
>> }
>>
>>
>> This patch intrinsifies the calls to replace the CmpL + Bool nodes for
>> MaxL/MinL nodes respectively.
>> By doing this, vectorization no longer finds the control flow and so it can
>> carry out the vectorization.
>> E.g.
>>
>>
>> SuperWord::transform_loop:
>> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt
>> strip_mined
>> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]]
>> inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193]
>> !jvms: Test::test @ bci:14 (line 21)
>>
>>
>> Applying the same changes to `ReductionPerf` as in
>> https://github.com/openjdk/jdk/pull/13056, we can compare the results before
>> and after. Before the patch, on darwin/aarch64 (M1):
>>
>>
>> ==============================
>> Test summary
>> ==============================
>> TEST TOTAL PASS FAIL ERROR
>> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>> 1 1 0 0
>> ==============================
>> TEST SUCCESS
>>
>> long min 1155
>> long max 1173
>>
>>
>> After the patch, on darwin/aarch64 (M1):
>>
>>
>> ==============================
>> Test summary
>> ==============================
>> TEST TOTAL PASS FAIL ERROR
>> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>> 1 1 0 0
>> ==============================
>> TEST SUCCESS
>>
>> long min 1042
>> long max 1042
>>
>>
>> This patch does not add an platform-specific backend implementations for the
>> MaxL/MinL nodes.
>> Therefore, it still relies on the macro expansion to transform those into
>> CMoveL.
>>
>> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these
>> results:
>>
>>
>> ==============================
>> Test summary
>> ==============================
>> TEST TOTAL PA...
>
> Galder ZamarreƱo has updated the pull request with a new target base due to a
> merge or a rebase. The incremental webrev excludes the unrelated changes
> brought in by the merge/rebase. The pull request contains 30 additional
> commits since the last revision:
>
> - Use same default size as in other vector reduction benchmarks
> - Renamed benchmark class
> - Double/Float tests only when avx enabled
> - Make state class non-final
> - Restore previous benchmark iterations and default param size
> - Add clipping range benchmark that uses min/max
> - Encapsulate benchmark state within an inner class
> - Avoid creating result array in benchmark method
> - Merge branch 'master' into topic.intrinsify-max-min-long
> - Revert "Implement cmovL as a jump+mov branch"
>
> This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44.
> - ... and 20 more: https://git.openjdk.org/jdk/compare/b7ec22c4...0a8718e1
Looks good to me.
-------------
Marked as reviewed by roland (Reviewer).
PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2412925659