On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarreño <gal...@openjdk.org> wrote:

> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in 
> order to help improve vectorization performance.
> 
> Currently vectorization does not kick in for loops containing either of these 
> calls because of the following error:
> 
> 
> VLoop::check_preconditions: failed: control flow in loop not allowed
> 
> 
> The control flow is due to the java implementation for these methods, e.g.
> 
> 
> public static long max(long a, long b) {
>     return (a >= b) ? a : b;
> }
> 
> 
> This patch intrinsifies the calls to replace the CmpL + Bool nodes for 
> MaxL/MinL nodes respectively.
> By doing this, vectorization no longer finds the control flow and so it can 
> carry out the vectorization.
> E.g.
> 
> 
> SuperWord::transform_loop:
>     Loop: N518/N126  counted [int,int),+4 (1025 iters)  main has_sfpt 
> strip_mined
>  518  CountedLoop  === 518 246 126  [[ 513 517 518 242 521 522 422 210 ]] 
> inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: 
> Test::test @ bci:14 (line 21)
> 
> 
> Applying the same changes to `ReductionPerf` as in 
> https://github.com/openjdk/jdk/pull/13056, we can compare the results before 
> and after. Before the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1155
> long max   1173
> 
> 
> After the patch, on darwin/aarch64 (M1):
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java
>                                                          1     1     0     0
> ==============================
> TEST SUCCESS
> 
> long min   1042
> long max   1042
> 
> 
> This patch does not add an platform-specific backend implementations for the 
> MaxL/MinL nodes.
> Therefore, it still relies on the macro expansion to transform those into 
> CMoveL.
> 
> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these 
> results:
> 
> 
> ==============================
> Test summary
> ==============================
>    TEST                                              TOTAL  PASS  FAIL ERROR
>    jtreg:test/hotspot/jtreg:tier1                     2500  2500     0     0
>>> jtreg:test/jdk:tier1                     ...

The C2 changes look nice! I just added one comment here about style. It would 
also be good to add some IR tests checking that the intrinsic is creating 
`MaxL`/`MinL` nodes before macro expansion, and a microbenchmark to compare 
results.

src/hotspot/share/opto/library_call.cpp line 8244:

> 8242: bool LibraryCallKit::inline_long_min_max(vmIntrinsics::ID id) {
> 8243:   assert(callee()->signature()->size() == 4, "minL/maxL has 2 
> parameters of size 2 each.");
> 8244:   Node *a = argument(0);

Suggestion:

  Node* a = argument(0);

And the same for `b` and `n` as well.

-------------

PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2169250610
PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1672350809

Reply via email to