On Tue, 3 Jun 2025 14:54:16 GMT, Jatin Bhateja <[email protected]> wrote:

> Hi all,
> 
> This patch optimizes SIMD kernels making heavy use of broadcasted inputs 
> through following reassociating ideal transformations.
> 
> 
>  VectorOperation (VectorBroadcast INP1,  VectorBroadcast INP2) => 
>                             VectorBroadcast (ScalarOpration INP1, INP2)
> 
>  VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast 
> INP2) INP3) => 
>                              VectorOperation INP3 (VectorOperation 
> (VectorBroadcast INP1) (VectorOperation INP2))
> 
> 
> The idea is to push broadcasts across the vector operation and replace the 
> vector with an equivalent, cheaper scalar variant.  Currently, patch handles 
> most common vector operations.
> 
> Following are the performance number of benchmark included with this patch on 
> latest generation x86 targets:- 
> 
> **AMD Turin (2.1GHz)**
> <img width="1122" height="355" alt="image" 
> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54";
>  />
> 
> **Intel Granite Rapids (2.1GHz)**
> <img width="1105" height="325" alt="image" 
> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63";
>  />
> 
> 
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

Good work @jatin-bhateja ! I left some comments below. Thanks!

src/hotspot/share/opto/vectornode.cpp line 390:

> 388:         case T_BOOLEAN:
> 389:         case T_CHAR:
> 390:           assert(!enable_assertions, "boolean and char are signed, not 
> implemented for Min");

Suggestion:

          assert(!enable_assertions, "boolean and char are unsigned, not 
implemented for Min");

src/hotspot/share/opto/vectornode.cpp line 411:

> 409:         case T_BOOLEAN:
> 410:         case T_CHAR:
> 411:           assert(!enable_assertions, "boolean and char are signed, not 
> implemented for Max");

ditto

src/hotspot/share/opto/vectornode.cpp line 1317:

> 1315:   }
> 1316:   return nullptr;
> 1317: }

Reassociation can be a separate optimization for vector nodes to me. Do you 
think it's better to split it as a separate change following the broadcast 
optimization? We can add specific tests for it.

src/hotspot/share/opto/vectornode.cpp line 1358:

> 1356:       return new ReplicateNode(sop, vect_type());
> 1357:     }
> 1358:   }

Is it better to wrap this transformation as a method?

test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java 
line 136:

> 134:      * ======================= */
> 135: 
> 136:     static final VectorSpecies<Long> LSP = LongVector.SPECIES_256;

Why not using the `SPECIES_PREFERRED` instead like the int species? The max 
vector size for AArch64 NEON (asimd) is 128-bit. We have to add another 
condition `MaxVectorSize >= 32` for following IR tests if using `SPECIES_256`.

-------------

PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-3776329907
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785411161
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785421927
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785464918
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785469004
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785485050

Reply via email to