On Tue, 3 Jun 2025 14:54:16 GMT, Jatin Bhateja <[email protected]> wrote:
> Hi all, > > This patch optimizes SIMD kernels making heavy use of broadcasted inputs > through following reassociating ideal transformations. > > > VectorOperation (VectorBroadcast INP1, VectorBroadcast INP2) => > VectorBroadcast (ScalarOpration INP1, INP2) > > VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast > INP2) INP3) => > VectorOperation INP3 (VectorOperation > (VectorBroadcast INP1) (VectorOperation INP2)) > > > The idea is to push broadcasts across the vector operation and replace the > vector with an equivalent, cheaper scalar variant. Currently, patch handles > most common vector operations. > > Following are the performance number of benchmark included with this patch on > latest generation x86 targets:- > > **AMD Turin (2.1GHz)** > <img width="1122" height="355" alt="image" > src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54" > /> > > **Intel Granite Rapids (2.1GHz)** > <img width="1105" height="325" alt="image" > src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63" > /> > > > > Kindly review and share your feedback. > > Best Regards, > Jatin Good work @jatin-bhateja ! I left some comments below. Thanks! src/hotspot/share/opto/vectornode.cpp line 390: > 388: case T_BOOLEAN: > 389: case T_CHAR: > 390: assert(!enable_assertions, "boolean and char are signed, not > implemented for Min"); Suggestion: assert(!enable_assertions, "boolean and char are unsigned, not implemented for Min"); src/hotspot/share/opto/vectornode.cpp line 411: > 409: case T_BOOLEAN: > 410: case T_CHAR: > 411: assert(!enable_assertions, "boolean and char are signed, not > implemented for Max"); ditto src/hotspot/share/opto/vectornode.cpp line 1317: > 1315: } > 1316: return nullptr; > 1317: } Reassociation can be a separate optimization for vector nodes to me. Do you think it's better to split it as a separate change following the broadcast optimization? We can add specific tests for it. src/hotspot/share/opto/vectornode.cpp line 1358: > 1356: return new ReplicateNode(sop, vect_type()); > 1357: } > 1358: } Is it better to wrap this transformation as a method? test/hotspot/jtreg/compiler/vectorapi/TestVectorBroadcastReassociations.java line 136: > 134: * ======================= */ > 135: > 136: static final VectorSpecies<Long> LSP = LongVector.SPECIES_256; Why not using the `SPECIES_PREFERRED` instead like the int species? The max vector size for AArch64 NEON (asimd) is 128-bit. We have to add another condition `MaxVectorSize >= 32` for following IR tests if using `SPECIES_256`. ------------- PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-3776329907 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785411161 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785421927 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785464918 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785469004 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r2785485050
