On Fri, 24 Apr 2026 08:09:23 GMT, Jatin Bhateja <[email protected]> wrote:
>> Hi all, >> >> This patch optimizes SIMD kernels making heavy use of broadcasted inputs >> through following reassociating ideal transformations. >> >> >> VectorOperation (VectorBroadcast INP1, VectorBroadcast INP2) => >> VectorBroadcast (ScalarOpration INP1, INP2) >> >> VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast >> INP2) INP3) => >> VectorOperation INP3 (VectorOperation >> (VectorBroadcast INP1) (VectorBroadcast INP2)) >> >> >> The idea is to push broadcasts across the vector operation and replace the >> vector with an equivalent, cheaper scalar variant. Currently, patch handles >> most common vector operations. >> >> Following are the performance number of benchmark included with this patch >> on latest generation x86 targets:- >> >> **AMD Turin (2.1GHz)** >> <img width="1122" height="355" alt="image" >> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54" >> /> >> >> **Intel Granite Rapids (2.1GHz)** >> <img width="1105" height="325" alt="image" >> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63" >> /> >> >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Remving duplicate IRNode definitions after latest mainline merge @jatin-bhateja Looks like this is nicely progressing :) I have some questions / suggestions below. src/hotspot/share/opto/vectornode.cpp line 38: > 36: > 37: // Return the vector operator for the specified scalar operation > 38: // and basic type. Why did you delete this line? src/hotspot/share/opto/vectornode.cpp line 300: > 298: // Return the scalar opcode for the specified vector opcode and basic > type. > 299: // Returns 0 if not handled. > 300: int VectorNode::scalar_opcode(int vopc, BasicType bt) { Is this not the inverse of some other `VectorNode` function? Could we have some kind of assert/verification to check for consistency? src/hotspot/share/opto/vectornode.cpp line 1127: > 1125: } > 1126: > 1127: static bool is_commutative_vector_operation(int opcode) { This is assuming that the operation is not masked, right? You may want to put this caveat as a code comment. src/hotspot/share/opto/vectornode.cpp line 1186: > 1184: // Check whether we can push this vector op through replicate (all > inputs are Replicate). > 1185: bool VectorNode::can_push_through_replicate(BasicType bt) { > 1186: if (!scalar_opcode(Opcode(), bt)) { Looks like an implicit zero check. Not allowed by guidelines. Suggestion: if (scalar_opcode(Opcode(), bt) == 0) { src/hotspot/share/opto/vectornode.cpp line 1207: > 1205: int sopc = scalar_opcode(vopc, bt); > 1206: assert(sopc != 0, "unhandled vector opcode %s", NodeClassNames[vopc]); > 1207: assert(opcode(sopc, bt) == vopc, "scalar_opcode and opcode must agree > for %s", NodeClassNames[vopc]); Ah nice. This is some kind of round-trip verification. Good. src/hotspot/share/opto/vectornode.cpp line 1281: > 1279: Node* VectorNode::create_reassociated_node(Node* parent, Node* child, > Node* cinput1, Node* cinput2, > 1280: Node* pinput2, PhaseGVN* > phase) { > 1281: // Transformation: parent(child(cinput1, cinput2), pinput2) with > child's inputs set to cinput1, cinput2. A transformation usually specifies the input and output pattern. I struggle a bit to read what this line means. Can you please clarify? src/hotspot/share/opto/vectornode.cpp line 1292: > 1290: } > 1291: > 1292: // Try to reassociate commutative vector operations using following > ideal transformation, `using following` repeated verb src/hotspot/share/opto/vectornode.cpp line 1385: > 1383: sop = phase->transform(new LShiftINode(sop, con_shift)); > 1384: sop = phase->transform(new RShiftINode(sop, con_shift)); > 1385: } I think we have the default the wrong way around here. We should assume that by default we always have to do truncation, and should only avoid truncation if we are sure it is safe. Otherwise we might be setting ourselves up for future bugs here, and it will be difficult to catch them, until some fuzzer finds it. @iwanowww do you agree? See also `superword.cpp` -> `can_subword_truncate` ------------- PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-4186249156 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151917009 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151929266 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151935147 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151946316 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151956143 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151966534 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151976795 PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3152013336
