Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Emanuel Peter Mon, 27 Apr 2026 23:33:21 -0700

On Fri, 24 Apr 2026 08:09:23 GMT, Jatin Bhateja <[email protected]> wrote:


>> Hi all,
>> 
>> This patch optimizes SIMD kernels making heavy use of broadcasted inputs 
>> through following reassociating ideal transformations.
>> 
>> 
>>  VectorOperation (VectorBroadcast INP1,  VectorBroadcast INP2) => 
>>                             VectorBroadcast (ScalarOpration INP1, INP2)
>> 
>>  VectorOperation (VectorBroadcast INP1) (VectorOperation (VectorBroadcast 
>> INP2) INP3) => 
>>                              VectorOperation INP3 (VectorOperation 
>> (VectorBroadcast INP1) (VectorBroadcast INP2))
>> 
>> 
>> The idea is to push broadcasts across the vector operation and replace the 
>> vector with an equivalent, cheaper scalar variant.  Currently, patch handles 
>> most common vector operations.
>> 
>> Following are the performance number of benchmark included with this patch 
>> on latest generation x86 targets:- 
>> 
>> **AMD Turin (2.1GHz)**
>> <img width="1122" height="355" alt="image" 
>> src="https://github.com/user-attachments/assets/3f5087bf-0e14-4c56-b0c2-3d23253bad54";
>>  />
>> 
>> **Intel Granite Rapids (2.1GHz)**
>> <img width="1105" height="325" alt="image" 
>> src="https://github.com/user-attachments/assets/c8481f86-4db2-4c4e-bd65-51542c59fe63";
>>  />
>> 
>> 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Remving duplicate IRNode definitions after latest mainline merge

@jatin-bhateja Looks like this is nicely progressing :)

I have some questions / suggestions below.

src/hotspot/share/opto/vectornode.cpp line 38:

> 36: 
> 37: // Return the vector operator for the specified scalar operation
> 38: // and basic type.

Why did you delete this line?

src/hotspot/share/opto/vectornode.cpp line 300:

> 298: // Return the scalar opcode for the specified vector opcode and basic 
> type.
> 299: // Returns 0 if not handled.
> 300: int VectorNode::scalar_opcode(int vopc, BasicType bt) {

Is this not the inverse of some other `VectorNode` function? Could we have some 
kind of assert/verification to check for consistency?

src/hotspot/share/opto/vectornode.cpp line 1127:

> 1125: }
> 1126: 
> 1127: static bool is_commutative_vector_operation(int opcode) {

This is assuming that the operation is not masked, right? You may want to put 
this caveat as a code comment.

src/hotspot/share/opto/vectornode.cpp line 1186:

> 1184: // Check whether we can push this vector op through replicate (all 
> inputs are Replicate).
> 1185: bool VectorNode::can_push_through_replicate(BasicType bt) {
> 1186:   if (!scalar_opcode(Opcode(), bt)) {

Looks like an implicit zero check. Not allowed by guidelines.

Suggestion:

  if (scalar_opcode(Opcode(), bt) == 0) {

src/hotspot/share/opto/vectornode.cpp line 1207:

> 1205:   int sopc = scalar_opcode(vopc, bt);
> 1206:   assert(sopc != 0, "unhandled vector opcode %s", NodeClassNames[vopc]);
> 1207:   assert(opcode(sopc, bt) == vopc, "scalar_opcode and opcode must agree 
> for %s", NodeClassNames[vopc]);

Ah nice. This is some kind of round-trip verification. Good.

src/hotspot/share/opto/vectornode.cpp line 1281:

> 1279: Node* VectorNode::create_reassociated_node(Node* parent, Node* child, 
> Node* cinput1, Node* cinput2,
> 1280:                                            Node* pinput2, PhaseGVN* 
> phase) {
> 1281:   // Transformation: parent(child(cinput1, cinput2), pinput2) with 
> child's inputs set to cinput1, cinput2.

A transformation usually specifies the input and output pattern. I struggle a 
bit to read what this line means. Can you please clarify?

src/hotspot/share/opto/vectornode.cpp line 1292:

> 1290: }
> 1291: 
> 1292: // Try to reassociate commutative vector operations using following 
> ideal transformation,

`using following` repeated verb

src/hotspot/share/opto/vectornode.cpp line 1385:

> 1383:     sop = phase->transform(new LShiftINode(sop, con_shift));
> 1384:     sop = phase->transform(new RShiftINode(sop, con_shift));
> 1385:   }

I think we have the default the wrong way around here.
We should assume that by default we always have to do truncation, and should 
only avoid truncation if we are sure it is safe.

Otherwise we might be setting ourselves up for future bugs here, and it will be 
difficult to catch them, until some fuzzer finds it.

@iwanowww do you agree?

See also `superword.cpp` -> `can_subword_truncate`

-------------

PR Review: https://git.openjdk.org/jdk/pull/25617#pullrequestreview-4186249156
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151917009
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151929266
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151935147
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151946316
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151956143
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151966534
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3151976795
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3152013336

Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Reply via email to