Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Vladimir Ivanov Tue, 28 Apr 2026 14:40:06 -0700

On Tue, 28 Apr 2026 21:32:10 GMT, Vladimir Ivanov <[email protected]> wrote:


>> I had originally banked on the truncation semantics of broadcast instruction 
>> (replicate node), but to be consistent with the Ideal graph generated if we 
>> hand tune this transformation in java code, introduced the truncation IR. 
>>  
>> 
>>  public static void micro1(byte [] dst, byte [] src1, byte [] src2, int idx) 
>> {
>>       ByteVector.broadcast(BSP, n1)
>>                 .lanewise(VectorOperators.ADD, ByteVector.broadcast(BSP, n2))
>>                 .intoArray(dst, idx);
>>   }
>> 
>>   public static void micro2(byte [] dst, byte [] src1, byte [] src2, int 
>> idx) {
>>       ByteVector.broadcast(BSP, n1 + n2)
>>                 .intoArray(dst, idx);
>>   }
>> 
>> 
>> Following is the Ideal graph generated for micro2
>> 
>> 
>>   37  AddI  === _ 31 36  [[ 88 ]]  !jvms: test_subword::micro2 @ bci:9 (line 
>> 19)
>>   87  ConI  === 0  [[ 88 89 ]]  #int:24
>>   88  LShiftI  === _ 37 87  [[ 89 ]]  !jvms: 
>> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328) 
>> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast 
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>>   89  RShiftI  === _ 88 87  [[ 101 90 506 ]]  !jvms: 
>> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328) 
>> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast 
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>>  506  Replicate  === _ 89  [[ 348 533 362 ]]  #vectorz<B,64> !jvms: 
>> ByteVector$ByteSpecies::broadcastBits @ bci:20 (line 4305) 
>> ByteVector$ByteSpecies::broadcast @ bci:6 (line 4320) ByteVector::broadcast 
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>> 
>> 
>> Lets wait to hear back form @iwanowww
>
> Well, IMO truncation is redundant here. As it is shaped now, `Replicate` 
> implicitly performs truncation. Subword types are erased to ints in 
> bytecodes, but `Replicate` consumes raw subword values and is expected to 
> truncate it when populating a vector. So, when it comes to `Replicate` node 
> elimination scenario, I don't see a difference between `Replicate 
> INP1`/`Replicate INP2` and `Replicate (ScalarOp INP1 INP2)`. In both cases 
> the scalar is not guaranteed to abide to subword bounds. If there's no 
> explicit truncation in the former case,  what's the point in explicitly 
> truncating the result of scalar operation?

If there's a scenario when `Replicate` is bypassed, it has to take into account 
the effects of truncation the node implicitly performs. I'd prefer to see 
truncation made explicit and make `Replicate` require its input to be within 
bounds, then match corresponding IR tree. But that's something for a separate 
RFE.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3157373666

Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Reply via email to