Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Vladimir Ivanov Tue, 28 Apr 2026 14:36:46 -0700

On Tue, 28 Apr 2026 09:40:53 GMT, Jatin Bhateja <[email protected]> wrote:


>> src/hotspot/share/opto/vectornode.cpp line 1385:
>> 
>>> 1383:     sop = phase->transform(new LShiftINode(sop, con_shift));
>>> 1384:     sop = phase->transform(new RShiftINode(sop, con_shift));
>>> 1385:   }
>> 
>> I think we have the default the wrong way around here.
>> We should assume that by default we always have to do truncation, and should 
>> only avoid truncation if we are sure it is safe.
>> 
>> Otherwise we might be setting ourselves up for future bugs here, and it will 
>> be difficult to catch them, until some fuzzer finds it.
>> 
>> @iwanowww do you agree?
>> 
>> See also `superword.cpp` -> `can_subword_truncate`
>
> I had originally banked on the truncation semantics of broadcast instruction 
> (replicate node), but to be consistent with the Ideal graph generated if we 
> hand tune this transformation in java code, introduced the truncation IR. 
>  
> 
>  public static void micro1(byte [] dst, byte [] src1, byte [] src2, int idx) {
>       ByteVector.broadcast(BSP, n1)
>                 .lanewise(VectorOperators.ADD, ByteVector.broadcast(BSP, n2))
>                 .intoArray(dst, idx);
>   }
> 
>   public static void micro2(byte [] dst, byte [] src1, byte [] src2, int idx) 
> {
>       ByteVector.broadcast(BSP, n1 + n2)
>                 .intoArray(dst, idx);
>   }
> 
> 
> Following is the Ideal graph generated for micro2
> 
> 
>   37  AddI  === _ 31 36  [[ 88 ]]  !jvms: test_subword::micro2 @ bci:9 (line 
> 19)
>   87  ConI  === 0  [[ 88 89 ]]  #int:24
>   88  LShiftI  === _ 37 87  [[ 89 ]]  !jvms: 
> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328) 
> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast @ 
> bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>   89  RShiftI  === _ 88 87  [[ 101 90 506 ]]  !jvms: 
> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328) 
> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast @ 
> bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>  506  Replicate  === _ 89  [[ 348 533 362 ]]  #vectorz<B,64> !jvms: 
> ByteVector$ByteSpecies::broadcastBits @ bci:20 (line 4305) 
> ByteVector$ByteSpecies::broadcast @ bci:6 (line 4320) ByteVector::broadcast @ 
> bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
> 
> 
> Lets wait to hear back form @iwanowww

Well, IMO truncation is redundant here. As it is shaped now, `Replicate` 
implicitly performs truncation. Subword types are erased to ints in bytecodes, 
but `Replicate` consumes raw subword values and is expected to truncate it when 
populating a vector. So, when it comes to `Replicate` node elimination 
scenario, I don't see a difference between `Replicate INP1`/`Replicate INP2` 
and `Replicate (ScalarOp INP1 INP2)`. In both cases the scalar is not 
guaranteed to abide to subword bounds. If there's no explicit truncation in the 
former case,  what's the point in explicitly truncating the result of scalar 
operation?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3157346214

Re: RFR: 8358521: Optimize vector operations by reassociating broadcasted inputs [v10]

Reply via email to