On Tue, 28 Apr 2026 21:32:10 GMT, Vladimir Ivanov <[email protected]> wrote:
>> I had originally banked on the truncation semantics of broadcast instruction
>> (replicate node), but to be consistent with the Ideal graph generated if we
>> hand tune this transformation in java code, introduced the truncation IR.
>>
>>
>> public static void micro1(byte [] dst, byte [] src1, byte [] src2, int idx)
>> {
>> ByteVector.broadcast(BSP, n1)
>> .lanewise(VectorOperators.ADD, ByteVector.broadcast(BSP, n2))
>> .intoArray(dst, idx);
>> }
>>
>> public static void micro2(byte [] dst, byte [] src1, byte [] src2, int
>> idx) {
>> ByteVector.broadcast(BSP, n1 + n2)
>> .intoArray(dst, idx);
>> }
>>
>>
>> Following is the Ideal graph generated for micro2
>>
>>
>> 37 AddI === _ 31 36 [[ 88 ]] !jvms: test_subword::micro2 @ bci:9 (line
>> 19)
>> 87 ConI === 0 [[ 88 89 ]] #int:24
>> 88 LShiftI === _ 37 87 [[ 89 ]] !jvms:
>> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328)
>> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>> 89 RShiftI === _ 88 87 [[ 101 90 506 ]] !jvms:
>> ByteVector$ByteSpecies::longToElementBits @ bci:2 (line 4328)
>> ByteVector$ByteSpecies::broadcast @ bci:3 (line 4320) ByteVector::broadcast
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>> 506 Replicate === _ 89 [[ 348 533 362 ]] #vectorz<B,64> !jvms:
>> ByteVector$ByteSpecies::broadcastBits @ bci:20 (line 4305)
>> ByteVector$ByteSpecies::broadcast @ bci:6 (line 4320) ByteVector::broadcast
>> @ bci:7 (line 673) test_subword::micro2 @ bci:11 (line 19)
>>
>>
>> Lets wait to hear back form @iwanowww
>
> Well, IMO truncation is redundant here. As it is shaped now, `Replicate`
> implicitly performs truncation. Subword types are erased to ints in
> bytecodes, but `Replicate` consumes raw subword values and is expected to
> truncate it when populating a vector. So, when it comes to `Replicate` node
> elimination scenario, I don't see a difference between `Replicate
> INP1`/`Replicate INP2` and `Replicate (ScalarOp INP1 INP2)`. In both cases
> the scalar is not guaranteed to abide to subword bounds. If there's no
> explicit truncation in the former case, what's the point in explicitly
> truncating the result of scalar operation?
If there's a scenario when `Replicate` is bypassed, it has to take into account
the effects of truncation the node implicitly performs. I'd prefer to see
truncation made explicit and make `Replicate` require its input to be within
bounds, then match corresponding IR tree. But that's something for a separate
RFE.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/25617#discussion_r3157373666