On Tue, 8 Jul 2025 10:33:50 GMT, Fei Gao <f...@openjdk.org> wrote:

>>> > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>>> > > > 
>>> > > > 
>>> > > > Actually I didn't change the min vector size for `char` vectors in 
>>> > > > this patch. Relaxing `short` vectors to 32-bit is to support the 
>>> > > > vector cast for Vector API, and there is no `char` species in it. Do 
>>> > > > you think it's better to do the same change for `char` as well? This 
>>> > > > will just benefit auto-vectorization.
>>> > > 
>>> > > 
>>> > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases 
>>> > > involving `char`, the vector elements are represented using `T_SHORT` 
>>> > > as the `BasicType`, rather than `T_CHAR`.
>>> > > This is because, in Java, operands of subword types are always promoted 
>>> > > to `int` before any arithmetic operation. As a result, when handling a 
>>> > > node like `ConvD2I`, we don’t initially know its actual subword type. 
>>> > > Later, the SuperWord phase propagates a narrowed integer type backward 
>>> > > to help determine the correct subword type. See:
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>>> > > 
>>> > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>>> > > 
>>> > > the entire propagation chain tends to use `T_SHORT` as well.
>>> > > This applies to most operations, with the exception of a few like 
>>> > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>>> > > So your change already benefits many char-related vectorization cases 
>>> > > like `convertDoubleToChar` above. That’s why we can safely relax the IR 
>>> > > condition mentioned earlier.
>>> > 
>>> > 
>>> > Thanks for your input! It's really helpful to me. Does this mean it 
>>> > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do 
>>> > not need to consider `T_CHAR` in vector IRs in backend?
>>> 
>>> No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, 
>>> for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler 
>>> needs to preserve the higher-order bits of the first operand. Therefore, 
>>> SuperWord still needs to assign them precise subword types. See:
>>> 
>>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>> 
>> Yes, I see. Thanks! What I mean is for cases that SLP will use the sub...
>
>> > > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392
>> > > > > 
>> > > > > 
>> > > > > Actually I didn't change the min vector size for `char` vectors in 
>> > > > > this patch. Relaxing `short` vectors to 32-bit is to support the 
>> > > > > vector cast for Vector API, and there is no `char` species in it. Do 
>> > > > > you think it's better to do the same change for `char` as well? This 
>> > > > > will just benefit auto-vectorization.
>> > > > 
>> > > > 
>> > > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases 
>> > > > involving `char`, the vector elements are represented using `T_SHORT` 
>> > > > as the `BasicType`, rather than `T_CHAR`.
>> > > > This is because, in Java, operands of subword types are always 
>> > > > promoted to `int` before any arithmetic operation. As a result, when 
>> > > > handling a node like `ConvD2I`, we don’t initially know its actual 
>> > > > subword type. Later, the SuperWord phase propagates a narrowed integer 
>> > > > type backward to help determine the correct subword type. See:
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558
>> > > > 
>> > > > Since SuperWord assigns `T_SHORT` to `StoreC` early on
>> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650
>> > > > 
>> > > > the entire propagation chain tends to use `T_SHORT` as well.
>> > > > This applies to most operations, with the exception of a few like 
>> > > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately.
>> > > > So your change already benefits many char-related vectorization cases 
>> > > > like `convertDoubleToChar` above. That’s why we can safely relax the 
>> > > > IR condition mentioned earlier.
>> > > 
>> > > 
>> > > Thanks for your input! It's really helpful to me. Does this mean it 
>> > > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we 
>> > > do not need to consider `T_CHAR` in vector IRs in backend?
>> > 
>> > 
>> > No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, 
>> > for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler 
>> > needs to preserve the higher-order bits of the first operand. Therefore, 
>> > SuperWord still needs to assign them precise subword types. See:
>> > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589
>> 
>> Yes, I see. Thanks! What I mean is for cases th...

@fg1417 , there is performance regression of `D -> S` on NEON for SLP. I'v 
disabled the case in latest change. And here is the performance data of JMH 
`TypeVectorOperations` on Grace (the 128-bit SVE machine) and N1 (NEON) 
respectively:

Grace:

Benchmark                                 COUNT Mode  Unit    Before      After 
    Ratio
TypeVectorOperationsSuperWord.convertD2S  512   avgt  ns/op 155.667433  
123.222497  1.26
TypeVectorOperationsSuperWord.convertD2S  2048  avgt  ns/op 622.262384  
489.336020  1.27
TypeVectorOperationsSuperWord.convertL2S  512   avgt  ns/op 93.173939   
63.557134   1.46
TypeVectorOperationsSuperWord.convertL2S  2048  avgt  ns/op 365.287938  
239.726941  1.52
TypeVectorOperationsSuperWord.convertS2D  512   avgt  ns/op 157.096344  
147.560047  1.06
TypeVectorOperationsSuperWord.convertS2D  2048  avgt  ns/op 627.039963  
614.748559  1.01
TypeVectorOperationsSuperWord.convertS2L  512   avgt  ns/op 111.752970  
108.629240  1.02
TypeVectorOperationsSuperWord.convertS2L  2048  avgt  ns/op 441.312737  
441.088523  1.00

N1:

Benchmark                                 COUNT Mode  Unit    Before        
After   Ratio
TypeVectorOperationsSuperWord.convertD2S  512   avgt  ns/op 215.353528  
214.769884  1.00
TypeVectorOperationsSuperWord.convertD2S  2048  avgt  ns/op 958.428871  
952.922855  1.00
TypeVectorOperationsSuperWord.convertL2S  512   avgt  ns/op 158.000190  
142.647209  1.10
TypeVectorOperationsSuperWord.convertL2S  2048  avgt  ns/op 612.525835  
532.023419  1.15
TypeVectorOperationsSuperWord.convertS2D  512   avgt  ns/op 209.993363  
210.466401  0.99
TypeVectorOperationsSuperWord.convertS2D  2048  avgt  ns/op 819.181052  
803.601170  1.01
TypeVectorOperationsSuperWord.convertS2L  512   avgt  ns/op 217.848273  
182.680450  1.19
TypeVectorOperationsSuperWord.convertS2L  2048  avgt  ns/op 858.031089  
695.502377  1.23

-------------

PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3050738693

Reply via email to