On Tue, 8 Jul 2025 10:33:50 GMT, Fei Gao <f...@openjdk.org> wrote: >>> > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392 >>> > > > >>> > > > >>> > > > Actually I didn't change the min vector size for `char` vectors in >>> > > > this patch. Relaxing `short` vectors to 32-bit is to support the >>> > > > vector cast for Vector API, and there is no `char` species in it. Do >>> > > > you think it's better to do the same change for `char` as well? This >>> > > > will just benefit auto-vectorization. >>> > > >>> > > >>> > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases >>> > > involving `char`, the vector elements are represented using `T_SHORT` >>> > > as the `BasicType`, rather than `T_CHAR`. >>> > > This is because, in Java, operands of subword types are always promoted >>> > > to `int` before any arithmetic operation. As a result, when handling a >>> > > node like `ConvD2I`, we don’t initially know its actual subword type. >>> > > Later, the SuperWord phase propagates a narrowed integer type backward >>> > > to help determine the correct subword type. See: >>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558 >>> > > >>> > > Since SuperWord assigns `T_SHORT` to `StoreC` early on >>> > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650 >>> > > >>> > > the entire propagation chain tends to use `T_SHORT` as well. >>> > > This applies to most operations, with the exception of a few like >>> > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately. >>> > > So your change already benefits many char-related vectorization cases >>> > > like `convertDoubleToChar` above. That’s why we can safely relax the IR >>> > > condition mentioned earlier. >>> > >>> > >>> > Thanks for your input! It's really helpful to me. Does this mean it >>> > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we do >>> > not need to consider `T_CHAR` in vector IRs in backend? >>> >>> No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, >>> for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler >>> needs to preserve the higher-order bits of the first operand. Therefore, >>> SuperWord still needs to assign them precise subword types. See: >>> >>> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589 >> >> Yes, I see. Thanks! What I mean is for cases that SLP will use the sub... > >> > > > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392 >> > > > > >> > > > > >> > > > > Actually I didn't change the min vector size for `char` vectors in >> > > > > this patch. Relaxing `short` vectors to 32-bit is to support the >> > > > > vector cast for Vector API, and there is no `char` species in it. Do >> > > > > you think it's better to do the same change for `char` as well? This >> > > > > will just benefit auto-vectorization. >> > > > >> > > > >> > > > Hi @XiaohongGong thanks for asking. In many auto-vectorization cases >> > > > involving `char`, the vector elements are represented using `T_SHORT` >> > > > as the `BasicType`, rather than `T_CHAR`. >> > > > This is because, in Java, operands of subword types are always >> > > > promoted to `int` before any arithmetic operation. As a result, when >> > > > handling a node like `ConvD2I`, we don’t initially know its actual >> > > > subword type. Later, the SuperWord phase propagates a narrowed integer >> > > > type backward to help determine the correct subword type. See: >> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2551-L2558 >> > > > >> > > > Since SuperWord assigns `T_SHORT` to `StoreC` early on >> > > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2646-L2650 >> > > > >> > > > the entire propagation chain tends to use `T_SHORT` as well. >> > > > This applies to most operations, with the exception of a few like >> > > > `RShiftI`, `Abs`, and `ReverseBytesI`, which are handled separately. >> > > > So your change already benefits many char-related vectorization cases >> > > > like `convertDoubleToChar` above. That’s why we can safely relax the >> > > > IR condition mentioned earlier. >> > > >> > > >> > > Thanks for your input! It's really helpful to me. Does this mean it >> > > always use `T_SHORT` for char vectors in SLP? If so, it's safe that we >> > > do not need to consider `T_CHAR` in vector IRs in backend? >> > >> > >> > No, we don't always use `T_SHORT` for char vectors. As mentioned earlier, >> > for operations like `RShiftI`, `Abs`, and `ReverseBytesI`, the compiler >> > needs to preserve the higher-order bits of the first operand. Therefore, >> > SuperWord still needs to assign them precise subword types. See: >> > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/share/opto/superword.cpp#L2583-L2589 >> >> Yes, I see. Thanks! What I mean is for cases th...
@fg1417 , there is performance regression of `D -> S` on NEON for SLP. I'v disabled the case in latest change. And here is the performance data of JMH `TypeVectorOperations` on Grace (the 128-bit SVE machine) and N1 (NEON) respectively: Grace: Benchmark COUNT Mode Unit Before After Ratio TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 155.667433 123.222497 1.26 TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 622.262384 489.336020 1.27 TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 93.173939 63.557134 1.46 TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 365.287938 239.726941 1.52 TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 157.096344 147.560047 1.06 TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 627.039963 614.748559 1.01 TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 111.752970 108.629240 1.02 TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 441.312737 441.088523 1.00 N1: Benchmark COUNT Mode Unit Before After Ratio TypeVectorOperationsSuperWord.convertD2S 512 avgt ns/op 215.353528 214.769884 1.00 TypeVectorOperationsSuperWord.convertD2S 2048 avgt ns/op 958.428871 952.922855 1.00 TypeVectorOperationsSuperWord.convertL2S 512 avgt ns/op 158.000190 142.647209 1.10 TypeVectorOperationsSuperWord.convertL2S 2048 avgt ns/op 612.525835 532.023419 1.15 TypeVectorOperationsSuperWord.convertS2D 512 avgt ns/op 209.993363 210.466401 0.99 TypeVectorOperationsSuperWord.convertS2D 2048 avgt ns/op 819.181052 803.601170 1.01 TypeVectorOperationsSuperWord.convertS2L 512 avgt ns/op 217.848273 182.680450 1.19 TypeVectorOperationsSuperWord.convertS2L 2048 avgt ns/op 858.031089 695.502377 1.23 ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3050738693