On Mon, 7 Jul 2025 06:59:20 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:
>> Have you measured the performance of this micro-benchmark on NEON machine? >> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256 >> >> We added an limitation only for `int` before: >> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134 >> >> Perhaps we also need to impose a similar limitation on `short` if the same >> regression occurs. > >> Have you measured the performance of this micro-benchmark on NEON machine? >> >> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/micro/org/openjdk/bench/vm/compiler/TypeVectorOperations.java#L251-L256 >> >> We added an limitation only for `int` before: >> >> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L131-L134 >> >> Perhaps we also need to impose a similar limitation on `short` if the same >> regression occurs. > > Good catch, and thanks so much for your input @fg1417 ! I will test the > performance and disable auto-vectorization for double to short casting if the > performance has regression. > >> https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java#L388-L392 > > Actually I didn't change the min vector size for `char` vectors in this > patch. Relaxing `short` vectors to 32-bit is to support the vector cast for > Vector API, and there is no `char` species in it. Do you think it's better to > do the same change for `char` as well? This will just benefit > auto-vectorization. > Hi @XiaohongGong, is there any way we can implement 2HF -> 2S and 2S -> 2HF > in these match rules ? > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4697 > > https://github.com/openjdk/jdk/blob/f2d2eef988c57cc9f6194a8fd5b2b422035ee68f/src/hotspot/cpu/aarch64/aarch64_vector.ad#L4679 > > The `fcvtn` and `fcvtl` instructions do not support these arrangements. I was > wondering if there is any other way we can implement these by any chance? Do you mean `2HF -> 2F` and `2F -> 2HF` ? Yes, it does not support the 32-bit arrangements. Vector conversion is a kind of lanewise vector operation. For such cases, we usually use the same arrangements with 64-bit vector size for 32-bit ones. That means we can reuse the `T4H` and `T4S` to implement it. Hence, current match rules can cover the conversions between `2HF` and `2F`. Consider there is no such conversion cases in Vector API, I didn't change the comment in the match rules. I think this may benefit auto-vectorization. Currently, do we have cases that can match these rules with SLP? ------------- PR Comment: https://git.openjdk.org/jdk/pull/26057#issuecomment-3047091009