https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124903

            Bug ID: 124903
           Summary: [LoopVectorize][AArch64] GCC fails to vectorize
                    strided reduction with type conversion that Clang
                    vectorizes with NEON
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bug_hunters at yeah dot net
  Target Milestone: ---

**Description:**
GCC fails to auto-vectorize a reduction loop containing a non-unit stride and a
type conversion (float to long) on AArch64, while Clang successfully vectorizes
it using NEON under identical compilation flags.

The loop computes the product of `float` array elements accessed with a stride
of 4, after casting them to `long`. GCC's vectorizer reports "unsupported SLP
instances" and aborts vectorization, generating scalar code. Clang's loop
vectorizer initially rejects scalable vectorization for this reduction pattern
but successfully falls back to fixed-width NEON vectorization, achieving a
vectorization width of 4 and an interleaved count of 2.

**Test case :**
```c
#include <stdint.h>
#include <stddef.h>

short foo(
    const float * __restrict__ a,
    int n
) {
    long product = 1;
    for (int i = 0; i < n; i += 2)
    {
        product *= (long)a[(i * 2)];
    }
    return (short)product;
}
```

**gcc version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260413 (experimental)
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

**Compilation options:**
```
-O3 -S -march=armv9-a+sve -ftree-vectorize -fopt-info-vec-all
```

**The compiler outputs :**
```
<source>:9:23: missed: couldn't vectorize loop
<source>:9:23: missed: unsupported SLP instances
<source>:4:7: note: vectorized 0 loops in function.
<source>:13:12: note: ***** Analysis failed with vector mode VNx4SF
<source>:13:12: note: ***** Skipping vector mode VNx16QI, which would repeat
the analysis for VNx4SF
```

Also reproducible on Godbolt: https://godbolt.org/z/qdazr3vxf

**However, Clang vectorizes it.** Clang version 21.1.1.

**clang version:**
```
clang version 21.1.1
Target: unknown
Thread model: posix
Build config: +unoptimized, +assertions
```

**Clang options:**
```
-target aarch64-linux-gnu -march=armv9-a+sve -S -O3 -ftree-vectorize
-ftree-slp-vectorize -Rpass=.*vectorize.* -Rpass-missed=.*vectorize.*
-Rpass-analysis=.*vectorize.*
```

**The result of Clang:**
```
<source>:9:5: remark: Scalable vectorization not supported for the reduction
operations found in this loop. [-Rpass-analysis=loop-vectorize]
    9 |     for (int i = 0; i < n; i += 2)
      |     ^
<source>:9:5: remark: vectorized loop (vectorization width: 4, interleaved
count: 2) [-Rpass=loop-vectorize]
<source>:4:7: remark: Cannot SLP vectorize list: vectorization was impossible
with available vectorization factors [-Rpass-missed=slp-vectorizer]
    4 | short foo(
      |       ^
```

Also reproducible on Godbolt: https://godbolt.org/z/zYP7j17Td

Reply via email to