https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124902

            Bug ID: 124902
           Summary: [LoopVectorize][AArch64] GCC fails to vectorize
                    conditional reduction with `if` statement that Clang
                    vectorizes with SVE
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bug_hunters at yeah dot net
  Target Milestone: ---

**Description:**
GCC fails to auto-vectorize a simple conditional reduction loop on AArch64,
while Clang successfully vectorizes it using SVE under the same architecture
flags (`-march=armv9-a+sve`). The loop computes a sum of integer array elements
inside an `if (a[idx] != 0)` condition.

GCC's vectorizer reports "unsupported use in stmt" and aborts vectorization
entirely, generating fully scalar code. In contrast, Clang successfully
vectorizes the loop with a vectorization width of `vscale x 4` and an
interleaved count of 2, using SVE instructions.

**Test case :**
```c
#include <stdint.h>
#include <stddef.h>

unsigned int foo(
    const int * __restrict__ a,
    const int * __restrict__ b,
    const int * __restrict__ c,
    int n
) {
    int sum = 0;
    for (int i = n - 1; i >= 0; i -= 1)
    {
        int idx = i;
        if (a[idx] != 0) {
            sum += a[idx];
            sum += b[idx];
            sum += c[idx];

        }
    }
    return sum;
}
```

**gcc version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260413 (experimental)
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

**Compilation options:**
```
-O3 -S -march=armv9-a+sve -ftree-vectorize -fopt-info-vec-all
```

**The compiler outputs :**
```
<source>:11:27: missed: couldn't vectorize loop
<source>:4:14: missed: not vectorized: unsupported use in stmt.
<source>:4:14: note: vectorized 0 loops in function.
<source>:17:17: note: ***** Analysis failed with vector mode VNx4SI
<source>:17:17: note: ***** The result for vector mode VNx16QI would be the
same
<source>:17:17: note: ***** The result for vector mode VNx8QI would be the same
<source>:17:17: note: ***** The result for vector mode VNx4QI would be the same
<source>:17:17: note: ***** Re-trying analysis with vector mode VNx2QI
<source>:17:17: note: ***** Analysis failed with vector mode VNx2QI
<source>:17:17: note: ***** Re-trying analysis with vector mode V16QI
<source>:17:17: note: ***** Analysis failed with vector mode V16QI
<source>:17:17: note: ***** The result for vector mode V8QI would be the same
<source>:17:17: note: ***** The result for vector mode V4HI would be the same
<source>:17:17: note: ***** Re-trying analysis with vector mode V2SI
<source>:17:17: note: ***** Analysis failed with vector mode V2SI
```

Also reproducible on Godbolt: 
https://godbolt.org/z/87z4frG5d

**However, Clang vectorizes it.** Clang version 21.1.1.

**clang version:**
```
clang version 21.1.1
Target: unknown
Thread model: posix
Build config: +unoptimized, +assertions
```

**Clang options:**
```
-target aarch64-linux-gnu -march=armv9-a+sve -S -O3 -ftree-vectorize
-ftree-slp-vectorize -Rpass=.*vectorize.* -Rpass-missed=.*vectorize.*
-Rpass-analysis=.*vectorize.*
```

**The result of Clang:**
```
<source>:14:13: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
   14 |         if (a[idx] != 0) {
      |             ^
<source>:15:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
   15 |             sum += a[idx];
      |                 ^
<source>:16:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
   16 |             sum += b[idx];
      |                    ^
<source>:16:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
   16 |             sum += b[idx];
      |                 ^
<source>:17:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
   17 |             sum += c[idx];
      |                    ^
<source>:17:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
   17 |             sum += c[idx];
      |                 ^
<source>:11:5: remark: vectorized loop (vectorization width: vscale x 4,
interleaved count: 2) [-Rpass=loop-vectorize]
   11 |     for (int i = n - 1; i >= 0; i -= 1)
      |     ^
```

Also reproducible on Godbolt: https://godbolt.org/z/fnr6MoW79

Reply via email to