https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124902
Bug ID: 124902
Summary: [LoopVectorize][AArch64] GCC fails to vectorize
conditional reduction with `if` statement that Clang
vectorizes with SVE
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: bug_hunters at yeah dot net
Target Milestone: ---
**Description:**
GCC fails to auto-vectorize a simple conditional reduction loop on AArch64,
while Clang successfully vectorizes it using SVE under the same architecture
flags (`-march=armv9-a+sve`). The loop computes a sum of integer array elements
inside an `if (a[idx] != 0)` condition.
GCC's vectorizer reports "unsupported use in stmt" and aborts vectorization
entirely, generating fully scalar code. In contrast, Clang successfully
vectorizes the loop with a vectorization width of `vscale x 4` and an
interleaved count of 2, using SVE instructions.
**Test case :**
```c
#include <stdint.h>
#include <stddef.h>
unsigned int foo(
const int * __restrict__ a,
const int * __restrict__ b,
const int * __restrict__ c,
int n
) {
int sum = 0;
for (int i = n - 1; i >= 0; i -= 1)
{
int idx = i;
if (a[idx] != 0) {
sum += a[idx];
sum += b[idx];
sum += c[idx];
}
}
return sum;
}
```
**gcc version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260413 (experimental)
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```
**Compilation options:**
```
-O3 -S -march=armv9-a+sve -ftree-vectorize -fopt-info-vec-all
```
**The compiler outputs :**
```
<source>:11:27: missed: couldn't vectorize loop
<source>:4:14: missed: not vectorized: unsupported use in stmt.
<source>:4:14: note: vectorized 0 loops in function.
<source>:17:17: note: ***** Analysis failed with vector mode VNx4SI
<source>:17:17: note: ***** The result for vector mode VNx16QI would be the
same
<source>:17:17: note: ***** The result for vector mode VNx8QI would be the same
<source>:17:17: note: ***** The result for vector mode VNx4QI would be the same
<source>:17:17: note: ***** Re-trying analysis with vector mode VNx2QI
<source>:17:17: note: ***** Analysis failed with vector mode VNx2QI
<source>:17:17: note: ***** Re-trying analysis with vector mode V16QI
<source>:17:17: note: ***** Analysis failed with vector mode V16QI
<source>:17:17: note: ***** The result for vector mode V8QI would be the same
<source>:17:17: note: ***** The result for vector mode V4HI would be the same
<source>:17:17: note: ***** Re-trying analysis with vector mode V2SI
<source>:17:17: note: ***** Analysis failed with vector mode V2SI
```
Also reproducible on Godbolt:
https://godbolt.org/z/87z4frG5d
**However, Clang vectorizes it.** Clang version 21.1.1.
**clang version:**
```
clang version 21.1.1
Target: unknown
Thread model: posix
Build config: +unoptimized, +assertions
```
**Clang options:**
```
-target aarch64-linux-gnu -march=armv9-a+sve -S -O3 -ftree-vectorize
-ftree-slp-vectorize -Rpass=.*vectorize.* -Rpass-missed=.*vectorize.*
-Rpass-analysis=.*vectorize.*
```
**The result of Clang:**
```
<source>:14:13: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
14 | if (a[idx] != 0) {
| ^
<source>:15:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
15 | sum += a[idx];
| ^
<source>:16:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
16 | sum += b[idx];
| ^
<source>:16:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
16 | sum += b[idx];
| ^
<source>:17:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
17 | sum += c[idx];
| ^
<source>:17:17: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): add [-Rpass-analysis=loop-vectorize]
17 | sum += c[idx];
| ^
<source>:11:5: remark: vectorized loop (vectorization width: vscale x 4,
interleaved count: 2) [-Rpass=loop-vectorize]
11 | for (int i = n - 1; i >= 0; i -= 1)
| ^
```
Also reproducible on Godbolt: https://godbolt.org/z/fnr6MoW79