https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124904
Bug ID: 124904
Summary: [LoopVectorize][AArch64] GCC fails to vectorize
conditional store loop with offset access that Clang
vectorizes with SVE
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: bug_hunters at yeah dot net
Target Milestone: ---
**Description:**
GCC fails to auto-vectorize a loop containing a conditional store with an
offset access pattern on AArch64, while Clang successfully vectorizes it using
SVE under the same architecture flags (`-march=armv9-a+sve`).
The loop performs a simple operation: when `a[i]` is non-zero, it stores the
value of `a[i+9]` (cast to double) into `out[i]`. GCC's vectorizer reports
"unsupported control flow in loop" and aborts vectorization entirely,
generating fully scalar code. In contrast, Clang successfully vectorizes the
loop with a vectorization width of `vscale x 2` and an interleaved count of 2,
using SVE instructions.
**Test case :**
```c
#include <stdint.h>
#include <stddef.h>
float foo(
const long * __restrict__ a,
double * __restrict__ out,
int n
) {
for (int i = 0; i < n; i += 1)
{
if ((a[i] != 0)) {
out[i] = (((double)a[(i + 9)]));
}
}
return (float)0;
}
```
**gcc version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260413 (experimental)
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```
**Compilation options:**
```
-O3 -S -march=armv9-a+sve -ftree-vectorize -fopt-info-vec-all
```
**The compiler outputs :**
```
<source>:9:23: missed: couldn't vectorize loop
<source>:9:23: missed: not vectorized: unsupported control flow in loop.
<source>:4:7: note: vectorized 0 loops in function.
<source>:16:12: note: ***** Analysis failed with vector mode VNx2DI
<source>:16:12: note: ***** Skipping vector mode VNx16QI, which would repeat
the analysis for VNx2DI
```
Also reproducible on Godbolt: https://godbolt.org/z/orPqv3vov
**However, Clang vectorizes it.** Clang version 21.1.1.
**clang version:**
```
clang version 21.1.1
Target: unknown
Thread model: posix
Build config: +unoptimized, +assertions
```
**Clang options:**
```
-target aarch64-linux-gnu -march=armv9-a+sve -S -O3 -ftree-vectorize
-ftree-slp-vectorize -Rpass=.*vectorize.* -Rpass-missed=.*vectorize.*
-Rpass-analysis=.*vectorize.*
```
**The result of Clang:**
```
<source>:11:14: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
11 | if ((a[i] != 0)) {
| ^
<source>:12:32: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
12 | out[i] = (((double)a[(i + 9)]));
| ^
<source>:12:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): store [-Rpass-analysis=loop-vectorize]
12 | out[i] = (((double)a[(i + 9)]));
| ^
<source>:9:5: remark: vectorized loop (vectorization width: vscale x 2,
interleaved count: 2) [-Rpass=loop-vectorize]
9 | for (int i = 0; i < n; i += 1)
| ^
```
Also reproducible on Godbolt: https://godbolt.org/z/35h6cG7x7