https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124904

            Bug ID: 124904
           Summary: [LoopVectorize][AArch64] GCC fails to vectorize
                    conditional store loop with offset access that Clang
                    vectorizes with SVE
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bug_hunters at yeah dot net
  Target Milestone: ---

**Description:**
GCC fails to auto-vectorize a loop containing a conditional store with an
offset access pattern on AArch64, while Clang successfully vectorizes it using
SVE under the same architecture flags (`-march=armv9-a+sve`).

The loop performs a simple operation: when `a[i]` is non-zero, it stores the
value of `a[i+9]` (cast to double) into `out[i]`. GCC's vectorizer reports
"unsupported control flow in loop" and aborts vectorization entirely,
generating fully scalar code. In contrast, Clang successfully vectorizes the
loop with a vectorization width of `vscale x 2` and an interleaved count of 2,
using SVE instructions.

**Test case :**
```c
#include <stdint.h>
#include <stddef.h>

float foo(
    const long * __restrict__ a,
    double * __restrict__ out,
    int n
) {
    for (int i = 0; i < n; i += 1)
    {
        if ((a[i] != 0)) {
            out[i] = (((double)a[(i + 9)]));

        }
    } 
    return (float)0;
}
```

**gcc version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260413 (experimental)
Copyright (C) 2026 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
```

**Compilation options:**
```
-O3 -S -march=armv9-a+sve -ftree-vectorize -fopt-info-vec-all
```

**The compiler outputs :**
```
<source>:9:23: missed: couldn't vectorize loop
<source>:9:23: missed: not vectorized: unsupported control flow in loop.
<source>:4:7: note: vectorized 0 loops in function.
<source>:16:12: note: ***** Analysis failed with vector mode VNx2DI
<source>:16:12: note: ***** Skipping vector mode VNx16QI, which would repeat
the analysis for VNx2DI
```

Also reproducible on Godbolt: https://godbolt.org/z/orPqv3vov

**However, Clang vectorizes it.** Clang version 21.1.1.

**clang version:**
```
clang version 21.1.1
Target: unknown
Thread model: posix
Build config: +unoptimized, +assertions
```

**Clang options:**
```
-target aarch64-linux-gnu -march=armv9-a+sve -S -O3 -ftree-vectorize
-ftree-slp-vectorize -Rpass=.*vectorize.* -Rpass-missed=.*vectorize.*
-Rpass-analysis=.*vectorize.*
```

**The result of Clang:**
```
<source>:11:14: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
   11 |         if ((a[i] != 0)) {
      |              ^
<source>:12:32: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): load [-Rpass-analysis=loop-vectorize]
   12 |             out[i] = (((double)a[(i + 9)]));
      |                                ^
<source>:12:20: remark: Recipe with invalid costs prevented vectorization at
VF=(vscale x 1): store [-Rpass-analysis=loop-vectorize]
   12 |             out[i] = (((double)a[(i + 9)]));
      |                    ^
<source>:9:5: remark: vectorized loop (vectorization width: vscale x 2,
interleaved count: 2) [-Rpass=loop-vectorize]
    9 |     for (int i = 0; i < n; i += 1)
      |     ^
```

Also reproducible on Godbolt: https://godbolt.org/z/35h6cG7x7

Reply via email to