https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125219

            Bug ID: 125219
           Summary: [15/16 Regression]AArch64 SVE: conditional store with
                    mixed-type comparison (double vs int) fails to
                    vectorize on trunk, regression from GCC 15.2
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: bug_hunters at yeah dot net
  Target Milestone: ---

**Description:**
GCC trunk fails to vectorize a loop containing a conditional store where the
condition involves a mixed-type comparison between `double` and `int` (`a[idx]
> b[idx]` where `a` is `double*`, `b` is `int*`). The vectorizer reports
"unsupported control flow in loop" and generates fully scalar code.

GCC 15.2.0 successfully vectorizes this same loop using SVE, handling the
mixed-type comparison by promoting `int` to `double` via `scvtf`, comparing
with `fcmgt`, and using predicated stores (`st1w`) for the conditional write to
`out`. Trunk rejects the loop entirely.

**Test case:**
```c
double foo(
    const double * __restrict__ a,
    const int * __restrict__ b,
    int * __restrict__ out,
    int n
) {
    for (int i = 0; i < n; i += 1)
    {
        int idx = i;
        if ((a[idx] > b[idx] && b[idx] != 1)) {
            out[idx] = ((idx) >= 4 ? out[(idx) - 4] : (int)0) + (((int)a[idx])
+ ((int)b[idx]));
        }
    }
    return (double)0;
}
```

**GCC version:**
```
aarch64-unknown-linux-gnu-gcc (GCC) 16.0.1 20260429 (experimental) [trunk]
```

**Compilation options:**
```
-march=armv9-a+sve -ftree-vectorize -O3 -fopt-info-vec-all
```

**GCC trunk output:**
```
<source>:7:23: missed: couldn't vectorize loop
<source>:7:23: missed: not vectorized: unsupported control flow in loop.
<source>:7:23: missed: couldn't vectorize loop
<source>:7:23: missed: not vectorized: unsupported control flow in loop.
<source>:1:8: note: vectorized 0 loops in function.
<source>:11:63: note: ***** Analysis failed with vector mode VNx2DF
<source>:11:63: note: ***** The result for vector mode VNx16QI would be the
same
<source>:11:63: note: ***** The result for vector mode VNx8QI would be the same
<source>:11:63: note: ***** The result for vector mode VNx4QI would be the same
<source>:11:63: note: ***** Re-trying analysis with vector mode VNx2QI
<source>:11:63: note: ***** Analysis failed with vector mode VNx2QI
<source>:11:63: note: ***** Re-trying analysis with vector mode V16QI
...
```

Generated assembly (fully scalar, no SVE instructions used, truncated for
brevity):
```assembly
foo:
        cmp     w3, 0
        ble     .L2
        ldr     w4, [x1]
        ldr     d31, [x0]
        scvtf   d30, w4
        fcmpe   d31, d30
        ccmp    w4, 1, 4, gt
        bne     .L37
.L3:
        ...
.L8:
        ldr     w5, [x1, x4, lsl 2]
        ldr     d23, [x0, x4, lsl 3]
        scvtf   d22, w5
        fcmpe   d22, d23
        ccmp    w5, 1, 4, mi
        bne     .L41
.L7:
        add     x4, x4, 1
        cmp     w3, w4
        bgt     .L8
.L2:
        movi    d0, #0
        ret
```

Also reproducible on Godbolt: https://godbolt.org/z/TGcnGE4ef

**GCC 15.2.0 (for comparison):**
```
<source>:7:23: missed: couldn't vectorize loop
<source>:7:23: missed: may need non-SLP handling
<source>:7:23: optimized: loop vectorized using variable length vectors
<source>:1:8: note: vectorized 1 loops in function.
```

Key vectorized portion (showing SVE mixed-type comparison + predicated store):
```assembly
        ld1w    z27.s, p6/z, [x1]          ; load b (int)
        ld1d    z29.d, p5/z, [x0]          ; load a (double)
        scvtf   z0.d, p7/m, z0.s           ; int -> double
        fcmgt   p15.d, p7/z, z29.d, z0.d   ; a > (double)b
        cmpne   p6.s, p6/z, z27.s, #1      ; b != 1
        and     p7.b, p7/z, p6.b, p6.b     ; combine predicates
        st1w    z25.s, p7, [x2]            ; predicated store
```

Also reproducible on Godbolt: https://godbolt.org/z/seMdcvs5z.

**Additional notes:**

1. This is a regression from GCC 15.2, which successfully vectorized the loop
using SVE predicated operations including mixed-type comparison (`double` vs
`int`), type conversions (`fcvtzs`/`scvtf`), and predicated stores (`st1w`).

2. `-fvect-cost-model=unlimited` has no effect on trunk, indicating a
capability failure rather than a cost-model decision.

3. The failure message "unsupported control flow in loop" suggests trunk's
vectorizer is rejecting the if-statement containing both the mixed-type
comparison and the conditional store, which GCC 15.2 was able to handle with
predication.

Reply via email to