https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122103

--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:974c04dc2cb7f44705a9fd62b3b9592d7c6faca3

commit r16-6511-g974c04dc2cb7f44705a9fd62b3b9592d7c6faca3
Author: Tamar Christina <[email protected]>
Date:   Mon Jan 5 20:56:03 2026 +0000

    vect: teach vectorizable_call to predicate calls when they can trap
[PR122103]

    The following example

    void f (float *__restrict c, int *__restrict d, int n)
    {
        for (int i = 0; i < n; i++)
        {
          c[i] = __builtin_sqrtf (c[i]);
        }
    }

    compiled with -O3 -march=armv9-a -fno-math-errno -ftrapping-math needs to
be
    predicated on the conditional.  It's invalid to execute the branch and use
a
    select to extract it later unless using -fno-trapping-math.

    We currently generate:

    f:
            cmp     w2, 0
            ble     .L1
            mov     x1, 0
            whilelo p7.s, wzr, w2
            ptrue   p6.b, all
    .L3:
            ld1w    z31.s, p7/z, [x0, x1, lsl 2]
            fsqrt   z31.s, p6/m, z31.s
            st1w    z31.s, p7, [x0, x1, lsl 2]
            incw    x1
            whilelo p7.s, w1, w2
            b.any   .L3
    .L1:
            ret

    Which means the inactive lanes of the operation can raise an FE.  With this
    change we now generate

    f:
            cmp     w2, 0
            ble     .L1
            mov     x1, 0
            whilelo p7.s, wzr, w2
            .p2align 5,,15
    .L3:
            ld1w    z31.s, p7/z, [x0, x1, lsl 2]
            fsqrt   z31.s, p7/m, z31.s
            st1w    z31.s, p7, [x0, x1, lsl 2]
            incw    x1
            whilelo p7.s, w1, w2
            b.any   .L3
    .L1:
            ret

    However as discussed in PR96373 while we probably shouldn't vectorize for
the
    cases where we can trap but don't support conditional operation there
doesn't
    seem to be a clear consensus on how GCC should handle trapping math.

    As such similar to PR96373 I don't stop vectorization if trapping math and
    the conditional operation isn't supported.

    gcc/ChangeLog:

            PR tree-optimization/122103
            * tree-vect-stmts.cc (vectorizable_call): Handle trapping math.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/122103
            * gcc.target/aarch64/sve/pr122103_4.c: New test.
            * gcc.target/aarch64/sve/pr122103_5.c: New test.
            * gcc.target/aarch64/sve/pr122103_6.c: New test.

Reply via email to