https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <[email protected]>:

https://gcc.gnu.org/g:2f7d90ef65c6f09106c18b99a9590b8f81933115

commit r16-5587-g2f7d90ef65c6f09106c18b99a9590b8f81933115
Author: Tamar Christina <[email protected]>
Date:   Tue Nov 25 12:52:56 2025 +0000

    AArch64: Implement {cond_}vec_cbranch_{any|all} [PR118974]

    The following example:

    #define N 640
    int a[N] = {};
    int b[N] = {};
    int c[N] = {};

    void f1 (int d)
    {
      for (int i = 0; i < N; i++)
        {
          b[i] += a[i];
          if (a[i] != d)
            break;
        }
    }

    today generates with
    -Ofast -march=armv8-a+sve --param aarch64-autovec-preference=asimd-only

    .L6:
            ldr     q30, [x3, x1]
            cmeq    v31.4s, v30.4s, v27.4s
            not     v31.16b, v31.16b
            umaxp   v31.4s, v31.4s, v31.4s
            fmov    x4, d31
            cbz     x4, .L2

    Where an we use an Adv. SIMD compare and a reduction sequence to implement
    early break.  This patch implements the new optabs vec_cbranch_any and
    vec_cbranch_all in order to replace the Adv. SIMD compare and reduction
with
    an SVE flag-setting compare.

    With this patch the above generates:

            ptrue   p7.b, vl16
    .L6:
            ldr     q30, [x3, x1]
            cmpne   p15.s, p7/z, z30.s, z27.s
            b.none  .L2

    This optab could also be used for optimizing the Adv. SIMD Sequence when
SVE
    is not available.  I have a separate patch for that and will send depending
on
    if this approach is accepted or not.

    Note that for floating-point we still need the ptest as floating point SVE
    compares don't set flags.  In addition because SVE doesn't have a CMTST
    equivalent instruction we have to do an explicit AND before the compares.

    These two cases don't have a speed advantage, but do have a codesize one
    so I've left them enabled.

    This patch also eliminated PTEST on normal SVE compare and branch through
    the introduction of new optabs cond_vec_cbranch_any and
cond_vec_cbranch_all.

    In the example

    void f1 ()
    {
      for (int i = 0; i < N; i++)
        {
          b[i] += a[i];
          if (a[i] > 0)
            break;
        }
    }

    when compiled for SVE we generate:

            ld1w    z28.s, p7/z, [x4, x0, lsl 2]
            cmpgt   p14.s, p7/z, z28.s, #0
            ptest   p15, p14.b
            b.none  .L3

    Where the ptest isn't needed since the branch only cares about the Z and NZ
    flags.

    GCC Today supports eliding this through the pattern
*cmp<cmp_op><mode>_ptest
    however this pattern only supports the removal when the outermost context
is a
    CMP where the predicate is inside the condition itself.

    This typically only happens for an unpredicated CMP as a ptrue will be
generated
    during expand.

    In the case about at the GIMPLE level we have

      mask_patt_14.15_57 = vect__2.11_52 > { 0, ... };
      vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57;
      if (vec_mask_and_58 != { 0, ... })
        goto <bb 5>; [5.50%]
      else
        goto <bb 6>; [94.50%]

    where the loop mask is applied to the compare as an AND.

    The loop mask is moved into the compare by the pattern
*cmp<cmp_op><mode>_and
    which moves the mask inside if the current mask is a ptrue since
    p && true -> p.

    However this happens after combine, and so we can't both move the predicate
    inside AND eliminate the ptests.

    To fix this the middle-end will now rewrite the mask into the compare optab
    and indicate that only the CC flags are required.  This allows us to simply
    not generate the ptest at all, rather than trying to eliminate it later on.

    After this patch we generate

            ld1w    z28.s, p7/z, [x4, x0, lsl 2]
            cmpgt   p14.s, p7/z, z28.s, #0
            b.none  .L3

    gcc/ChangeLog:

            PR target/118974
            * config/aarch64/aarch64-simd.md (xor<mode>3<vczle><vczbe>): Rename
...
            (@xor<mode>3<vczle><vczbe>): .. to this.
            (cbranch<mode>4): Update comments.
            (<optab><mode>): New.
            * config/aarch64/aarch64-sve.md (cbranch<mode>4): Update comment.
            (<optab><mode>): New.
            (aarch64_ptest<mode>): Rename to ...
            (@aarch64_ptest<mode>): .. this.
            * config/aarch64/iterators.md (UNSPEC_CMP_ALL, UNSPEC_CMP_ANY,
            UNSPEC_COND_CMP_ALL, UNSPEC_COND_CMP_ANY): New.
            (optabs): Add them.
            (CBRANCH_CMP, COND_CBRANCH_CMP, cbranch_op): New.
            * config/aarch64/predicates.md (aarch64_cbranch_compare_operation):
New.

    gcc/testsuite/ChangeLog:

            PR target/118974
            * gcc.target/aarch64/sve/pr119351.c: Update codegen.
            * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise.
            * gcc.target/aarch64/vect-early-break-cbranch.c: Likewise.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_3.c: New test.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_4.c: New test.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_5.c: New test.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_7.c: New test.
            * gcc.target/aarch64/sve/vect-early-break-cbranch_8.c: New test.
            * gcc.target/aarch64/vect-early-break-cbranch_2.c: New test.
            * gcc.target/aarch64/vect-early-break-cbranch_3.c: New test.

Reply via email to