https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118974
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Tamar Christina <[email protected]>: https://gcc.gnu.org/g:2f7d90ef65c6f09106c18b99a9590b8f81933115 commit r16-5587-g2f7d90ef65c6f09106c18b99a9590b8f81933115 Author: Tamar Christina <[email protected]> Date: Tue Nov 25 12:52:56 2025 +0000 AArch64: Implement {cond_}vec_cbranch_{any|all} [PR118974] The following example: #define N 640 int a[N] = {}; int b[N] = {}; int c[N] = {}; void f1 (int d) { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] != d) break; } } today generates with -Ofast -march=armv8-a+sve --param aarch64-autovec-preference=asimd-only .L6: ldr q30, [x3, x1] cmeq v31.4s, v30.4s, v27.4s not v31.16b, v31.16b umaxp v31.4s, v31.4s, v31.4s fmov x4, d31 cbz x4, .L2 Where an we use an Adv. SIMD compare and a reduction sequence to implement early break. This patch implements the new optabs vec_cbranch_any and vec_cbranch_all in order to replace the Adv. SIMD compare and reduction with an SVE flag-setting compare. With this patch the above generates: ptrue p7.b, vl16 .L6: ldr q30, [x3, x1] cmpne p15.s, p7/z, z30.s, z27.s b.none .L2 This optab could also be used for optimizing the Adv. SIMD Sequence when SVE is not available. I have a separate patch for that and will send depending on if this approach is accepted or not. Note that for floating-point we still need the ptest as floating point SVE compares don't set flags. In addition because SVE doesn't have a CMTST equivalent instruction we have to do an explicit AND before the compares. These two cases don't have a speed advantage, but do have a codesize one so I've left them enabled. This patch also eliminated PTEST on normal SVE compare and branch through the introduction of new optabs cond_vec_cbranch_any and cond_vec_cbranch_all. In the example void f1 () { for (int i = 0; i < N; i++) { b[i] += a[i]; if (a[i] > 0) break; } } when compiled for SVE we generate: ld1w z28.s, p7/z, [x4, x0, lsl 2] cmpgt p14.s, p7/z, z28.s, #0 ptest p15, p14.b b.none .L3 Where the ptest isn't needed since the branch only cares about the Z and NZ flags. GCC Today supports eliding this through the pattern *cmp<cmp_op><mode>_ptest however this pattern only supports the removal when the outermost context is a CMP where the predicate is inside the condition itself. This typically only happens for an unpredicated CMP as a ptrue will be generated during expand. In the case about at the GIMPLE level we have mask_patt_14.15_57 = vect__2.11_52 > { 0, ... }; vec_mask_and_58 = loop_mask_48 & mask_patt_14.15_57; if (vec_mask_and_58 != { 0, ... }) goto <bb 5>; [5.50%] else goto <bb 6>; [94.50%] where the loop mask is applied to the compare as an AND. The loop mask is moved into the compare by the pattern *cmp<cmp_op><mode>_and which moves the mask inside if the current mask is a ptrue since p && true -> p. However this happens after combine, and so we can't both move the predicate inside AND eliminate the ptests. To fix this the middle-end will now rewrite the mask into the compare optab and indicate that only the CC flags are required. This allows us to simply not generate the ptest at all, rather than trying to eliminate it later on. After this patch we generate ld1w z28.s, p7/z, [x4, x0, lsl 2] cmpgt p14.s, p7/z, z28.s, #0 b.none .L3 gcc/ChangeLog: PR target/118974 * config/aarch64/aarch64-simd.md (xor<mode>3<vczle><vczbe>): Rename ... (@xor<mode>3<vczle><vczbe>): .. to this. (cbranch<mode>4): Update comments. (<optab><mode>): New. * config/aarch64/aarch64-sve.md (cbranch<mode>4): Update comment. (<optab><mode>): New. (aarch64_ptest<mode>): Rename to ... (@aarch64_ptest<mode>): .. this. * config/aarch64/iterators.md (UNSPEC_CMP_ALL, UNSPEC_CMP_ANY, UNSPEC_COND_CMP_ALL, UNSPEC_COND_CMP_ANY): New. (optabs): Add them. (CBRANCH_CMP, COND_CBRANCH_CMP, cbranch_op): New. * config/aarch64/predicates.md (aarch64_cbranch_compare_operation): New. gcc/testsuite/ChangeLog: PR target/118974 * gcc.target/aarch64/sve/pr119351.c: Update codegen. * gcc.target/aarch64/sve/vect-early-break-cbranch.c: Likewise. * gcc.target/aarch64/vect-early-break-cbranch.c: Likewise. * gcc.target/aarch64/sve/vect-early-break-cbranch_2.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_3.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_4.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_5.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_7.c: New test. * gcc.target/aarch64/sve/vect-early-break-cbranch_8.c: New test. * gcc.target/aarch64/vect-early-break-cbranch_2.c: New test. * gcc.target/aarch64/vect-early-break-cbranch_3.c: New test.
