Jennifer Schmitz <jschm...@nvidia.com> writes: > This patch implements the optabs reduc_and_scal_<mode>, > reduc_ior_scal_<mode>, and reduc_xor_scal_<mode> for Advanced SIMD > integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, and > EORV for fixed-width bitwise reductions. > For example, the test case > > int32_t foo (int32_t *a) > { > int32_t b = -1; > for (int i = 0; i < 4; ++i) > b &= a[i]; > return b; > } > > was previously compiled to > (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only): > foo: > ldp w2, w1, [x0] > ldp w3, w0, [x0, 8] > and w1, w1, w3 > and w0, w0, w2 > and w0, w1, w0 > ret > > With patch, it is compiled to: > foo: > ldr q31, [x0] > ptrue p7.b, all > andv s31, p7, z31.s > fmov w0, s3 > ret > > Test cases were added to check the produced assembly for use of SVE > instructions.
I would imagine that in this particular case, the scalar version is better. But I agree it's a useful feature for other cases. However, a natural follow-on would be to use SMAXV and UMAXV for V2DI. Taking that into account: > [...] > diff --git a/gcc/config/aarch64/aarch64-sve.md > b/gcc/config/aarch64/aarch64-sve.md > index bfa28849adf..0d9e5cebef0 100644 > --- a/gcc/config/aarch64/aarch64-sve.md > +++ b/gcc/config/aarch64/aarch64-sve.md > @@ -8927,6 +8927,28 @@ > "<su>addv\t%d0, %1, %2.<Vetype>" > ) > > +;; Unpredicated logical integer reductions for Advanced SIMD modes. > +(define_expand "reduc_<optab>_scal_<mode>" > + [(set (match_operand:<VEL> 0 "register_operand") > + (unspec:<VEL> [(match_dup 2) > + (match_operand:VQ_I 1 "register_operand")] > + SVE_INT_REDUCTION_LOGICAL))] > + "TARGET_SVE" > + { > + operands[2] = aarch64_ptrue_reg (<VPRED>mode); > + } > +) > + > +;; Predicated logical integer reductions for Advanced SIMD modes. > +(define_insn "*aarch64_pred_reduc_<optab>_<mode>" > + [(set (match_operand:<VEL> 0 "register_operand" "=w") > + (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl") > + (match_operand:VQ_I 2 "register_operand" "w")] > + SVE_INT_REDUCTION_LOGICAL))] > + "TARGET_SVE" > + "<sve_int_op>\t%<Vetype>0, %1, %Z2.<Vetype>" > +) > + ...I think we should avoid adding more patterns, and instead extend the existing: (define_expand "reduc_<optab>_scal_<mode>" [(set (match_operand:<VEL> 0 "register_operand") (unspec:<VEL> [(match_dup 2) (match_operand:SVE_FULL_I 1 "register_operand")] SVE_INT_REDUCTION))] "TARGET_SVE" { operands[2] = aarch64_ptrue_reg (<VPRED>mode); } ) to all Advanced SIMD integer modes except V1DI. This would involve adding a new mode iterator along the same lines as SVE_FULL_SDI_SIMD (SVE_FULL_I_SIMD?). Then we could change the name of: (define_expand "reduc_<optab>_scal_<mode>" [(match_operand:<VEL> 0 "register_operand") (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")] MAXMINV)] "TARGET_SIMD" { rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0); rtx scratch = gen_reg_rtx (<MODE>mode); emit_insn (gen_aarch64_reduc_<optab>_internal<mode> (scratch, operands[1])); emit_insn (gen_aarch64_get_lane<mode> (operands[0], scratch, elt)); DONE; } ) to something like: (define_expand "@aarch64_advsimd_<optab><mode>" The expander above would then be something like: (define_expand "reduc_<optab>_scal_<mode>" [(set (match_operand:<VEL> 0 "register_operand") (unspec:<VEL> [(match_dup 2) (match_operand:SVE_FULL_I_SIMD 1 "register_operand")] SVE_INT_REDUCTION))] "TARGET_SVE || (maybe_code_for_aarch64_advsimd (<SVE_INT_REDUCTION>, <MODE>mode) != CODE_FOR_nothing)" { if (rtx pat = maybe_gen_aarch64_advsimd (<SVE_INT_REDUCTION>, <MODE>mode)) { emit_insn (pat); DONE; } operands[2] = aarch64_ptrue_reg (<VPRED>mode); } ) which acts as a kind of union pattern for SVE and Advanced SIMD. This would handle the SMAXV and UMAXV cases as well as the ANDV, ORV and EORV cases. Not tested, so let me know if it doesn't work :) Thanks, Richard