Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

Richard Sandiford Tue, 01 Oct 2024 07:57:50 -0700

Jennifer Schmitz <jschm...@nvidia.com> writes:
> This patch implements the optabs reduc_and_scal_<mode>,
> reduc_ior_scal_<mode>, and reduc_xor_scal_<mode> for Advanced SIMD
> integers for TARGET_SVE in order to use the SVE instructions ANDV, ORV, and
> EORV for fixed-width bitwise reductions.
> For example, the test case
>
> int32_t foo (int32_t *a)
> {
>   int32_t b = -1;
>   for (int i = 0; i < 4; ++i)
>     b &= a[i];
>   return b;
> }
>
> was previously compiled to
> (-O2 -ftree-vectorize --param aarch64-autovec-preference=asimd-only):
> foo:
>         ldp     w2, w1, [x0]
>         ldp     w3, w0, [x0, 8]
>         and     w1, w1, w3
>         and     w0, w0, w2
>         and     w0, w1, w0
>         ret
>
> With patch, it is compiled to:
> foo:
>         ldr     q31, [x0]
>       ptrue   p7.b, all
>       andv    s31, p7, z31.s
>       fmov    w0, s3
>       ret
>
> Test cases were added to check the produced assembly for use of SVE
> instructions.


I would imagine that in this particular case, the scalar version is
better.  But I agree it's a useful feature for other cases.

However, a natural follow-on would be to use SMAXV and UMAXV for V2DI.
Taking that into account:

> [...]
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index bfa28849adf..0d9e5cebef0 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -8927,6 +8927,28 @@
>    "<su>addv\t%d0, %1, %2.<Vetype>"
>  )
>  
> +;; Unpredicated logical integer reductions for Advanced SIMD modes.
> +(define_expand "reduc_<optab>_scal_<mode>"
> +  [(set (match_operand:<VEL> 0 "register_operand")
> +     (unspec:<VEL> [(match_dup 2)
> +                    (match_operand:VQ_I 1 "register_operand")]
> +                   SVE_INT_REDUCTION_LOGICAL))]
> +  "TARGET_SVE"
> +  {
> +    operands[2] = aarch64_ptrue_reg (<VPRED>mode);
> +  }
> +)
> +
> +;; Predicated logical integer reductions for Advanced SIMD modes.
> +(define_insn "*aarch64_pred_reduc_<optab>_<mode>"
> +  [(set (match_operand:<VEL> 0 "register_operand" "=w")
> +     (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
> +                    (match_operand:VQ_I 2 "register_operand" "w")]
> +                   SVE_INT_REDUCTION_LOGICAL))]
> +  "TARGET_SVE"
> +  "<sve_int_op>\t%<Vetype>0, %1, %Z2.<Vetype>"
> +)
> +

...I think we should avoid adding more patterns, and instead extend
the existing:

(define_expand "reduc_<optab>_scal_<mode>"
  [(set (match_operand:<VEL> 0 "register_operand")
        (unspec:<VEL> [(match_dup 2)
                       (match_operand:SVE_FULL_I 1 "register_operand")]
                      SVE_INT_REDUCTION))]
  "TARGET_SVE"
  {
    operands[2] = aarch64_ptrue_reg (<VPRED>mode);
  }
)

to all Advanced SIMD integer modes except V1DI.  This would involve
adding a new mode iterator along the same lines as SVE_FULL_SDI_SIMD
(SVE_FULL_I_SIMD?).

Then we could change the name of:

(define_expand "reduc_<optab>_scal_<mode>"
  [(match_operand:<VEL> 0 "register_operand")
   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")]
                    MAXMINV)]
  "TARGET_SIMD"
  {
    rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
    rtx scratch = gen_reg_rtx (<MODE>mode);
    emit_insn (gen_aarch64_reduc_<optab>_internal<mode> (scratch,
                                                         operands[1]));
    emit_insn (gen_aarch64_get_lane<mode> (operands[0], scratch, elt));
    DONE;
  }
)

to something like:

(define_expand "@aarch64_advsimd_<optab><mode>"

The expander above would then be something like:

(define_expand "reduc_<optab>_scal_<mode>"
  [(set (match_operand:<VEL> 0 "register_operand")
        (unspec:<VEL> [(match_dup 2)
                       (match_operand:SVE_FULL_I_SIMD 1 "register_operand")]
                      SVE_INT_REDUCTION))]
  "TARGET_SVE
   || (maybe_code_for_aarch64_advsimd (<SVE_INT_REDUCTION>, <MODE>mode)
       != CODE_FOR_nothing)"
  {
    if (rtx pat = maybe_gen_aarch64_advsimd (<SVE_INT_REDUCTION>, <MODE>mode))
      {
        emit_insn (pat);
        DONE;
      }
    operands[2] = aarch64_ptrue_reg (<VPRED>mode);
  }
)

which acts as a kind of union pattern for SVE and Advanced SIMD.

This would handle the SMAXV and UMAXV cases as well as the ANDV, ORV
and EORV cases.

Not tested, so let me know if it doesn't work :)

Thanks,
Richard

Re: [PATCH] [PR113816] AArch64: Use SVE bit op reduction for vector reductions

Reply via email to