[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

cvs-commit at gcc dot gnu.org via Gcc-bugs Thu, 16 May 2024 05:09:58 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51492


--- Comment #20 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <pa...@gcc.gnu.org>:

https://gcc.gnu.org/g:d4dee347b3fe1982bab26485ff31cd039c9df010

commit r15-577-gd4dee347b3fe1982bab26485ff31cd039c9df010
Author: Pan Li <pan2...@intel.com>
Date:   Wed May 15 10:14:06 2024 +0800

    Vect: Support new IFN SAT_ADD for unsigned vector int

    For vectorize, we leverage the existing vect pattern recog to find
    the pattern similar to scalar and let the vectorizer to perform
    the rest part for standard name usadd<mode>3 in vector mode.
    The riscv vector backend have insn "Vector Single-Width Saturating
    Add and Subtract" which can be leveraged when expand the usadd<mode>3
    in vector mode.  For example:

    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      unsigned i;

      for (i = 0; i < n; i++)
        out[i] = (x[i] + y[i]) | (- (uint64_t)((uint64_t)(x[i] + y[i]) <
x[i]));
    }

    Before this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _80 = .SELECT_VL (ivtmp_78, POLY_INT_CST [2, 2]);
      ivtmp_58 = _80 * 8;
      vect__4.7_61 = .MASK_LEN_LOAD (vectp_x.5_59, 64B, { -1, ... }, _80, 0);
      vect__6.10_65 = .MASK_LEN_LOAD (vectp_y.8_63, 64B, { -1, ... }, _80, 0);
      vect__7.11_66 = vect__4.7_61 + vect__6.10_65;
      mask__8.12_67 = vect__4.7_61 > vect__7.11_66;
      vect__12.15_72 = .VCOND_MASK (mask__8.12_67, { 18446744073709551615,
        ... }, vect__7.11_66);
      .MASK_LEN_STORE (vectp_out.16_74, 64B, { -1, ... }, _80, 0,
vect__12.15_72);
      vectp_x.5_60 = vectp_x.5_59 + ivtmp_58;
      vectp_y.8_64 = vectp_y.8_63 + ivtmp_58;
      vectp_out.16_75 = vectp_out.16_74 + ivtmp_58;
      ivtmp_79 = ivtmp_78 - _80;
      ...
    }

    After this patch:
    void vec_sat_add_u64 (uint64_t *out, uint64_t *x, uint64_t *y, unsigned n)
    {
      ...
      _62 = .SELECT_VL (ivtmp_60, POLY_INT_CST [2, 2]);
      ivtmp_46 = _62 * 8;
      vect__4.7_49 = .MASK_LEN_LOAD (vectp_x.5_47, 64B, { -1, ... }, _62, 0);
      vect__6.10_53 = .MASK_LEN_LOAD (vectp_y.8_51, 64B, { -1, ... }, _62, 0);
      vect__12.11_54 = .SAT_ADD (vect__4.7_49, vect__6.10_53);
      .MASK_LEN_STORE (vectp_out.12_56, 64B, { -1, ... }, _62, 0,
vect__12.11_54);
      ...
    }

    The below test suites are passed for this patch.
    * The riscv fully regression tests.
    * The x86 bootstrap tests.
    * The x86 fully regression tests.

            PR target/51492
            PR target/112600

    gcc/ChangeLog:

            * tree-vect-patterns.cc (gimple_unsigned_integer_sat_add): New
            func decl generated by match.pd match.
            (vect_recog_sat_add_pattern): New func impl to recog the pattern
            for unsigned SAT_ADD.

    Signed-off-by: Pan Li <pan2...@intel.com>

[Bug tree-optimization/51492] vectorizer does not support saturated arithmetic patterns

Reply via email to