https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96373

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:7486fe153adaa868f36248b72f3e78d18b1b3ba1

commit r13-5458-g7486fe153adaa868f36248b72f3e78d18b1b3ba1
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Fri Jan 27 17:03:51 2023 +0000

    Add support for conditional xorsign [PR96373]

    This patch is an optimisation, but it's also a prerequisite for
    fixing PR96373 without regressing vect-xorsign_exec.c.

    Currently the vectoriser vectorises:

      for (i = 0; i < N; i++)
        r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);

    as two unconditional operations (copysign and mult).
    tree-ssa-math-opts.cc later combines them into an "xorsign" function.
    This works for both Advanced SIMD and SVE.

    However, with the fix for PR96373, the vectoriser will instead
    generate a conditional multiplication (IFN_COND_MUL).  Something then
    needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
    xorsign.  Three obvious options were:

    (1) Extend tree-ssa-math-opts.cc.
    (2) Do the fold in match.pd.
    (3) Leave it to rtl combine.

    I'm against (3), because this isn't a target-specific optimisation.
    (1) would be possible, but would involve open-coding a lot of what
    match.pd does for us.  And, in contrast to doing the current
    tree-ssa-math-opts.cc optimisation in match.pd, there should be
    no danger of (2) happening too early.  If we have an IFN_COND_MUL
    then we're already past the stage of simplifying the original
    source code.

    There was also a choice between adding a conditional xorsign ifn
    and simply open-coding the xorsign.  The latter seems simpler,
    and means less boiler-plate for target-specific code.

    The signed_or_unsigned_type_for change is needed to make sure
    that we stay in "SVE space" when doing the optimisation on 128-bit
    fixed-length SVE.

    gcc/
            PR tree-optimization/96373
            * tree.h (sign_mask_for): Declare.
            * tree.cc (sign_mask_for): New function.
            (signed_or_unsigned_type_for): For vector types, try to use the
            related_int_vector_mode.
            * genmatch.cc (commutative_op): Handle conditional internal
functions.
            * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.

    gcc/testsuite/
            PR tree-optimization/96373
            * gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
            * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.

Reply via email to