On Mon, Jul 10, 2017 at 04:49:13PM +0100, Tamar Christina wrote: > Hi All, > > As the mid-end patch has been respun I've had to respin this one as well. > So this is a new version and a ping as well. > > The patch provides AArch64 optabs for XORSIGN, both vectorized and scalar. > > This patch is a revival of a previous patch > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html > > Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues. > Regression done on aarch64-none-linux-gnu and no regressions. > > AArch64 now generates in GCC: > > movi v2.2s, 0x80, lsl 24 > and v1.8b, v1.8b, v2.8b > eor v0.8b, v0.8b, v1.8b > > as opposed to before: > > fmov s2, 1.0e+0 > mov x0, 2147483648 > fmov d3, x0 > bsl v3.8b, v1.8b, v2.8b > fmul s0, s0, s3 > > Ok for trunk?
I have a question in-line below, and your ChangeLog is out of date, but otherwise this looks good to me when the prerequisite makes it through review. > > gcc/ > 2017-07-10 Tamar Christina <tamar.christ...@arm.com> > > PR middle-end/19706 > * config/aarch64/aarch64.md (xorsign<mode>3): New optabs. > * config/aarch64/aarch64-builtins.c > (aarch64_builtin_vectorized_function): Added CASE_CFN_XORSIGN. > * config/aarch64/aarch64-simd-builtins.def: Added xorsign BINOP. These changes are no longer in the patch? > * config/aarch64/aarch64-simd.md: Added xorsign<mode>3. > > gcc/testsuite/ > 2017-07-10 Tamar Christina <tamar.christ...@arm.com> > > * gcc.target/aarch64/xorsign.c: New. > * gcc.target/aarch64/xorsign_exec.c: New. > * gcc.target/aarch64/vect-xorsign_exec.c: New. > ________________________________________ > From: gcc-patches-ow...@gcc.gnu.org <gcc-patches-ow...@gcc.gnu.org> on behalf > of Tamar Christina <tamar.christ...@arm.com> > Sent: Monday, June 12, 2017 8:56:58 AM > To: GCC Patches > Cc: nd; James Greenhalgh; Richard Earnshaw; Marcus Shawcroft > Subject: [GCC][PATCH][AArch64] Optimize x * copysign (1.0, y) [Patch (2/2)] Please don't top-post your replies like this, it makes it very confusing to read the thread. <snip old email> > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index > 1cb6eeb318716aadacb84a44aa2062d486e0186b..db6a882eb42819569a127bc4526d73e94771c970 > 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -351,6 +351,35 @@ > } > ) > > +(define_expand "xorsign<mode>3" > + [(match_operand:VHSDF 0 "register_operand") > + (match_operand:VHSDF 1 "register_operand") > + (match_operand:VHSDF 2 "register_operand")] > + "TARGET_SIMD" > +{ > + > + machine_mode imode = <V_cmp_result>mode; > + rtx v_bitmask = gen_reg_rtx (imode); > + rtx op1x = gen_reg_rtx (imode); > + rtx op2x = gen_reg_rtx (imode); > + > + rtx arg1 = lowpart_subreg (imode, operands[1], <MODE>mode); > + rtx arg2 = lowpart_subreg (imode, operands[2], <MODE>mode); > + > + int bits = GET_MODE_UNIT_BITSIZE (<MODE>mode) - 1; > + > + emit_move_insn (v_bitmask, > + aarch64_simd_gen_const_vector_dup (<V_cmp_result>mode, > + HOST_WIDE_INT_M1U << > bits)); > + > + emit_insn (gen_and<v_cmp_result>3 (op2x, v_bitmask, arg2)); > + emit_insn (gen_xor<v_cmp_result>3 (op1x, arg1, op2x)); > + emit_move_insn (operands[0], > + lowpart_subreg (<MODE>mode, op1x, imode)); > + DONE; > +} > +) > + > (define_expand "copysign<mode>3" > [(match_operand:VHSDF 0 "register_operand") > (match_operand:VHSDF 1 "register_operand") > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index > 6bdbf650d9281f95fc7fa49b38e1a6da538cdd27..583bb2af4026bec68ecd129988b9aee6918b814c > 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5000,6 +5000,42 @@ > } > ) > > +;; For xorsign (x, y), we want to generate: > +;; > +;; LDR d2, #1<<63 > +;; AND v3.8B, v1.8B, v2.8B > +;; EOR v0.8B, v0.8B, v3.8B > +;; > + > +(define_expand "xorsign<mode>3" > + [(match_operand:GPF 0 "register_operand") > + (match_operand:GPF 1 "register_operand") > + (match_operand:GPF 2 "register_operand")] > + "TARGET_FLOAT && TARGET_SIMD" > +{ > + > + machine_mode imode = <V_cmp_result>mode; > + rtx mask = gen_reg_rtx (imode); > + rtx op1x = gen_reg_rtx (imode); > + rtx op2x = gen_reg_rtx (imode); > + > + int bits = GET_MODE_BITSIZE (<MODE>mode) - 1; > + emit_move_insn (mask, GEN_INT (trunc_int_for_mode (HOST_WIDE_INT_M1U << > bits, > + imode))); If you need a trunc_int_for_mode here, why don't you also need it in the vector version above? > + emit_insn (gen_and<v_cmp_result>3 (op2x, mask, > + lowpart_subreg (imode, operands[2], > + <MODE>mode))); > + emit_insn (gen_xor<v_cmp_result>3 (op1x, > + lowpart_subreg (imode, operands[1], > + <MODE>mode), > + op2x)); > + emit_move_insn (operands[0], > + lowpart_subreg (<MODE>mode, op1x, imode)); > + DONE; > +} > +) > + > ;; ------------------------------------------------------------------- > ;; Reload support > ;; ------------------------------------------------------------------- Thanks, James