Richard Biener <richard.guent...@gmail.com> writes: >> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, >> } > >> if (negate_p) >> - mulop1 = force_gimple_operand_gsi (&gsi, >> - build1 (NEGATE_EXPR, >> - type, mulop1), >> - true, NULL_TREE, true, >> - GSI_SAME_STMT); >> + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > >> - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), >> - FMA_EXPR, mulop1, op2, addop); >> + if (seq) >> + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); >> + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, > addop); >> + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); >> + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal > (use_stmt)); >> + gsi_replace (&gsi, fma_stmt, true); >> + /* Valueize aggressively so that we generate FMS, FNMA and FNMS >> + regardless of where the negation occurs. */ >> + if (fold_stmt (&gsi, aggressive_valueize)) >> + update_stmt (gsi_stmt (gsi)); > > I think it would be nice to be able to use gimple_build () with IFNs so you > can > gimple_build () the IFN and then use gsi_replace_with_seq () on it. You > only need to fold with generated negates, not with negates already in the > IL? > The the folding implied with gimple_build will take care of it.
The idea was to pick up existing negates that feed the multiplication as well as any added by the pass itself. On IRC yesterday we talked about how this should handle the ECF_NOTHROW flag, and whether things like IFN_SQRT and IFN_FMA should always be nothrow (like the built-in functions are). But in the end I thought it'd be better to keep things as they are. We already handle -fnon-call-exceptions for unfused a * b + c and before the patch also handled it for FMA_EXPR. It'd seem like a step backwards if the new internal functions didn't handle it too. If anything it seems like the built-in functions should change to be closer to the tree_code and internal_fn way of doing things, if we want to support -fnon-call-exceptions properly. This also surprised me when doing the if-conversion patch I sent yesterday. We're happy to vectorise: for (int i = 0; i < 100; ++i) x[i] = ... ? sqrt (x[i]) : 0; by doing the sqrt unconditionally and selecting on the result, even with the default maths flags, but refuse to vectorise the simpler: for (int i = 0; i < 100; ++i) x[i] = ... ? x[i] + 1 : 0; in the same way. > Otherwise can you please move aggressive_valueize to gimple-fold.[ch] > alongside no_follow_ssa_edges / follow_single_use_edges and maybe > rename it as follow_all_ssa_edges? Ah, yeah, that's definitely a better name. I also renamed all_scalar_fma to scalar_all_fma, since I realised after Andrew's reply that the old name made it sound like it was "all scalars", whereas it meant to mean "all fmas". Tested as before. Thanks, Richard 2018-05-17 Richard Sandiford <richard.sandif...@linaro.org> gcc/ * doc/sourcebuild.texi (scalar_all_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * gimple-fold.h (follow_all_ssa_edges): Declare. * gimple-fold.c (follow_all_ssa_edges): New function. * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the gimple_build interface and use follow_all_ssa_edges to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_scalar_all_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require scalar_all_fma. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2018-05-16 12:48:59.410941892 +0100 +++ gcc/doc/sourcebuild.texi 2018-05-17 09:18:19.954942948 +0100 @@ -2251,6 +2251,11 @@ Target supports @option{-pie}, @option{- @item rdynamic Target supports @option{-rdynamic}. +@item scalar_all_fma +Target supports all four fused multiply-add optabs for both @code{float} +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, +@code{fnma_optab} and @code{fnms_optab}. + @item section_anchors Target supports section anchors. Index: gcc/tree.def =================================================================== --- gcc/tree.def 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree.def 2018-05-17 09:18:19.987942174 +0100 @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid by the second argument. */ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) -/* Fused multiply-add. - All operands and the result are of the same type. No intermediate - rounding is performed after multiplying operand one with operand two - before adding operand three. */ -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) - /* Widening vector multiplication. The two operands are vectors with N elements of size S. Multiplying the elements of the two vectors will result in N products of size 2*S. Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-05-16 12:48:59.410941892 +0100 +++ gcc/internal-fn.def 2018-05-17 09:18:19.974942479 +0100 @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. - unary: a normal unary optab, such as vec_reverse_<mode> - binary: a normal binary optab, such as vec_interleave_lo_<mode> + - ternary: a normal ternary optab, such as fma<mode>4 - cond_binary: a conditional binary optab, such as add<mode>cc @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) + DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) +/* Ternary math functions. */ +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) + /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-05-17 09:17:58.757608747 +0100 +++ gcc/internal-fn.c 2018-05-17 09:18:19.974942479 +0100 @@ -110,6 +110,7 @@ #define mask_store_lanes_direct { 0, 0, #define scatter_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } +#define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } #define while_direct { 0, 2, false } @@ -2982,6 +2983,9 @@ #define expand_unary_optab_fn(FN, STMT, #define expand_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 3) + #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) @@ -3067,6 +3071,7 @@ multi_vector_optab_supported_p (convert_ #define direct_unary_optab_supported_p direct_optab_supported_p #define direct_binary_optab_supported_p direct_optab_supported_p +#define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2018-05-16 12:49:10.495216143 +0100 +++ gcc/Makefile.in 2018-05-17 09:18:19.936943371 +0100 @@ -2787,7 +2787,7 @@ build/genmddump.o : genmddump.c $(RTL_BA $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ - tree.def builtins.def internal-fn.def + tree.def builtins.def internal-fn.def case-cfn-macros.h build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ internal-fn.def Index: gcc/builtins.c =================================================================== --- gcc/builtins.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/builtins.c 2018-05-17 09:18:19.939943300 +0100 @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a return fold_build1_loc (loc, ABS_EXPR, type, arg); } -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ - -static tree -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) -{ - /* ??? Only expand to FMA_EXPR if it's directly supported. */ - if (validate_arg (arg0, REAL_TYPE) - && validate_arg (arg1, REAL_TYPE) - && validate_arg (arg2, REAL_TYPE) - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); - - return NULL_TREE; -} - /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ static tree @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd CASE_FLT_FN (BUILT_IN_SINCOS): return fold_builtin_sincos (loc, arg0, arg1, arg2); - CASE_FLT_FN (BUILT_IN_FMA): - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): - return fold_builtin_fma (loc, arg0, arg1, arg2, type); - CASE_FLT_FN (BUILT_IN_REMQUO): if (validate_arg (arg0, REAL_TYPE) && validate_arg (arg1, REAL_TYPE) Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/cfgexpand.c 2018-05-17 09:18:19.941943253 +0100 @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: goto ternary; case TRUTH_ANDIF_EXPR: @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) } return NULL; - case FMA_EXPR: - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); - default: flag_unsupported: if (flag_checking) Index: gcc/expr.c =================================================================== --- gcc/expr.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/expr.c 2018-05-17 09:18:19.957942878 +0100 @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); - case FMA_EXPR: - { - optab opt = fma_optab; - gimple *def0, *def2; - - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} - call. */ - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) - { - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); - tree call_expr; - - gcc_assert (fn != NULL_TREE); - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); - return expand_builtin (call_expr, target, subtarget, mode, false); - } - - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); - /* The multiplication is commutative - look at its 2nd operand - if the first isn't fed by a negate. */ - if (!def0) - { - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); - /* Swap operands if the 2nd operand is fed by a negate. */ - if (def0) - std::swap (treeop0, treeop1); - } - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); - - op0 = op2 = NULL; - - if (def0 && def2 - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) - { - opt = fnms_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - else if (def0 - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) - { - opt = fnma_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - } - else if (def2 - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) - { - opt = fms_optab; - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - - if (op0 == NULL) - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); - if (op2 == NULL) - op2 = expand_normal (treeop2); - op1 = expand_normal (treeop1); - - return expand_ternary_op (TYPE_MODE (type), opt, - op0, op1, op2, target, 0); - } - case MULT_EXPR: /* If this is a fixed-point operation, then we cannot use the code below because "expand_mult" doesn't support sat/no-sat fixed-point Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const.c 2018-05-17 09:18:19.959942831 +0100 @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ case TRUTH_ORIF_EXPR: return OP_SAME (0) && OP_SAME (1); - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: if (!OP_SAME (2)) @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t return NULL_TREE; - case FMA_EXPR: - /* For integers we can decompose the FMA if possible. */ - if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) - return fold_build2_loc (loc, PLUS_EXPR, type, - const_binop (MULT_EXPR, arg0, arg1), arg2); - if (integer_zerop (arg2)) - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); - - return fold_fma (loc, type, arg0, arg1, arg2); - case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { Index: gcc/gimple-pretty-print.c =================================================================== --- gcc/gimple-pretty-print.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/gimple-pretty-print.c 2018-05-17 09:18:19.963942737 +0100 @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer pp_greater (buffer); break; - case FMA_EXPR: - if (flags & TDF_GIMPLE) - { - pp_string (buffer, "__FMA ("); - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - pp_right_paren (buffer); - } - else - { - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_string (buffer, " * "); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_string (buffer, " + "); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - } - break; - case DOT_PROD_EXPR: pp_string (buffer, "DOT_PROD_EXPR <"); dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); Index: gcc/gimple.c =================================================================== --- gcc/gimple.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/gimple.c 2018-05-17 09:18:19.966942667 +0100 @@ -2150,8 +2150,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N || (SYM) == REALIGN_LOAD_EXPR \ || (SYM) == VEC_COND_EXPR \ || (SYM) == VEC_PERM_EXPR \ - || (SYM) == BIT_INSERT_EXPR \ - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ : ((SYM) == CONSTRUCTOR \ || (SYM) == OBJ_TYPE_REF \ || (SYM) == ASSERT_EXPR \ Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/gimplify.c 2018-05-17 09:18:19.972942526 +0100 @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq } break; - case FMA_EXPR: case VEC_PERM_EXPR: /* Classified as tcc_expression. */ goto expr_3; Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/optabs-tree.c 2018-05-17 09:18:19.975942456 +0100 @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code : (TYPE_SATURATING (type) ? ssmsub_widen_optab : smsub_widen_optab)); - case FMA_EXPR: - return fma_optab; - case VEC_WIDEN_MULT_HI_EXPR: return TYPE_UNSIGNED (type) ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2018-05-16 12:49:10.497382733 +0100 +++ gcc/tree-cfg.c 2018-05-17 09:18:19.979942362 +0100 @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s } break; - case FMA_EXPR: - if (!useless_type_conversion_p (lhs_type, rhs1_type) - || !useless_type_conversion_p (lhs_type, rhs2_type) - || !useless_type_conversion_p (lhs_type, rhs3_type)) - { - error ("type mismatch in fused multiply-add expression"); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - debug_generic_expr (rhs2_type); - debug_generic_expr (rhs3_type); - return true; - } - break; - case VEC_COND_EXPR: if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), Index: gcc/tree-eh.c =================================================================== --- gcc/tree-eh.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-eh.c 2018-05-17 09:18:19.980942338 +0100 @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o if (TREE_CODE_CLASS (op) != tcc_comparison && TREE_CODE_CLASS (op) != tcc_unary - && TREE_CODE_CLASS (op) != tcc_binary - && op != FMA_EXPR) + && TREE_CODE_CLASS (op) != tcc_binary) return false; return operation_could_trap_helper_p (op, fp_operation, honor_trapv, @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) if (TREE_CODE_CLASS (code) == tcc_comparison || TREE_CODE_CLASS (code) == tcc_unary - || TREE_CODE_CLASS (code) == tcc_binary - || code == FMA_EXPR) + || TREE_CODE_CLASS (code) == tcc_binary) { if (TREE_CODE_CLASS (code) == tcc_comparison) t = TREE_TYPE (gimple_assign_rhs1 (stmt)); Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-inline.c 2018-05-17 09:18:19.980942338 +0100 @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c case MINUS_EXPR: case MULT_EXPR: case MULT_HIGHPART_EXPR: - case FMA_EXPR: case ADDR_SPACE_CONVERT_EXPR: case FIXED_CONVERT_EXPR: Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2018-05-17 09:17:58.758608713 +0100 +++ gcc/tree-pretty-print.c 2018-05-17 09:18:19.981942315 +0100 @@ -2904,16 +2904,6 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; - case FMA_EXPR: - pp_string (pp, " FMA_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); - pp_string (pp, " > "); - break; - case OACC_PARALLEL: pp_string (pp, "#pragma acc parallel"); goto dump_omp_clauses_body; @@ -3552,7 +3542,6 @@ op_code_prio (enum tree_code code) case CEIL_MOD_EXPR: case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR: - case FMA_EXPR: return 13; case TRUTH_NOT_EXPR: Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-loop-im.c 2018-05-17 09:18:19.981942315 +0100 @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: Index: gcc/tree-ssa-operands.c =================================================================== --- gcc/tree-ssa-operands.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-operands.c 2018-05-17 09:18:19.982942291 +0100 @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, case REALIGN_LOAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: { get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); Index: gcc/tree.c =================================================================== --- gcc/tree.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree.c 2018-05-17 09:18:19.986942197 +0100 @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h flags &= ~OEP_ADDRESS_OF; break; - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: { Index: gcc/fold-const-call.h =================================================================== --- gcc/fold-const-call.h 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const-call.h 2018-05-17 09:18:19.957942878 +0100 @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H tree fold_const_call (combined_fn, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree, tree); -tree fold_fma (location_t, tree, tree, tree, tree); tree build_cmp_result (tree type, int res); #endif Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const-call.c 2018-05-17 09:18:19.957942878 +0100 @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result CASE_CFN_FMA_FN: return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); + case CFN_FMS: + { + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); + } + + case CFN_FNMA: + { + real_value new_arg0 = real_value_negate (arg0); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); + } + + case CFN_FNMS: + { + real_value new_arg0 = real_value_negate (arg0); + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, + &new_arg2, format); + } + default: return false; } @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty return fold_const_call_1 (fn, type, arg0, arg1, arg2); } } - -/* Fold a fma operation with arguments ARG[012]. */ - -tree -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) -{ - REAL_VALUE_TYPE result; - if (real_cst_p (arg0) - && real_cst_p (arg1) - && real_cst_p (arg2) - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), - REAL_MODE_FORMAT (TYPE_MODE (type)))) - return build_real (type, result); - - return NULL_TREE; -} Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/genmatch.c 2018-05-17 09:18:19.960942808 +0100 @@ -241,6 +241,20 @@ enum internal_fn { IFN_LAST }; +enum combined_fn { +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ + CFN_##ENUM = int (ENUM), +#include "builtins.def" + +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), +#include "internal-fn.def" + + CFN_LAST +}; + +#include "case-cfn-macros.h" + /* Return true if CODE represents a commutative tree code. Otherwise return false. */ bool @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * return id->kind == id_base::USER; } +/* If ID has a pair of consecutive, commutative operands, return the + index of the first, otherwise return -1. */ + +static int +commutative_op (id_base *id) +{ + if (operator_id *code = dyn_cast <operator_id *> (id)) + { + if (commutative_tree_code (code->code) + || commutative_ternary_tree_code (code->code)) + return 0; + return -1; + } + if (fn_id *fn = dyn_cast <fn_id *> (id)) + switch (fn->fn) + { + CASE_CFN_FMA: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: + return 0; + + default: + return -1; + } + if (user_id *uid = dyn_cast<user_id *> (id)) + { + int res = commutative_op (uid->substitutes[0]); + if (res < 0) + return 0; + for (unsigned i = 1; i < uid->substitutes.length (); ++i) + if (res != commutative_op (uid->substitutes[i])) + return -1; + return res; + } + return -1; +} + /* Add a predicate identifier to the hash. */ static predicate_id * @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id if (!e->is_commutative) return ret; + /* The operation is always binary if it isn't inherently commutative. */ + int natural_opno = commutative_op (e->operation); + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; for (unsigned i = 0; i < result.length (); ++i) { expr *ne = new expr (e); @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id } } ne->is_commutative = false; - // result[i].length () is 2 since e->operation is binary - for (unsigned j = result[i].length (); j; --j) - ne->append_op (result[i][j-1]); + for (unsigned j = 0; j < result[i].length (); ++j) + { + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); + ne->append_op (result[i][old_j]); + } ret.safe_push (ne); } @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in /* While the toplevel operands are canonicalized by the caller after valueizing operands of sub-expressions we have to re-canonicalize operand order. */ - if (operator_id *code = dyn_cast <operator_id *> (id)) + int opno = commutative_op (id); + if (opno >= 0) { - /* ??? We can't canonicalize tcc_comparison operands here - because that requires changing the comparison code which - we already matched... */ - if (commutative_tree_code (code->code) - || commutative_ternary_tree_code (code->code)) - { - char child_opname0[20], child_opname1[20]; - gen_opname (child_opname0, 0); - gen_opname (child_opname1, 1); - fprintf_indent (f, indent, - "if (tree_swap_operands_p (%s, %s))\n", - child_opname0, child_opname1); - fprintf_indent (f, indent, - " std::swap (%s, %s);\n", - child_opname0, child_opname1); - } + char child_opname0[20], child_opname1[20]; + gen_opname (child_opname0, opno); + gen_opname (child_opname1, opno + 1); + fprintf_indent (f, indent, + "if (tree_swap_operands_p (%s, %s))\n", + child_opname0, child_opname1); + fprintf_indent (f, indent, + " std::swap (%s, %s);\n", + child_opname0, child_opname1); } return n_braces; @@ -4217,11 +4267,14 @@ parser::parse_expr () e->operation->id, e->operation->nargs, e->ops.length ()); if (is_commutative) { - if (e->ops.length () == 2) + if (e->ops.length () == 2 + || commutative_op (e->operation) >= 0) e->is_commutative = true; else - fatal_at (token, "only binary operators or function with " - "two arguments can be marked commutative"); + fatal_at (token, "only binary operators or functions with " + "two arguments can be marked commutative, " + "unless the operation is known to be inherently " + "commutative"); } e->expr_type = expr_type; return op; Index: gcc/gimple-ssa-backprop.c =================================================================== --- gcc/gimple-ssa-backprop.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/gimple-ssa-backprop.c 2018-05-17 09:18:19.963942737 +0100 @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal CASE_CFN_FMA: CASE_CFN_FMA_FN: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: /* In X * X + Y, where Y is distinct from X, the sign of X doesn't matter. */ if (gimple_call_arg (call, 0) == rhs @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a } break; - case FMA_EXPR: - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't - matter. */ - if (gimple_assign_rhs1 (assign) == rhs - && gimple_assign_rhs2 (assign) == rhs - && gimple_assign_rhs3 (assign) != rhs) - info->flags.ignore_sign = true; - break; - case MULT_EXPR: /* In X * X, the sign of X doesn't matter. */ if (gimple_assign_rhs1 (assign) == rhs Index: gcc/hsa-gen.c =================================================================== --- gcc/hsa-gen.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/hsa-gen.c 2018-05-17 09:18:19.973942503 +0100 @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( case NEGATE_EXPR: opcode = BRIG_OPCODE_NEG; break; - case FMA_EXPR: - /* There is a native HSA instruction for scalar FMAs but not for vector - ones. */ - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) - { - hsa_op_reg *dest - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); - return; - } - opcode = BRIG_OPCODE_MAD; - break; case MIN_EXPR: opcode = BRIG_OPCODE_MIN; break; @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb insn->set_output_in_type (dest, 0, hbb); } +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. + Instructions are appended to basic block HBB. NEGATE1 is true for + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ + +static void +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) +{ + tree lhs = gimple_call_lhs (call); + if (lhs == NULL_TREE) + return; + + tree rhs1 = gimple_call_arg (call, 0); + tree rhs2 = gimple_call_arg (call, 1); + tree rhs3 = gimple_call_arg (call, 2); + + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + + if (negate1) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); + op1 = tmp; + } + + /* There is a native HSA instruction for scalar FMAs but not for vector + ones. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, + dest, tmp, op3, hbb); + } + else + { + if (negate3) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); + op3 = tmp; + } + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, + dest->m_type, dest, + op1, op2, op3); + hbb->append_insn (insn); + } +} + /* Set VALUE to a shadow kernel debug argument and append a new instruction to HBB basic block. */ @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); break; + case IFN_FMA: + gen_hsa_fma (stmt, hbb, false, false); + break; + + case IFN_FMS: + gen_hsa_fma (stmt, hbb, false, true); + break; + + case IFN_FNMA: + gen_hsa_fma (stmt, hbb, true, false); + break; + + case IFN_FNMS: + gen_hsa_fma (stmt, hbb, true, true); + break; + default: HSA_SORRY_ATV (gimple_location (stmt), "support for HSA does not implement internal function: %s", Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-05-16 12:48:59.410941892 +0100 +++ gcc/match.pd 2018-05-17 09:18:19.974942479 +0100 @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +(for fmas (FMA) + (simplify + (fmas:c (negate @0) @1 @2) + (IFN_FNMA @0 @1 @2)) + (simplify + (fmas @0 @1 (negate @2)) + (IFN_FMS @0 @1 @2)) + (simplify + (fmas:c (negate @0) @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) + (simplify + (negate (fmas@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMS @0 @1 @2)))) + +(simplify + (IFN_FMS:c (negate @0) @1 @2) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FMS @0 @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FMS:c (negate @0) @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (negate (IFN_FMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMA @0 @1 @2))) + +(simplify + (IFN_FNMA:c (negate @0) @1 @2) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FNMA @0 @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FNMA:c (negate @0) @1 (negate @2)) + (IFN_FMS @0 @1 @2)) +(simplify + (negate (IFN_FNMA@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMS @0 @1 @2))) + +(simplify + (IFN_FNMS:c (negate @0) @1 @2) + (IFN_FMS @0 @1 @2)) +(simplify + (IFN_FNMS @0 @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (IFN_FNMS:c (negate @0) @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (negate (IFN_FNMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMA @0 @1 @2))) Index: gcc/gimple-fold.h =================================================================== --- gcc/gimple-fold.h 2018-05-17 09:17:32.876478942 +0100 +++ gcc/gimple-fold.h 2018-05-17 09:18:19.962942761 +0100 @@ -41,6 +41,7 @@ extern bool arith_overflowed_p (enum tre const_tree); extern tree no_follow_ssa_edges (tree); extern tree follow_single_use_edges (tree); +extern tree follow_all_ssa_edges (tree); extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree), tree (*) (tree) = no_follow_ssa_edges); extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree)); Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c 2018-05-17 09:17:32.876478942 +0100 +++ gcc/gimple-fold.c 2018-05-17 09:18:19.961942784 +0100 @@ -4967,6 +4967,14 @@ follow_single_use_edges (tree val) return val; } +/* Valueization callback that follows all SSA edges. */ + +tree +follow_all_ssa_edges (tree val) +{ + return val; +} + /* Fold the statement pointed to by GSI. In some cases, this function may replace the whole statement with a new one. Returns true iff folding makes any changes. Index: gcc/tree-ssa-math-opts.c =================================================================== --- gcc/tree-ssa-math-opts.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-math-opts.c 2018-05-17 09:18:19.982942291 +0100 @@ -2650,7 +2650,7 @@ convert_mult_to_fma_1 (tree mul_result, tree type = TREE_TYPE (mul_result); gimple *use_stmt; imm_use_iterator imm_iter; - gassign *fma_stmt; + gcall *fma_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) { @@ -2658,6 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, enum tree_code use_code; tree addop, mulop1 = op1, result = mul_result; bool negate_p = false; + gimple_seq seq = NULL; if (is_gimple_debug (use_stmt)) continue; @@ -2683,11 +2684,7 @@ convert_mult_to_fma_1 (tree mul_result, addop = gimple_assign_rhs2 (use_stmt); /* a * b - c -> a * b + (-c) */ if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) - addop = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, addop), - true, NULL_TREE, true, - GSI_SAME_STMT); + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); } else { @@ -2698,23 +2695,26 @@ convert_mult_to_fma_1 (tree mul_result, } if (negate_p) - mulop1 = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, mulop1), - true, NULL_TREE, true, - GSI_SAME_STMT); + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), - FMA_EXPR, mulop1, op2, addop); + if (seq) + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); + gsi_replace (&gsi, fma_stmt, true); + /* Follow all SSA edges so that we generate FMS, FNMA and FNMS + regardless of where the negation occurs. */ + if (fold_stmt (&gsi, follow_all_ssa_edges)) + update_stmt (gsi_stmt (gsi)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Generated FMA "); - print_gimple_stmt (dump_file, fma_stmt, 0, 0); + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); fprintf (dump_file, "\n"); } - gsi_replace (&gsi, fma_stmt, true); widen_mul_stats.fmas_inserted++; } } @@ -2862,7 +2862,8 @@ convert_mult_to_fma (gimple *mul_stmt, t /* If the target doesn't support it, don't generate it. We assume that if fma isn't available then fms, fnma or fnms are not either. */ - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) return false; /* If the multiplication has zero uses, it is kept around probably because @@ -2958,8 +2959,8 @@ convert_mult_to_fma (gimple *mul_stmt, t that a mult / subtract pair. */ if (use_code == MINUS_EXPR && !negate_p && gimple_assign_rhs1 (use_stmt) == result - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) { tree rhs2 = gimple_assign_rhs2 (use_stmt); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2018-05-16 12:49:10.501715913 +0100 +++ gcc/config/i386/i386.c 2018-05-17 09:18:19.949943066 +0100 @@ -50614,19 +50614,20 @@ ix86_add_stmt_cost (void *data, int coun tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = - 1; + bool fp = false; + machine_mode mode = TImode; + + if (vectype != NULL) + { + fp = FLOAT_TYPE_P (vectype); + mode = TYPE_MODE (vectype); + } + if ((kind == vector_stmt || kind == scalar_stmt) && stmt_info && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) { tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); - bool fp = false; - machine_mode mode = TImode; - - if (vectype != NULL) - { - fp = FLOAT_TYPE_P (vectype); - mode = TYPE_MODE (vectype); - } /*machine_mode inner_mode = mode; if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode);*/ @@ -50657,12 +50658,6 @@ ix86_add_stmt_cost (void *data, int coun case MULT_HIGHPART_EXPR: stmt_cost = ix86_multiplication_cost (ix86_cost, mode); break; - case FMA_EXPR: - stmt_cost = ix86_vec_cost (mode, - mode == SFmode ? ix86_cost->fmass - : ix86_cost->fmasd, - true); - break; case NEGATE_EXPR: if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) stmt_cost = ix86_cost->sse_op; @@ -50725,6 +50720,24 @@ ix86_add_stmt_cost (void *data, int coun break; } } + + combined_fn cfn; + if ((kind == vector_stmt || kind == scalar_stmt) + && stmt_info + && stmt_info->stmt + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_FMA: + stmt_cost = ix86_vec_cost (mode, + mode == SFmode ? ix86_cost->fmass + : ix86_cost->fmasd, + true); + break; + default: + break; + } + /* If we do elementwise loads into a vector then we are bound by latency and execution resources for the many scalar loads (AGU and load ports). Try to account for this by scaling the Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/config/rs6000/rs6000.c 2018-05-17 09:18:19.953942972 +0100 @@ -15880,7 +15880,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ arg1 = gimple_call_arg (stmt, 1); tree arg2 = gimple_call_arg (stmt, 2); lhs = gimple_call_lhs (stmt); - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); + gimple_call_set_lhs (g, lhs); + gimple_call_set_nothrow (g, true); gimple_set_location (g, gimple_location (stmt)); gsi_replace (gsi, g, true); return true; Index: gcc/brig/brigfrontend/brig-function.cc =================================================================== --- gcc/brig/brigfrontend/brig-function.cc 2018-05-16 12:48:59.410941892 +0100 +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-17 09:18:19.936943371 +0100 @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod case BRIG_OPCODE_NEXP2: builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); break; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_NFMA: builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); break; @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc return CALL_EXPR; else return MAX_EXPR; - case BRIG_OPCODE_FMA: - return FMA_EXPR; case BRIG_OPCODE_ABS: return ABS_EXPR; case BRIG_OPCODE_SHL: @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc /* Implement as 1/f (x). gcc should pattern detect that and use a native instruction, if available, for it. */ return TREE_LIST; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_FLOOR: case BRIG_OPCODE_CEIL: case BRIG_OPCODE_SQRT: Index: gcc/c/gimple-parser.c =================================================================== --- gcc/c/gimple-parser.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/c/gimple-parser.c 2018-05-17 09:18:19.939943300 +0100 @@ -952,27 +952,6 @@ c_parser_gimple_postfix_expression (c_pa expr.value = fold_convert (type, val); return expr; } - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) - { - c_parser_consume_token (parser); - auto_vec<tree> args; - - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) - { - c_parser_gimple_expr_list (parser, &args); - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); - } - if (args.length () != 3) - { - error_at (loc, "invalid number of operands to __FMA"); - expr.value = error_mark_node; - return expr; - } - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), - args[0], args[1], args[2]); - return expr; - } /* SSA name. */ unsigned version, ver_offset; Index: gcc/cp/constexpr.c =================================================================== --- gcc/cp/constexpr.c 2018-05-16 12:49:10.203809793 +0100 +++ gcc/cp/constexpr.c 2018-05-17 09:18:19.954942948 +0100 @@ -4574,7 +4574,6 @@ cxx_eval_constant_expression (const cons non_constant_p, overflow_p); break; - case FMA_EXPR: case VEC_PERM_EXPR: r = cxx_eval_trinary_expression (ctx, t, lval, non_constant_p, overflow_p); @@ -5999,7 +5998,6 @@ #define RECUR(T,RV) \ return false; return true; - case FMA_EXPR: case VEC_PERM_EXPR: for (i = 0; i < 3; ++i) if (!RECUR (TREE_OPERAND (t, i), true)) Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/lib/target-supports.exp 2018-05-17 09:18:19.978942385 +0100 @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo return 1 } +# Return 1 if the target supports all four forms of fused multiply-add +# (fma, fms, fnma, and fnms) for both float and double. + +proc check_effective_target_scalar_all_fma { } { + return [istarget aarch64*-*-*] +} + # Return 1 if the target supports compiling fixed-point, # 0 otherwise. Index: gcc/testsuite/gcc.dg/fma-1.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b + c; +} + +double +f2 (double a, double b, double c) +{ + return a * b + c; +} + +/* { dg-final { scan-tree-dump-times { = \.FMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-2.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b - c; +} + +double +f2 (double a, double b, double c) +{ + return a * b - c; +} + +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-3.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return c - a * b; +} + +double +f2 (double a, double b, double c) +{ + return c - a * b; +} + +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return -(a * b) - c; +} + +double +f2 (double a, double b, double c) +{ + return -(a * b) - c; +} + +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,53 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (a, b, -c); +} + +void +f3 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, b, -e); + res[1] = __builtin_fmaf (c, d, -e); +} + +void +f4 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, b, -e); + res[1] = __builtin_fma (c, d, -e); +} + +float +f5 (float a, float b, float c) +{ + return -__builtin_fmaf (-a, b, c); +} + +double +f6 (double a, double b, double c) +{ + return -__builtin_fma (-a, b, c); +} + +float +f7 (float a, float b, float c) +{ + return -__builtin_fmaf (a, -b, c); +} + +double +f8 (double a, double b, double c) +{ + return -__builtin_fma (a, -b, c); +} + +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 10 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-6.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, c); +} + +void +f5 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (-a, b, c); + res[1] = __builtin_fmaf (-a, d, e); +} + +void +f6 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (-a, b, c); + res[1] = __builtin_fma (-a, d, e); +} + +void +f7 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, -b, c); + res[1] = __builtin_fmaf (d, -b, e); +} + +void +f8 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, -b, c); + res[1] = __builtin_fma (d, -b, e); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, -c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, -c); +} + +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-7.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, -c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, -c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, -c); +} + +void +f5 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (-a, b, -c); + res[1] = __builtin_fmaf (-a, d, -c); +} + +void +f6 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (-a, b, -c); + res[1] = __builtin_fma (-a, d, -c); +} + +void +f7 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (a, -b, -c); + res[1] = __builtin_fmaf (d, -b, -c); +} + +void +f8 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (a, -b, -c); + res[1] = __builtin_fma (d, -b, -c); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, c); +} + +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/gimplefe-26.c =================================================================== --- gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-17 09:18:19.976942432 +0100 @@ -1,16 +1,15 @@ -/* { dg-do compile { target c99_runtime } } */ +/* { dg-do compile { target scalar_all_fma } } */ /* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ #define foo(type, num) \ type __GIMPLE () foo_##num (type a, type b, type c) \ { \ type t0; \ - t0_1 = __FMA (a, b, c); \ + t0_1 = .FMA (a, b, c); \ return t0_1; \ } foo(float, 1) foo(double, 2) -foo(long double, 3) -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ +/* { dg-final { scan-tree-dump-times {\.FMA} 2 "ssa" } } */ Index: gcc/testsuite/gfortran.dg/reassoc_7.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-17 09:18:19.976942432 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_8.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-17 09:18:19.977942409 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_9.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-17 09:18:19.977942409 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_10.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-17 09:18:19.976942432 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Q,Dvdph) implicit none