On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote: > On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote: > >> On 2 November 2015 at 18:31, Richard Biener <rguent...@suse.de> wrote: >> > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote: >> > >> >> On 2 November 2015 at 13:20, Prathamesh Kulkarni >> >> <prathamesh.kulka...@linaro.org> wrote: >> >> > On 30 October 2015 at 15:57, Richard Biener >> >> > <richard.guent...@gmail.com> wrote: >> >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni >> >> >> <prathamesh.kulka...@linaro.org> wrote: >> >> >>> Hi, >> >> >>> I have attached revamped version of Kugan's patch >> >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721. >> >> >>> Test-case: http://pastebin.com/QMfpXLD9 >> >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp >> >> >>> Assembly: http://pastebin.com/kk2HZpvA >> >> >>> >> >> >>> The approach I took is similar to sincos pass, which involves two >> >> >>> parts: >> >> >>> a) divmod pass that transforms: >> >> >>> t1 = a / b; >> >> >>> t2 = a % b; >> >> >>> to the following sequence: >> >> >>> complex_tmp = DIVMOD (a, b); >> >> >>> t1 = REALPART_EXPR(a); >> >> >>> t2 = IMAGPART_EXPR(b); >> >> >>> >> >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by >> >> >>> expand_DIVMOD(). >> >> >>> I am not sure if I have done this correctly. I was somehow looking to >> >> >>> reuse expand_divmod() but am not able to figure out how to do that >> >> >>> (AFAIU expand_divmod() strictly returns either the quotient or >> >> >>> remainder but never both which is what I want), so ended up with >> >> >>> manually expanding to rtx. >> >> >>> >> >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't >> >> >>> understand the following: >> >> >>> if (gimple_purge_dead_eh_edges (gimple_bb (stmt))) >> >> >>> cfg_changed = true; >> >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here? >> >> >> >> >> >> The call replacement might no longer throw. >> >> >> >> >> >>> A silly question, what options to pass to gcc to print statistics ? I >> >> >>> added statistics_counter_event to keep track of number of calls >> >> >>> inserted/used but not sure how to print them :/ >> >> >> >> >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats >> >> > Thanks, I can see it now -;) >> >> >> >> >> >>> I would be grateful for suggestions for improving the patch. >> >> >> >> >> >> First of all new functions need comments (expand_DIVMOD). >> >> >> >> >> >> Second - what's the reasoning for the pass placement seen? I think >> >> >> the transform is a lowering one, improving RTL expansion. The >> >> >> DIVMOD representation is harmful for GIMPLE optimizers. >> >> >> >> >> >> Third - I'd have integrated this with an existing pass - we have >> >> >> another >> >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul. >> >> >> Please piggy-back on it. >> >> >> >> >> >> You seem to do the transform unconditionally even if the target has >> >> >> instructions for division and modulus but no instruction for divmod >> >> >> in which case you'll end up emitting a library call in the end instead >> >> >> of a div and mod instruction. I think the transform should be gated on >> >> >> missing div/mod instructions or the availability of a divmod one. >> >> >> >> >> >> + if (is_gimple_assign (stmt) >> >> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == >> >> >> tcc_binary) >> >> >> + { >> >> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR) >> >> >> + cfg_changed |= execute_divmod_1 (stmt); >> >> >> >> >> >> you can directly check gimple_assign_rhs_code. I'd check for >> >> >> TRUNC_MOD_EXPR >> >> >> which seems to be less common and thus should cut down compile-time. >> >> >> >> >> >> + /* Case 2: When both are in top_bb */ >> >> >> + else if (op1_def_in_bb && op2_def_in_bb) >> >> >> + { >> >> >> + /* FIXME: better way to compare iterators?. */ >> >> >> + gimple_stmt_iterator gsi = get_later_stmt (top_bb, >> >> >> def_stmt_op1, def_stmt_op2); >> >> >> >> >> >> You can't use get_later_stmt, it's a vectorizer speciality. To do >> >> >> this test efficiently >> >> >> you need stmt UIDs computed. >> >> >> >> >> >> I miss a testcase (or two). >> >> > I have tried to address your comments in the attached patch. >> >> oops, unsigned uid_no = 0; should be outside the loop in >> >> pass_optimize_widening_mul::execute. >> >> Done in this version of the patch. >> >> >> >> Thanks, >> >> Prathamesh >> >> > Could you please review it for me ? >> > >> > For the testcases you need a new target check for divmod. >> Sorry I didn't understand. >> Should I add sth similar to /* { dg-require-effective-target divmod } */ ? > > Yes. > >> > >> >> > I have a few questions: >> >> > >> >> > a) Is the check for availability of divmod correct ? >> > >> > You simply want optab_handler (tab, mode) != CODE_FOR_nothing >> > >> > The libfunc is always available. >> Um I am probably missing something, I thought libfuncs are initialized >> by init_optabs() >> but the target can override/delete that with the target hook >> targetm.init_libfuncs () ? >> On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX. > > Hum, I see. I thought we always have the libfunc if we don't have > the instruction. > >> > >> > When looking at TRUNC_DIV_EXPR you should also exclude >> > the case where TYPE_OVERFLOW_TRAPS (type) as that should >> > expand using the [su]divv optabs (no trapping overflow >> > divmod optab exists). >> > >> >> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1. >> >> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 >> >> > == bb1). >> >> > However I wonder if we should also check that if bb0 dominates bb1 >> >> > then bb1 should >> >> > post-dominate bb0 ? >> > >> > Depends on how expensive divmod is compared to div or mod. An alternative >> > transform would be to look whether [us]mod optabs are available and if not >> > implement modulo via div and subtract, CSEing the division itself. Not >> > sure if any target that provides div patterns does not also provide >> > mod ones. >> > >> >> > For instance the transform gets applied to test-case in pr43721-2.c. >> >> > bb containing rem = x % y doesn't post-dominate bb containing quot = x >> >> > / y; >> >> > I wonder if we want to perform the transform for this case since >> >> > control flow >> >> > may not always reach from quot = x / y to rem = x % y ? >> > >> > See above, it's a cost question. We do for sincos as computing sin or >> > cos if the other is already computed is cheap. I would suggest the >> > same is true for div/mod. >> > >> >> > c) A silly queston, I wonder if vec<gimple *> stmts in >> >> > convert_to_divmod() will >> >> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in >> >> > the vector) ? >> >> > I assume that cse will take place before widening_mul pass executes >> >> > (since pass_fre/pass_fre executes earlier), and there will be no >> >> > duplicate gimple stmts ? >> > >> > There are passes inbetween widen-mult and the last CSE that (while >> > not very likely) makes it possible (VRP jump threading). >> > >> >> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt >> >> > with subcode TRUNC_MOD_EXPR >> >> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands. >> >> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with >> >> > same operands ? >> > >> > I wouldn't say so, see above. >> Thanks for the explanation. >> > >> >> > d) I am not sure if I have correctly done expansion of divmod to rtx >> >> > in expand_DIVMOD() (I fear I may be missing >> >> > many checks that are in expmed.c:expand_divmod). >> > >> > I'm not sure either, esp. about the libcall path and you choosing >> > a wider mode eventually (is this necessary?). Did you make sure >> > this path works? >> Removed choosing widening modes in this patch. >> During the widening_mul pass it checks is divmod is available for the given >> mode >> and only then does the transform. It probably doesn't make sense to check >> for a wider mode during expand_DIVMOD, since optab_libfunc should be >> available for divmod with the given mode ? > > Yes. > >> > >> > + /* Compute stmt uids. */ >> > + unsigned uid_no = 0; >> > + FOR_EACH_BB_FN (bb, fun) >> > + { >> > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p >> > (gsi); gsi_next (&gsi)) >> > + { >> > + gimple *stmt = gsi_stmt (gsi); >> > + gimple_set_uid (stmt, uid_no++); >> > + } >> > + } >> > >> > renumber_gimple_stmt_uids () >> Thanks. >> >> The attached patch has following additional changes: >> a) Fixes ICE when one of the operands was constant (unconditionally >> used SSA_NAME_DEF_STMT/ FOR_EACH_IMM_USE_FAST on constant tree >> operands). >> b) Adds use_stmt to vector_stmts only if the operands match and in the >> same order. >> I was incorrectly applying the transform to the following case: t1 = x >> / y; t2 = y % x. > > +/* Expand DIVMOD() using: > + a) optab handler for udivmod/sdivmod if it is available > + b) If optab_handler doesn't exist, Generate call to optab_libfunc for > udivmod/sdivmod. */ > > long line > > + expand_twoval_binop (tab, op0, op1, quotient, remainder, > TYPE_UNSIGNED (type)); > + > + > > likewise > > + machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE > (mode), MODE_INT); > > likewise (please verify elsewhere yourself). Our column limit is 80. > > Testcases still need adjustment with an effective tagret. > > Btw, did you investigate code gen differences on x86_64/i586? That > target expands all divisions/modulo ops via divmod, relying on CSE > solely as the HW always computes both div and mod (IIRC). x86_64 has optab_handler for divmod defined, so the transform won't take place on x86. > > + if ((optab_handler (tab, mode) == CODE_FOR_nothing) && !optab_libfunc > (tab, mode)) > + return false; > > As originally said we shouldn't generate divmod library calls if > the target has non-library div or mod implementations. > Thus the proper check would be to > > && (!optab_libfunc (tab ...) > || optab_handler (div) || optab_handler (mod)))) > > + /* FIXME: Is it safe to assume both operands cannot be constant > + since constant folding would have taken place ? */ > + gcc_assert (! (TREE_CONSTANT (op1) && TREE_CONSTANT (op2))); > > On trapping ops 1/0 might prevail for example. So I'd rather > have a runtime check if op is an SSA_NAME > > I'm not sure I would even look at a / 3; a % 3 or 3 / a; 3 % a > though (they tend to have special expansions). > > + FOR_EACH_IMM_USE_FAST (use_p, use_iter, op) > > use FOR_EACH_IMM_USE_STMT to only visit stmts once. > > + { > + gimple *use_stmt = USE_STMT (use_p); > + if (is_gimple_assign (use_stmt) > + && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR > + && !TYPE_OVERFLOW_TRAPS (TREE_TYPE (gimple_assign_lhs > (use_stmt)))) > > Please check this upfront - you know the type from the stmt you are > starting the visiting from. > > + { > + tree u_op1 = gimple_assign_rhs1 (use_stmt); > + tree u_op2 = gimple_assign_rhs2 (use_stmt); > + > + if ((operand_equal_p (op1, u_op1, 0)) && operand_equal_p (op2, > u_op2, 0)) > + maybe_record_divmod (stmts, top_bb, use_stmt); > > + /* If either operand is a constant, treat it's "definition" as > + outside of top_bb. */ > + > + if (!TREE_CONSTANT (op1)) > + { > > check TREE_CODE (op1) == SSA_NAME here and in related places. > > You seem to insert the call at the place of the definitions of > the division/modulo operands rather than either at the point > of the division or the modulo. Don't do that. Insert at > the topmost division/modulo stmt. > > + for (unsigned i = 0; i < stmts.length (); ++i) > + { > + tree rhs; > + use_stmt = stmts[i]; > + > + switch (gimple_assign_rhs_code (use_stmt)) > + { > + case TRUNC_DIV_EXPR: > + rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); > + break; > + > + case TRUNC_MOD_EXPR: > + rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res); > + break; > + > + default: > + gcc_unreachable (); > + } > + > + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs > (use_stmt), rhs); > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); > > Ick. Please use > > gimple_set_rhs_from_tree (use_stmt, res); Um there doesn't seem to be gimple_set_rhs_from_tree. I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt. Is that OK ? > update_stmt (use_stmt); > if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) > cfg_changed = true; > > + free_dominance_info (CDI_DOMINATORS); > > do not free dominators.
I have done the suggested changes in the attached patch. I have a few questions: a) Does the change to insert DIVMOD call before topmost div or mod stmt with matching operands look correct ? b) Handling constants - I dropped handling constants in the attached patch. IIUC we don't want to enable this transform if there's a specialized expansion for some constants for div or mod ? I suppose this would also be target dependent and require a target hook ? For instance arm defines modsi3 pattern to expand mod when 2nd operand is constant and <= 0 or power of 2, while for other cases goes the expand_divmod() route to generate call to __aeabi_idivmod libcall. c) Gating the divmod transform - I tried gating it on checks for optab_handlers on div and mod, however this doesn't enable transform for arm cortex-a9 anymore (cortex-a9 doesn't have hardware instructions for integer div and mod). IIUC for cortex-a9, optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because HAVE_divsi3 is 0. However optab_handler (smod_optab, SImode) matches since optab_handler only checks for existence of pattern (and not whether the pattern gets matched). I suppose we should enable the transform only if the divmod, div, and mod pattern do not match rather than checking if the patterns exist via optab_handler ? For a general x % y, modsi3 would fail to match but optab_handler(smod_optab, SImode ) still says it's matched. Should we define a new target hook combine_divmod, which returns true if transforming to divmod is desirable for that target ? The default definition could be: bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2) { // check for optab_handlers for div/mod/divmod and libfunc for divmod } And for arm, it could be over-ridden to return false if op2 is constant and <= 0 or power of 2. I am not really sure if this is a good idea since I am replicating information from modsi3 pattern. Any change to the pattern may require corresponding change to the hook :/ d) Adding effective-target-check for divmod: I just enabled it for arm*-*-* for now. I could additionally append more targets, not sure if this is the right approach. Thanks, Prathamesh > > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c index a7da373..c0c84d8 100644 --- a/gcc/internal-fn.c +++ b/gcc/internal-fn.c @@ -2045,6 +2045,47 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED) gcc_unreachable (); } +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available. + b) If optab_handler doesn't exist, Generate call to + optab_libfunc for udivmod/sdivmod. */ + +static void +expand_DIVMOD (gcall *stmt) +{ + tree lhs = gimple_call_lhs (stmt); + tree arg0 = gimple_call_arg (stmt, 0); + tree arg1 = gimple_call_arg (stmt, 1); + + gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); + tree type = TREE_TYPE (TREE_TYPE (lhs)); + machine_mode mode = TYPE_MODE (type); + optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab; + + rtx op0 = expand_normal (arg0); + rtx op1 = expand_normal (arg1); + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); + + rtx libfunc = optab_libfunc (tab, mode); + gcc_assert (libfunc != NULL_RTX); + + machine_mode libval_mode = + smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT); + + rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST, + libval_mode, 2, op0, mode, op1, mode); + + rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0); + rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, + GET_MODE_SIZE (mode)); + + /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */ + expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs), + make_tree (TREE_TYPE (arg0), quotient), + make_tree (TREE_TYPE (arg1), remainder)), + target, VOIDmode, EXPAND_NORMAL); +} + /* Routines to expand each internal function, indexed by function number. Each routine has the prototype: diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 78266d9..f28579b 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | ECF_LEAF, ".") /* OpenACC looping abstraction. See internal-fn.h for usage. */ DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL) + +DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL) diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c new file mode 100644 index 0000000..2ec1118 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-1.c @@ -0,0 +1,11 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + int quotient = x / y; + int remainder = x % y; + return quotient + remainder; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c new file mode 100644 index 0000000..974099c --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-2.c @@ -0,0 +1,17 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + extern int early_exit; + + int quot = x / y; + + if (early_exit) + return 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c new file mode 100644 index 0000000..41cda2e --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-3.c @@ -0,0 +1,18 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + extern int flag; + int quot; + + if (flag) + quot = x / y; + else + quot = 0; + + int rem = x % y; + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c new file mode 100644 index 0000000..b4f69b0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr43721-4.c @@ -0,0 +1,19 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ +/* { dg-require-effective-target divmod } */ + +int f(int x, int y) +{ + int quot = 0; + int rem = 0; + + extern int flag; + + if (flag) + quot = x / y; + else + rem = x % y; + + return quot + rem; +} + +/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index b543519..c3895a7 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -6494,3 +6494,12 @@ proc check_effective_target_vect_max_reduc { } { } return 0 } + +# Return 1 if the target supports divmod + +proc check_effective_target_divmod { } { + if { [istarget arm*-*-*] } { + return 1 + } + return 0 +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 41fcabf..90c2da8 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -110,6 +110,8 @@ along with GCC; see the file COPYING3. If not see #include "tree-ssa.h" #include "builtins.h" #include "params.h" +#include "optabs-libfuncs.h" +#include "tree-eh.h" /* This structure represents one basic block that either computes a division, or is a common dominator for basic block that compute a @@ -182,6 +184,9 @@ static struct /* Number of fp fused multiply-add ops inserted. */ int fmas_inserted; + + /* Number of divmod calls inserted. */ + int divmod_calls_inserted; } widen_mul_stats; /* The instance of "struct occurrence" representing the highest @@ -3493,6 +3498,151 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree op2) return true; } +/* Add use_stmt to stmts if either top_bb or gimple_bb(use_stmt) dominate + each other. If gimple_bb (use_stmt) dominates top_bb, then set top_bb + to gimple_bb (use_stmt). */ + +static bool +maybe_record_divmod (vec<gimple *>& stmts, basic_block& top_bb, + gimple *use_stmt) +{ + basic_block bb = gimple_bb (use_stmt); + + if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) + ; + else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb)) + top_bb = bb; + else + return false; + + stmts.safe_push (use_stmt); + return true; +} + +/* This function looks for: + t1 = a TRUNC_DIV_EXPR b; + t2 = a TRUNC_MOD_EXPR b; + and transforms it to the following sequence: + complex_tmp = DIVMOD (a, b); + t1 = REALPART_EXPR(a); + t2 = IMAGPART_EXPR(b); + This change is done only if the target has support for divmod. + + The pass works in two phases: + 1) Walk through all immediate uses of stmt's operand and find a + TRUNC_DIV_EXPR with matching operands and if such a stmt is found add + it to stmts vector. + 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that + dominates other div/mod stmts with same operands) and update entries in + stmts vector to use return value of DIMOVD (REALEXPR_PART for div, + IMAGPART_EXPR for mod). */ + +static bool +convert_to_divmod (gassign *stmt) +{ + enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt))); + const_tree type = TREE_TYPE (gimple_assign_lhs (stmt)); + optab divmod_optab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab; + + /* FIXME: Check for divmod optab handler or libfunc. */ + + if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing + || (optab_libfunc (divmod_optab, mode) == NULL_RTX)) + return false; + + tree op1 = gimple_assign_rhs1 (stmt); + tree op2 = gimple_assign_rhs2 (stmt); + + /* FIXME: Drop handling constants for now. */ + if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2)) + return false; + + if (TYPE_OVERFLOW_TRAPS (type)) + return false; + + vec<gimple *> stmts = vNULL; + stmts.safe_push (stmt); + basic_block top_bb = gimple_bb (stmt); + + imm_use_iterator use_iter; + gimple *use_stmt; + + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1) + { + if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR + && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0) + && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0)) + maybe_record_divmod (stmts, top_bb, use_stmt); + } + + if (stmts.length () == 1) + return false; + + /* Create the library call. */ + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2); + tree res = make_temp_ssa_name ( + build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))), + call_stmt, "divmod_tmp"); + + gimple_call_set_lhs (call_stmt, res); + + widen_mul_stats.divmod_calls_inserted++; + + /* Insert call-stmt just before the topmost div/mod stmt. + top_bb dominates all other basic blocks containing div/mod stms + so, the topmost stmt would be the first div/mod stmt with matching operands + in top_bb. */ + + gcc_assert (top_bb != 0); + gimple_stmt_iterator gsi; + for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *g = gsi_stmt (gsi); + if (is_gimple_assign (g) + && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR + || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR) + && operand_equal_p (op1, gimple_assign_rhs1 (g), 0) + && operand_equal_p (op2, gimple_assign_rhs2 (g), 0)) + break; + } + + gcc_assert (!gsi_end_p (gsi)); + gsi_insert_before (&gsi, call_stmt, GSI_SAME_STMT); + + /* Update stmts. */ + bool cfg_changed = false; + for (unsigned i = 0; i < stmts.length (); ++i) + { + tree rhs; + use_stmt = stmts[i]; + + switch (gimple_assign_rhs_code (use_stmt)) + { + case TRUNC_DIV_EXPR: + rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); + break; + + case TRUNC_MOD_EXPR: + rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res); + break; + + default: + gcc_unreachable (); + } + + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); + gimple_assign_set_rhs_from_tree (&gsi, rhs); + update_stmt (use_stmt); + + if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) + cfg_changed = true; + } + + stmts.release (); + return cfg_changed; +} + /* Find integer multiplications where the operands are extended from smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR where appropriate. */ @@ -3535,7 +3685,10 @@ pass_optimize_widening_mul::execute (function *fun) basic_block bb; bool cfg_changed = false; + calculate_dominance_info (CDI_DOMINATORS); + memset (&widen_mul_stats, 0, sizeof (widen_mul_stats)); + renumber_gimple_stmt_uids (); FOR_EACH_BB_FN (bb, fun) { @@ -3563,6 +3716,10 @@ pass_optimize_widening_mul::execute (function *fun) } break; + case TRUNC_MOD_EXPR: + convert_to_divmod (as_a<gassign *> (stmt)); + break; + case PLUS_EXPR: case MINUS_EXPR: convert_plusminus_to_widen (&gsi, stmt, code); @@ -3614,6 +3771,8 @@ pass_optimize_widening_mul::execute (function *fun) widen_mul_stats.maccs_inserted); statistics_counter_event (fun, "fused multiply-adds inserted", widen_mul_stats.fmas_inserted); + statistics_counter_event (fun, "divmod calls inserted", + widen_mul_stats.divmod_calls_inserted); return cfg_changed ? TODO_cleanup_cfg : 0; }
2015-11-02 Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> * internal-fn.def: Add entry for expand_DIVMOD. * internal-fn.c (expand_DIVMOD): New function. * tree-ssa-math-opts.c: (maybe_record_divmod): Likewise. * tree-ssa-math-opts.c: (convert_to_divmod): Likewise. * tree-ssa-math-opts.c: Include optabs-libfuncs.h. * tree-ssa-math-opts.c: (widen_mul_stats): New member divmod_calls_inserted. * tree-ssa-math-opts.c: (pass_optimize_widening_mul::execute): Call caluclate_dominace_info, call renumber_gimple_stmt_uids, and add check for TRUNC_MOD_EXPR. testsuite/ * gcc.dg/pr43721-1.c: New test. * gcc.dg/pr43721-2.c: Likewise. * gcc.dg/pr43721-3.c: Likewise. * gcc.dg/pr43721-4.c: Likewise. * lib/target-supports.exp: Add check for divmod.