On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote:
> On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 2 November 2015 at 18:31, Richard Biener <rguent...@suse.de> wrote:
>> > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 2 November 2015 at 13:20, Prathamesh Kulkarni
>> >> <prathamesh.kulka...@linaro.org> wrote:
>> >> > On 30 October 2015 at 15:57, Richard Biener 
>> >> > <richard.guent...@gmail.com> wrote:
>> >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>> >> >> <prathamesh.kulka...@linaro.org> wrote:
>> >> >>> Hi,
>> >> >>> I have attached revamped version of Kugan's patch
>> >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>> >> >>> Test-case: http://pastebin.com/QMfpXLD9
>> >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp
>> >> >>> Assembly: http://pastebin.com/kk2HZpvA
>> >> >>>
>> >> >>> The approach I took is similar to sincos pass, which involves two 
>> >> >>> parts:
>> >> >>> a) divmod pass that transforms:
>> >> >>> t1 = a / b;
>> >> >>> t2 = a % b;
>> >> >>> to the following sequence:
>> >> >>> complex_tmp = DIVMOD (a, b);
>> >> >>> t1 = REALPART_EXPR(a);
>> >> >>> t2 = IMAGPART_EXPR(b);
>> >> >>>
>> >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>> >> >>> expand_DIVMOD().
>> >> >>> I am not sure if I have done this correctly. I was somehow looking to
>> >> >>> reuse expand_divmod() but am not able to figure out how to do that
>> >> >>> (AFAIU expand_divmod() strictly returns either the quotient or
>> >> >>> remainder but never both which is what I want), so ended up with
>> >> >>> manually expanding to rtx.
>> >> >>>
>> >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>> >> >>> understand the following:
>> >> >>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>> >> >>>           cfg_changed = true;
>> >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>> >> >>
>> >> >> The call replacement might no longer throw.
>> >> >>
>> >> >>> A silly question, what options to pass to gcc to print statistics ? I
>> >> >>> added statistics_counter_event to keep track of number of calls
>> >> >>> inserted/used but not sure how to print them :/
>> >> >>
>> >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats
>> >> > Thanks, I can see it now -;)
>> >> >>
>> >> >>> I would be grateful for suggestions for improving the patch.
>> >> >>
>> >> >> First of all new functions need comments (expand_DIVMOD).
>> >> >>
>> >> >> Second - what's the reasoning for the pass placement seen?  I think
>> >> >> the transform is a lowering one, improving RTL expansion.  The
>> >> >> DIVMOD representation is harmful for GIMPLE optimizers.
>> >> >>
>> >> >> Third - I'd have integrated this with an existing pass - we have 
>> >> >> another
>> >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> >> >> Please piggy-back on it.
>> >> >>
>> >> >> You seem to do the transform unconditionally even if the target has
>> >> >> instructions for division and modulus but no instruction for divmod
>> >> >> in which case you'll end up emitting a library call in the end instead
>> >> >> of a div and mod instruction.  I think the transform should be gated on
>> >> >> missing div/mod instructions or the availability of a divmod one.
>> >> >>
>> >> >> +         if (is_gimple_assign (stmt)
>> >> >> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
>> >> >> tcc_binary)
>> >> >> +           {
>> >> >> +             if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> >> >> +               cfg_changed |= execute_divmod_1 (stmt);
>> >> >>
>> >> >> you can directly check gimple_assign_rhs_code.  I'd check for 
>> >> >> TRUNC_MOD_EXPR
>> >> >> which seems to be less common and thus should cut down compile-time.
>> >> >>
>> >> >> +  /* Case 2: When both are in top_bb */
>> >> >> +  else if (op1_def_in_bb && op2_def_in_bb)
>> >> >> +    {
>> >> >> +      /* FIXME: better way to compare iterators?.  */
>> >> >> +      gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> >> >> def_stmt_op1, def_stmt_op2);
>> >> >>
>> >> >> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> >> >> this test efficiently
>> >> >> you need stmt UIDs computed.
>> >> >>
>> >> >> I miss a testcase (or two).
>> >> > I have tried to address your comments in the attached patch.
>> >> oops, unsigned uid_no = 0; should be outside the loop in
>> >> pass_optimize_widening_mul::execute.
>> >> Done in this version of the patch.
>> >>
>> >> Thanks,
>> >> Prathamesh
>> >> > Could you please review it for me ?
>> >
>> > For the testcases you need a new target check for divmod.
>> Sorry I didn't understand.
>> Should I add sth similar to /* {  dg-require-effective-target divmod } */ ?
>
> Yes.
>
>> >
>> >> > I have a few questions:
>> >> >
>> >> > a) Is the check for availability of divmod correct ?
>> >
>> > You simply want optab_handler (tab, mode) != CODE_FOR_nothing
>> >
>> > The libfunc is always available.
>> Um I am probably missing something, I thought libfuncs are initialized
>> by init_optabs()
>> but the target can override/delete that with the target hook
>> targetm.init_libfuncs () ?
>> On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX.
>
> Hum, I see.  I thought we always have the libfunc if we don't have
> the instruction.
>
>> >
>> > When looking at TRUNC_DIV_EXPR you should also exclude
>> > the case where TYPE_OVERFLOW_TRAPS (type) as that should
>> > expand using the [su]divv optabs (no trapping overflow
>> > divmod optab exists).
>> >
>> >> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
>> >> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 
>> >> > == bb1).
>> >> > However I wonder if we should also check that if bb0 dominates bb1
>> >> > then bb1 should
>> >> > post-dominate bb0 ?
>> >
>> > Depends on how expensive divmod is compared to div or mod.  An alternative
>> > transform would be to look whether [us]mod optabs are available and if not
>> > implement modulo via div and subtract, CSEing the division itself.  Not
>> > sure if any target that provides div patterns does not also provide
>> > mod ones.
>> >
>> >> > For instance the transform gets applied to test-case in pr43721-2.c.
>> >> > bb containing rem = x % y doesn't post-dominate bb containing quot = x 
>> >> > / y;
>> >> > I wonder if we want to perform the transform for this case since 
>> >> > control flow
>> >> > may not always reach from quot = x / y to rem = x % y ?
>> >
>> > See above, it's a cost question.  We do for sincos as computing sin or
>> > cos if the other is already computed is cheap.  I would suggest the
>> > same is true for div/mod.
>> >
>> >> > c) A silly queston, I wonder if vec<gimple *> stmts  in 
>> >> > convert_to_divmod() will
>> >> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
>> >> > the vector) ?
>> >> > I assume that cse will take place before widening_mul pass executes
>> >> > (since pass_fre/pass_fre executes earlier), and there will be no
>> >> > duplicate gimple stmts ?
>> >
>> > There are passes inbetween widen-mult and the last CSE that (while
>> > not very likely) makes it possible (VRP jump threading).
>> >
>> >> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt
>> >> > with subcode  TRUNC_MOD_EXPR
>> >> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
>> >> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
>> >> > same operands ?
>> >
>> > I wouldn't say so, see above.
>> Thanks for the explanation.
>> >
>> >> > d) I am not sure if I have correctly done expansion of divmod to rtx
>> >> > in expand_DIVMOD() (I fear I may be missing
>> >> > many checks that are in expmed.c:expand_divmod).
>> >
>> > I'm not sure either, esp. about the libcall path and you choosing
>> > a wider mode eventually (is this necessary?).  Did you make sure
>> > this path works?
>> Removed choosing widening modes in this patch.
>> During the widening_mul pass it checks is divmod is available for the given 
>> mode
>> and only then does the transform. It probably doesn't make sense to check
>> for a wider mode during expand_DIVMOD, since optab_libfunc should be
>> available for divmod with the given mode ?
>
> Yes.
>
>> >
>> > +  /* Compute stmt uids.  */
>> > +  unsigned uid_no = 0;
>> > +  FOR_EACH_BB_FN (bb, fun)
>> > +    {
>> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
>> > (gsi); gsi_next (&gsi))
>> > +       {
>> > +         gimple *stmt = gsi_stmt (gsi);
>> > +         gimple_set_uid (stmt, uid_no++);
>> > +       }
>> > +    }
>> >
>> > renumber_gimple_stmt_uids ()
>> Thanks.
>>
>> The attached patch has following additional changes:
>> a) Fixes ICE when one of the operands was constant (unconditionally
>> used SSA_NAME_DEF_STMT/ FOR_EACH_IMM_USE_FAST on constant tree
>> operands).
>> b) Adds use_stmt to vector_stmts only if the operands match and in the
>> same order.
>> I was incorrectly applying the transform to the following case: t1 = x
>> / y; t2 = y % x.
>
> +/* Expand DIVMOD() using:
> + a) optab handler for udivmod/sdivmod if it is available
> + b) If optab_handler doesn't exist, Generate call to optab_libfunc for
> udivmod/sdivmod.  */
>
> long line
>
> +      expand_twoval_binop (tab, op0, op1, quotient, remainder,
> TYPE_UNSIGNED (type));
> +
> +
>
> likewise
>
> +  machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE
> (mode), MODE_INT);
>
> likewise (please verify elsewhere yourself).  Our column limit is 80.
>
> Testcases still need adjustment with an effective tagret.
>
> Btw, did you investigate code gen differences on x86_64/i586?  That
> target expands all divisions/modulo ops via divmod, relying on CSE
> solely as the HW always computes both div and mod (IIRC).
x86_64 has optab_handler for divmod defined, so the transform won't
take place on x86.
>
> +  if ((optab_handler (tab, mode) == CODE_FOR_nothing) && !optab_libfunc
> (tab, mode))
> +    return false;
>
> As originally said we shouldn't generate divmod library calls if
> the target has non-library div or mod implementations.
> Thus the proper check would be to
>
>        && (!optab_libfunc (tab ...)
>            || optab_handler (div) || optab_handler (mod))))
>
> +  /* FIXME: Is it safe to assume both operands cannot be constant
> +     since constant folding would have taken place ?  */
> +  gcc_assert (! (TREE_CONSTANT (op1) && TREE_CONSTANT (op2)));
>
> On trapping ops 1/0 might prevail for example.  So I'd rather
> have a runtime check if op is an SSA_NAME
>
> I'm not sure I would even look at a / 3; a % 3 or 3 / a; 3 % a
> though (they tend to have special expansions).
>
> +  FOR_EACH_IMM_USE_FAST (use_p, use_iter, op)
>
> use FOR_EACH_IMM_USE_STMT to only visit stmts once.
>
> +    {
> +      gimple *use_stmt = USE_STMT (use_p);
> +      if (is_gimple_assign (use_stmt)
> +         && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
> +        && !TYPE_OVERFLOW_TRAPS (TREE_TYPE (gimple_assign_lhs
> (use_stmt))))
>
> Please check this upfront - you know the type from the stmt you are
> starting the visiting from.
>
> +       {
> +         tree u_op1 = gimple_assign_rhs1 (use_stmt);
> +         tree u_op2 = gimple_assign_rhs2 (use_stmt);
> +
> +         if ((operand_equal_p (op1, u_op1, 0)) && operand_equal_p (op2,
> u_op2, 0))
> +          maybe_record_divmod (stmts, top_bb, use_stmt);
>
> +  /* If either operand is a constant, treat it's "definition" as
> +     outside of top_bb.  */
> +
> +  if (!TREE_CONSTANT (op1))
> +    {
>
> check TREE_CODE (op1) == SSA_NAME here and in related places.
>
> You seem to insert the call at the place of the definitions of
> the division/modulo operands rather than either at the point
> of the division or the modulo.  Don't do that.  Insert at
> the topmost division/modulo stmt.
>
> +  for (unsigned i = 0; i < stmts.length (); ++i)
> +    {
> +      tree rhs;
> +      use_stmt = stmts[i];
> +
> +      switch (gimple_assign_rhs_code (use_stmt))
> +       {
> +         case TRUNC_DIV_EXPR:
> +           rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
> +           break;
> +
> +         case TRUNC_MOD_EXPR:
> +           rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
> +           break;
> +
> +         default:
> +           gcc_unreachable ();
> +       }
> +
> +        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs
> (use_stmt), rhs);
> +        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>
> Ick.  Please use
>
>     gimple_set_rhs_from_tree (use_stmt, res);
Um there doesn't seem to be gimple_set_rhs_from_tree.
I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
Is that OK ?
>     update_stmt (use_stmt);
>     if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>       cfg_changed = true;
>
> +  free_dominance_info (CDI_DOMINATORS);
>
> do not free dominators.

I have done the suggested changes in the attached patch.
I have a few questions:

a) Does the change to insert DIVMOD call before topmost div or mod
stmt with matching operands
look correct ?

b) Handling constants - I dropped handling constants in the attached
patch. IIUC we don't want
to enable this transform if there's a specialized expansion for some
constants for div or mod ?
I suppose this would also be target dependent and require a target hook ?
For instance arm defines modsi3 pattern to expand mod when 2nd operand
is constant and <= 0 or power of 2,
while for other cases goes the expand_divmod() route to generate call
to __aeabi_idivmod libcall.

c) Gating the divmod transform -
I tried gating it on checks for optab_handlers on div and mod, however
this doesn't enable transform for arm cortex-a9
anymore (cortex-a9 doesn't have hardware instructions for integer div and mod).
IIUC for cortex-a9,
optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
HAVE_divsi3 is 0.
However optab_handler (smod_optab, SImode) matches since optab_handler
only checks for existence of pattern
(and not whether the pattern gets matched).
I suppose we should enable the transform only if the divmod, div, and
mod pattern do not match rather than checking
if the patterns exist via optab_handler ? For a general x % y, modsi3
would fail to match but optab_handler(smod_optab, SImode ) still
says it's matched.

Should we define a new target hook combine_divmod, which returns true
if transforming to divmod is desirable for that
target ?
The default definition could be:
bool default_combine_divmod (enum machine_mode mode, tree op1, tree op2)
{
  // check for optab_handlers for div/mod/divmod and libfunc for divmod
}

And for arm, it could be over-ridden to return false if op2 is
constant and <= 0 or power of 2.
I am not really sure if this is a good idea since I am replicating
information from modsi3 pattern.
Any change to the pattern may require corresponding change to the hook :/

d) Adding effective-target-check for divmod: I just enabled it for
arm*-*-* for now. I could additionally append more targets,
not sure if this is the right approach.

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> > Richard.
>>
>
> --
> Richard Biener <rguent...@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a7da373..c0c84d8 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2045,6 +2045,47 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED)
   gcc_unreachable ();
 }
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void
+expand_DIVMOD (gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+  
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); 
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx libfunc = optab_libfunc (tab, mode);
+  gcc_assert (libfunc != NULL_RTX);
+  
+  machine_mode libval_mode =
+    smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT);
+
+  rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+                                       libval_mode, 2, op0, mode, op1, mode);
+
+  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
+  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
+                                      GET_MODE_SIZE (mode));
+
+  /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+                      make_tree (TREE_TYPE (arg0), quotient),
+                      make_tree (TREE_TYPE (arg1), remainder)),
+             target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 78266d9..f28579b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | 
ECF_LEAF, ".")
 
 /* OpenACC looping abstraction.  See internal-fn.h for usage.  */
 DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
+
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c
new file mode 100644
index 0000000..2ec1118
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-1.c
@@ -0,0 +1,11 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+  int quotient = x / y;
+  int remainder = x % y;
+  return quotient + remainder;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c
new file mode 100644
index 0000000..974099c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-2.c
@@ -0,0 +1,17 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+  extern int early_exit;
+
+  int quot = x / y;
+
+  if (early_exit)
+    return 0;
+
+  int rem = x % y;
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c
new file mode 100644
index 0000000..41cda2e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-3.c
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+  extern int flag;
+  int quot;
+
+  if (flag)
+    quot = x / y;
+  else
+    quot = 0;
+
+  int rem = x % y;
+  return quot + rem; 
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c
new file mode 100644
index 0000000..b4f69b0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-4.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+/* { dg-require-effective-target divmod } */
+
+int f(int x, int y)
+{
+  int quot = 0;
+  int rem = 0;
+
+  extern int flag;
+
+  if (flag)
+    quot = x / y;
+  else
+    rem = x % y;
+
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index b543519..c3895a7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6494,3 +6494,12 @@ proc check_effective_target_vect_max_reduc { } {
     }
     return 0
 }
+
+# Return 1 if the target supports divmod
+
+proc check_effective_target_divmod { } {
+    if { [istarget arm*-*-*] } {
+       return 1
+    }
+    return 0
+}
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 41fcabf..90c2da8 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -110,6 +110,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "builtins.h"
 #include "params.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -182,6 +184,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3493,6 +3498,151 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
op2)
   return true;
 }
 
+/* Add use_stmt to stmts if either top_bb or gimple_bb(use_stmt) dominate
+   each other. If gimple_bb (use_stmt) dominates top_bb, then set top_bb
+   to gimple_bb (use_stmt).  */
+
+static bool
+maybe_record_divmod (vec<gimple *>& stmts, basic_block& top_bb,
+                    gimple *use_stmt)
+{
+  basic_block bb = gimple_bb (use_stmt);
+
+  if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+    ;
+  else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+    top_bb = bb;
+  else
+    return false;
+
+  stmts.safe_push (use_stmt);
+  return true;
+}  
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   This change is done only if the target has support for divmod.
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt)));
+  const_tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  optab divmod_optab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab;
+
+  /* FIXME: Check for divmod optab handler or libfunc.  */
+
+  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+      || (optab_libfunc (divmod_optab, mode) == NULL_RTX))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+  
+  /* FIXME: Drop handling constants for now.  */
+  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
+    return false;
+
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+
+  vec<gimple *> stmts = vNULL;
+  stmts.safe_push (stmt);
+  basic_block top_bb = gimple_bb (stmt);
+
+  imm_use_iterator use_iter;
+  gimple *use_stmt; 
+ 
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+          && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+         && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+         && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+       maybe_record_divmod (stmts, top_bb, use_stmt);
+    }
+  
+  if (stmts.length () == 1)
+    return false;
+
+  /* Create the library call.  */
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+               build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))),
+               call_stmt, "divmod_tmp");
+
+  gimple_call_set_lhs (call_stmt, res);
+
+  widen_mul_stats.divmod_calls_inserted++; 
+
+  /* Insert call-stmt just before the topmost div/mod stmt.
+     top_bb dominates all other basic blocks containing div/mod stms
+     so, the topmost stmt would be the first div/mod stmt with matching 
operands
+     in top_bb.  */
+
+  gcc_assert (top_bb != 0);
+  gimple_stmt_iterator gsi;
+  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *g = gsi_stmt (gsi);
+      if (is_gimple_assign (g)
+         && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
+            || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
+         && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
+         && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
+       break;
+    }
+
+  gcc_assert (!gsi_end_p (gsi));
+  gsi_insert_before (&gsi, call_stmt, GSI_SAME_STMT);
+
+  /* Update stmts. */
+  bool cfg_changed = false;
+  for (unsigned i = 0; i < stmts.length (); ++i) 
+    {
+      tree rhs;
+      use_stmt = stmts[i];
+ 
+      switch (gimple_assign_rhs_code (use_stmt))
+       {
+         case TRUNC_DIV_EXPR:
+           rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         case TRUNC_MOD_EXPR:
+           rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         default:
+           gcc_unreachable (); 
+       }
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+       cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}
+
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
    where appropriate.  */
@@ -3535,7 +3685,10 @@ pass_optimize_widening_mul::execute (function *fun)
   basic_block bb;
   bool cfg_changed = false;
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3563,6 +3716,10 @@ pass_optimize_widening_mul::execute (function *fun)
                    }
                  break;
 
+               case TRUNC_MOD_EXPR:
+                 convert_to_divmod (as_a<gassign *> (stmt));
+                 break;
+
                case PLUS_EXPR:
                case MINUS_EXPR:
                  convert_plusminus_to_widen (&gsi, stmt, code);
@@ -3614,6 +3771,8 @@ pass_optimize_widening_mul::execute (function *fun)
                            widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
                            widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+                           widen_mul_stats.divmod_calls_inserted);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
2015-11-02  Prathamesh Kulkarni  <prathamesh.kulka...@linaro.org>
            Kugan Vivekanandarajah  <kugan.vivekanandara...@linaro.org> 

        * internal-fn.def: Add entry for expand_DIVMOD.
        * internal-fn.c (expand_DIVMOD): New function.
        * tree-ssa-math-opts.c: (maybe_record_divmod): Likewise. 
        * tree-ssa-math-opts.c: (convert_to_divmod): Likewise.
        * tree-ssa-math-opts.c: Include optabs-libfuncs.h. 
        * tree-ssa-math-opts.c: (widen_mul_stats): New member 
divmod_calls_inserted.
        * tree-ssa-math-opts.c: (pass_optimize_widening_mul::execute): Call 
caluclate_dominace_info,
        call renumber_gimple_stmt_uids, and add check for TRUNC_MOD_EXPR.

testsuite/
        * gcc.dg/pr43721-1.c: New test.
        * gcc.dg/pr43721-2.c: Likewise.
        * gcc.dg/pr43721-3.c: Likewise.
        * gcc.dg/pr43721-4.c: Likewise.
        * lib/target-supports.exp: Add check for divmod. 

Reply via email to