On 2 November 2015 at 13:20, Prathamesh Kulkarni
<prathamesh.kulka...@linaro.org> wrote:
> On 30 October 2015 at 15:57, Richard Biener <richard.guent...@gmail.com> 
> wrote:
>> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni
>> <prathamesh.kulka...@linaro.org> wrote:
>>> Hi,
>>> I have attached revamped version of Kugan's patch
>>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721.
>>> Test-case: http://pastebin.com/QMfpXLD9
>>> divmod pass dump: http://pastebin.com/yMY1ikCp
>>> Assembly: http://pastebin.com/kk2HZpvA
>>>
>>> The approach I took is similar to sincos pass, which involves two parts:
>>> a) divmod pass that transforms:
>>> t1 = a / b;
>>> t2 = a % b;
>>> to the following sequence:
>>> complex_tmp = DIVMOD (a, b);
>>> t1 = REALPART_EXPR(a);
>>> t2 = IMAGPART_EXPR(b);
>>>
>>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by
>>> expand_DIVMOD().
>>> I am not sure if I have done this correctly. I was somehow looking to
>>> reuse expand_divmod() but am not able to figure out how to do that
>>> (AFAIU expand_divmod() strictly returns either the quotient or
>>> remainder but never both which is what I want), so ended up with
>>> manually expanding to rtx.
>>>
>>> While going through the sincos pass in execute_cse_sincos_1, I didn't
>>> understand the following:
>>>  if (gimple_purge_dead_eh_edges (gimple_bb (stmt)))
>>>           cfg_changed = true;
>>> Um why is the call to gimple_purge_dead_eh_edges necessary here?
>>
>> The call replacement might no longer throw.
>>
>>> A silly question, what options to pass to gcc to print statistics ? I
>>> added statistics_counter_event to keep track of number of calls
>>> inserted/used but not sure how to print them :/
>>
>> -fdump-tree-XXX-stats or -fdump-statistics-stats
> Thanks, I can see it now -;)
>>
>>> I would be grateful for suggestions for improving the patch.
>>
>> First of all new functions need comments (expand_DIVMOD).
>>
>> Second - what's the reasoning for the pass placement seen?  I think
>> the transform is a lowering one, improving RTL expansion.  The
>> DIVMOD representation is harmful for GIMPLE optimizers.
>>
>> Third - I'd have integrated this with an existing pass - we have another
>> lowering / RTL expansion kind pass which is pass_optimize_widening_mul.
>> Please piggy-back on it.
>>
>> You seem to do the transform unconditionally even if the target has
>> instructions for division and modulus but no instruction for divmod
>> in which case you'll end up emitting a library call in the end instead
>> of a div and mod instruction.  I think the transform should be gated on
>> missing div/mod instructions or the availability of a divmod one.
>>
>> +         if (is_gimple_assign (stmt)
>> +             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
>> tcc_binary)
>> +           {
>> +             if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR)
>> +               cfg_changed |= execute_divmod_1 (stmt);
>>
>> you can directly check gimple_assign_rhs_code.  I'd check for TRUNC_MOD_EXPR
>> which seems to be less common and thus should cut down compile-time.
>>
>> +  /* Case 2: When both are in top_bb */
>> +  else if (op1_def_in_bb && op2_def_in_bb)
>> +    {
>> +      /* FIXME: better way to compare iterators?.  */
>> +      gimple_stmt_iterator gsi = get_later_stmt (top_bb,
>> def_stmt_op1, def_stmt_op2);
>>
>> You can't use get_later_stmt, it's a vectorizer speciality.  To do
>> this test efficiently
>> you need stmt UIDs computed.
>>
>> I miss a testcase (or two).
> I have tried to address your comments in the attached patch.
oops, unsigned uid_no = 0; should be outside the loop in
pass_optimize_widening_mul::execute.
Done in this version of the patch.

Thanks,
Prathamesh
> Could you please review it for me ?
>
> I have a few questions:
>
> a) Is the check for availability of divmod correct ?
>
> b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1.
> I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 == 
> bb1).
> However I wonder if we should also check that if bb0 dominates bb1
> then bb1 should
> post-dominate bb0 ?
> For instance the transform gets applied to test-case in pr43721-2.c.
> bb containing rem = x % y doesn't post-dominate bb containing quot = x / y;
> I wonder if we want to perform the transform for this case since control flow
> may not always reach from quot = x / y to rem = x % y ?
>
> c) A silly queston, I wonder if vec<gimple *> stmts  in convert_to_divmod() 
> will
> have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in
> the vector) ?
> I assume that cse will take place before widening_mul pass executes
> (since pass_fre/pass_fre executes earlier), and there will be no
> duplicate gimple stmts ?
> So vec<gimple *> stmts would contain at most 2 entries - gimple stmt
> with subcode  TRUNC_MOD_EXPR
> and gimple stmt with subcode TRUNC_DIV_EXPR having same operands.
> There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with
> same operands ?
>
> d) I am not sure if I have correctly done expansion of divmod to rtx
> in expand_DIVMOD() (I fear I may be missing
> many checks that are in expmed.c:expand_divmod).
>
> Thanks,
> Prathamesh
>>
>> Richard.
>>
>>
>>> Thank you,
>>> Prathamesh
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a7da373..1fda966 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2045,6 +2045,75 @@ expand_GOACC_LOOP (gcall *stmt ATTRIBUTE_UNUSED)
   gcc_unreachable ();
 }
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available
+ b) If optab_handler doesn't exist, Generate call to optab_libfunc for 
udivmod/sdivmod.  */ 
+
+static void
+expand_DIVMOD (gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+  
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); 
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  /* Check if optab handler exists for udivmod/sdivmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      rtx quotient = gen_reg_rtx (mode);
+      rtx remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, TYPE_UNSIGNED 
(type));
+  
+      /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
+      expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+                  make_tree (TREE_TYPE (arg0), quotient),
+                  make_tree (TREE_TYPE (arg1), remainder)),
+                  target, VOIDmode, EXPAND_NORMAL);
+
+      return;
+    }
+
+  rtx libfunc = NULL_RTX;
+  machine_mode compute_mode;
+  for (compute_mode = mode;
+       compute_mode != VOIDmode; 
+       compute_mode = GET_MODE_WIDER_MODE (compute_mode))
+    {
+      libfunc = optab_libfunc (tab, compute_mode);
+      if (libfunc != NULL_RTX)
+       break;
+    }
+
+  gcc_assert (libfunc != NULL_RTX);
+  
+  if (compute_mode != mode)
+    {
+      op0 = convert_modes (compute_mode, GET_MODE (op0), op0, tab);
+      op1 = convert_modes (compute_mode, GET_MODE (op1), op1, tab);
+    }
+
+  machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE 
(compute_mode), MODE_INT);
+  rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+                                       libval_mode, 2, op0, compute_mode, op1, 
compute_mode);   
+
+  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
+  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode, 
GET_MODE_SIZE (compute_mode));
+
+  /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+                      make_tree (TREE_TYPE (arg0), quotient),
+                      make_tree (TREE_TYPE (arg1), remainder)),
+             target, VOIDmode, EXPAND_NORMAL);
+}
+
 /* Routines to expand each internal function, indexed by function number.
    Each routine has the prototype:
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 78266d9..f28579b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -83,3 +83,5 @@ DEF_INTERNAL_FN (GOACC_DIM_POS, ECF_PURE | ECF_NOTHROW | 
ECF_LEAF, ".")
 
 /* OpenACC looping abstraction.  See internal-fn.h for usage.  */
 DEF_INTERNAL_FN (GOACC_LOOP, ECF_PURE | ECF_NOTHROW, NULL)
+
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c
new file mode 100644
index 0000000..8873d9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-1.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  int quotient = x / y;
+  int remainder = x % y;
+  return quotient + remainder;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c
new file mode 100644
index 0000000..62d73af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-2.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  extern int early_exit;
+
+  int quot = x / y;
+
+  if (early_exit)
+    return 0;
+
+  int rem = x % y;
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c
new file mode 100644
index 0000000..74816a0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-3.c
@@ -0,0 +1,17 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  extern int flag;
+  int quot;
+
+  if (flag)
+    quot = x / y;
+  else
+    quot = 0;
+
+  int rem = x % y;
+  return quot + rem; 
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c
new file mode 100644
index 0000000..fd82ad8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-4.c
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  int quot = 0;
+  int rem = 0;
+
+  extern int flag;
+
+  if (flag)
+    quot = x / y;
+  else
+    rem = x % y;
+
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index 41fcabf..3dc09ef 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -110,6 +110,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa.h"
 #include "builtins.h"
 #include "params.h"
+#include "optabs-libfuncs.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -182,6 +183,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3493,6 +3497,166 @@ convert_mult_to_fma (gimple *mul_stmt, tree op1, tree 
op2)
   return true;
 }
 
+/* Add use_stmt to stmts if either top_bb or gimple_bb(use_stmt) dominate each 
other.
+   If gimple_bb (use_stmt) dominates top_bb, then set top_bb to gimple_bb 
(use_stmt).  */
+
+static bool
+maybe_record_divmod (vec<gimple *>& stmts, basic_block& top_bb, gimple 
*use_stmt)
+{
+  basic_block bb = gimple_bb (use_stmt);
+
+  if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
+    ;
+  else if (dominated_by_p (CDI_DOMINATORS, bb, top_bb))
+    top_bb = bb;
+  else
+    return false;
+
+  stmts.safe_push (use_stmt);
+  return true;
+}  
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   This change is done only if the target has support for divmod.
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a 
TRUNC_DIV_EXPR with matching operands
+      and if such a stmt is found add it to stmts vector.
+   2) Insert DIVMOD() internal function in the basic block that dominates all 
statements in stmts
+      vector (top_bb) and upates statements in stmts vector to use return 
value of DIVMOD.  */ 
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  /* Check if divmod is available for the target.  */
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt)));
+  const_tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  optab tab = (TYPE_UNSIGNED (type)) ? udivmod_optab : sdivmod_optab;
+
+  if (!(optab_handler (tab, mode) || optab_libfunc (tab, mode)))
+    return false;
+
+  vec<gimple *> stmts = vNULL;
+  stmts.safe_push (stmt);
+  basic_block top_bb = gimple_bb (stmt);
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  use_operand_p use_p;
+  imm_use_iterator use_iter;
+
+  FOR_EACH_IMM_USE_FAST (use_p, use_iter, op1)
+    {
+      gimple *use_stmt = USE_STMT (use_p);
+      if (is_gimple_assign (use_stmt)
+         && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR) 
+       {
+         tree u_op1 = gimple_assign_rhs1 (use_stmt);
+         tree u_op2 = gimple_assign_rhs2 (use_stmt);
+
+         if ((operand_equal_p (op1, u_op1, 0) && operand_equal_p (op2, u_op2, 
0))
+             || (operand_equal_p (op1, u_op2, 0) && operand_equal_p (op2, 
u_op1, 0)))
+             maybe_record_divmod (stmts, top_bb, use_stmt);
+       }
+    }
+
+  if (stmts.length () == 1)
+    return false;
+
+  /* Create the library call.  */
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE 
(gimple_assign_lhs (stmt))), call_stmt, "divmod_tmp"); 
+  gimple_call_set_lhs (call_stmt, res);
+
+  widen_mul_stats.divmod_calls_inserted++; 
+
+  /* Insert the call-stmt.  */ 
+  gimple *def_stmt_op1 = SSA_NAME_DEF_STMT (op1); 
+  bool op1_def_in_bb = 
+    (!SSA_NAME_IS_DEFAULT_DEF (op1)
+    && gimple_code (def_stmt_op1) != GIMPLE_PHI
+    && gimple_bb (def_stmt_op1) == top_bb);
+ 
+  gimple *def_stmt_op2 = SSA_NAME_DEF_STMT (op2); 
+  bool op2_def_in_bb = 
+    (!SSA_NAME_IS_DEFAULT_DEF (op2)
+    && gimple_code (def_stmt_op2) != GIMPLE_PHI
+    && gimple_bb (def_stmt_op2) == top_bb);
+
+  /* 3 cases arise where to insert the call to divmod depending upon where op1 
and op2 are defined wrt top_bb:
+   * Case 1: When both def are outside top_bb, simply insert after labels */
+  if (!op1_def_in_bb && !op2_def_in_bb)
+    {
+      gcc_assert (top_bb != 0);
+      gimple_stmt_iterator gsi = gsi_after_labels (top_bb);
+      gsi_insert_before (&gsi, call_stmt, GSI_SAME_STMT);
+    }
+
+  /* Case 2: When both def are in top_bb.  */
+  else if (op1_def_in_bb && op2_def_in_bb)
+    {
+      gimple_stmt_iterator gsi;
+
+      if (gimple_uid (def_stmt_op1) < gimple_uid (def_stmt_op2))
+       gsi = gsi_for_stmt (def_stmt_op2);
+      else
+       gsi = gsi_for_stmt (def_stmt_op1);
+
+      gsi_insert_after (&gsi, call_stmt, GSI_SAME_STMT);
+    }
+
+  /* Case 3: When one def is inside top_bb and one is outside.  */
+  else
+    {
+      gimple_stmt_iterator gsi;
+      if (op1_def_in_bb)
+       gsi = gsi_for_stmt (def_stmt_op1);
+      else
+       gsi = gsi_for_stmt (def_stmt_op2);
+
+      gsi_insert_after (&gsi, call_stmt, GSI_SAME_STMT);
+    }
+
+  /* Update stmts. */
+  bool cfg_changed = false;
+  gimple *use_stmt;
+  for (unsigned i = 0; i < stmts.length (); ++i) 
+    {
+      tree rhs;
+      use_stmt = stmts[i];
+ 
+      switch (gimple_assign_rhs_code (use_stmt))
+       {
+         case TRUNC_DIV_EXPR:
+           rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         case TRUNC_MOD_EXPR:
+           rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         default:
+           gcc_unreachable (); 
+       }
+
+        gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs 
(use_stmt), rhs);
+        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+        gsi_replace (&gsi, assign_stmt, true);
+        if (gimple_purge_dead_eh_edges (gimple_bb (assign_stmt)))
+         cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}
+
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
    where appropriate.  */
@@ -3535,8 +3699,21 @@ pass_optimize_widening_mul::execute (function *fun)
   basic_block bb;
   bool cfg_changed = false;
 
+  calculate_dominance_info (CDI_DOMINATORS);
+
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
 
+  /* Compute stmt uids.  */
+  unsigned uid_no = 0;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); 
gsi_next (&gsi))
+       {
+         gimple *stmt = gsi_stmt (gsi);
+         gimple_set_uid (stmt, uid_no++);
+       }
+    } 
+
   FOR_EACH_BB_FN (bb, fun)
     {
       gimple_stmt_iterator gsi;
@@ -3563,6 +3740,10 @@ pass_optimize_widening_mul::execute (function *fun)
                    }
                  break;
 
+               case TRUNC_MOD_EXPR:
+                 convert_to_divmod (as_a<gassign *> (stmt));
+                 break;
+
                case PLUS_EXPR:
                case MINUS_EXPR:
                  convert_plusminus_to_widen (&gsi, stmt, code);
@@ -3614,6 +3795,10 @@ pass_optimize_widening_mul::execute (function *fun)
                            widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
                            widen_mul_stats.fmas_inserted);
+  statistics_counter_event (fun, "divmod calls inserted",
+                           widen_mul_stats.divmod_calls_inserted);
+
+  free_dominance_info (CDI_DOMINATORS);
 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
2015-11-02  Prathamesh Kulkarni  <prathamesh.kulka...@linaro.org>
            Kugan Vivekanandarajah  <kugan.vivekanandara...@linaro.org> 

        * internal-fn.def: Add entry for expand_DIVMOD.
        * internal-fn.c (expand_DIVMOD): New function.
        * tree-ssa-math-opts.c: (maybe_record_divmod): Likewise. 
        * tree-ssa-math-opts.c: (convert_to_divmod): Likewise.
        * tree-ssa-math-opts.c: Include optabs-libfuncs.h. 
        * tree-ssa-math-opts.c: (widen_mul_stats): New member 
divmod_calls_inserted.
        * tree-ssa-math-opts.c: (pass_optimize_widening_mul::execute): Call 
caluclate_dominace_info,
        compute stmt uids and add check for TRUNC_MOD_EXPR.

testsuite/
        * gcc.dg/pr43721-1.c: New test.
        * gcc.dg/pr43721-2.c: Likewise.
        * gcc.dg/pr43721-3.c: Likewise.
        * gcc.dg/pr43721-4.c: Likewise.

Reply via email to