On 11 November 2015 at 19:04, Richard Biener <rguent...@suse.de> wrote:
> On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>
>> On 11 November 2015 at 16:03, Richard Biener <rguent...@suse.de> wrote:
>> > On Wed, 11 Nov 2015, Prathamesh Kulkarni wrote:
>> >
>> >> On 10 November 2015 at 20:11, Richard Biener <rguent...@suse.de> wrote:
>> >> > On Mon, 9 Nov 2015, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 4 November 2015 at 20:35, Richard Biener <rguent...@suse.de> wrote:
>> >> >> >
>> >> >> > Btw, did you investigate code gen differences on x86_64/i586?  That
>> >> >> > target expands all divisions/modulo ops via divmod, relying on CSE
>> >> >> > solely as the HW always computes both div and mod (IIRC).
>> >> >> x86_64 has optab_handler for divmod defined, so the transform won't
>> >> >> take place on x86.
>> >> >
>> >> > Ok.
>> >> >
>> >> >> > +
>> >> >> > +        gassign *assign_stmt = gimple_build_assign 
>> >> >> > (gimple_assign_lhs
>> >> >> > (use_stmt), rhs);
>> >> >> > +        gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> >> >> >
>> >> >> > Ick.  Please use
>> >> >> >
>> >> >> >     gimple_set_rhs_from_tree (use_stmt, res);
>> >> >> Um there doesn't seem to be gimple_set_rhs_from_tree.
>> >> >> I used gimple_assign_set_rhs_from_tree which requires gsi for use_stmt.
>> >> >> Is that OK ?
>> >> >
>> >> > Yes.
>> >> >
>> >> >> >     update_stmt (use_stmt);
>> >> >> >     if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> >> >> >       cfg_changed = true;
>> >> >> >
>> >> >> > +  free_dominance_info (CDI_DOMINATORS);
>> >> >> >
>> >> >> > do not free dominators.
>> >> >>
>> >> >> I have done the suggested changes in the attached patch.
>> >> >> I have a few questions:
>> >> >>
>> >> >> a) Does the change to insert DIVMOD call before topmost div or mod
>> >> >> stmt with matching operands
>> >> >> look correct ?
>> >> >
>> >> > +  /* Insert call-stmt just before the topmost div/mod stmt.
>> >> > +     top_bb dominates all other basic blocks containing div/mod stms
>> >> > +     so, the topmost stmt would be the first div/mod stmt with matching
>> >> > operands
>> >> > +     in top_bb.  */
>> >> > +
>> >> > +  gcc_assert (top_bb != 0);
>> >> > +  gimple_stmt_iterator gsi;
>> >> > +  for (gsi = gsi_after_labels (top_bb); !gsi_end_p (gsi); gsi_next
>> >> > (&gsi))
>> >> > +    {
>> >> > +      gimple *g = gsi_stmt (gsi);
>> >> > +      if (is_gimple_assign (g)
>> >> > +         && (gimple_assign_rhs_code (g) == TRUNC_DIV_EXPR
>> >> > +            || gimple_assign_rhs_code (g) == TRUNC_MOD_EXPR)
>> >> > +         && operand_equal_p (op1, gimple_assign_rhs1 (g), 0)
>> >> > +         && operand_equal_p (op2, gimple_assign_rhs2 (g), 0))
>> >> > +       break;
>> >> >
>> >> > Looks overly complicated to me.  Just remember "topmost" use_stmt
>> >> > alongside top_bb (looks like you'll no longer need top_bb if you
>> >> > retail top_stmt).  And then do
>> >> >
>> >> >    gsi = gsi_for_stmt (top_stmt);
>> >> >
>> >> > and insert before that.
>> >> Thanks, done in this patch. Does it look OK ?
>> >> IIUC gimple_uid (stmt1) < gimple_uid (stmt2) can be used to check if
>> >> stmt1 occurs before stmt2
>> >> only if stmt1 and stmt2 are in the same basic block ?
>> >> >
>> >> >> b) Handling constants - I dropped handling constants in the attached
>> >> >> patch. IIUC we don't want
>> >> >> to enable this transform if there's a specialized expansion for some
>> >> >> constants for div or mod ?
>> >> >
>> >> > See expand_divmod which has lots of special cases for constant operands
>> >> > not requiring target support for div or mod.
>> >> Thanks, would it be OK if I do this in follow up patch ?
>> >
>> > Well, just not handle them like in your patch is fine.
>> >
>> >> >
>> >> >> I suppose this would also be target dependent and require a target 
>> >> >> hook ?
>> >> >> For instance arm defines modsi3 pattern to expand mod when 2nd operand
>> >> >> is constant and <= 0 or power of 2,
>> >> >> while for other cases goes the expand_divmod() route to generate call
>> >> >> to __aeabi_idivmod libcall.
>> >> >
>> >> > Ok, so it lacks a signed mod instruction.
>> >> >
>> >> >> c) Gating the divmod transform -
>> >> >> I tried gating it on checks for optab_handlers on div and mod, however
>> >> >> this doesn't enable transform for arm cortex-a9
>> >> >> anymore (cortex-a9 doesn't have hardware instructions for integer div 
>> >> >> and mod).
>> >> >> IIUC for cortex-a9,
>> >> >> optab_handler (sdivmod_optab, SImode) returns CODE_FOR_nothing because
>> >> >> HAVE_divsi3 is 0.
>> >> >> However optab_handler (smod_optab, SImode) matches since optab_handler
>> >> >> only checks for existence of pattern
>> >> >> (and not whether the pattern gets matched).
>> >> >> I suppose we should enable the transform only if the divmod, div, and
>> >> >> mod pattern do not match rather than checking
>> >> >> if the patterns exist via optab_handler ? For a general x % y, modsi3
>> >> >> would fail to match but optab_handler(smod_optab, SImode ) still
>> >> >> says it's matched.
>> >> >
>> >> > Ah, of course.  Querying for an optab handler is just a cheap
>> >> > guesstimate...  Not sure how to circumvent this best (sub-target
>> >> > enablement of patterns).  RTL expansion just goes ahead (of course)
>> >> > and sees if expansion eventually fails.  Richard?
>> >> >
>> >> >> Should we define a new target hook combine_divmod, which returns true
>> >> >> if transforming to divmod is desirable for that
>> >> >> target ?
>> >> >> The default definition could be:
>> >> >> bool default_combine_divmod (enum machine_mode mode, tree op1, tree 
>> >> >> op2)
>> >> >> {
>> >> >>   // check for optab_handlers for div/mod/divmod and libfunc for divmod
>> >> >> }
>> >> >>
>> >> >> And for arm, it could be over-ridden to return false if op2 is
>> >> >> constant and <= 0 or power of 2.
>> >> >> I am not really sure if this is a good idea since I am replicating
>> >> >> information from modsi3 pattern.
>> >> >> Any change to the pattern may require corresponding change to the hook 
>> >> >> :/
>> >> >
>> >> > Yeah, I don't think that is desirable.  Ideally we'd have a way
>> >> > to query HAVE_* for CODE_FOR_* which would mean target-insns.def
>> >> > support for all div/mod/divmod patterns(?) and queries...
>> >> >
>> >> > Not sure if what would be enough though.
>> >> >
>> >> > Note that the divmod check is equally flawed.
>> >> >
>> >> > I think with the above I'd enable the transform when
>> >> >
>> >> > +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >> > +      || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>> >> >            && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing))
>> >> > +    return false;
>> >> Um this fails for the arm backend (for cortex-a9) because
>> >> optab_handler (divmod_optab, mode) != CODE_FOR_nothing is false
>> >> optab_libfunc (divmod_optab, mode) != NULL_RTX is true.
>> >> optab_handler (div_optab, mode) == CODE_FOR_nothing is true.
>> >> which comes down to false || (true && true) which is true and we hit
>> >> return false.
>> >
>> > Oh, sorry to mess up the test - it was supposed to be inverted.
>> >
>> >> AFAIU, we want the transform to be disabled if:
>> >> a) optab_handler exists for divmod.
>> >> b) optab_handler exists for div.
>> >> c) optab_libfunc does not exist for divmod.  */
>> >>
>> >> +  if (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >> +      || optab_handler (div_optab, mode) != CODE_FOR_nothing
>> >> +      || optab_libfunc (divmod_optab, mode) == NULL_RTX)
>> >> +    return false;
>> >> Does that look correct ?
>> >
>> > No.  That will disable if we have a divmod optab.  Instead try
>> >
>> >  if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
>> >         || (optab_libfunc (divmod_optab, mode) != NULL_RTX
>> >             && optab_handler ([su]div_optab, mode) == CODE_FOR_nothing)))
>> >    return false;
>> >
>> > which is what I intended.  If we have a divmod optab go ahead.
>> > If we have a libfunc and not a div optab then as well.
>> Oops, I assumed that we only wanted this transform if optab_libfunc existed.
>> Modified the test in the attached patch.
>> Well this does affect x86_64 and i?86 -;)
>>
>> I added the following hunk back to expand_DIVMOD, since if optab_handler 
>> exists
>> we want to use it for expansion. Does it look OK ?
>>
>> +  /* Check if optab handler exists for udivmod/sdivmod.  */
>> +  if (optab_handler (tab, mode) != CODE_FOR_nothing)
>> +    {
>> +      rtx quotient = gen_reg_rtx (mode);
>> +      rtx remainder = gen_reg_rtx (mode);
>> +      expand_twoval_binop (tab, op0, op1, quotient, remainder,
>> TYPE_UNSIGNED (type));
>> +
>> +      /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
>> +      expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
>> +                  make_tree (TREE_TYPE (arg0), quotient),
>> +                  make_tree (TREE_TYPE (arg1), remainder)),
>> +                  target, VOIDmode, EXPAND_NORMAL);
>> +
>> +      return;
>> +    }
>
> Ah, sure.
>
>> I verified the code generated for x86_64 and i?86 and it's same for my
>> test-cases.
>> However during a clean build of gcc for x86_64, I am getting segfault
>> in bid64_div.c:
>> In file included from
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_internal.h:27:0,
>>                  from
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:56:
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:
>> In function ‘__bid64_div’:
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid_conf.h:36:19:
>> internal compiler error: in expand_DIVMOD, at internal-fn.c:2099
>>  #define bid64_div __bid64_div
>>                    ^
>> /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/libgcc/config/libbid/bid64_div.c:80:1:
>> note: in expansion of macro ‘bid64_div’
>>  bid64_div (UINT64 x,
>>  ^
>> 0x8e101f expand_DIVMOD
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/internal-fn.c:2099
>> 0x705927 expand_call_stmt
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:2549
>> 0x705927 expand_gimple_stmt_1
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3509
>> 0x705927 expand_gimple_stmt
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:3672
>> 0x708d65 expand_gimple_basic_block
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:5676
>> 0x70e696 execute
>>         /home/bilbo/gnu-toolchain/src/gcc.git~tcwg-72/gcc/cfgexpand.c:6288
>>
>> It looks like in the following code in expand_DIVMOD:
>>
>> +  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
>> +  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
>> +                                      GET_MODE_SIZE (mode));
>>
>> remainder is (nil) and hence the segfault in make_tree (I added the
>> asserts for quotient and remainder later).
>> I am not sure why it's happening  though, investigating it.
>
> No idea - as said, RTL expansion isn't my area of expertise.  I would
> expect that offsetted subregs shouldn't be used for complex components.
> Maybe gen_lowpart / gen_highpart will work better which abstracts
> the subregging.
Hi Richard,
This is a revamped version of my previous patch for divmod transform.
The segfault with that patch can be reproduced with following
test-case on x86_64 with -m32:
typedef unsigned int DItype __attribute__((mode(DI)));

DItype f (DItype x, DItype y)
{
  DItype quot, rem;

  quot = x / y;
  rem = x % y;
  return quot + rem;
}

Jim pointed out to me that happens because target-specific divmod
libfuncs have different calling conventions, there is no "standard"
calling convention for divmod in libgcc.
The divmod libfunc in this case is libgcc2.c:__udivmoddi4() which has
a different calling convention from arm's __aeabi_divmod() and I was
assuming
that all divmod's follow arm's calling convention.
The arm version expects that op0 and op1 are passed as arguments and
both div,rem are returned with return value having mode twice that of
it's arguments.
whereas libgcc2.c:__udivmoddi4() takes 3 args: op0, op1 of unsigned
DImode and 3rd arg is pointer used for storing remainder while
return value contains quotient.
Similarly spu's divmod libfuncs follow libgcc2.c:__udivmoddi4()'s convention.

To workaround this, I defined a new hook expand_divmod_libfunc, which
targets must override for expanding call to target-specific dimovd.
The "default" hook default_expand_divmod_libfunc() expands call to
libgcc2.c:__udivmoddi4() since that's the only "generic" divmod
available.
Is this a reasonable approach ?

Expansion proceeds as follows:
expand_DIVMOD checks if optab_handler for udivmod/sdivmod exists and
if it does, uses expand_twoval_binop() for expansion.
else it calls the target hook for generating call to divmod libfunc.

The divmod transform takes place if:
a) optab_handler exists for divmod or
b) optab_libfunc (divmod) exists and optab_handler(div) doesn't.
and target overrides expand_divmod_libfunc or for the default function,
mode is DImode and TYPE_UNSIGNED (type) is true.

Regarding test-cases, I added effective-check for divmod for only arm
However DIVMOD transforms will also take place for targets having
hardware div instructions.
Should I need to manually add each configuration to
check_effective_target_divmod() that have
hardware div or is there a better way to check that ?

The patch passes bootstrap and testing on x86_64-unknown-linux-gnu
and cross-tested on arm-linux-gnueabihf, arm-none-eabi,
armeb-none-linux-gnueabihf,

Thanks,
Prathamesh
>
> Richard.
>
>> Thanks,
>> Prathamesh
>> >
>> >> >
>> >> > so we either will have a divmod instruction (hopefully not sub-target
>> >> > disabled for us) or a libfunc for divmod and for sure no HW divide
>> >> > instruction (HW mod can be emulated by HW divide but not the other
>> >> > way around).
>> >> >
>> >> >> d) Adding effective-target-check for divmod: I just enabled it for
>> >> >> arm*-*-* for now. I could additionally append more targets,
>> >> >> not sure if this is the right approach.
>> >> >
>> >> > Looks good to me.
>> >> Is this version OK if bootstrap/testing passes ?
>> >
>> > Ok with adjusting the optab check like above.
>> >
>> > Thanks,
>> > Richard.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9b03b05..33d2a44 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -300,6 +300,7 @@ static void arm_canonicalize_comparison (int *code, rtx 
*op0, rtx *op1,
 static unsigned HOST_WIDE_INT arm_asan_shadow_offset (void);
 
 static void arm_sched_fusion_priority (rtx_insn *, int, int *, int*);
+static void arm_expand_divmod_libfunc (bool, machine_mode, rtx, rtx, rtx *, 
rtx *);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -735,6 +736,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_SCHED_FUSION_PRIORITY
 #define TARGET_SCHED_FUSION_PRIORITY arm_sched_fusion_priority
 
+#undef TARGET_EXPAND_DIVMOD_LIBFUNC
+#define TARGET_EXPAND_DIVMOD_LIBFUNC arm_expand_divmod_libfunc
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Obstack for minipool constant handling.  */
@@ -30192,4 +30196,30 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
   return;
 }
 
+/* Expand call to __aeabi_idivmod (op0, op1).  */
+static void
+arm_expand_divmod_libfunc (bool unsignedp, machine_mode mode, rtx op0, 
+                          rtx op1, rtx *quot_p, rtx *rem_p)
+{
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+  rtx libfunc = optab_libfunc (tab, mode);
+  gcc_assert (libfunc != NULL_RTX);
+  
+  machine_mode libval_mode =
+    smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT);
+
+  rtx libval = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+                                       libval_mode, 2, op0, GET_MODE (op0), 
op1, GET_MODE (op1)); 
+
+  rtx quotient = simplify_gen_subreg (mode, libval, libval_mode, 0);
+  rtx remainder = simplify_gen_subreg (mode, libval, libval_mode,
+                                      GET_MODE_SIZE (mode));
+
+  gcc_assert (quotient != NULL_RTX);
+  gcc_assert (remainder != NULL_RTX);
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-arm.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index aae09bf..0212f81 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6962,6 +6962,10 @@ This is firstly introduced on ARM/AArch64 targets, 
please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn
 
+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (bool 
@var{unsignedp}, machine_mode @var{mode}, @var{rtx}, @var{rtx}, rtx 
*@var{quot}, rtx *@var{rem})
+Expand divmod libfunc
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f31c763..b25dcf9 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4848,6 +4848,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_FUSION_PRIORITY
 
+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index c07b538..135a5fc 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2305,6 +2305,49 @@ set_edom_supported_p (void)
 #endif
 }
 
+/* Expand DIVMOD() using:
+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, Generate call to
+    optab_libfunc for udivmod/sdivmod.  */
+
+static void
+expand_DIVMOD (internal_fn, gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+  tree arg0 = gimple_call_arg (stmt, 0);
+  tree arg1 = gimple_call_arg (stmt, 1);
+  
+  gcc_assert (TREE_CODE (TREE_TYPE (lhs)) == COMPLEX_TYPE); 
+  tree type = TREE_TYPE (TREE_TYPE (lhs));
+  machine_mode mode = TYPE_MODE (type);
+  bool unsignedp = TYPE_UNSIGNED (type);
+  optab tab = (unsignedp) ? udivmod_optab : sdivmod_optab;
+
+  rtx op0 = expand_normal (arg0);
+  rtx op1 = expand_normal (arg1);
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+
+  rtx quotient, remainder;
+
+  /* Check if optab handler exists for udivmod/sdivmod.  */
+  if (optab_handler (tab, mode) != CODE_FOR_nothing)
+    {
+      quotient = gen_reg_rtx (mode);
+      remainder = gen_reg_rtx (mode);
+      expand_twoval_binop (tab, op0, op1, quotient, remainder, unsignedp); 
+    }
+  else
+    targetm.expand_divmod_libfunc (unsignedp, mode, op0,
+                                  op1, &quotient, &remainder);
+
+  /* Wrap the return value (quotient, remaineder) within COMPLEX_EXPR */
+  expand_expr (build2 (COMPLEX_EXPR, TREE_TYPE (lhs),
+                      make_tree (TREE_TYPE (arg0), quotient),
+                      make_tree (TREE_TYPE (arg1), remainder)),
+              target, VOIDmode, EXPAND_NORMAL);
+}
+
+
 #define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) \
   static void                                          \
   expand_##CODE (internal_fn fn, gcall *stmt)          \
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index a62f3e8..b969ee5 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -189,6 +189,8 @@ DEF_INTERNAL_FN (GOACC_REDUCTION, ECF_NOTHROW | ECF_LEAF, 
NULL)
    current target.  */
 DEF_INTERNAL_FN (SET_EDOM, ECF_LEAF | ECF_NOTHROW, NULL)
 
+DEF_INTERNAL_FN (DIVMOD, ECF_CONST | ECF_LEAF, NULL)
+
 #undef DEF_INTERNAL_INT_FN
 #undef DEF_INTERNAL_FLT_FN
 #undef DEF_INTERNAL_OPTAB_FN
diff --git a/gcc/target.def b/gcc/target.def
index d60319e..9663aab 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4969,6 +4969,13 @@ Normally, this is not needed.",
  bool, (const_tree field, machine_mode mode),
  default_member_type_forces_blk)
 
+/* Hook for generating call to divmod libfunc.  */
+DEFHOOK
+(expand_divmod_libfunc,
+  "Expand divmod libfunc",
+  void, (bool unsignedp, machine_mode mode, rtx, rtx, rtx *quot, rtx *rem),
+  default_expand_divmod_libfunc)
+
 /* Return the class for a secondary reload, and fill in extra information.  */
 DEFHOOK
 (secondary_reload,
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 8a162a1..4906b1c 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1961,4 +1961,31 @@ default_optab_supported_p (int, machine_mode, 
machine_mode, optimization_type)
   return true;
 }
 
+void
+default_expand_divmod_libfunc (bool unsignedp, machine_mode mode, rtx op0,
+                              rtx op1, rtx *quot_p, rtx *rem_p)
+{
+  gcc_assert (mode == DImode);
+  gcc_assert (unsignedp);
+
+  /* Generate call to
+     DImode __udivmoddi4 (DImode op0, DImode op1, DImode *rem).
+   */
+
+  rtx libfunc = optab_libfunc (udivmod_optab, DImode);
+  gcc_assert (libfunc);
+
+  rtx remainder = assign_stack_temp (DImode, GET_MODE_SIZE (DImode));
+  rtx address = XEXP (remainder, 0);
+
+  rtx quotient = emit_library_call_value (libfunc, NULL_RTX, LCT_CONST,
+                                       DImode, 3,
+                                       op0, GET_MODE (op0),
+                                       op1, GET_MODE (op1),
+                                       address, GET_MODE (address));   
+
+  *quot_p = quotient;
+  *rem_p = remainder;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 7ab647f..5df9a7f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -253,4 +253,7 @@ extern void default_setup_incoming_vararg_bounds 
(cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
                                       optimization_type);
 
+extern void default_expand_divmod_libfunc (bool, machine_mode, rtx,
+                                          rtx, rtx *, rtx *);
+
 #endif /* GCC_TARGHOOKS_H */
diff --git a/gcc/testsuite/gcc.dg/pr43721-1.c b/gcc/testsuite/gcc.dg/pr43721-1.c
new file mode 100644
index 0000000..8873d9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-1.c
@@ -0,0 +1,10 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  int quotient = x / y;
+  int remainder = x % y;
+  return quotient + remainder;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-2.c b/gcc/testsuite/gcc.dg/pr43721-2.c
new file mode 100644
index 0000000..62d73af
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-2.c
@@ -0,0 +1,16 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  extern int early_exit;
+
+  int quot = x / y;
+
+  if (early_exit)
+    return 0;
+
+  int rem = x % y;
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 1 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-3.c b/gcc/testsuite/gcc.dg/pr43721-3.c
new file mode 100644
index 0000000..74816a0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-3.c
@@ -0,0 +1,17 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  extern int flag;
+  int quot;
+
+  if (flag)
+    quot = x / y;
+  else
+    quot = 0;
+
+  int rem = x % y;
+  return quot + rem; 
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-4.c b/gcc/testsuite/gcc.dg/pr43721-4.c
new file mode 100644
index 0000000..fd82ad8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-4.c
@@ -0,0 +1,18 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+int f(int x, int y)
+{
+  int quot = 0;
+  int rem = 0;
+
+  extern int flag;
+
+  if (flag)
+    quot = x / y;
+  else
+    rem = x % y;
+
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/gcc.dg/pr43721-5.c b/gcc/testsuite/gcc.dg/pr43721-5.c
new file mode 100644
index 0000000..2e7ac42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr43721-5.c
@@ -0,0 +1,12 @@
+/* { dg-options "-O2 -fdump-tree-widening_mul" } */
+
+/* Do not enable the transform if either operand is constant.  */
+int f1(int x)
+{
+  int quot = x / 3; 
+  int rem = x % 3;
+
+  return quot + rem;
+}
+
+/* { dg-final { scan-tree-dump-times "DIVMOD" 0 "widening_mul" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 645981a..2a7dd40 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6918,3 +6918,13 @@ proc check_effective_target_offload_nvptx { } {
        int main () {return 0;}
     } "-foffload=nvptx-none" ]
 }
+
+# Return 1 if the target supports divmod
+
+proc check_effective_target_divmod { } {
+    if { [istarget arm*-*-*] } {
+       return 1
+    }
+    return 0
+}
+
diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
index abd77e7..1375e69 100644
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -112,6 +112,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "internal-fn.h"
 #include "case-cfn-macros.h"
+#include "optabs-libfuncs.h"
+#include "tree-eh.h"
 
 /* This structure represents one basic block that either computes a
    division, or is a common dominator for basic block that compute a
@@ -184,6 +186,9 @@ static struct
 
   /* Number of fp fused multiply-add ops inserted.  */
   int fmas_inserted;
+
+  /* Number of divmod calls inserted.  */
+  int divmod_calls_inserted;
 } widen_mul_stats;
 
 /* The instance of "struct occurrence" representing the highest
@@ -3752,6 +3757,166 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, 
gimple *stmt,
   return true;
 }
 
+/* Set top_stmt to topmost stmt between top_stmt and use_stmt, and add
+   use_stmt to vector stmts provided basic block containing top_stmt
+   dominates use_stmt or vice versa.  */
+   
+static void 
+maybe_record_divmod (vec<gimple *>& stmts, gimple *&top_stmt, gimple *use_stmt)
+{
+  basic_block bb = gimple_bb (use_stmt);
+  basic_block top_bb = gimple_bb (top_stmt);
+
+  if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) 
+    {
+      if (bb != top_bb
+         || gimple_uid (use_stmt) < gimple_uid (top_stmt))
+       top_stmt = use_stmt;
+    }  
+  else if (!dominated_by_p (CDI_DOMINATORS, bb, top_bb)) 
+    return;
+
+  stmts.safe_push (use_stmt);
+}  
+
+/* Check if the stmt is a candidate for divmod transform.  */
+
+static bool
+divmod_candidate_p (gassign *stmt)
+{
+  enum machine_mode mode = TYPE_MODE (TREE_TYPE (gimple_assign_lhs (stmt)));
+  const_tree type = TREE_TYPE (gimple_assign_lhs (stmt));
+  optab divmod_optab, div_optab;
+  void default_expand_divmod_libfunc (bool, machine_mode, rtx, rtx, rtx *, rtx 
*);
+
+  if (TYPE_UNSIGNED (type))
+    {
+      divmod_optab = udivmod_optab;
+      div_optab = udiv_optab;
+    }
+  else
+    {
+      divmod_optab = sdivmod_optab;
+      div_optab = sdiv_optab;
+    }
+
+  /* Enable the transform if:
+     a) optab_handler exists for divmod or
+     b) optab_libfunc (divmod) exists and optab_handler(div) doesn't.
+        and target overrides expand_divmod_libfunc or for the default function,
+        mode is DImode and TYPE_UNSIGNED (type) is true.  */
+
+  if (! (optab_handler (divmod_optab, mode) != CODE_FOR_nothing
+        || ((optab_libfunc (divmod_optab, mode) != NULL_RTX
+            && optab_handler (div_optab, mode) == CODE_FOR_nothing)
+           && (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc
+               || (TYPE_UNSIGNED (type) && mode == DImode)))))
+   return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  /* Disable the transform if either is a constant, since division-by-constant
+     may have specialized expansion.  */
+  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))
+    return false;
+
+  if (TYPE_OVERFLOW_TRAPS (type))
+    return false;
+
+  return true;
+} 
+
+/* This function looks for:
+   t1 = a TRUNC_DIV_EXPR b;
+   t2 = a TRUNC_MOD_EXPR b;
+   and transforms it to the following sequence:
+   complex_tmp = DIVMOD (a, b);
+   t1 = REALPART_EXPR(a);
+   t2 = IMAGPART_EXPR(b);
+   This change is done only if the target has support for divmod.
+
+   The pass works in two phases:
+   1) Walk through all immediate uses of stmt's operand and find a
+      TRUNC_DIV_EXPR with matching operands and if such a stmt is found add
+      it to stmts vector.
+   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block that
+      dominates other div/mod stmts with same operands) and update entries in
+      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
+      IMAGPART_EXPR for mod).  */
+
+static bool
+convert_to_divmod (gassign *stmt)
+{
+  if (!divmod_candidate_p (stmt))
+    return false;
+
+  tree op1 = gimple_assign_rhs1 (stmt);
+  tree op2 = gimple_assign_rhs2 (stmt);
+
+  vec<gimple *> stmts = vNULL;
+  stmts.safe_push (stmt);
+
+  imm_use_iterator use_iter;
+  gimple *use_stmt; 
+  gimple *top_stmt = stmt;
+ 
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+    {
+      if (is_gimple_assign (use_stmt)
+          && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+         && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+         && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
+       maybe_record_divmod (stmts, top_stmt, use_stmt);
+    }
+  
+  if (stmts.length () == 1)
+    return false;
+
+  /* Create the library call and insert the call stmt before top_stmt.  */
+  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
+  tree res = make_temp_ssa_name (
+               build_complex_type (TREE_TYPE (gimple_assign_lhs (stmt))),
+               call_stmt, "divmod_tmp");
+
+  gimple_call_set_lhs (call_stmt, res);
+  gimple_stmt_iterator top_stmt_gsi = gsi_for_stmt (top_stmt);
+  gsi_insert_before (&top_stmt_gsi, call_stmt, GSI_SAME_STMT);
+
+  widen_mul_stats.divmod_calls_inserted++; 
+
+  /* Update stmts. */
+  bool cfg_changed = false;
+  for (unsigned i = 0; i < stmts.length (); ++i) 
+    {
+      tree rhs;
+      use_stmt = stmts[i];
+ 
+      switch (gimple_assign_rhs_code (use_stmt))
+       {
+         case TRUNC_DIV_EXPR:
+           rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         case TRUNC_MOD_EXPR:
+           rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res);
+           break;
+
+         default:
+           gcc_unreachable (); 
+       }
+
+      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+      gimple_assign_set_rhs_from_tree (&gsi, rhs);
+      update_stmt (use_stmt);
+
+      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+       cfg_changed = true;
+    }
+
+  stmts.release ();
+  return cfg_changed;
+}
 
 /* Find integer multiplications where the operands are extended from
    smaller types, and replace the MULT_EXPR with a WIDEN_MULT_EXPR
@@ -3796,6 +3961,8 @@ pass_optimize_widening_mul::execute (function *fun)
   bool cfg_changed = false;
 
   memset (&widen_mul_stats, 0, sizeof (widen_mul_stats));
+  calculate_dominance_info (CDI_DOMINATORS);
+  renumber_gimple_stmt_uids ();
 
   FOR_EACH_BB_FN (bb, fun)
     {
@@ -3829,6 +3996,10 @@ pass_optimize_widening_mul::execute (function *fun)
                    match_uaddsub_overflow (&gsi, stmt, code);
                  break;
 
+               case TRUNC_MOD_EXPR:
+                 convert_to_divmod (as_a<gassign *> (stmt));
+                 break;
+
                default:;
                }
            }
@@ -3875,7 +4046,9 @@ pass_optimize_widening_mul::execute (function *fun)
                            widen_mul_stats.maccs_inserted);
   statistics_counter_event (fun, "fused multiply-adds inserted",
                            widen_mul_stats.fmas_inserted);
-
+  statistics_counter_event (fun, "divmod calls inserted",
+                           widen_mul_stats.divmod_calls_inserted);
+ 
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 

Attachment: ChangeLog
Description: Binary data

Reply via email to