[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

2018-06-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116

--- Comment #5 from Richard Biener  ---
So what is missing is for us to see that for the following __first_21 is
__first_20 + 8B and so we can replace either with the other.  IVOPTs figures
this out but nothing before.


# __first_20 = PHI <_M_elems(2), __first_21(7)>
# __first_21 = PHI <[(void *) + 8B](2), __first_7(7)>
...
_8 = MEM[(const double *)__first_20 + 8B];

 [local count: 955630224]:
# __result_15 = PHI <__first_21(5), __result_22(8)>
__first_7 = __first_21 + 8;
if (__first_7 != [(void *) + 8000B])
  goto ; [89.00%]
else
  goto ; [11.00%]

 [local count: 850510900]:

[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

2018-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116

--- Comment #4 from Richard Biener  ---
So loop-header copying now DTRT but then PRE is faced with

   [local count: 955630223]:
  # __first_20 = PHI <_M_elems(2), __first_21(7)>
  # __first_21 = PHI <[(void *) + 8B](2), __first_7(7)>
  # __result_22 = PHI <_M_elems(2), __result_15(7)>
  _8 = MEM[(const double *)__first_20 + 8B];
  _10 = *__result_22;
  _11 = _8 - offset_2(D);
  _12 = ABS_EXPR <_11>;
  _13 = _10 - offset_2(D);
  _14 = ABS_EXPR <_13>;
  if (_12 < _14)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 477815111]:
  goto ; [100.00%]

   [local count: 477815112]:

   [local count: 955630224]:
  # __result_15 = PHI <__first_21(5), __result_22(8)>
  __first_7 = __first_21 + 8;
  if (__first_7 != [(void *) + 8000B])
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 850510900]:
  goto ;

where the situation is too confusing to be recognized.  As said, loop-header
copying doing a SSA update introduces the extra IV somehow.  That needs to be
investigated.

[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

2018-04-26 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116

--- Comment #3 from Richard Biener  ---
Author: rguenth
Date: Thu Apr 26 12:18:58 2018
New Revision: 259672

URL: https://gcc.gnu.org/viewcvs?rev=259672=gcc=rev
Log:
2018-04-26  Richard Biener  

PR tree-optimization/85116
* tree-ssa-loop-ch.c (do_while_loop_p): A do-while loop should
have a loop exit from the single latch predecessor.  Remove
case of header with just condition.
(ch_base::copy_headers): Exclude infinite loops from any
processing.
(pass_ch::execute): Record exits.

* gcc.dg/tree-ssa/copy-headers-2.c: New testcase.
* gcc.dg/tree-ssa/copy-headers-3.c: Likewise.
* gcc.dg/tree-ssa/copy-headers-4.c: Likewise.
* gcc.dg/tree-ssa/loadpre6.c: Adjust.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-3.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/copy-headers-4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/loadpre6.c
trunk/gcc/tree-ssa-loop-ch.c

[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

2018-04-09 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #2 from Richard Biener  ---
Mine.

[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

2018-03-29 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-03-29
 CC||rguenth at gcc dot gnu.org
  Component|c++ |libstdc++
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Eventually std::min_element can be improved directly but this is a missed
optimization as well.  A similar C testcase is optimized by PRE:

int getMinIdx (double *a, int n, double off)
{
  double *min = a;
  for (double *first = a; first < a + n; first++)
{
  if (__builtin_fabs (*min - off) > __builtin_fabs (*first - off))
min = first;
}
  return min - a;
}

while the C++ testcase ends up in PRE like the following where likely
the extra PHI for __first_2[23] confuse things.  DOM introduces that
by threading the first iteration comparison, avoiding that doesn't seem
to help though.  Ah, it's because the loop isn't header-copied, it says
"Loop 1 is do-while loop" but given the load from __result happens after
the exit test it isn't partially redundant across the backedge.

   [15.00%]:
  goto ; [100.00%]

   [42.50%]:

   [85.00%]:
  # __result_17 = PHI <__result_24(8), __first_23(5)>
  __first_9 = __first_23 + 8;
  if (__first_9 != [(void *) + 8000B])
goto ; [82.35%]
  else
goto ; [17.65%]

   [70.00%]:

   [85.00%]:
  # __first_22 = PHI <__first_23(7), _M_elems(2)>
  # __first_23 = PHI <__first_9(7), [(void *) + 8B](2)>
  # __result_24 = PHI <__result_17(7), _M_elems(2)>
  _10 = MEM[(const double *)__first_22 + 8B];
  _12 = *__result_24;
  _13 = _10 - offset_2(D);
  _14 = ABS_EXPR <_13>;
  _15 = _12 - offset_2(D);
  _16 = ABS_EXPR <_15>;
  if (_14 < _16)
goto ; [50.00%]
  else
goto ; [50.00%]

   [42.50%]:
  goto ; [100.00%]

   [15.00%]:
  __last.3_6 = (long int) __result_17;
  __first.4_5 = (long int) _M_elems;
  _4 = __last.3_6 - __first.4_5;
  _3 = _4 /[ex] 8;
  _7 = (int) _3;
  return _7;


So improving libstd++ would be nice here, resp.

  template
_GLIBCXX14_CONSTEXPR
_ForwardIterator
__min_element(_ForwardIterator __first, _ForwardIterator __last,
  _Compare __comp)
{
  if (__first == __last)
return __first;
  _ForwardIterator __result = __first;
  while (++__first != __last)
if (__comp(__first, __result))
  __result = __first;
  return __result;
}

change that to

  if (__first == __last)
return __first;
  ++_first;
  if (__first == __last)
return __first;
  _ForwardIterator __result = __first;
  do
{
  if (__comp(__first, __result))
__result = __first;
}
  whule (++_first != _last);
  return __result;

do-while style loops are always better for optimization.

Of course the issue is that loop-header copying doesn't do this transform,
it's do-while loop detection is a bit too simplistic...

Fixing that leads CH itself introduce that weird third PHI node :/

Index: gcc/tree-ssa-loop-ch.c
===
--- gcc/tree-ssa-loop-ch.c  (revision 258915)
+++ gcc/tree-ssa-loop-ch.c  (working copy)
@@ -176,6 +176,22 @@ do_while_loop_p (struct loop *loop)
 "header contains just condition.\n", loop->num);
   return false;
 }
+
+  if (edge e = single_exit (loop))
+{
+  if (e->src == loop->header
+ && (!single_pred_p (loop->latch)
+ || single_pred (loop->latch) != loop->header))
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file,
+"Loop %i is not do-while loop: "
+"header contains single exit test but isn't latch.\n",
+loop->num);
+ return false;
+   }
+}
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "Loop %i is do-while loop\n", loop->num);

@@ -409,7 +425,8 @@ unsigned int
 pass_ch::execute (function *fun)
 {
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-  | LOOPS_HAVE_SIMPLE_LATCHES);
+  | LOOPS_HAVE_SIMPLE_LATCHES
+  | LOOPS_HAVE_RECORDED_EXITS);

   unsigned int res = copy_headers (fun);