[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

rguenth at gcc dot gnu.org Thu, 29 Mar 2018 01:27:45 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-03-29
                 CC|                            |rguenth at gcc dot gnu.org
          Component|c++                         |libstdc++
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Eventually std::min_element can be improved directly but this is a missed
optimization as well.  A similar C testcase is optimized by PRE:

int getMinIdx (double *a, int n, double off)
{
  double *min = a;
  for (double *first = a; first < a + n; first++)
    {
      if (__builtin_fabs (*min - off) > __builtin_fabs (*first - off))
        min = first;
    }
  return min - a;
}

while the C++ testcase ends up in PRE like the following where likely
the extra PHI for __first_2[23] confuse things.  DOM introduces that
by threading the first iteration comparison, avoiding that doesn't seem
to help though.  Ah, it's because the loop isn't header-copied, it says
"Loop 1 is do-while loop" but given the load from __result happens after
the exit test it isn't partially redundant across the backedge.

  <bb 2> [15.00%]:
  goto <bb 4>; [100.00%]

  <bb 8> [42.50%]:

  <bb 3> [85.00%]:
  # __result_17 = PHI <__result_24(8), __first_23(5)>
  __first_9 = __first_23 + 8;
  if (__first_9 != &MEM[(void *)&testArray + 8000B])
    goto <bb 7>; [82.35%]
  else
    goto <bb 6>; [17.65%]

  <bb 7> [70.00%]:

  <bb 4> [85.00%]:
  # __first_22 = PHI <__first_23(7), &testArray._M_elems(2)>
  # __first_23 = PHI <__first_9(7), &MEM[(void *)&testArray + 8B](2)>
  # __result_24 = PHI <__result_17(7), &testArray._M_elems(2)>
  _10 = MEM[(const double *)__first_22 + 8B];
  _12 = *__result_24;
  _13 = _10 - offset_2(D);
  _14 = ABS_EXPR <_13>;
  _15 = _12 - offset_2(D);
  _16 = ABS_EXPR <_15>;
  if (_14 < _16)
    goto <bb 5>; [50.00%]
  else
    goto <bb 8>; [50.00%]

  <bb 5> [42.50%]:
  goto <bb 3>; [100.00%]

  <bb 6> [15.00%]:
  __last.3_6 = (long int) __result_17;
  __first.4_5 = (long int) &testArray._M_elems;
  _4 = __last.3_6 - __first.4_5;
  _3 = _4 /[ex] 8;
  _7 = (int) _3;
  return _7;


So improving libstd++ would be nice here, resp.

  template<typename _ForwardIterator, typename _Compare>
    _GLIBCXX14_CONSTEXPR
    _ForwardIterator
    __min_element(_ForwardIterator __first, _ForwardIterator __last,
                  _Compare __comp)
    {
      if (__first == __last)
        return __first;
      _ForwardIterator __result = __first;
      while (++__first != __last)
        if (__comp(__first, __result))
          __result = __first;
      return __result;
    }

change that to

      if (__first == __last)
        return __first;
      ++_first;
      if (__first == __last)
        return __first;
      _ForwardIterator __result = __first;
      do
        {
          if (__comp(__first, __result))
            __result = __first;
        }
      whule (++_first != _last);
      return __result;

do-while style loops are always better for optimization.

Of course the issue is that loop-header copying doesn't do this transform,
it's do-while loop detection is a bit too simplistic...

Fixing that leads CH itself introduce that weird third PHI node :/

Index: gcc/tree-ssa-loop-ch.c
===================================================================
--- gcc/tree-ssa-loop-ch.c      (revision 258915)
+++ gcc/tree-ssa-loop-ch.c      (working copy)
@@ -176,6 +176,22 @@ do_while_loop_p (struct loop *loop)
                 "header contains just condition.\n", loop->num);
       return false;
     }
+
+  if (edge e = single_exit (loop))
+    {
+      if (e->src == loop->header
+         && (!single_pred_p (loop->latch)
+             || single_pred (loop->latch) != loop->header))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file,
+                    "Loop %i is not do-while loop: "
+                    "header contains single exit test but isn't latch.\n",
+                    loop->num);
+         return false;
+       }
+    }
+
   if (dump_file && (dump_flags & TDF_DETAILS))
     fprintf (dump_file, "Loop %i is do-while loop\n", loop->num);

@@ -409,7 +425,8 @@ unsigned int
 pass_ch::execute (function *fun)
 {
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-                      | LOOPS_HAVE_SIMPLE_LATCHES);
+                      | LOOPS_HAVE_SIMPLE_LATCHES
+                      | LOOPS_HAVE_RECORDED_EXITS);

   unsigned int res = copy_headers (fun);

[Bug libstdc++/85116] std::min_element does not optimize well with inlined predicate

Reply via email to