https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116
Richard Biener changed:
What|Removed |Added
Keywords||missed-optimization
Status|UNCONFIRMED |NEW
Last reconfirmed||2018-03-29
CC||rguenth at gcc dot gnu.org
Component|c++ |libstdc++
Ever confirmed|0 |1
--- Comment #1 from Richard Biener ---
Eventually std::min_element can be improved directly but this is a missed
optimization as well. A similar C testcase is optimized by PRE:
int getMinIdx (double *a, int n, double off)
{
double *min = a;
for (double *first = a; first < a + n; first++)
{
if (__builtin_fabs (*min - off) > __builtin_fabs (*first - off))
min = first;
}
return min - a;
}
while the C++ testcase ends up in PRE like the following where likely
the extra PHI for __first_2[23] confuse things. DOM introduces that
by threading the first iteration comparison, avoiding that doesn't seem
to help though. Ah, it's because the loop isn't header-copied, it says
"Loop 1 is do-while loop" but given the load from __result happens after
the exit test it isn't partially redundant across the backedge.
[15.00%]:
goto ; [100.00%]
[42.50%]:
[85.00%]:
# __result_17 = PHI <__result_24(8), __first_23(5)>
__first_9 = __first_23 + 8;
if (__first_9 != [(void *) + 8000B])
goto ; [82.35%]
else
goto ; [17.65%]
[70.00%]:
[85.00%]:
# __first_22 = PHI <__first_23(7), _M_elems(2)>
# __first_23 = PHI <__first_9(7), [(void *) + 8B](2)>
# __result_24 = PHI <__result_17(7), _M_elems(2)>
_10 = MEM[(const double *)__first_22 + 8B];
_12 = *__result_24;
_13 = _10 - offset_2(D);
_14 = ABS_EXPR <_13>;
_15 = _12 - offset_2(D);
_16 = ABS_EXPR <_15>;
if (_14 < _16)
goto ; [50.00%]
else
goto ; [50.00%]
[42.50%]:
goto ; [100.00%]
[15.00%]:
__last.3_6 = (long int) __result_17;
__first.4_5 = (long int) _M_elems;
_4 = __last.3_6 - __first.4_5;
_3 = _4 /[ex] 8;
_7 = (int) _3;
return _7;
So improving libstd++ would be nice here, resp.
template
_GLIBCXX14_CONSTEXPR
_ForwardIterator
__min_element(_ForwardIterator __first, _ForwardIterator __last,
_Compare __comp)
{
if (__first == __last)
return __first;
_ForwardIterator __result = __first;
while (++__first != __last)
if (__comp(__first, __result))
__result = __first;
return __result;
}
change that to
if (__first == __last)
return __first;
++_first;
if (__first == __last)
return __first;
_ForwardIterator __result = __first;
do
{
if (__comp(__first, __result))
__result = __first;
}
whule (++_first != _last);
return __result;
do-while style loops are always better for optimization.
Of course the issue is that loop-header copying doesn't do this transform,
it's do-while loop detection is a bit too simplistic...
Fixing that leads CH itself introduce that weird third PHI node :/
Index: gcc/tree-ssa-loop-ch.c
===
--- gcc/tree-ssa-loop-ch.c (revision 258915)
+++ gcc/tree-ssa-loop-ch.c (working copy)
@@ -176,6 +176,22 @@ do_while_loop_p (struct loop *loop)
"header contains just condition.\n", loop->num);
return false;
}
+
+ if (edge e = single_exit (loop))
+{
+ if (e->src == loop->header
+ && (!single_pred_p (loop->latch)
+ || single_pred (loop->latch) != loop->header))
+ {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file,
+"Loop %i is not do-while loop: "
+"header contains single exit test but isn't latch.\n",
+loop->num);
+ return false;
+ }
+}
+
if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Loop %i is do-while loop\n", loop->num);
@@ -409,7 +425,8 @@ unsigned int
pass_ch::execute (function *fun)
{
loop_optimizer_init (LOOPS_HAVE_PREHEADERS
- | LOOPS_HAVE_SIMPLE_LATCHES);
+ | LOOPS_HAVE_SIMPLE_LATCHES
+ | LOOPS_HAVE_RECORDED_EXITS);
unsigned int res = copy_headers (fun);