https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85116
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Status|UNCONFIRMED |NEW Last reconfirmed| |2018-03-29 CC| |rguenth at gcc dot gnu.org Component|c++ |libstdc++ Ever confirmed|0 |1 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Eventually std::min_element can be improved directly but this is a missed optimization as well. A similar C testcase is optimized by PRE: int getMinIdx (double *a, int n, double off) { double *min = a; for (double *first = a; first < a + n; first++) { if (__builtin_fabs (*min - off) > __builtin_fabs (*first - off)) min = first; } return min - a; } while the C++ testcase ends up in PRE like the following where likely the extra PHI for __first_2[23] confuse things. DOM introduces that by threading the first iteration comparison, avoiding that doesn't seem to help though. Ah, it's because the loop isn't header-copied, it says "Loop 1 is do-while loop" but given the load from __result happens after the exit test it isn't partially redundant across the backedge. <bb 2> [15.00%]: goto <bb 4>; [100.00%] <bb 8> [42.50%]: <bb 3> [85.00%]: # __result_17 = PHI <__result_24(8), __first_23(5)> __first_9 = __first_23 + 8; if (__first_9 != &MEM[(void *)&testArray + 8000B]) goto <bb 7>; [82.35%] else goto <bb 6>; [17.65%] <bb 7> [70.00%]: <bb 4> [85.00%]: # __first_22 = PHI <__first_23(7), &testArray._M_elems(2)> # __first_23 = PHI <__first_9(7), &MEM[(void *)&testArray + 8B](2)> # __result_24 = PHI <__result_17(7), &testArray._M_elems(2)> _10 = MEM[(const double *)__first_22 + 8B]; _12 = *__result_24; _13 = _10 - offset_2(D); _14 = ABS_EXPR <_13>; _15 = _12 - offset_2(D); _16 = ABS_EXPR <_15>; if (_14 < _16) goto <bb 5>; [50.00%] else goto <bb 8>; [50.00%] <bb 5> [42.50%]: goto <bb 3>; [100.00%] <bb 6> [15.00%]: __last.3_6 = (long int) __result_17; __first.4_5 = (long int) &testArray._M_elems; _4 = __last.3_6 - __first.4_5; _3 = _4 /[ex] 8; _7 = (int) _3; return _7; So improving libstd++ would be nice here, resp. template<typename _ForwardIterator, typename _Compare> _GLIBCXX14_CONSTEXPR _ForwardIterator __min_element(_ForwardIterator __first, _ForwardIterator __last, _Compare __comp) { if (__first == __last) return __first; _ForwardIterator __result = __first; while (++__first != __last) if (__comp(__first, __result)) __result = __first; return __result; } change that to if (__first == __last) return __first; ++_first; if (__first == __last) return __first; _ForwardIterator __result = __first; do { if (__comp(__first, __result)) __result = __first; } whule (++_first != _last); return __result; do-while style loops are always better for optimization. Of course the issue is that loop-header copying doesn't do this transform, it's do-while loop detection is a bit too simplistic... Fixing that leads CH itself introduce that weird third PHI node :/ Index: gcc/tree-ssa-loop-ch.c =================================================================== --- gcc/tree-ssa-loop-ch.c (revision 258915) +++ gcc/tree-ssa-loop-ch.c (working copy) @@ -176,6 +176,22 @@ do_while_loop_p (struct loop *loop) "header contains just condition.\n", loop->num); return false; } + + if (edge e = single_exit (loop)) + { + if (e->src == loop->header + && (!single_pred_p (loop->latch) + || single_pred (loop->latch) != loop->header)) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Loop %i is not do-while loop: " + "header contains single exit test but isn't latch.\n", + loop->num); + return false; + } + } + if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, "Loop %i is do-while loop\n", loop->num); @@ -409,7 +425,8 @@ unsigned int pass_ch::execute (function *fun) { loop_optimizer_init (LOOPS_HAVE_PREHEADERS - | LOOPS_HAVE_SIMPLE_LATCHES); + | LOOPS_HAVE_SIMPLE_LATCHES + | LOOPS_HAVE_RECORDED_EXITS); unsigned int res = copy_headers (fun);