https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
--- Comment #62 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 34238 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34238&action=edit frontend patch So the issue is that the division is executed conditionally because it is placed after the loop entry check. The frontend generates if (step > 0) { if (to < from) goto exit_label; countm1 = (to - from) / step; } else { if (to > from) goto exit_label; countm1 = (from - to) / -step; } which ends up optimized to (as step is known to be positive here): if (to < from) { countm1 = (to - from) / step; ...loop... if we instead generate if (step > 0) { countm1 = (to - from) / step; if (to < from) goto exit_label; } else { countm1 = (from - to) / -step; if (to > from) goto exit_label; } the division can be hoisted out of the outer loop (at the cost of computing countm1 even when the loop is not entered -- but that would have happened if the desired transform happened anyway). Note that if step > 0 cannot be optimized we still have the same issue, so in theory we could generate if (step > 0) { tem1 = (to - from); tem2 = step; } else { tem1 = (from - to); tem2 = -step; } countm1 = tem1 / tem2; if (step > 0) { if (to < from) goto exit_label; } else { if (to > from) goto exit_label; } but hoisting the division with conditional computed operands is a lot more expensive (but possible for exactly two ops, and likely done). The runtime overhead from the above is one extra branch. If we do that then unfortunately jump threading undoes that change creating the first variant again.