https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10520
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Here is the current IR at optimized: <bb 3> [local count: 958878296]: # n_in_42 = PHI <n_in_31(4), 0(2)> # n_out_43 = PHI <n_out_32(4), 0(2)> # n_in1_44 = PHI <n_in1_33(4), 1(2)> # n_out1_45 = PHI <n_out1_34(4), 1(2)> n_in.0_1 = (int) n_in_42; _3 = n_in.0_1 w* 4; _4 = buf_fast_28(D) + _3; n_out.1_5 = (int) n_out_43; _7 = n_out.1_5 w* 4; _8 = buf_fast_28(D) + _7; _9 = *_4; *_8 = _9; n_in1.2_10 = (int) n_in1_44; _12 = n_in1.2_10 w* 4; _13 = buf_fast_28(D) + _12; n_out1.3_14 = (int) n_out1_45; _16 = n_out1.3_14 w* 4; _17 = buf_fast_28(D) + _16; _18 = *_13; *_17 = _18; n_in_31 = n_in_42 + 4; n_out_32 = n_out_43 + 2; n_in1_33 = n_in1_44 + 4; n_out1_34 = n_out1_45 + 2; _24 = MAX_EXPR <n_in_31, n_out_32>; if (_24 < _tmp0_27(D)) goto <bb 4>; [94.50%] else goto <bb 5>; [5.50%] <bb 4> [local count: 906139990]: _25 = MAX_EXPR <n_in1_33, n_out1_34>; if (_25 < _tmp0_27(D)) goto <bb 3>; [94.50%] else goto <bb 5>; [5.50%] We should figure out that: _24 = MAX_EXPR <n_in_31, n_out_32>; Is just as n_in_31 is being incremented by 4 each time through the loop while n_out_32 only by 2 _24 = n_in_31 And: _25 = MAX_EXPR <n_in1_33, n_out1_34>; Is just (same logic as above) _25 = n_in1_33 And then we have: if (n_in_31 < _tmp0_27(D)) goto <bb 4>; [94.50%] else goto <bb 5>; [5.50%] <bb 4> [local count: 906139990]: if (n_in1_33 < _tmp0_27(D)) goto <bb 3>; [94.50%] else goto <bb 5>; [5.50%] Where n_in1_33 = n_in_31+1 There for we should reduce it to just: <bb 4> [local count: 906139990]: if (n_in1_33 < _tmp0_27(D)) goto <bb 3>; [94.50%] else goto <bb 5>; [5.50%] (hopefully I did this correctly). Of course this depends on if they are not going to be overflowed .... Which we know they won't because they are being used for pointer accesses.