https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96031
--- Comment #2 from bin cheng <amker at gcc dot gnu.org> --- Interesting case, I see two issues in generated asm. One is the unnecessary bitwise and, the other is allocating different registers for induction variable and the base address. However, looks like neither issue is caused by ivopts. Check the dump: 431 <bb 4> [local count: 105119324]: 432 _12 = (short unsigned int) step_8(D); 433 ivtmp.10_11 = (unsigned long) &array; 434 _18 = len_7(D) + 4294967294; 435 _19 = (unsigned long) _18; 436 _20 = _19 * 2; 437 _21 = (unsigned long) &array; 438 _22 = _21 + 2; 439 _23 = _20 + _22; 440 441 <bb 5> [local count: 955630224]: 442 # ivtmp.8_15 = PHI <_12(4), ivtmp.8_5(6)> 443 # ivtmp.10_16 = PHI <ivtmp.10_11(4), ivtmp.10_4(6)> 444 _3 = ivtmp.8_15; 445 _2 = (void *) ivtmp.10_16; 446 MEM[base: _2, offset: 2B] = _3; 447 ivtmp.8_5 = ivtmp.8_15 + _12; 448 ivtmp.10_4 = ivtmp.10_16 + 2; 449 if (ivtmp.10_4 != _23) 450 goto <bb 6>; [89.00%] 451 else 452 goto <bb 8>; [11.00%] 453 454 <bb 8> [local count: 105119324]: 455 goto <bb 3>; [100.00%] 456 457 <bb 6> [local count: 850510900]: 458 goto <bb 5>; [100.00%] As far as I can tell, it's optimal. The register allocation issue is introduced by rtl PRE, apparently we should not save the add 2 instruction in the last iteration with a false dependence which is more harmful. As for ivopt, I can see a minor improvement by replacing != exit condition with <=, thus saving add 2 instruction computing _22, which happens to "disable" the wrong PRE transformation. Ah, I see it's already classified as rtl-optimization. Thanks