https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106688
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #0) > It looks as if going out of SSA places in the loop a register copy > corresponding to a phi node which is outside of the loop. Strangely, RTL > optimizations do not clean it up either. No it is IVOPTs that places the copy inside the loop: <bb 5> [local count: 1006632961]: # buf_25 = PHI <buf_21(5), buf_22(4)> # vs1_28 = PHI <vs1_20(5), { 0, 0, 0, 0, 0, 0, 0, 0 }(4)> __asm__("pmovzxbw %1, %0" : "=x" b_17 : "m" MEM[(i8v8 *)buf_25]); vs1_18 = b_17 + vs1_28; _15 = (unsigned long) buf_25; _14 = _15 + 8; _2 = (const unsigned char *) _14; __asm__("pmovzxbw %1, %0" : "=x" b_19 : "m" MEM[(i8v8 *)_2]); vs1_20 = vs1_18 + b_19; buf_21 = buf_25 + 16; _33 = (const unsigned char *) ivtmp.18_7; if (buf_21 != _33) goto <bb 5>; [93.75%] else goto <bb 6>; [6.25%] Notice the cast is of ivtmp.18_7 assigned to _33 here. The cast is an invariant. I don't know why LIM4 didn't pull out the invariant.