https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65443
--- Comment #4 from vries at gcc dot gnu.org --- (In reply to vries from comment #3) > (In reply to vries from comment #2) > > The problem with this transformation is that '_20 + 1' might overflow, > > that's what the comment 'This may need some additional preconditioning in > > case NIT = ~0' refers to. > > AFAIU, we might also move 'ivtmp_6 = ivtmp_y + 1' to the end of bb4. That > way it's not triggered at loop entry, as before the transformation, > eliminating the need for '_20 + 1'. One thing I overlooked there: _20 = n_4(D) + 4294967295; If n == 0, we don't reach the loop. If n == 1, we reach the loop, and _20 == 0. And when we reach the loop condition from loop entry with ivtmp == 0, ivtmp < _20 will evaluate to false, and we won't even enter the loop. That's the problem we're trying to solve using '_20 + 1'. And moving 'ivtmp_6 = ivtmp_y + 1' to the end of bb4 doesn't fix that.