https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92712
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Before the first revision mentioned above *.optimized dump contained just t * v, the second one doesn't change anything in *.optimized and is a RTL costing matter. _4 = (unsigned int) t_1(D); _10 = _4 + 4294967295; _8 = (int) _10; _13 = v_3(D) * _8; x_5 = v_3(D) + _13; and can be seen even on simpler: int foo (int t, int v) { t = t - 1U; v *= t; return v + t; } which we don't optimize at GIMPLE level. We don't optimize even: int bar (int t, int v) { t = t - 1; v *= t; return v + t; } Rather than hoping it is optimized during combine (the change there was that while combining b=a-1 into c=b*d we attempted c=a*d-d we now attempt c=(a-1)*d and similarly for the 3 insn combination with e=c+d, where we attempted and succeeded to combine that into e=a*d while now we attempt and fail e=(a-1)*d+d: -Successfully matched this instruction: +Failed to match this instruction: (parallel [ (set (reg/v:SI 91 [ <retval> ]) - (mult:SI (reg/v:SI 92 [ t ]) + (plus:SI (mult:SI (plus:SI (reg/v:SI 92 [ t ]) + (const_int -1 [0xffffffffffffffff])) + (reg/v:SI 93 [ v ])) (reg/v:SI 93 [ v ]))) (clobber (reg:CC 17 flags)) ]) ), I think it would be useful to optimize this in match.pd, plus maybe teach simplify-rtx.c to handle this.