http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54717
--- Comment #7 from Richard Guenther <rguenth at gcc dot gnu.org> 2012-09-27 10:43:00 UTC --- I can reproduce the slowdown. Code differences appear first in early FRE, good ones like: - _84 = &*a_56(D)[_83]; + _84 = _75; which was the intention of the patch (and that is also likely the reason for the inliner code size/time estimate changes). It would be nice to get a smaller testcase for the PRE change you quote. Unfortunately the big slowdown does not reproduce with -fno-inline which makes it harder to track down. The real differences do appear in PRE, some of the kind you quote and some where we perform more PRE like: @@ -19695,11 +19720,13 @@ <bb 289>: pretmp_ = stride.258_ * _; pretmp_ = offset.259_ + pretmp_; + pretmp_ = stride.258_ * _; + pretmp_ = offset.259_ + pretmp_; <bb 123>: # i_ = PHI <1(289), i_(292)> - _ = stride.258_ * _; - _ = _ + offset.259_; + _ = pretmp_; + _ = pretmp_; Aside from that the differences you quote result in less if-conversion applied: # ival2_ = PHI <ival2_(39), ival2_(41)> # ival2_ = PHI <ival2_(39), ival2_(41)> - # prephitmp_ = PHI <pretmp_(39), prephitmp_(41)> _ = (integer(kind=8)) ival2_; _ = _ + -1; _ = *xxtrt_(D)[_]; - ival2_ = _ < prephitmp_ ? ival2_ : ival2_; - prephitmp_ = MIN_EXPR <_, prephitmp_>; + _ = (integer(kind=8)) ival2_; + _ = _ + -1; + _ = *xxtrt_(D)[_]; + ival2_ = _ < _ ? ival2_ : ival2_; but that does not result in any extra or missed vectorization. Btw, dropping to -O2 also fixes the regression. So, it's not at all clear what we are chasing here (the PRE seems to be a partial antic expression).