------- Comment #3 from rguenth at gcc dot gnu dot org 2010-06-05 10:56 ------- Ok. Fact is that no pass can move invariant store/load pairs. But that's pre-existing - the main issue is that the new SRA implementation ends up rematerializing the stores inside the loop!
Diff of pre-esra vs. esra: <bb 2>: D.4339_3 = a_2(D)->r; - va.f[0] = D.4339_3; + va$f$0_33 = D.4339_3; D.4340_4 = a_2(D)->g; - va.f[1] = D.4340_4; + va$f$1_32 = D.4340_4; D.4341_5 = a_2(D)->b; - va.f[2] = D.4341_5; - va.f[3] = 0.0; + va$f$2_31 = D.4341_5; + va$f$3_30 = 0.0; y_6 = 0; goto <bb 4>; @@ -504,6 +203,10 @@ tmpatt_37 = {D.4375_36, D.4375_36, D.4375_36, D.4375_36}; tmpatt_40 = tmpatt_37; tmpatt_15 = tmpatt_40; + va.f[0] = va$f$0_33; + va.f[1] = va$f$1_32; + va.f[2] = va$f$2_31; + va.f[3] = va$f$3_30; D.4347_16 = va.v; tmpatt_38 = __builtin_ia32_mulps (tmpatt_15, D.4347_16); tmpatt_41 = tmpatt_38; that's of course bad (and the scalarization in this particular case looks useless, too - the only use is an aggregate one, covering all stores). -- rguenth at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jamborm at gcc dot gnu dot | |org Component|regression |tree-optimization Summary|[4.5/4.6] Massive |[4.5/4.6 Regression] Massive |performance regression in |performance regression in |SSE code |SSE code due to SRA Target Milestone|--- |4.5.1 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423