------- Comment #3 from rguenth at gcc dot gnu dot org  2010-06-05 10:56 -------
Ok.  Fact is that no pass can move invariant store/load pairs.  But that's
pre-existing - the main issue is that the new SRA implementation ends up
rematerializing the stores inside the loop!

Diff of pre-esra vs. esra:

 <bb 2>:
   D.4339_3 = a_2(D)->r;
-  va.f[0] = D.4339_3;
+  va$f$0_33 = D.4339_3;
   D.4340_4 = a_2(D)->g;
-  va.f[1] = D.4340_4;
+  va$f$1_32 = D.4340_4;
   D.4341_5 = a_2(D)->b;
-  va.f[2] = D.4341_5;
-  va.f[3] = 0.0;
+  va$f$2_31 = D.4341_5;
+  va$f$3_30 = 0.0;
   y_6 = 0;
   goto <bb 4>;

@@ -504,6 +203,10 @@
   tmpatt_37 = {D.4375_36, D.4375_36, D.4375_36, D.4375_36};
   tmpatt_40 = tmpatt_37;
   tmpatt_15 = tmpatt_40;
+  va.f[0] = va$f$0_33;
+  va.f[1] = va$f$1_32;
+  va.f[2] = va$f$2_31;
+  va.f[3] = va$f$3_30;
   D.4347_16 = va.v;
   tmpatt_38 = __builtin_ia32_mulps (tmpatt_15, D.4347_16);
   tmpatt_41 = tmpatt_38;

that's of course bad (and the scalarization in this particular case looks
useless, too - the only use is an aggregate one, covering all stores).


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu dot
                   |                            |org
          Component|regression                  |tree-optimization
            Summary|[4.5/4.6] Massive           |[4.5/4.6 Regression] Massive
                   |performance regression in   |performance regression in
                   |SSE code                    |SSE code due to SRA
   Target Milestone|---                         |4.5.1


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44423

Reply via email to