http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48329
Richard Guenther <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Keywords|openmp | Last reconfirmed| |2011.03.29 10:31:56 Component|middle-end |tree-optimization CC| |rguenth at gcc dot gnu.org Ever Confirmed|0 |1 Summary|Program takes twice as long |Missed vectorization of |*without* -fopenmp than |reduction due to PRE |with 1 OpenMP thread | --- Comment #1 from Richard Guenther <rguenth at gcc dot gnu.org> 2011-03-29 10:31:56 UTC --- We vectorize the reduction if the function is outlined. I suppose sth confuses the vectorizer in the non-OMP path. Yep, it's PRE, so try -fno-tree-pre: <bb 3>: # i_1 = PHI <1(2), i_22(4)> # sum_2 = PHI <0.0(2), sum_20(4)> # prephitmp.9_50 = PHI <5.66893424036281234980410020432668056299176519904892395524e-20(2), D.1586_48(4)> # ivtmp.12_10 = PHI <2100000000(2), ivtmp.12_11(4)> D.1574_17 = prephitmp.9_50 + 1.0e+0; D.1575_18 = ((D.1574_17)); D.1576_19 = 4.0e+0 / D.1575_18; sum_20 = D.1576_19 + sum_2; ivtmp.12_11 = ivtmp.12_10 - 1; if (ivtmp.12_11 == 0) goto <bb 5>; else goto <bb 4>; <bb 4>: i_22 = i_1 + 1; pretmp.8_44 = (real(kind=8)) i_22; pretmp.8_45 = pretmp.8_44 - 5.0e-1; pretmp.8_46 = ((pretmp.8_45)); pretmp.8_47 = pretmp.8_46 * 4.76190476190476200439314681013558416822206709184683859348e-10; D.1586_48 = __builtin_pow (pretmp.8_47, 2.0e+0); goto <bb 3>; is not detected as reduction. Probably not only because, but at least also because of the latch block not being empty.