------- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-16 09:44 ------- The middle-end presents the vectorizer with
<bb 3>: # i_13 = PHI <i_7(4), 0(2)> # ivtmp.26_8 = PHI <ivtmp.26_16(4), 1024(2)> D.1623_3 = xd[i_13]; sincostmp.21_1 = __builtin_cexpi (D.1623_3); D.1624_4 = IMAGPART_EXPR <sincostmp.21_1>; sd[i_13] = D.1624_4; D.1625_6 = REALPART_EXPR <sincostmp.21_1>; cd[i_13] = D.1625_6; i_7 = i_13 + 1; ivtmp.26_16 = ivtmp.26_8 - 1; if (ivtmp.26_16 != 0) goto <bb 4>; else goto <bb 5>; which has first of all complex types (they should be recognized as V2DF with vectorization factor 1, thus SLP-able). For the float case <bb 3>: # i_13 = PHI <i_7(4), 0(2)> # ivtmp.6_8 = PHI <ivtmp.6_16(4), 1024(2)> D.1610_3 = xf[i_13]; sincostmp.1_1 = __builtin_cexpif (D.1610_3); D.1611_4 = IMAGPART_EXPR <sincostmp.1_1>; sf[i_13] = D.1611_4; D.1612_6 = REALPART_EXPR <sincostmp.1_1>; cf[i_13] = D.1612_6; i_7 = i_13 + 1; ivtmp.6_16 = ivtmp.6_8 - 1; if (ivtmp.6_16 != 0) goto <bb 4>; else goto <bb 5>; they should be V2SF, thus use V4SF and vectorization factor 2. Still use SLP probably. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770