https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115709
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Blocks| |53947 CC| |rguenth at gcc dot gnu.org --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- I don't think this works, in the end we have to add even and odd elements to compute b[i] (real and imag parts). Yes, the multiplies could happen on unpermuted data. But your example assembly accumulates in a wrong way. GCC produces vect__4.10_77 = MEM <vector(4) double> [(double *)a_15(D) + ivtmp.33_113 * 2]; vect__4.11_79 = MEM <vector(4) double> [(double *)a_15(D) + 32B + ivtmp.33_113 * 2]; vect_perm_even_80 = VEC_PERM_EXPR <vect__4.10_77, vect__4.11_79, { 0, 2, 4, 6 }>; vect_perm_odd_81 = VEC_PERM_EXPR <vect__4.10_77, vect__4.11_79, { 1, 3, 5, 7 }>; vect_powmult_7.13_83 = vect_perm_odd_81 * vect_perm_odd_81; vect__10.14_84 = .FMA (vect_perm_even_80, vect_perm_even_80, vect_powmult_7.13_83); MEM <vector(4) double> [(double *)b_16(D) + ivtmp.33_113 * 1] = vect__10.14_84; Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations