https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108677
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords|wrong-code |missed-optimization --- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- >gcc vectorize the loop even if a dependency is present...[1] What dependency? TrigArr cannot point to tp. Also it is just doing just doing SLP, the vectorization is fine here even. It is only doing basic block form. IR: _2 = jp_31 + 4294967295; _3 = (long unsigned int) _2; _4 = _3 * 16; _5 = TrigArr.0_1 + _4; vect__22.19_48 = MEM <vector(2) double> [(double *)_5]; vect__22.23_43 = VEC_PERM_EXPR <vect__22.19_48, vect__22.19_48, { 1, 0 }>; vect__20.15_54 = MEM <const vector(2) double> [(double *)tp_14(D)]; vect__20.16_53 = VEC_PERM_EXPR <vect__20.15_54, vect__20.15_54, { 0, 0 }>; vect__20.27_38 = VEC_PERM_EXPR <vect__20.15_54, vect__20.15_54, { 1, 1 }>; vect__27.28_37 = vect__20.27_38 * vect__22.23_43; vect__57.29_36 = .VEC_FMADDSUB (vect__20.16_53, vect__22.19_48, vect__27.28_37); _7 = (long unsigned int) jp_31; _8 = _7 * 16; _9 = TrigArr.0_1 + _8; MEM <vector(2) double> [(double *)_9] = vect__57.29_36; _5 is &TrigArr[jp-1], _9 is &TrigArr[jp] This looks fine to me. As not doing SLP without the copy constructor, it seems like the IR changes and forced the load from tp_14 outside of the loop which caused SLP not do to the load there ...