https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116979
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
vect___r$_M_value$real_8.21_4 = MEM <vector(2) float> [(float *)a_2(D)];
vect___r$_M_value$real_8.22_24 = VEC_PERM_EXPR
<vect___r$_M_value$real_8.21_4, vect___r$_M_value$real_8.21_4, { 0, 0 }>;
vect___r$_M_value$real_8.30_35 = VEC_PERM_EXPR
<vect___r$_M_value$real_8.21_4, vect___r$_M_value$real_8.21_4, { 1, 1 }>;
vect__10.25_28 = MEM <vector(2) float> [(float *)b_3(D)];
vect__12.26_31 = vect___r$_M_value$real_8.22_24 * vect__10.25_28;
vect__10.34_40 = VEC_PERM_EXPR <vect__10.25_28, vect__10.25_28, { 1, 0 }>;
vect__13.35_43 = vect___r$_M_value$real_8.30_35 * vect__10.34_40;
vect__6.36_44 = .VEC_ADDSUB (vect__12.26_31, vect__13.35_43);
_46 = BIT_FIELD_REF <vect__6.36_44, 32, 32>;
_45 = BIT_FIELD_REF <vect__6.36_44, 32, 0>;
Yes looks like a cost issue.