https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100257
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Looks like a few missed optimizations at the tree level (and a target issue of the store): memcpy (&pixel, src_33, 6); _1 = pixel.b; _2 = pixel.g; _3 = pixel.r; val_2.0_21 = (short int) _1; val_1.1_22 = (short int) _2; val_0.2_23 = (short int) _3; _24 = {val_0.2_23, val_1.1_22, val_2.0_21, 0, 0, 0, 0, 0}; _25 = __builtin_ia32_vcvtph2ps (_24); _14 = BIT_FIELD_REF <_25, 64, 0>; _28 = BIT_FIELD_REF <_25, 32, 64>; MEM <vector(2) float> [(float *)dst_34] = _14; MEM[(float *)dst_34 + 8B] = _28; MEM[(float *)dst_34 + 12B] = 1.0e+0; The store issue is now PR 100258. This is more about the missed optimization of the first part, the conversion.