https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Last reconfirmed| |2018-11-30 Ever confirmed|0 |1 --- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I guess #include <x86intrin.h> __m128i foo (__m64 *x) { return _mm_movpi64_epi64 (*x); } is what intrinsic users would write for this case, and that is optimized properly: (insn 7 6 12 2 (set (reg:V2DI 87) (vec_concat:V2DI (mem:DI (reg:DI 89) [0 *x_3(D)+0 S8 A64]) (const_int 0 [0]))) "include/emmintrin.h":592:24 3956 {vec_concatv2di} (expr_list:REG_DEAD (reg:DI 89) (nil))) Similarly e.g. #include <x86intrin.h> __m256 foo (__m128 *x) { return _mm256_castps128_ps256 (*x); } which is conceptually closest to this case. Or #include <x86intrin.h> __m256i foo (__m128i *x) { return _mm256_castsi128_si256 (*x); } All these use something like: (insn 7 6 13 2 (set (reg:V8SI 87) (unspec:V8SI [ (mem:V4SI (reg:DI 90) [0 *x_3(D)+0 S16 A128]) ] UNSPEC_CAST)) "include/avxintrin.h":1484:20 4813 {avx_si256_si} (expr_list:REG_DEAD (reg:DI 90) (nil))) Not really sure why UNSPEC_CAST rather than representing it with something natural like VEC_CONCAT of nonimmediate_operand and const0_operand.