https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88278

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2018-11-30
     Ever confirmed|0                           |1

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
I guess
#include <x86intrin.h>

__m128i
foo (__m64 *x)
{
  return _mm_movpi64_epi64 (*x);
}
is what intrinsic users would write for this case, and that is optimized
properly:
(insn 7 6 12 2 (set (reg:V2DI 87)
        (vec_concat:V2DI (mem:DI (reg:DI 89) [0 *x_3(D)+0 S8 A64])
            (const_int 0 [0]))) "include/emmintrin.h":592:24 3956
{vec_concatv2di}
     (expr_list:REG_DEAD (reg:DI 89)
        (nil)))

Similarly e.g.
#include <x86intrin.h>

__m256
foo (__m128 *x)
{
  return _mm256_castps128_ps256 (*x);
}
which is conceptually closest to this case.
Or
#include <x86intrin.h>

__m256i
foo (__m128i *x)
{
  return _mm256_castsi128_si256 (*x);
}

All these use something like:
(insn 7 6 13 2 (set (reg:V8SI 87)
        (unspec:V8SI [
                (mem:V4SI (reg:DI 90) [0 *x_3(D)+0 S16 A128])
            ] UNSPEC_CAST)) "include/avxintrin.h":1484:20 4813 {avx_si256_si}
     (expr_list:REG_DEAD (reg:DI 90)
        (nil)))
Not really sure why UNSPEC_CAST rather than representing it with something
natural like VEC_CONCAT of nonimmediate_operand and const0_operand.

Reply via email to