On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
> >
> > __artificial__))
> >
> > _mm_move_sd (__m128d __A, __m128d __B)
> > {
> >
> > - return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > + return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> >
> > }
>
> If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> wonder if we should be explicit and use __builtin_shuffle instead of
> relying on some forwprop pass to transform it. Maybe not, just asking. And
> the answer need not even be the same for _mm_move_sd and _mm_move_ss.
I forgot. One of the things that makes using __builtin_shuffle ugly is that
__v4si as the suffle argument needs to be in _mm_move_ss, is declared
in emmintrin.h, but _mm_move_ss is in xmmintrin.h.
In general the gcc __builtin_shuffle syntax with the argument being a vector
is kind of ackward. At least for the declaring intrinsics, the clang still
where the permutator is extra argument is easier to deal with:
__builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
vs
__builtin_shuffle(a, b, 4, 0, 1, 2)