Re: [PATCH][x86] Match movss and movsd "blend" instructions

Marc Glisse Thu, 02 Aug 2018 14:16:38 -0700

On Thu, 2 Aug 2018, Allan Sandfeld Jensen wrote:

I forgot. One of the things that makes using __builtin_shuffle ugly is that
__v4si  as the suffle argument needs to be in _mm_move_ss, is declared
in emmintrin.h, but _mm_move_ss is in xmmintrin.h.

__v4si is some internal detail, I don't see much issue with moving it toxmmintrin.h if you want to use it there.

In general the gcc __builtin_shuffle syntax with the argument being a vector
is kind of ackward. At least for the declaring intrinsics, the clang still
where the permutator is extra argument is easier to deal with:
__builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
vs
__builtin_shuffle(a, b, 4, 0, 1, 2)


__builtin_shufflevector IIRC

The question is what users expect and get when they use -O0 with intrinsics?

Here is the version with __builtin_shuffle. It might be more expectable -O0,
but it is also uglier.


I am not convinced -O0 is very important.

If you start extending your approach to _mm_add_sd and others, while oneinstruction is easy enough to recognize, if we put several in a row, theywill be partially simplified and may become harder to recognize.{ x*(y+v[0]-z), v[1] } requires that you notice that the upper part ofthis vector is v[1], i.e. the upper part of a vector whose lower partappears somewhere in the arbitrarily complex expression for the lowerpart of the result. And you then have to propagate the fact that you aredoing vector operations all the way back to v[0].


I don't have a strong opinion on what the best approach is.

--
Marc Glisse

Re: [PATCH][x86] Match movss and movsd "blend" instructions

Reply via email to