On Thu, 2 Aug 2018, Allan Sandfeld Jensen wrote:

I forgot. One of the things that makes using __builtin_shuffle ugly is that
__v4si  as the suffle argument needs to be in _mm_move_ss, is declared
in emmintrin.h, but _mm_move_ss is in xmmintrin.h.

__v4si is some internal detail, I don't see much issue with moving it to xmmintrin.h if you want to use it there.

In general the gcc __builtin_shuffle syntax with the argument being a vector
is kind of ackward. At least for the declaring intrinsics, the clang still
where the permutator is extra argument is easier to deal with:
__builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
vs
__builtin_shuffle(a, b, 4, 0, 1, 2)

__builtin_shufflevector IIRC


The question is what users expect and get when they use -O0 with intrinsics?

Here is the version with __builtin_shuffle. It might be more expectable -O0,
but it is also uglier.

I am not convinced -O0 is very important.

If you start extending your approach to _mm_add_sd and others, while one instruction is easy enough to recognize, if we put several in a row, they will be partially simplified and may become harder to recognize. { x*(y+v[0]-z), v[1] } requires that you notice that the upper part of this vector is v[1], i.e. the upper part of a vector whose lower part appears somewhere in the arbitrarily complex expression for the lower part of the result. And you then have to propagate the fact that you are doing vector operations all the way back to v[0].

I don't have a strong opinion on what the best approach is.

--
Marc Glisse

Reply via email to