Hi Paul, On Tue, Feb 19, 2019 at 10:10:38AM -0600, Paul Clarke wrote: > Incorrect type for interpreting the result from mfvsrd instruction leads > to incorrect results. Also, mfvsrd instruction only works as expected in > 64-bit mode or for 32-bit quantities in 32-bit mode. A more general, > if slower, solution is needed for 32-bit mode.
You cannot use 64-bit registers in 32 bit mode on Linux, yes. > @@ -1577,6 +1577,7 @@ _m_pminub (__m64 __A, __m64 __B) > extern __inline int __attribute__((__gnu_inline__, __always_inline__, > __artificial__)) > _mm_movemask_pi8 (__m64 __A) > { > +#ifdef __powerpc64__ > unsigned long long p = > #ifdef __LITTLE_ENDIAN__ > 0x0008101820283038UL; // permute control for sign > bits > @@ -1584,6 +1585,18 @@ _mm_movemask_pi8 (__m64 __A) > 0x3830282018100800UL; // permute control for sign > bits > #endif > return __builtin_bpermd (p, __A); > +#else > + vector unsigned char A = (vector unsigned char) > + (vector unsigned long long) { 0, __A }; > + vector unsigned char mask = { > + 0x38, 0x30, 0x28, 0x20, 0x18, 0x10, 0x08, 0x00, > + 0x78, 0x70, 0x68, 0x60, 0x58, 0x50, 0x48, 0x40 > + }; > + vector unsigned long long r = (vector unsigned long long) > + vec_bperm (A, mask); > + return r[0]; > +#endif Wow, how inelegant. Not that splitting the word into two and doing two __builtin_bpermd will be much better :-/ Okay for trunk. Thanks! Segher