Hi Paul,

On Tue, Feb 19, 2019 at 10:10:38AM -0600, Paul Clarke wrote:
> Incorrect type for interpreting the result from mfvsrd instruction leads
> to incorrect results.  Also, mfvsrd instruction only works as expected in
> 64-bit mode or for 32-bit quantities in 32-bit mode.  A more general,
> if slower, solution is needed for 32-bit mode.

You cannot use 64-bit registers in 32 bit mode on Linux, yes.

> @@ -1577,6 +1577,7 @@ _m_pminub (__m64 __A, __m64 __B)
>  extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
> __artificial__))
>  _mm_movemask_pi8 (__m64 __A)
>  {
> +#ifdef __powerpc64__
>    unsigned long long p =
>  #ifdef __LITTLE_ENDIAN__
>                           0x0008101820283038UL; // permute control for sign 
> bits
> @@ -1584,6 +1585,18 @@ _mm_movemask_pi8 (__m64 __A)
>                           0x3830282018100800UL; // permute control for sign 
> bits
>  #endif
>    return __builtin_bpermd (p, __A);
> +#else
> +  vector unsigned char A = (vector unsigned char)
> +    (vector unsigned long long) { 0, __A };
> +  vector unsigned char mask = {
> +    0x38, 0x30, 0x28, 0x20, 0x18, 0x10, 0x08, 0x00,
> +    0x78, 0x70, 0x68, 0x60, 0x58, 0x50, 0x48, 0x40
> +  };
> +  vector unsigned long long r = (vector unsigned long long)
> +    vec_bperm (A, mask);
> +  return r[0];
> +#endif

Wow, how inelegant.  Not that splitting the word into two and doing two
__builtin_bpermd will be much better :-/

Okay for trunk.  Thanks!


Segher

Reply via email to