Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Kirill Yukhin
Hello Marc, Uroš,
On 10 Nov 21:33, Uros Bizjak wrote:
 On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:
  Hello,
 
and == for integer vectors of size 128. I was surprised not to find
  _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
  512, there is no corresponding intrinsic, there are only _mask versions that
  return a mask.
 
 Let's ask Kirill (CC'd) about missing intrinsics.
We have no `_mm_cmplt_epi64' intrinsic because there's no such instruction in
Intel ISA. All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from
AVX-512*.
VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we
have intrinsics for that (config/i386/avx512fintrin.h):
extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)

--
Thanks, K




Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Marc Glisse

On Tue, 11 Nov 2014, Kirill Yukhin wrote:


Hello Marc, Uroš,
On 10 Nov 21:33, Uros Bizjak wrote:

On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:

Hello,

  and == for integer vectors of size 128. I was surprised not to find
_mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
512, there is no corresponding intrinsic, there are only _mask versions that
return a mask.


Let's ask Kirill (CC'd) about missing intrinsics.
We have no `_mm_cmplt_epi64' intrinsic because there's no such 
instruction in Intel ISA.


We have _mm_cmplt_epi32 without a corresponding instruction though ;-)
(yes, it is useless)


All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from
AVX-512*.
VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we
have intrinsics for that (config/i386/avx512fintrin.h):
extern __inline __mmask16
__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y)


--
Marc Glisse


Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-11 Thread Kirill Yukhin
On 11 Nov 10:28, Marc Glisse wrote:
 On Tue, 11 Nov 2014, Kirill Yukhin wrote:
 
 Hello Marc, Uroš,
 On 10 Nov 21:33, Uros Bizjak wrote:
 On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,
 
   and == for integer vectors of size 128. I was surprised not to find
 _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
 512, there is no corresponding intrinsic, there are only _mask versions 
 that
 return a mask.
 
 Let's ask Kirill (CC'd) about missing intrinsics.
 We have no `_mm_cmplt_epi64' intrinsic because there's no such
 instruction in Intel ISA.
 
 We have _mm_cmplt_epi32 without a corresponding instruction though ;-)
 (yes, it is useless)
Right, but not in official SDM [1]. I believe this extra intrinsics were added
for compatibility w/ ICC which also features it.
 
 -- 
 Marc Glisse

[1] - 
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

--
Thanks, K


Re: [x86, 6/n] Replace builtins with vector extensions

2014-11-10 Thread Uros Bizjak
On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote:
 Hello,

   and == for integer vectors of size 128. I was surprised not to find
 _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not
 512, there is no corresponding intrinsic, there are only _mask versions that
 return a mask.

Let's ask Kirill (CC'd) about missing intrinsics.

 For gcc-5, we should stop either after 5/n or after 7/n (avx2 version of
 6/n).

 Regtested with 5/n.

 2014-11-10  Marc Glisse  marc.gli...@inria.fr

 * config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16,
 _mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32,
 _mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector
 extensions instead of builtins.
 * config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64):
 Likewise.

OK.

Thanks,
Uros.


[x86, 6/n] Replace builtins with vector extensions

2014-11-09 Thread Marc Glisse

Hello,

  and == for integer vectors of size 128. I was surprised not to find 
_mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but 
not 512, there is no corresponding intrinsic, there are only _mask 
versions that return a mask.


For gcc-5, we should stop either after 5/n or after 7/n (avx2 version of 
6/n).


Regtested with 5/n.

2014-11-10  Marc Glisse  marc.gli...@inria.fr

* config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16,
_mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32,
_mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector
extensions instead of builtins.
* config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64):
Likewise.


--
Marc GlisseIndex: emmintrin.h
===
--- emmintrin.h (revision 217263)
+++ emmintrin.h (working copy)
@@ -1268,69 +1268,69 @@ _mm_or_si128 (__m128i __A, __m128i __B)
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_xor_si128 (__m128i __A, __m128i __B)
 {
   return (__m128i) ((__v2du)__A ^ (__v2du)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpeq_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpeqb128 ((__v16qi)__A, (__v16qi)__B);
+  return (__m128i) ((__v16qi)__A == (__v16qi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpeq_epi16 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpeqw128 ((__v8hi)__A, (__v8hi)__B);
+  return (__m128i) ((__v8hi)__A == (__v8hi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpeq_epi32 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpeqd128 ((__v4si)__A, (__v4si)__B);
+  return (__m128i) ((__v4si)__A == (__v4si)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmplt_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtb128 ((__v16qi)__B, (__v16qi)__A);
+  return (__m128i) ((__v16qi)__A  (__v16qi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmplt_epi16 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtw128 ((__v8hi)__B, (__v8hi)__A);
+  return (__m128i) ((__v8hi)__A  (__v8hi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmplt_epi32 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtd128 ((__v4si)__B, (__v4si)__A);
+  return (__m128i) ((__v4si)__A  (__v4si)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpgt_epi8 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtb128 ((__v16qi)__A, (__v16qi)__B);
+  return (__m128i) ((__v16qi)__A  (__v16qi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpgt_epi16 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtw128 ((__v8hi)__A, (__v8hi)__B);
+  return (__m128i) ((__v8hi)__A  (__v8hi)__B);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpgt_epi32 (__m128i __A, __m128i __B)
 {
-  return (__m128i)__builtin_ia32_pcmpgtd128 ((__v4si)__A, (__v4si)__B);
+  return (__m128i) ((__v4si)__A  (__v4si)__B);
 }
 
 #ifdef __OPTIMIZE__
 extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_extract_epi16 (__m128i const __A, int const __N)
 {
   return (unsigned short) __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, __N);
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
Index: smmintrin.h
===
--- smmintrin.h (revision 217259)
+++ smmintrin.h (working copy)
@@ -260,21 +260,21 @@ _mm_dp_pd (__m128d __X, __m128d __Y, con
 #define _mm_dp_pd(X, Y, M) \
   ((__m128d) __builtin_ia32_dppd ((__v2df)(__m128d)(X),
\
  (__v2df)(__m128d)(Y), (int)(M)))
 #endif
 
 /* Packed integer 64-bit comparison, zeroing or filling with ones
corresponding parts of result.  */
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_cmpeq_epi64 (__m128i __X, __m128i __Y)
 {
-  return (__m128i) __builtin_ia32_pcmpeqq ((__v2di)__X, (__v2di)__Y);
+  return (__m128i) ((__v2di)__X == (__v2di)__Y);
 }
 
 /*  Min/max packed integer instructions.  */
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_min_epi8 (__m128i __X, __m128i __Y)
 {
   return (__m128i) __builtin_ia32_pminsb128 ((__v16qi)__X, (__v16qi)__Y);