On Mon, Apr 24, 2017 at 03:15:11PM +0200, Allan Sandfeld Jensen wrote:
> Okay, I have tried that, and I also made it more obvious how the intrinsics 
> can become non-immediate shift.
> 

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index b58f5050db0..b9406550fc5 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2017-04-22  Allan Sandfeld Jensen  <sandf...@kde.org>
> +
> +     * config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*):
> +     Use vector intrinstics instead of builtins.
> +     * config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*):
> +     Use vector intrinstics instead of builtins.
> +
>  2017-04-21  Uros Bizjak  <ubiz...@gmail.com>
>  
>       * config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv<mode>.
> diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h
> index 82f170a3d61..64ba52b244e 100644
> --- a/gcc/config/i386/avx2intrin.h
> +++ b/gcc/config/i386/avx2intrin.h
> @@ -665,13 +665,6 @@ _mm256_slli_si256 (__m256i __A, const int __N)
>  
>  extern __inline __m256i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm256_slli_epi16 (__m256i __A, int __B)
> -{
> -  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
> -}
> -
> -extern __inline __m256i
> -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm256_sll_epi16 (__m256i __A, __m128i __B)
>  {
>    return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B);
> @@ -679,9 +672,11 @@ _mm256_sll_epi16 (__m256i __A, __m128i __B)
>  
>  extern __inline __m256i
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> -_mm256_slli_epi32 (__m256i __A, int __B)
> +_mm256_slli_epi16 (__m256i __A, int __B)
>  {
> -  return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B);
> +  if (__builtin_constant_p(__B))
> +    return ((unsigned int)__B < 16) ? (__m256i)((__v16hi)__A << __B) : 
> _mm256_setzero_si256();
> +  return _mm256_sll_epi16(__A, _mm_cvtsi32_si128(__B));
>  }

The formatting is wrong, missing spaces before function names and opening (,
too long lines.  Also, you've removed some builtin uses like
__builtin_ia32_psllwi256 above, but haven't removed those builtins from the
compiler (unlike the intrinsics, the builtins are not supported and can be
removed).  But I guess the primary question is on Uros, do we
want to handle this in the *intrin.h headers and thus increase the size
of those (and their parsing time etc.), or do we want to handle this
in the target folders (tree as well as gimple one), where we'd convert
e.g. __builtin_ia32_psllwi256 to the shift if the shift count is constant.

        Jakub

Reply via email to