[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #2 from Jakub Jelinek  ---
This is another user error.
_mm512_srai_epi32 second argument must be a constant integer literal, n is not
a constant integer literal, nor is there any chance that even with
optimizations you get one, you pass a function argument to that.

So, if you want to pass sometimes a variable, sometimes a constant, use
_mm512_sra_epi32(a.v, _mm_set_epi64x(0, n))
instead of
_mm512_srai_epi32(a.v, n);
or perhaps
__builtin_constant_p (n) ? _mm512_srai_epi32(a.v, n) : _mm512_sra_epi32(a.v,
_mm_set_epi64x(0, n));
The optimizer will optimize the first line exactly as the second one if n is a
constant after optimizations, but it will not error out otherwise.

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-10 Thread m...@sven-woop.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #3 from Sven Woop  ---
Right, this could be considered a user bug. However, we ran into this as we are
successfully using this code sequence in our code:

#include 

#define __forceinline inline __attribute__((always_inline))

struct vint8
{
  __forceinline vint8(const int i) 
: v(_mm256_set1_epi32(i)) {}

  __forceinline vint8(const __m256i& t) 
: v(t) {}

  friend __forceinline const vint8 operator >>( const vint8& a, const int n ) { 
return _mm256_srai_epi32(a.v, n); 
  }

  __m256i v; 
};

vint8 test8(int shift)
{
  const vint8 blocks_add(shift);
  return blocks_add >> shift;
}


Which is essentially the same bug for AVX2. However, this code compiles with
every compiler that supports AVX2, be it GCC, Clang, or MSVC. Also the
corresponding sequence for SSE compiles with every compiler we tried so far.

I would have expected GCC to behave consistent for AVX-256 and AVX-512 for this
code.

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-10 Thread m...@sven-woop.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #4 from Sven Woop  ---
BTW, the AVX-512 version of this "bug" also compiles with ICC and Clang 4.

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-10 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

Jakub Jelinek  changed:

   What|Removed |Added

 Status|RESOLVED|ASSIGNED
   Last reconfirmed||2017-04-10
 Resolution|INVALID |---
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #5 from Jakub Jelinek  ---
Created attachment 41168
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41168&action=edit
gcc7-pr80381.patch

The documentation says that it should be an immediate and only mention the insn
with the immediate.
That said, the reason why GCC accepts it is that it has magic handling for the
count argument of the 2 argument shift builtins, where it basically transforms
the count as needed depending on what it is.  This patch adds the same magic
handling to the count argument of the 4 argument shift builtins (the ones that
have the mask argument and the source for values when mask is not set).

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #6 from Jakub Jelinek  ---
Author: jakub
Date: Tue Apr 11 08:54:54 2017
New Revision: 246835

URL: https://gcc.gnu.org/viewcvs?rev=246835&root=gcc&view=rev
Log:
PR target/80381
* config/i386/i386-builtin-types.def
(V16HI_FTYPE_V16HI_INT_V16HI_UHI_COUNT,
V16HI_FTYPE_V16HI_V8HI_V16HI_UHI_COUNT,
V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT,
V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT,
V2DI_FTYPE_V2DI_INT_V2DI_UQI_COUNT,
V2DI_FTYPE_V2DI_V2DI_V2DI_UQI_COUNT,
V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT,
V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT,
V4DI_FTYPE_V4DI_INT_V4DI_UQI_COUNT,
V4DI_FTYPE_V4DI_V2DI_V4DI_UQI_COUNT,
V4SI_FTYPE_V4SI_INT_V4SI_UQI_COUNT,
V4SI_FTYPE_V4SI_V4SI_V4SI_UQI_COUNT,
V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT,
V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT,
V8HI_FTYPE_V8HI_INT_V8HI_UQI_COUNT,
V8HI_FTYPE_V8HI_V8HI_V8HI_UQI_COUNT,
V8SI_FTYPE_V8SI_INT_V8SI_UQI_COUNT,
V8SI_FTYPE_V8SI_V4SI_V8SI_UQI_COUNT): New function type aliases.
* config/i386/i386-builtin.def (__builtin_ia32_pslld512_mask,
__builtin_ia32_pslldi512_mask, __builtin_ia32_psllq512_mask,
__builtin_ia32_psllqi512_mask, __builtin_ia32_psrad512_mask,
__builtin_ia32_psradi512_mask, __builtin_ia32_psraq512_mask,
__builtin_ia32_psraqi512_mask, __builtin_ia32_psrld512_mask,
__builtin_ia32_psrldi512_mask, __builtin_ia32_psrlq512_mask,
__builtin_ia32_psrlqi512_mask, __builtin_ia32_psllwi128_mask,
__builtin_ia32_pslldi128_mask, __builtin_ia32_psllqi128_mask,
__builtin_ia32_psllw128_mask, __builtin_ia32_pslld128_mask,
__builtin_ia32_psllq128_mask, __builtin_ia32_psllwi256_mask,
__builtin_ia32_psllw256_mask, __builtin_ia32_pslldi256_mask,
__builtin_ia32_pslld256_mask, __builtin_ia32_psllqi256_mask,
__builtin_ia32_psllq256_mask, __builtin_ia32_psradi128_mask,
__builtin_ia32_psrad128_mask, __builtin_ia32_psradi256_mask,
__builtin_ia32_psrad256_mask, __builtin_ia32_psraqi128_mask,
__builtin_ia32_psraq128_mask, __builtin_ia32_psraqi256_mask,
__builtin_ia32_psraq256_mask, __builtin_ia32_psrldi128_mask,
__builtin_ia32_psrld128_mask, __builtin_ia32_psrldi256_mask,
__builtin_ia32_psrld256_mask, __builtin_ia32_psrlqi128_mask,
__builtin_ia32_psrlq128_mask, __builtin_ia32_psrlqi256_mask,
__builtin_ia32_psrlq256_mask, __builtin_ia32_psrawi256_mask,
__builtin_ia32_psraw256_mask, __builtin_ia32_psrawi128_mask,
__builtin_ia32_psraw128_mask, __builtin_ia32_psrlwi256_mask,
__builtin_ia32_psrlw256_mask, __builtin_ia32_psrlwi128_mask,
__builtin_ia32_psrlw128_mask, __builtin_ia32_psllwi512_mask,
__builtin_ia32_psllw512_mask, __builtin_ia32_psrawi512_mask,
__builtin_ia32_psraw512_mask, __builtin_ia32_psrlwi512_mask,
__builtin_ia32_psrlw512_mask): Use _COUNT suffixed function type
aliases.
* config/i386/i386.c (ix86_expand_args_builtin): Rename last_arg_count
flag to second_arg_count, handle 4 argument function type _COUNT
aliases, handle second_arg_count on second argument rather than last.

* gcc.target/i386/pr80381.c: New test.

Added:
trunk/gcc/testsuite/gcc.target/i386/pr80381.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386-builtin-types.def
trunk/gcc/config/i386/i386-builtin.def
trunk/gcc/config/i386/i386.c
trunk/gcc/testsuite/ChangeLog

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-11 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Jakub Jelinek  ---
Fixed on the trunk.

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-11 Thread m...@sven-woop.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #8 from Sven Woop  ---
Thanks a lot. Sven

[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate

2017-04-11 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381

--- Comment #9 from Uroš Bizjak  ---
I was looking at generated code (with -mtune=intel):

vpbroadcastd%edi, %zmm0 # 9 *avx512f_vec_dup_gprv16si/2
[length = 6]
movl%edi, %edi  # 12*zero_extendsidi2/4 [length = 2]
vmovq   %rdi, %xmm1 # 26*movdi_internal/20  [length = 6]
vpsrad  %xmm1, %zmm0, %zmm0 # 17ashrv16si3/1[length = 6]
ret # 29simple_return_internal  [length = 1]

(insn 12) and (insn 26) could be merged to

vmovd   %edx, %xmm0 # 13*zero_extendsidi2/10[length = 6]

Register allocator somehow avoids zero-extension to SSE reg in (insn 12) and
generates input reload (insn 26) for (insn 17):

Inserting insn reload before:
   26: r107:DI=r103:DI
 ...
 Choosing alt 19 in insn 26:  (0) ?*Yi  (1) r {*movdi_internal}

RA could choose the same (?*Yi, r) alternative in the (insn 12).

REE pass also doesn't merge (insn 12) and (insn 26).