[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |INVALID --- Comment #2 from Jakub Jelinek --- This is another user error. _mm512_srai_epi32 second argument must be a constant integer literal, n is not a constant integer literal, nor is there any chance that even with optimizations you get one, you pass a function argument to that. So, if you want to pass sometimes a variable, sometimes a constant, use _mm512_sra_epi32(a.v, _mm_set_epi64x(0, n)) instead of _mm512_srai_epi32(a.v, n); or perhaps __builtin_constant_p (n) ? _mm512_srai_epi32(a.v, n) : _mm512_sra_epi32(a.v, _mm_set_epi64x(0, n)); The optimizer will optimize the first line exactly as the second one if n is a constant after optimizations, but it will not error out otherwise.
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 --- Comment #3 from Sven Woop --- Right, this could be considered a user bug. However, we ran into this as we are successfully using this code sequence in our code: #include #define __forceinline inline __attribute__((always_inline)) struct vint8 { __forceinline vint8(const int i) : v(_mm256_set1_epi32(i)) {} __forceinline vint8(const __m256i& t) : v(t) {} friend __forceinline const vint8 operator >>( const vint8& a, const int n ) { return _mm256_srai_epi32(a.v, n); } __m256i v; }; vint8 test8(int shift) { const vint8 blocks_add(shift); return blocks_add >> shift; } Which is essentially the same bug for AVX2. However, this code compiles with every compiler that supports AVX2, be it GCC, Clang, or MSVC. Also the corresponding sequence for SSE compiles with every compiler we tried so far. I would have expected GCC to behave consistent for AVX-256 and AVX-512 for this code.
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 --- Comment #4 from Sven Woop --- BTW, the AVX-512 version of this "bug" also compiles with ICC and Clang 4.
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 Jakub Jelinek changed: What|Removed |Added Status|RESOLVED|ASSIGNED Last reconfirmed||2017-04-10 Resolution|INVALID |--- Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #5 from Jakub Jelinek --- Created attachment 41168 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41168&action=edit gcc7-pr80381.patch The documentation says that it should be an immediate and only mention the insn with the immediate. That said, the reason why GCC accepts it is that it has magic handling for the count argument of the 2 argument shift builtins, where it basically transforms the count as needed depending on what it is. This patch adds the same magic handling to the count argument of the 4 argument shift builtins (the ones that have the mask argument and the source for values when mask is not set).
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 --- Comment #6 from Jakub Jelinek --- Author: jakub Date: Tue Apr 11 08:54:54 2017 New Revision: 246835 URL: https://gcc.gnu.org/viewcvs?rev=246835&root=gcc&view=rev Log: PR target/80381 * config/i386/i386-builtin-types.def (V16HI_FTYPE_V16HI_INT_V16HI_UHI_COUNT, V16HI_FTYPE_V16HI_V8HI_V16HI_UHI_COUNT, V16SI_FTYPE_V16SI_INT_V16SI_UHI_COUNT, V16SI_FTYPE_V16SI_V4SI_V16SI_UHI_COUNT, V2DI_FTYPE_V2DI_INT_V2DI_UQI_COUNT, V2DI_FTYPE_V2DI_V2DI_V2DI_UQI_COUNT, V32HI_FTYPE_V32HI_INT_V32HI_USI_COUNT, V32HI_FTYPE_V32HI_V8HI_V32HI_USI_COUNT, V4DI_FTYPE_V4DI_INT_V4DI_UQI_COUNT, V4DI_FTYPE_V4DI_V2DI_V4DI_UQI_COUNT, V4SI_FTYPE_V4SI_INT_V4SI_UQI_COUNT, V4SI_FTYPE_V4SI_V4SI_V4SI_UQI_COUNT, V8DI_FTYPE_V8DI_INT_V8DI_UQI_COUNT, V8DI_FTYPE_V8DI_V2DI_V8DI_UQI_COUNT, V8HI_FTYPE_V8HI_INT_V8HI_UQI_COUNT, V8HI_FTYPE_V8HI_V8HI_V8HI_UQI_COUNT, V8SI_FTYPE_V8SI_INT_V8SI_UQI_COUNT, V8SI_FTYPE_V8SI_V4SI_V8SI_UQI_COUNT): New function type aliases. * config/i386/i386-builtin.def (__builtin_ia32_pslld512_mask, __builtin_ia32_pslldi512_mask, __builtin_ia32_psllq512_mask, __builtin_ia32_psllqi512_mask, __builtin_ia32_psrad512_mask, __builtin_ia32_psradi512_mask, __builtin_ia32_psraq512_mask, __builtin_ia32_psraqi512_mask, __builtin_ia32_psrld512_mask, __builtin_ia32_psrldi512_mask, __builtin_ia32_psrlq512_mask, __builtin_ia32_psrlqi512_mask, __builtin_ia32_psllwi128_mask, __builtin_ia32_pslldi128_mask, __builtin_ia32_psllqi128_mask, __builtin_ia32_psllw128_mask, __builtin_ia32_pslld128_mask, __builtin_ia32_psllq128_mask, __builtin_ia32_psllwi256_mask, __builtin_ia32_psllw256_mask, __builtin_ia32_pslldi256_mask, __builtin_ia32_pslld256_mask, __builtin_ia32_psllqi256_mask, __builtin_ia32_psllq256_mask, __builtin_ia32_psradi128_mask, __builtin_ia32_psrad128_mask, __builtin_ia32_psradi256_mask, __builtin_ia32_psrad256_mask, __builtin_ia32_psraqi128_mask, __builtin_ia32_psraq128_mask, __builtin_ia32_psraqi256_mask, __builtin_ia32_psraq256_mask, __builtin_ia32_psrldi128_mask, __builtin_ia32_psrld128_mask, __builtin_ia32_psrldi256_mask, __builtin_ia32_psrld256_mask, __builtin_ia32_psrlqi128_mask, __builtin_ia32_psrlq128_mask, __builtin_ia32_psrlqi256_mask, __builtin_ia32_psrlq256_mask, __builtin_ia32_psrawi256_mask, __builtin_ia32_psraw256_mask, __builtin_ia32_psrawi128_mask, __builtin_ia32_psraw128_mask, __builtin_ia32_psrlwi256_mask, __builtin_ia32_psrlw256_mask, __builtin_ia32_psrlwi128_mask, __builtin_ia32_psrlw128_mask, __builtin_ia32_psllwi512_mask, __builtin_ia32_psllw512_mask, __builtin_ia32_psrawi512_mask, __builtin_ia32_psraw512_mask, __builtin_ia32_psrlwi512_mask, __builtin_ia32_psrlw512_mask): Use _COUNT suffixed function type aliases. * config/i386/i386.c (ix86_expand_args_builtin): Rename last_arg_count flag to second_arg_count, handle 4 argument function type _COUNT aliases, handle second_arg_count on second argument rather than last. * gcc.target/i386/pr80381.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr80381.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386-builtin-types.def trunk/gcc/config/i386/i386-builtin.def trunk/gcc/config/i386/i386.c trunk/gcc/testsuite/ChangeLog
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #7 from Jakub Jelinek --- Fixed on the trunk.
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 --- Comment #8 from Sven Woop --- Thanks a lot. Sven
[Bug target/80381] AVX512: -O3, _mm512_srai_epi32, the last argument must be an 8-bit immediate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80381 --- Comment #9 from Uroš Bizjak --- I was looking at generated code (with -mtune=intel): vpbroadcastd%edi, %zmm0 # 9 *avx512f_vec_dup_gprv16si/2 [length = 6] movl%edi, %edi # 12*zero_extendsidi2/4 [length = 2] vmovq %rdi, %xmm1 # 26*movdi_internal/20 [length = 6] vpsrad %xmm1, %zmm0, %zmm0 # 17ashrv16si3/1[length = 6] ret # 29simple_return_internal [length = 1] (insn 12) and (insn 26) could be merged to vmovd %edx, %xmm0 # 13*zero_extendsidi2/10[length = 6] Register allocator somehow avoids zero-extension to SSE reg in (insn 12) and generates input reload (insn 26) for (insn 17): Inserting insn reload before: 26: r107:DI=r103:DI ... Choosing alt 19 in insn 26: (0) ?*Yi (1) r {*movdi_internal} RA could choose the same (?*Yi, r) alternative in the (insn 12). REE pass also doesn't merge (insn 12) and (insn 26).