Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Mon, Nov 21, 2016 at 08:40:37PM +0300, Andrew Senkevich wrote: > > FWIW, I came across the same error in my own testing and raised > > bug 78451. > > Can we fix it with the following patch? Regtesting in progress. > > PR target/78451 > * gcc/config/i386/avx5124fmapsintrin.h: Avoid call to > _mm512_setzero_ps. > * gcc/config/i386/avx5124vnniwintrin.h: Ditto. That is just a workaround, we want to fix the real bug. I'll have a look tomorrow. Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-21 20:12 GMT+03:00 Martin Sebor: > On 11/20/2016 11:16 AM, Uros Bizjak wrote: >> >> On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak wrote: >>> >>> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek wrote: On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote: > > On x86_64-linux with the 3 patches I'm not seeing any new FAILs > compared to before r242569, on i686-linux there is still: > +FAIL: gcc.target/i386/pr57756.c (test for errors, line 6) > +FAIL: gcc.target/i386/pr57756.c (test for warnings, line 14) > compared to pre-r242569 (so some further fix is needed). And finally here is yet another patch that fixes pr57756 on i686-linux. Ok for trunk together with the other 3 patches? >>> >>> >>> OK for the whole patch series. >> >> >> Hm, I still see (both, 32bit and 64bit targets): >> >> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M >> from >> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M >> from >> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M >> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function >> '_mm512_maskz_4fmadd_ps':^M >> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining >> failed in call to always_inline '_mm512_setzero_ps': target specific >> option mismatch^M >> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M >> from >> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M >> from >> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M >> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note: >> called from here^M >> compiler exited with status 1 >> FAIL: gcc.target/i386/sse-22a.c (test for excess errors) >> Excess errors: >> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining >> failed in call to always_inline '_mm512_setzero_ps': target specific >> option mismatch > > > FWIW, I came across the same error in my own testing and raised > bug 78451. Can we fix it with the following patch? Regtesting in progress. PR target/78451 * gcc/config/i386/avx5124fmapsintrin.h: Avoid call to _mm512_setzero_ps. * gcc/config/i386/avx5124vnniwintrin.h: Ditto. diff --git a/gcc/config/i386/avx5124fmapsintrin.h b/gcc/config/i386/avx5124fmapsintrin.h index 6113ee9..dd9a322 --- a/gcc/config/i386/avx5124fmapsintrin.h +++ b/gcc/config/i386/avx5124fmapsintrin.h @@ -74,7 +74,9 @@ _mm512_maskz_4fmadd_ps (__mmask16 __U, (__v16sf) __E, (__v16sf) __A, (const __v4sf *) __F, - (__v16sf) _mm512_setzero_ps (), + (__v16sf) {0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0}, (__mmask16) __U); } @@ -161,7 +163,9 @@ _mm512_maskz_4fnmadd_ps (__mmask16 __U, (__v16sf) __E, (__v16sf) __A, (const __v4sf *) __F, - (__v16sf) _mm512_setzero_ps (), + (__v16sf) {0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0}, (__mmask16) __U); } diff --git a/gcc/config/i386/avx5124vnniwintrin.h b/gcc/config/i386/avx5124vnniwintrin.h index 392c6a5..a4faa24 --- a/gcc/config/i386/avx5124vnniwintrin.h +++ b/gcc/config/i386/avx5124vnniwintrin.h @@ -75,7 +75,9 @@ _mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i __A, __m512i __B, (__v16si) __E, (__v16si) __A, (const __v4si *) __F, - (__v16si) _mm512_setzero_ps (), + (__v16si) {0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0}, (__mmask16) __U); } @@ -120,7 +122,9 @@ _mm512_maskz_4dpwssds_epi32 (__mmask16 __U, __m512i __A, __m512i __B, (__v16si) __E, (__v16si) __A, (const __v4si *) __F, - (__v16si) _mm512_setzero_ps (), + (__v16si) {0, 0, 0, 0, + 0, 0, 0, 0, 0, 0, 0, 0, + 0, 0, 0, 0}, (__mmask16) __U); } -- WBR, Andrew sse-22a-fix.patch Description: Binary data
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On 11/20/2016 11:16 AM, Uros Bizjak wrote: On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjakwrote: On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek wrote: On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote: On x86_64-linux with the 3 patches I'm not seeing any new FAILs compared to before r242569, on i686-linux there is still: +FAIL: gcc.target/i386/pr57756.c (test for errors, line 6) +FAIL: gcc.target/i386/pr57756.c (test for warnings, line 14) compared to pre-r242569 (so some further fix is needed). And finally here is yet another patch that fixes pr57756 on i686-linux. Ok for trunk together with the other 3 patches? OK for the whole patch series. Hm, I still see (both, 32bit and 64bit targets): In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function '_mm512_maskz_4fmadd_ps':^M /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining failed in call to always_inline '_mm512_setzero_ps': target specific option mismatch^M In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note: called from here^M compiler exited with status 1 FAIL: gcc.target/i386/sse-22a.c (test for excess errors) Excess errors: /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining failed in call to always_inline '_mm512_setzero_ps': target specific option mismatch FWIW, I came across the same error in my own testing and raised bug 78451. Martin
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjakwrote: > On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek wrote: >> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote: >>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs >>> compared to before r242569, on i686-linux there is still: >>> +FAIL: gcc.target/i386/pr57756.c (test for errors, line 6) >>> +FAIL: gcc.target/i386/pr57756.c (test for warnings, line 14) >>> compared to pre-r242569 (so some further fix is needed). >> >> And finally here is yet another patch that fixes pr57756 on i686-linux. >> Ok for trunk together with the other 3 patches? > > OK for the whole patch series. Hm, I still see (both, 32bit and 64bit targets): In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function '_mm512_maskz_4fmadd_ps':^M /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining failed in call to always_inline '_mm512_setzero_ps': target specific option mismatch^M In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M from /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note: called from here^M compiler exited with status 1 FAIL: gcc.target/i386/sse-22a.c (test for excess errors) Excess errors: /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining failed in call to always_inline '_mm512_setzero_ps': target specific option mismatch Uros.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinekwrote: > On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote: >> On x86_64-linux with the 3 patches I'm not seeing any new FAILs >> compared to before r242569, on i686-linux there is still: >> +FAIL: gcc.target/i386/pr57756.c (test for errors, line 6) >> +FAIL: gcc.target/i386/pr57756.c (test for warnings, line 14) >> compared to pre-r242569 (so some further fix is needed). > > And finally here is yet another patch that fixes pr57756 on i686-linux. > Ok for trunk together with the other 3 patches? OK for the whole patch series. Big thanks, Uros. > > 2016-11-19 Jakub Jelinek > > * config/i386/i386.c (ix86_can_inline_p): Use || instead of & > when checking if callee's isa flags are subset of caller's isa flags. > Fix comment wording. > > --- gcc/config/i386/i386.c.jj 2016-11-19 18:02:56.0 +0100 > +++ gcc/config/i386/i386.c 2016-11-19 18:21:23.649463040 +0100 > @@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal >struct cl_target_option *caller_opts = TREE_TARGET_OPTION > (caller_tree); >struct cl_target_option *callee_opts = TREE_TARGET_OPTION > (callee_tree); > > - /* Callee's isa options should a subset of the caller's, i.e. a SSE4 > function > -can inline a SSE2 function but a SSE2 function can't inline a SSE4 > -function. */ > + /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 > +function can inline a SSE2 function but a SSE2 function can't inline > +a SSE4 function. */ >if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) > - != callee_opts->x_ix86_isa_flags) & > - ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) > - != callee_opts->x_ix86_isa_flags2)) > + != callee_opts->x_ix86_isa_flags) > + || ((caller_opts->x_ix86_isa_flags2 & > callee_opts->x_ix86_isa_flags2) > + != callee_opts->x_ix86_isa_flags2)) > ret = false; > >/* See if we have the same non-isa options. */ > > > Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote: > On x86_64-linux with the 3 patches I'm not seeing any new FAILs > compared to before r242569, on i686-linux there is still: > +FAIL: gcc.target/i386/pr57756.c (test for errors, line 6) > +FAIL: gcc.target/i386/pr57756.c (test for warnings, line 14) > compared to pre-r242569 (so some further fix is needed). And finally here is yet another patch that fixes pr57756 on i686-linux. Ok for trunk together with the other 3 patches? 2016-11-19 Jakub Jelinek* config/i386/i386.c (ix86_can_inline_p): Use || instead of & when checking if callee's isa flags are subset of caller's isa flags. Fix comment wording. --- gcc/config/i386/i386.c.jj 2016-11-19 18:02:56.0 +0100 +++ gcc/config/i386/i386.c 2016-11-19 18:21:23.649463040 +0100 @@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree); struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree); - /* Callee's isa options should a subset of the caller's, i.e. a SSE4 function -can inline a SSE2 function but a SSE2 function can't inline a SSE4 -function. */ + /* Callee's isa options should be a subset of the caller's, i.e. a SSE4 +function can inline a SSE2 function but a SSE2 function can't inline +a SSE4 function. */ if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags) - != callee_opts->x_ix86_isa_flags) & - ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) - != callee_opts->x_ix86_isa_flags2)) + != callee_opts->x_ix86_isa_flags) + || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2) + != callee_opts->x_ix86_isa_flags2)) ret = false; /* See if we have the same non-isa options. */ Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-19 13:17 GMT+03:00 Uros Bizjak: > On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinek wrote: >> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote: >>> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: >>> > I'm seeing lots of ICEs with this. >>> >>> Here is untested fix for that, will bootstrap/regtest it soon (after my >>> current set of bootstraps finishes). >>> >>> 2016-11-18 Jakub Jelinek >>> >>> * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, >>> don't initialize it, don't use it for the case where it isn't >>> provable %{z} nor using the same argument, instead move merge >>> argument into a new pseudo and use that as target. Formatting fixes. >> >> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and >> fixed a couple of FAILs, but not tons of others. >> >> Here is another patch I'm going to test which fixes many other FAILs, but >> still some are left: >> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error) >> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors) >> FAIL: gcc.target/i386/mvc1.c (internal compiler error) >> FAIL: gcc.target/i386/mvc1.c (test for excess errors) >> FAIL: gcc.target/i386/mvc6.c (internal compiler error) >> FAIL: gcc.target/i386/mvc6.c (test for excess errors) >> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb >> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw >> FAIL: gcc.target/i386/mvc8.c (internal compiler error) >> FAIL: gcc.target/i386/mvc8.c (test for excess errors) >> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error) >> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors) >> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error) >> FAIL: gcc.target/i386/pr71652-3.c (test for errors, line 5) >> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors) > > I wonder why patch submitter didn't get these failures during > regtesting. There are plenty of tests (the above multi-vrsioning > tests) that depend on correct handling of ISA variables. I assumed > that these tests passed and consequently didn't went deep into the > implementation, but rather requested a couple of additional tests that > exercised added functionality.some more. Completely my bad. Starting from addition last intrinsics testing gone wrong. Will double check next time to avoid repeating in the future. >> Will debug even those. Thank you, Jakub.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 11:17:55AM +0100, Uros Bizjak wrote: > > Here is another patch I'm going to test which fixes many other FAILs, but > > still some are left: > > FAIL: gcc.target/i386/funcspec-3.c (internal compiler error) > > FAIL: gcc.target/i386/funcspec-3.c (test for excess errors) > > FAIL: gcc.target/i386/mvc1.c (internal compiler error) > > FAIL: gcc.target/i386/mvc1.c (test for excess errors) > > FAIL: gcc.target/i386/mvc6.c (internal compiler error) > > FAIL: gcc.target/i386/mvc6.c (test for excess errors) > > FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb > > FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw > > FAIL: gcc.target/i386/mvc8.c (internal compiler error) > > FAIL: gcc.target/i386/mvc8.c (test for excess errors) > > FAIL: gcc.target/i386/pr67995-2.c (internal compiler error) > > FAIL: gcc.target/i386/pr67995-2.c (test for excess errors) > > FAIL: gcc.target/i386/pr71652-3.c (internal compiler error) > > FAIL: gcc.target/i386/pr71652-3.c (test for errors, line 5) > > FAIL: gcc.target/i386/pr71652-3.c (test for excess errors) > > I wonder why patch submitter didn't get these failures during > regtesting. There are plenty of tests (the above multi-vrsioning > tests) that depend on correct handling of ISA variables. I assumed > that these tests passed and consequently didn't went deep into the > implementation, but rather requested a couple of additional tests that > exercised added functionality.some more. Dunno, clearly the patch has not been tested at all, at least not in the form that has been checked in. I've now bootstrapped/regtested on x86_64-linux and i686-linux all these 3 patches: http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01992.html http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02026.html http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02027.html , e.g. on x86_64-linux they fix: -FAIL: gcc.target/i386/avx-2.c (internal compiler error) -FAIL: gcc.target/i386/avx-2.c (test for excess errors) -FAIL: gcc.target/i386/avx2-gather-2.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 16 -FAIL: gcc.target/i386/avx2-gather-6.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 1 -FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-ceil-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-ceilf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-floor-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-floorf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-gather-2.c scan-tree-dump-times vect "note: vectorized 1 loops in function" 16 -FAIL: gcc.target/i386/avx512f-gather-5.c scan-assembler-times gather[^\\n]*zmm[0-9]+{%k[1-7]}(?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times vcvtpd2dq[^\\n]+ymm[0-9](?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times vinserti64x4[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-rintf-sfix-vec-2.c scan-assembler-times vcvtps2dq[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times vcvttpd2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 2 -FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-trunc-vec-2.c scan-assembler-times vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/avx512f-truncf-vec-2.c scan-assembler-times vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1 -FAIL: gcc.target/i386/funcspec-8.c (test for errors, line 104) -FAIL: gcc.target/i386/funcspec-8.c
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinekwrote: > On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote: >> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: >> > I'm seeing lots of ICEs with this. >> >> Here is untested fix for that, will bootstrap/regtest it soon (after my >> current set of bootstraps finishes). >> >> 2016-11-18 Jakub Jelinek >> >> * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, >> don't initialize it, don't use it for the case where it isn't >> provable %{z} nor using the same argument, instead move merge >> argument into a new pseudo and use that as target. Formatting fixes. > > Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and > fixed a couple of FAILs, but not tons of others. > > Here is another patch I'm going to test which fixes many other FAILs, but > still some are left: > FAIL: gcc.target/i386/funcspec-3.c (internal compiler error) > FAIL: gcc.target/i386/funcspec-3.c (test for excess errors) > FAIL: gcc.target/i386/mvc1.c (internal compiler error) > FAIL: gcc.target/i386/mvc1.c (test for excess errors) > FAIL: gcc.target/i386/mvc6.c (internal compiler error) > FAIL: gcc.target/i386/mvc6.c (test for excess errors) > FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb > FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw > FAIL: gcc.target/i386/mvc8.c (internal compiler error) > FAIL: gcc.target/i386/mvc8.c (test for excess errors) > FAIL: gcc.target/i386/pr67995-2.c (internal compiler error) > FAIL: gcc.target/i386/pr67995-2.c (test for excess errors) > FAIL: gcc.target/i386/pr71652-3.c (internal compiler error) > FAIL: gcc.target/i386/pr71652-3.c (test for errors, line 5) > FAIL: gcc.target/i386/pr71652-3.c (test for excess errors) I wonder why patch submitter didn't get these failures during regtesting. There are plenty of tests (the above multi-vrsioning tests) that depend on correct handling of ISA variables. I assumed that these tests passed and consequently didn't went deep into the implementation, but rather requested a couple of additional tests that exercised added functionality.some more. > Will debug even those. Thanks! Uros. > 2016-11-19 Jakub Jelinek > > * config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2, > ix86_add_new_builtins): Formatting fixes. > (ix86_expand_builtin): Use || instead of && for isa vs. isa2. > (ix86_get_builtin): Likewise. > > --- gcc/config/i386/i386.c.jj 2016-11-18 22:30:16.0 +0100 > +++ gcc/config/i386/i386.c 2016-11-19 08:37:45.748175866 +0100 > @@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c > means that *both* cpuid bits must be set for the built-in to be > available. > Handle this here. */ >if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL) > - mask &= ~OPTION_MASK_ISA_AVX512VL; > + mask &= ~OPTION_MASK_ISA_AVX512VL; > >mask &= ~OPTION_MASK_ISA_64BIT; >if (mask == 0 > @@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c > > static inline tree > def_builtin2 (HOST_WIDE_INT mask, const char *name, > -enum ix86_builtin_func_type tcode, > -enum ix86_builtins code) > + enum ix86_builtin_func_type tcode, > + enum ix86_builtins code) > { >tree decl = NULL_TREE; > > @@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const >tree type = ix86_get_builtin_func_type (tcode); >decl = add_builtin_function (name, type, code, BUILT_IN_MD, >NULL, NULL_TREE); > - ix86_builtins[(int) code] = decl; > - ix86_builtins_isa[(int) code].set_and_not_built_p = false; > + ix86_builtins[(int) code] = decl; > + ix86_builtins_isa[(int) code].set_and_not_built_p = false; > } >else > { > @@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const > > static inline tree > def_builtin_const2 (HOST_WIDE_INT mask, const char *name, > - enum ix86_builtin_func_type tcode, enum ix86_builtins code) > + enum ix86_builtin_func_type tcode, enum ix86_builtins > code) > { >tree decl = def_builtin2 (mask, name, tcode, code); >if (decl) > @@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask, > static void > ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2) > { > - if (((isa & deferred_isa_values) == 0) > - && ((isa2 & deferred_isa_values2) == 0)) > + if ((isa & deferred_isa_values) == 0 > + && (isa2 & deferred_isa_values2) == 0) > return; > >/* Bits in ISA value can be removed from potential isa values. */ > @@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa > >for (i = 0; i < (int)IX86_BUILTIN_MAX; i++) > { > - if ix86_builtins_isa[i].isa & isa) != 0) || > ((ix86_builtins_isa[i].isa2 & isa2)
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Sat, Nov 19, 2016 at 09:05:05AM +0100, Jakub Jelinek wrote: > On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote: > > On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: > > > I'm seeing lots of ICEs with this. > > > > Here is untested fix for that, will bootstrap/regtest it soon (after my > > current set of bootstraps finishes). > > > > 2016-11-18 Jakub Jelinek> > > > * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, > > don't initialize it, don't use it for the case where it isn't > > provable %{z} nor using the same argument, instead move merge > > argument into a new pseudo and use that as target. Formatting fixes. > > Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and > fixed a couple of FAILs, but not tons of others. > > Here is another patch I'm going to test which fixes many other FAILs, but > still some are left: > FAIL: gcc.target/i386/funcspec-3.c (internal compiler error) > FAIL: gcc.target/i386/funcspec-3.c (test for excess errors) > FAIL: gcc.target/i386/mvc1.c (internal compiler error) > FAIL: gcc.target/i386/mvc1.c (test for excess errors) > FAIL: gcc.target/i386/mvc6.c (internal compiler error) > FAIL: gcc.target/i386/mvc6.c (test for excess errors) > FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb > FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw > FAIL: gcc.target/i386/mvc8.c (internal compiler error) > FAIL: gcc.target/i386/mvc8.c (test for excess errors) > FAIL: gcc.target/i386/pr67995-2.c (internal compiler error) > FAIL: gcc.target/i386/pr67995-2.c (test for excess errors) > FAIL: gcc.target/i386/pr71652-3.c (internal compiler error) > FAIL: gcc.target/i386/pr71652-3.c (test for errors, line 5) > FAIL: gcc.target/i386/pr71652-3.c (test for excess errors) > Will debug even those. The fix for that (still not bootstrapped/regtested) is below. Will now bootstrap/regtest all 3 patches and hopefully all the 4fma* introduced regressions will be gone. 2016-11-19 Jakub Jelinek * config/i386/i386.c (ix86_valid_target_attribute_tree): Don't clear opts->x_ix86_isa_flags, clear opts->x_ix86_isa_flags2 instead and using = 0 instead of &= 0. --- gcc/config/i386/i386.c.jj 2016-11-19 08:54:37.0 +0100 +++ gcc/config/i386/i386.c 2016-11-19 09:20:52.314913008 +0100 @@ -6845,7 +6845,7 @@ ix86_valid_target_attribute_tree (tree a | OPTION_MASK_ABI_64 | OPTION_MASK_ABI_X32 | OPTION_MASK_CODE16); - opts->x_ix86_isa_flags &= 0; + opts->x_ix86_isa_flags2 = 0; } else if (!orig_arch_specified) opts->x_ix86_arch_string = NULL; Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote: > On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: > > I'm seeing lots of ICEs with this. > > Here is untested fix for that, will bootstrap/regtest it soon (after my > current set of bootstraps finishes). > > 2016-11-18 Jakub Jelinek> > * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, > don't initialize it, don't use it for the case where it isn't > provable %{z} nor using the same argument, instead move merge > argument into a new pseudo and use that as target. Formatting fixes. Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and fixed a couple of FAILs, but not tons of others. Here is another patch I'm going to test which fixes many other FAILs, but still some are left: FAIL: gcc.target/i386/funcspec-3.c (internal compiler error) FAIL: gcc.target/i386/funcspec-3.c (test for excess errors) FAIL: gcc.target/i386/mvc1.c (internal compiler error) FAIL: gcc.target/i386/mvc1.c (test for excess errors) FAIL: gcc.target/i386/mvc6.c (internal compiler error) FAIL: gcc.target/i386/mvc6.c (test for excess errors) FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw FAIL: gcc.target/i386/mvc8.c (internal compiler error) FAIL: gcc.target/i386/mvc8.c (test for excess errors) FAIL: gcc.target/i386/pr67995-2.c (internal compiler error) FAIL: gcc.target/i386/pr67995-2.c (test for excess errors) FAIL: gcc.target/i386/pr71652-3.c (internal compiler error) FAIL: gcc.target/i386/pr71652-3.c (test for errors, line 5) FAIL: gcc.target/i386/pr71652-3.c (test for excess errors) Will debug even those. 2016-11-19 Jakub Jelinek * config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2, ix86_add_new_builtins): Formatting fixes. (ix86_expand_builtin): Use || instead of && for isa vs. isa2. (ix86_get_builtin): Likewise. --- gcc/config/i386/i386.c.jj 2016-11-18 22:30:16.0 +0100 +++ gcc/config/i386/i386.c 2016-11-19 08:37:45.748175866 +0100 @@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c means that *both* cpuid bits must be set for the built-in to be available. Handle this here. */ if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL) - mask &= ~OPTION_MASK_ISA_AVX512VL; + mask &= ~OPTION_MASK_ISA_AVX512VL; mask &= ~OPTION_MASK_ISA_64BIT; if (mask == 0 @@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c static inline tree def_builtin2 (HOST_WIDE_INT mask, const char *name, -enum ix86_builtin_func_type tcode, -enum ix86_builtins code) + enum ix86_builtin_func_type tcode, + enum ix86_builtins code) { tree decl = NULL_TREE; @@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const tree type = ix86_get_builtin_func_type (tcode); decl = add_builtin_function (name, type, code, BUILT_IN_MD, NULL, NULL_TREE); - ix86_builtins[(int) code] = decl; - ix86_builtins_isa[(int) code].set_and_not_built_p = false; + ix86_builtins[(int) code] = decl; + ix86_builtins_isa[(int) code].set_and_not_built_p = false; } else { @@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const static inline tree def_builtin_const2 (HOST_WIDE_INT mask, const char *name, - enum ix86_builtin_func_type tcode, enum ix86_builtins code) + enum ix86_builtin_func_type tcode, enum ix86_builtins code) { tree decl = def_builtin2 (mask, name, tcode, code); if (decl) @@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask, static void ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2) { - if (((isa & deferred_isa_values) == 0) - && ((isa2 & deferred_isa_values2) == 0)) + if ((isa & deferred_isa_values) == 0 + && (isa2 & deferred_isa_values2) == 0) return; /* Bits in ISA value can be removed from potential isa values. */ @@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa for (i = 0; i < (int)IX86_BUILTIN_MAX; i++) { - if ix86_builtins_isa[i].isa & isa) != 0) || ((ix86_builtins_isa[i].isa2 & isa2) != 0)) + if (((ix86_builtins_isa[i].isa & isa) != 0 + || (ix86_builtins_isa[i].isa2 & isa2) != 0) && ix86_builtins_isa[i].set_and_not_built_p) { tree decl, type; @@ -36549,7 +36550,7 @@ ix86_expand_builtin (tree exp, rtx targe whether it is supported. */ if ((ix86_builtins_isa[fcode].isa && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags)) - && (ix86_builtins_isa[fcode].isa2 + || (ix86_builtins_isa[fcode].isa2 && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2))) { char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, @@ -38514,7
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote: > I'm seeing lots of ICEs with this. Here is untested fix for that, will bootstrap/regtest it soon (after my current set of bootstraps finishes). 2016-11-18 Jakub Jelinek* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable, don't initialize it, don't use it for the case where it isn't provable %{z} nor using the same argument, instead move merge argument into a new pseudo and use that as target. Formatting fixes. --- gcc/config/i386/i386.c.jj 2016-11-18 20:04:31.0 +0100 +++ gcc/config/i386/i386.c 2016-11-18 21:21:17.764190127 +0100 @@ -38220,14 +38220,12 @@ rdseed_step: rtx (*fcn) (rtx, rtx, rtx, rtx); rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx); rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx); - rtx (*msk_mov) (rtx, rtx, rtx, rtx); int masked = 1; machine_mode mode, wide_mode, nar_mode; nar_mode = V4SFmode; mode = V16SFmode; wide_mode = V64SFmode; - msk_mov = gen_avx512f_loadv16sf_mask; fcn_mask = gen_avx5124fmaddps_4fmaddps_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz; @@ -38270,7 +38268,6 @@ rdseed_step: wide_mode = V64SImode; fcn_mask = gen_avx5124vnniw_vp4dpwssd_mask; fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz; - msk_mov = gen_avx512f_loadv16si_mask; goto v4fma_expand; case IX86_BUILTIN_4DPWSSDS_MASK: @@ -38279,7 +38276,6 @@ rdseed_step: wide_mode = V64SImode; fcn_mask = gen_avx5124vnniw_vp4dpwssds_mask; fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz; - msk_mov = gen_avx512f_loadv16si_mask; goto v4fma_expand; case IX86_BUILTIN_4FMAPS_MASK: @@ -38295,11 +38291,11 @@ v4fma_expand: wide_reg = gen_reg_rtx (wide_mode); for (i = 0; i < 4; i++) { - args[i] = CALL_EXPR_ARG (exp, i); + args[i] = CALL_EXPR_ARG (exp, i); ops[i] = expand_normal (args[i]); - emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64), - ops[i]); + emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, i * 64), + ops[i]); } accum = expand_normal (CALL_EXPR_ARG (exp, 4)); @@ -38318,7 +38314,7 @@ v4fma_expand: emit_insn (fcn (target, accum, wide_reg, mem)); else { - rtx merge, mask; + rtx merge, mask; merge = expand_normal (CALL_EXPR_ARG (exp, 6)); mask = expand_normal (CALL_EXPR_ARG (exp, 7)); @@ -38340,18 +38336,16 @@ v4fma_expand: merge = force_reg (mode, merge); emit_insn (fcn_mask (target, wide_reg, mem, merge, mask)); } - /* Merge with something unknown might happen if we z-mask w/ -O0. */ + /* Merge with something unknown might happen if we z-mask w/ -O0. */ else { - rtx tmp = target; - emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask)); - - target = force_reg (mode, merge); - emit_insn (msk_mov (target, tmp, target, mask)); + target = gen_reg_rtx (mode); + emit_move_insn (target, merge); + emit_insn (fcn_mask (target, wide_reg, mem, target, mask)); } } - return target; - } + return target; + } case IX86_BUILTIN_4FNMASS: fcn = gen_avx5124fmaddps_4fnmaddss; @@ -38366,7 +38360,6 @@ v4fma_expand: case IX86_BUILTIN_4FNMASS_MASK: fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz; - msk_mov = gen_avx512vl_loadv4sf_mask; goto s4fma_expand; case IX86_BUILTIN_4FMASS_MASK: @@ -38380,22 +38373,21 @@ v4fma_expand: fcn_mask = gen_avx5124fmaddps_4fmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz; - msk_mov = gen_avx512vl_loadv4sf_mask; s4fma_expand: mode = V4SFmode; wide_reg = gen_reg_rtx (V64SFmode); for (i = 0; i < 4; i++) { -rtx tmp; -args[i] = CALL_EXPR_ARG (exp, i); -ops[i] = expand_normal (args[i]); + rtx tmp; + args[i] = CALL_EXPR_ARG (exp, i); + ops[i] = expand_normal (args[i]); -tmp = gen_reg_rtx (SFmode); -emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0)); + tmp = gen_reg_rtx (SFmode); + emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0)); -emit_move_insn
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi! On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote: > > Hi HJ, could you please commit it? > > Done. I'm seeing lots of ICEs with this. E.g. reduced: typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); typedef unsigned char __mmask8; typedef float __v4sf __attribute__ ((__vector_size__ (16))); static inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_setzero_ps (void) { return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; } __m128 foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, __m128 *__F) { return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B, (__v4sf) __C, (__v4sf) __D, (__v4sf) __E, (__v4sf) __A, (const __v4sf *) __F, (__v4sf) _mm_setzero_ps (), (__mmask8) __U); } ICEs with -mavx5124fmaps -O0, but succeeds with -mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2. fcn_mask = gen_avx5124fmaddps_4fmaddss_mask; fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz; msk_mov = gen_avx512vl_loadv4sf_mask; looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong. You need some fallback if avx512vl isn't available, perhaps use avx512f 512-bit masked insns with bits in the mask forced to pick only the ones you want? Also, seems there are various formatting issues in the change, e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char indentation, some lines too long. Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Thu, Nov 17, 2016 at 4:20 AM, Andrew Senkevichwrote: > 16 Ноя 2016 г. 19:21 пользователь "Bernd Schmidt" > написал: > > >> >> On 11/15/2016 05:31 PM, Andrew Senkevich wrote: >>> >>> 2016-11-15 17:56 GMT+03:00 Jeff Law : On 11/15/2016 05:55 AM, Andrew Senkevich wrote: > > > 2016-11-11 14:16 GMT+03:00 Uros Bizjak : >> >> >> --- a/gcc/genmodes.c >> +++ b/gcc/genmodes.c >> --- a/gcc/init-regs.c >> +++ b/gcc/init-regs.c >> --- a/gcc/machmode.h >> +++ b/gcc/machmode.h >> >> These are middle-end changes, you will need a separate review for >> these. > > > > Who could review these changes? I can. I likely dropped the message because it looked x86 specific, so if you could resend it'd be appreciated. >>> >>> >>> Attached (diff with previous only in fixed comments typos). >> >> >> Next time please split middle-end changes out from target-related stuff >> and send them separately. >> >> These ones are OK. >> >> >> Bernd > > Hi HJ, could you please commit it? Done. -- H.J.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-16 19:21 GMT+03:00 Bernd Schmidt: > On 11/15/2016 05:31 PM, Andrew Senkevich wrote: >> >> 2016-11-15 17:56 GMT+03:00 Jeff Law : >>> >>> On 11/15/2016 05:55 AM, Andrew Senkevich wrote: 2016-11-11 14:16 GMT+03:00 Uros Bizjak : > > > --- a/gcc/genmodes.c > +++ b/gcc/genmodes.c > --- a/gcc/init-regs.c > +++ b/gcc/init-regs.c > --- a/gcc/machmode.h > +++ b/gcc/machmode.h > > These are middle-end changes, you will need a separate review for > these. Who could review these changes? >>> >>> >>> I can. I likely dropped the message because it looked x86 specific, so >>> if >>> you could resend it'd be appreciated. >> >> >> Attached (diff with previous only in fixed comments typos). > > > Next time please split middle-end changes out from target-related stuff and > send them separately. Ok. > These ones are OK. > > > Bernd Thanks! Who could commit it? -- WBR, Andrew
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On 11/15/2016 05:31 PM, Andrew Senkevich wrote: 2016-11-15 17:56 GMT+03:00 Jeff Law: On 11/15/2016 05:55 AM, Andrew Senkevich wrote: 2016-11-11 14:16 GMT+03:00 Uros Bizjak : --- a/gcc/genmodes.c +++ b/gcc/genmodes.c --- a/gcc/init-regs.c +++ b/gcc/init-regs.c --- a/gcc/machmode.h +++ b/gcc/machmode.h These are middle-end changes, you will need a separate review for these. Who could review these changes? I can. I likely dropped the message because it looked x86 specific, so if you could resend it'd be appreciated. Attached (diff with previous only in fixed comments typos). Next time please split middle-end changes out from target-related stuff and send them separately. These ones are OK. Bernd
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-15 17:56 GMT+03:00 Jeff Law: > On 11/15/2016 05:55 AM, Andrew Senkevich wrote: >> >> 2016-11-11 14:16 GMT+03:00 Uros Bizjak : >>> >>> --- a/gcc/genmodes.c >>> +++ b/gcc/genmodes.c >>> --- a/gcc/init-regs.c >>> +++ b/gcc/init-regs.c >>> --- a/gcc/machmode.h >>> +++ b/gcc/machmode.h >>> >>> These are middle-end changes, you will need a separate review for these. >> >> >> Who could review these changes? > > I can. I likely dropped the message because it looked x86 specific, so if > you could resend it'd be appreciated. Attached (diff with previous only in fixed comments typos). -- WBR, Andrew new_avx512_instructions_15.11.patch Description: Binary data
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On 11/15/2016 05:55 AM, Andrew Senkevich wrote: 2016-11-11 14:16 GMT+03:00 Uros Bizjak: --- a/gcc/genmodes.c +++ b/gcc/genmodes.c --- a/gcc/init-regs.c +++ b/gcc/init-regs.c --- a/gcc/machmode.h +++ b/gcc/machmode.h These are middle-end changes, you will need a separate review for these. Who could review these changes? I can. I likely dropped the message because it looked x86 specific, so if you could resend it'd be appreciated. jeff
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-11 14:16 GMT+03:00 Uros Bizjak: > --- a/gcc/genmodes.c > +++ b/gcc/genmodes.c > --- a/gcc/init-regs.c > +++ b/gcc/init-regs.c > --- a/gcc/machmode.h > +++ b/gcc/machmode.h > > These are middle-end changes, you will need a separate review for these. Who could review these changes? -- WBR, Andrew
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Mon, Nov 14, 2016 at 7:28 PM, Andrew Senkevichwrote: > 2016-11-11 14:16 GMT+03:00 Uros Bizjak : >> The x86 part of the patch is OK with the above changes and additional >> target attribute test for flags2 ISA features.. > > Fixed according your comments, I will followup with additional tests soon. OK. Thanks, Uros.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-11 14:29 GMT+03:00 Jakub Jelinek: > Hi! > > I've noticed preexisting: > > On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: > >> --- a/gcc/config/i386/i386-modes.def >> +++ b/gcc/config/i386/i386-modes.def >> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ >> VECTOR_MODES (FLOAT, 32); /*V16HF V8SF V4DF */ >> VECTOR_MODES (FLOAT, 64); /* V32HF V16SF V8DF */ >> VECTOR_MODES (FLOAT, 128);/* V64HF V32SF V16DF */ > > The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix > that? None of them create V*HF mode, but they do create V*TF mode. I have fixed it in new patch. -- WBR, Andrew
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-11 14:16 GMT+03:00 Uros Bizjak: > The x86 part of the patch is OK with the above changes and additional > target attribute test for flags2 ISA features.. Fixed according your comments, I will followup with additional tests soon. -- WBR, Andrew new_avx512_instructions_14.11.patch Description: Binary data
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi! I've noticed preexisting: On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: > --- a/gcc/config/i386/i386-modes.def > +++ b/gcc/config/i386/i386-modes.def > @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */ > VECTOR_MODES (FLOAT, 32); /*V16HF V8SF V4DF */ > VECTOR_MODES (FLOAT, 64); /* V32HF V16SF V8DF */ > VECTOR_MODES (FLOAT, 128);/* V64HF V32SF V16DF */ The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix that? None of them create V*HF mode, but they do create V*TF mode. Jakub
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Thu, Nov 10, 2016 at 6:18 PM, Andrew Senkevichwrote: > 2016-11-10 19:36 GMT+03:00 Jakub Jelinek : >> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: >>> Hi, >>> >>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. >>> >>> It requires additional patch for register allocator from Vladimir >>> Makarov to be committed before. >> >> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of >> tabs), can you repost as attachment or configure your MUA not to do this? >> >> Just a couple of random nits follow: >> >>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps. >> >> This mentions an option that doesn't exist, is that s/dd// ? > > Yes. > Attached fixed version. A couple of questions and comments below. You are introducing flag2 ABI option flags. There are no tests for corresponding __target__ attribute, please add some tests, similar to gcc.target/i386/funcspec-?.c. These can be in a follow-up patch. Please add new option to g++.dg/other/i386-{2,3}.C tests. These are like gcc.target/i386/sse-{22,23}.c for c++. Also, I guess we want to support these new options with __builtin_cpu_supports. Please add this functionality in a follow-up patch. +(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS" + "Any EVEX encodable SSE register, which has number factor of four.") + No, we are extremely low on a single-letter constraints. We will use these for possible future new register sets. Use Yv or something similar instead. +//additional structure for isa flags Please use c comments throughout the patch. @@ -1465,11 +1472,14 @@ enum reg_class { 0x11,0x1fe0,0x0 }, /* FLOAT_INT_REGS */\ { 0x1ff100ff,0xffe0, 0x1f }, /* INT_SSE_REGS */ \ { 0x1ff1,0xffe0, 0x1f }, /* FLOAT_INT_SSE_REGS */\ - { 0x0, 0x0, 0x1fc0 }, /* MASK_EVEX_REGS */ \ + { 0x0, 0x0, 0x1fc0 }, /* MASK_EVEX_REGS */\ { 0x0, 0x0, 0x1fe0 }, /* MASK_REGS */ \ -{ 0x,0x,0x1 }\ +{ 0x1fe0,0xe000, 0x1f }, /* MOD4_SSE_REGS */ \ +{ 0x,0x,0x1 }\ } +/* { 0x0220,0x2000, 0x02 },*/ /* MOD4_SSE_REGS */ + Please remove commented out code. Also, please fix whitespace at the new entry. +mavx5124fmaps +Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX5124FMAPS built-in functions and code generation. + +mavx5124vnniw +Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVX512F and AVX5124VNNIW built-in functions and code generation. Too much "and"s in the description. --- a/gcc/genmodes.c +++ b/gcc/genmodes.c --- a/gcc/init-regs.c +++ b/gcc/init-regs.c --- a/gcc/machmode.h +++ b/gcc/machmode.h These are middle-end changes, you will need a separate review for these. The x86 part of the patch is OK with the above changes and additional target attribute test for flags2 ISA features.. Uros.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-10 20:14 GMT+03:00 Vladimir N Makarov: > > > On 11/10/2016 11:27 AM, Andrew Senkevich wrote: >> >> Hi, >> >> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. >> >> It requires additional patch for register allocator from Vladimir >> Makarov to be committed before. >> >> > I've just committed the necessary patch. Thanks, Vladimir. -- WBR, Andrew
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
2016-11-10 19:36 GMT+03:00 Jakub Jelinek: > On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: >> Hi, >> >> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. >> >> It requires additional patch for register allocator from Vladimir >> Makarov to be committed before. > > Your MUA ate tabs (and in the ChangeLog you're using spaces instead of > tabs), can you repost as attachment or configure your MUA not to do this? > > Just a couple of random nits follow: > >> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps. > > This mentions an option that doesn't exist, is that s/dd// ? Yes. Attached fixed version. -- WBR, Andrew new_avx512_instructions.patch Description: Binary data
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On 11/10/2016 11:27 AM, Andrew Senkevich wrote: Hi, this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. It requires additional patch for register allocator from Vladimir Makarov to be committed before. I've just committed the necessary patch.
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote: > Hi, > > this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. > > It requires additional patch for register allocator from Vladimir > Makarov to be committed before. Your MUA ate tabs (and in the ChangeLog you're using spaces instead of tabs), can you repost as attachment or configure your MUA not to do this? Just a couple of random nits follow: > * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps. This mentions an option that doesn't exist, is that s/dd// ? > * gcc.target/i386/sse-13.c: Ditto. > @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts, > { >opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET; >opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET; > + > + //turn off additional isa flags Comments start with capital letter, end with ., there should be space between // and T, better use /* ... */ style comment to match other comments in the file. > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET; > + opts->x_ix86_isa_flags2_explicit |= > OPTION_MASK_ISA_AVX5124FMAPS_UNSET; > + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET; > + opts->x_ix86_isa_flags2_explicit |= > OPTION_MASK_ISA_AVX5124VNNIW_UNSET; > + > } The formatting looks very weird. Jakub
[PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
Hi, this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions. It requires additional patch for register allocator from Vladimir Makarov to be committed before. gcc/ * common/config/i386/i386-common.c (OPTION_MASK_ISA_AVX5124FMAPS_SET, OPTION_MASK_ISA_AVX5124FMAPS_UNSET, OPTION_MASK_ISA_AVX5124VNNIW_SET, OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New. (ix86_handle_option): Handle OPT_mavx5124fmaps, OPT_mavx5124vnniw. * config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h. * config/i386/avx5124fmapsintrin.h: New file. * config/i386/avx5124vnniwintrin.h: Ditto. * config/i386/constraints.md (h): New constraint. * config/i386/cpuid.h: (bit_AVX5124VNNIW, bit_AVX5124FMAPS): New. * config/i386/driver-i386.c (host_detect_local_cpu): Detect avx5124fmaps, avx5124vnniw. * config/i386/i386-builtin-types.def: Add types V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI, V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF, V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF, V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI, V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI, V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI. * config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask, __builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss, __builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask, __builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss, __builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd, __builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds, __builtin_ia32_vp4dpwssds_mask): New. * config/i386/i386-c.c (ix86_target_macros_internal): Define __AVX5124FMAPS__, __AVX5124VNNIW__. * config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256), VECTOR_MODE (INT, SI, 64)): New modes. * config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps, -mavx5124vnniw. (PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define. (ix86_option_override_internal): Handle new options. (ix86_valid_target_attribute_inner_p): Add avx5124fmaps, avx5124vnniw. (ix86_expand_builtin): Handle new builtins. (ix86_additional_allocno_class_p): New. * config/i386/i386.h (TARGET_AVX5124FMAPS, TARGET_AVX5124FMAPS_P, TARGET_AVX5124VNNIW, TARGET_AVX5124VNNIW_P): Define. (reg_class): Add MOD4_SSE_REGS. (MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New. * config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw. * config/i386/immintrin.h: Include avx5124fmapsintrin.h, avx5124vnniwintrin.h. * config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD, UNSPEC_VP4FNMADD, UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS. (define_mode_iterator IMOD4): New. (define_mode_attr imod4_narrow): Ditto. (define_insn "mov"): Ditto. (define_insn "avx5124fmaddps_4fmaddps"): Ditto. (define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto. (define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto. (define_insn "avx5124fmaddps_4fmaddss"): Ditto. (define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto. (define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto. (define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto. (define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto. (define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto. (define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto. * init-regs.c (initialize_uninitialized_regs): Add emit_clobber call. * genmodes.c (mode_size_inline): Extend return type. * machmode.h (mode_size, mode_base_align): Extend type. gcc/testsuite/ * gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test. * gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto. * gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto. * gcc.target/i386/avx5124fmaps-check.h: Ditto. * gcc.target/i386/avx5124vnniw-check.h: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto. * gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.