Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-21 Thread Jakub Jelinek
On Mon, Nov 21, 2016 at 08:40:37PM +0300, Andrew Senkevich wrote:
> > FWIW, I came across the same error in my own testing and raised
> > bug 78451.
> 
> Can we fix it with the following patch? Regtesting in progress.
> 
> PR target/78451
> * gcc/config/i386/avx5124fmapsintrin.h: Avoid call to
> _mm512_setzero_ps.
> * gcc/config/i386/avx5124vnniwintrin.h: Ditto.

That is just a workaround, we want to fix the real bug.  I'll have a look
tomorrow.

Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-21 Thread Andrew Senkevich
2016-11-21 20:12 GMT+03:00 Martin Sebor :
> On 11/20/2016 11:16 AM, Uros Bizjak wrote:
>>
>> On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak  wrote:
>>>
>>> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek  wrote:

 On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>
> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
> compared to before r242569, on i686-linux there is still:
> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
> compared to pre-r242569 (so some further fix is needed).


 And finally here is yet another patch that fixes pr57756 on i686-linux.
 Ok for trunk together with the other 3 patches?
>>>
>>>
>>> OK for the whole patch series.
>>
>>
>> Hm, I still see (both, 32bit and 64bit targets):
>>
>> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
>>  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>>  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
>> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
>> '_mm512_maskz_4fmadd_ps':^M
>> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
>> failed in call to always_inline '_mm512_setzero_ps': target specific
>> option mismatch^M
>> In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
>>  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
>>  from
>> /home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
>> /ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
>> called from here^M
>> compiler exited with status 1
>> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
>> Excess errors:
>> /ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
>> failed in call to always_inline '_mm512_setzero_ps': target specific
>> option mismatch
>
>
> FWIW, I came across the same error in my own testing and raised
> bug 78451.

Can we fix it with the following patch? Regtesting in progress.

PR target/78451
* gcc/config/i386/avx5124fmapsintrin.h: Avoid call to
_mm512_setzero_ps.
* gcc/config/i386/avx5124vnniwintrin.h: Ditto.

diff --git a/gcc/config/i386/avx5124fmapsintrin.h
b/gcc/config/i386/avx5124fmapsintrin.h
index 6113ee9..dd9a322
--- a/gcc/config/i386/avx5124fmapsintrin.h
+++ b/gcc/config/i386/avx5124fmapsintrin.h
@@ -74,7 +74,9 @@ _mm512_maskz_4fmadd_ps (__mmask16 __U,
  (__v16sf) __E,
  (__v16sf) __A,
  (const __v4sf *) __F,
- (__v16sf) _mm512_setzero_ps (),
+ (__v16sf) {0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0},
  (__mmask16) __U);
 }

@@ -161,7 +163,9 @@ _mm512_maskz_4fnmadd_ps (__mmask16 __U,
  (__v16sf) __E,
  (__v16sf) __A,
  (const __v4sf *) __F,
- (__v16sf) _mm512_setzero_ps (),
+ (__v16sf) {0, 0, 0, 0,
+ 0, 0, 0, 0, 0, 0, 0, 0,
+ 0, 0, 0, 0},
  (__mmask16) __U);
 }

diff --git a/gcc/config/i386/avx5124vnniwintrin.h
b/gcc/config/i386/avx5124vnniwintrin.h
index 392c6a5..a4faa24
--- a/gcc/config/i386/avx5124vnniwintrin.h
+++ b/gcc/config/i386/avx5124vnniwintrin.h
@@ -75,7 +75,9 @@ _mm512_maskz_4dpwssd_epi32 (__mmask16 __U, __m512i
__A, __m512i __B,
   (__v16si) __E,
   (__v16si) __A,
   (const __v4si *) __F,
-  (__v16si) _mm512_setzero_ps (),
+  (__v16si) {0, 0, 0, 0,
+  0, 0, 0, 0, 0, 0, 0, 0,
+  0, 0, 0, 0},
   (__mmask16) __U);
 }

@@ -120,7 +122,9 @@ _mm512_maskz_4dpwssds_epi32 (__mmask16 __U,
__m512i __A, __m512i __B,
(__v16si) __E,
(__v16si) __A,
(const __v4si *) __F,
-   (__v16si) _mm512_setzero_ps (),
+   (__v16si) {0, 0, 0, 0,
+   0, 0, 0, 0, 0, 0, 0, 0,
+   0, 0, 0, 0},
(__mmask16) __U);
 }


--
WBR,
Andrew


sse-22a-fix.patch
Description: Binary data


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-21 Thread Martin Sebor

On 11/20/2016 11:16 AM, Uros Bizjak wrote:

On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak  wrote:

On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek  wrote:

On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:

On x86_64-linux with the 3 patches I'm not seeing any new FAILs
compared to before r242569, on i686-linux there is still:
+FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
+FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
compared to pre-r242569 (so some further fix is needed).


And finally here is yet another patch that fixes pr57756 on i686-linux.
Ok for trunk together with the other 3 patches?


OK for the whole patch series.


Hm, I still see (both, 32bit and 64bit targets):

In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
'_mm512_maskz_4fmadd_ps':^M
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch^M
In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
called from here^M
compiler exited with status 1
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
Excess errors:
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch


FWIW, I came across the same error in my own testing and raised
bug 78451.

Martin



Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-20 Thread Uros Bizjak
On Sat, Nov 19, 2016 at 7:52 PM, Uros Bizjak  wrote:
> On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek  wrote:
>> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>>> compared to before r242569, on i686-linux there is still:
>>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>>> compared to pre-r242569 (so some further fix is needed).
>>
>> And finally here is yet another patch that fixes pr57756 on i686-linux.
>> Ok for trunk together with the other 3 patches?
>
> OK for the whole patch series.

Hm, I still see (both, 32bit and 64bit targets):

In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:45:0,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h: In function
'_mm512_maskz_4fmadd_ps':^M
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch^M
In file included from /ssd/uros/gcc-build/gcc/include/immintrin.h:71:0,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22.c:223,^M
 from
/home/uros/gcc-svn/trunk/gcc/testsuite/gcc.target/i386/sse-22a.c:7:^M
/ssd/uros/gcc-build/gcc/include/avx5124fmapsintrin.h:77:17: note:
called from here^M
compiler exited with status 1
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
Excess errors:
/ssd/uros/gcc-build/gcc/include/avx512fintrin.h:244:1: error: inlining
failed in call to always_inline '_mm512_setzero_ps': target specific
option mismatch

Uros.


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Uros Bizjak
On Sat, Nov 19, 2016 at 6:24 PM, Jakub Jelinek  wrote:
> On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
>> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
>> compared to before r242569, on i686-linux there is still:
>> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
>> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
>> compared to pre-r242569 (so some further fix is needed).
>
> And finally here is yet another patch that fixes pr57756 on i686-linux.
> Ok for trunk together with the other 3 patches?

OK for the whole patch series.

Big thanks,
Uros.

>
> 2016-11-19  Jakub Jelinek  
>
> * config/i386/i386.c (ix86_can_inline_p): Use || instead of &
> when checking if callee's isa flags are subset of caller's isa flags.
> Fix comment wording.
>
> --- gcc/config/i386/i386.c.jj   2016-11-19 18:02:56.0 +0100
> +++ gcc/config/i386/i386.c  2016-11-19 18:21:23.649463040 +0100
> @@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal
>struct cl_target_option *caller_opts = TREE_TARGET_OPTION 
> (caller_tree);
>struct cl_target_option *callee_opts = TREE_TARGET_OPTION 
> (callee_tree);
>
> -  /* Callee's isa options should a subset of the caller's, i.e. a SSE4 
> function
> -can inline a SSE2 function but a SSE2 function can't inline a SSE4
> -function.  */
> +  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
> +function can inline a SSE2 function but a SSE2 function can't inline
> +a SSE4 function.  */
>if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
> - != callee_opts->x_ix86_isa_flags) &
> - ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> - != callee_opts->x_ix86_isa_flags2))
> +  != callee_opts->x_ix86_isa_flags)
> + || ((caller_opts->x_ix86_isa_flags2 & 
> callee_opts->x_ix86_isa_flags2)
> + != callee_opts->x_ix86_isa_flags2))
> ret = false;
>
>/* See if we have the same non-isa options.  */
>
>
> Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Jakub Jelinek
On Sat, Nov 19, 2016 at 12:28:22PM +0100, Jakub Jelinek wrote:
> On x86_64-linux with the 3 patches I'm not seeing any new FAILs
> compared to before r242569, on i686-linux there is still:
> +FAIL: gcc.target/i386/pr57756.c  (test for errors, line 6)
> +FAIL: gcc.target/i386/pr57756.c  (test for warnings, line 14)
> compared to pre-r242569 (so some further fix is needed).

And finally here is yet another patch that fixes pr57756 on i686-linux.
Ok for trunk together with the other 3 patches?

2016-11-19  Jakub Jelinek  

* config/i386/i386.c (ix86_can_inline_p): Use || instead of &
when checking if callee's isa flags are subset of caller's isa flags.
Fix comment wording.

--- gcc/config/i386/i386.c.jj   2016-11-19 18:02:56.0 +0100
+++ gcc/config/i386/i386.c  2016-11-19 18:21:23.649463040 +0100
@@ -6981,13 +6981,13 @@ ix86_can_inline_p (tree caller, tree cal
   struct cl_target_option *caller_opts = TREE_TARGET_OPTION (caller_tree);
   struct cl_target_option *callee_opts = TREE_TARGET_OPTION (callee_tree);
 
-  /* Callee's isa options should a subset of the caller's, i.e. a SSE4 
function
-can inline a SSE2 function but a SSE2 function can't inline a SSE4
-function.  */
+  /* Callee's isa options should be a subset of the caller's, i.e. a SSE4
+function can inline a SSE2 function but a SSE2 function can't inline
+a SSE4 function.  */
   if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
- != callee_opts->x_ix86_isa_flags) &
- ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
- != callee_opts->x_ix86_isa_flags2))
+  != callee_opts->x_ix86_isa_flags)
+ || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
+ != callee_opts->x_ix86_isa_flags2))
ret = false;
 
   /* See if we have the same non-isa options.  */


Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Andrew Senkevich
2016-11-19 13:17 GMT+03:00 Uros Bizjak :
> On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinek  wrote:
>> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
>>> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
>>> > I'm seeing lots of ICEs with this.
>>>
>>> Here is untested fix for that, will bootstrap/regtest it soon (after my
>>> current set of bootstraps finishes).
>>>
>>> 2016-11-18  Jakub Jelinek  
>>>
>>>   * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
>>>   don't initialize it, don't use it for the case where it isn't
>>>   provable %{z} nor using the same argument, instead move merge
>>>   argument into a new pseudo and use that as target.  Formatting fixes.
>>
>> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
>> fixed a couple of FAILs, but not tons of others.
>>
>> Here is another patch I'm going to test which fixes many other FAILs, but
>> still some are left:
>> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
>> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
>> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
>> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
>> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
>> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
>> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
>> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
>> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
>> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
>> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
>
> I wonder why patch submitter didn't get these failures during
> regtesting. There are plenty of tests (the above multi-vrsioning
> tests) that depend on correct handling of ISA variables. I assumed
> that these tests passed and consequently didn't went deep into the
> implementation, but rather requested a couple of additional tests that
> exercised added functionality.some more.

Completely my bad. Starting from addition last intrinsics testing gone wrong.
Will double check next time to avoid repeating in the future.

>> Will debug even those.

Thank you, Jakub.


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Jakub Jelinek
On Sat, Nov 19, 2016 at 11:17:55AM +0100, Uros Bizjak wrote:
> > Here is another patch I'm going to test which fixes many other FAILs, but
> > still some are left:
> > FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> > FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> > FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> > FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> > FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> > FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> > FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> > FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> > FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> > FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> > FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
> 
> I wonder why patch submitter didn't get these failures during
> regtesting. There are plenty of tests (the above multi-vrsioning
> tests) that depend on correct handling of ISA variables. I assumed
> that these tests passed and consequently didn't went deep into the
> implementation, but rather requested a couple of additional tests that
> exercised added functionality.some more.

Dunno, clearly the patch has not been tested at all, at least not in
the form that has been checked in.
I've now bootstrapped/regtested on x86_64-linux and i686-linux all these
3 patches:
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg01992.html
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02026.html
http://gcc.gnu.org/ml/gcc-patches/2016-11/msg02027.html
, e.g. on x86_64-linux they fix:
-FAIL: gcc.target/i386/avx-2.c (internal compiler error)
-FAIL: gcc.target/i386/avx-2.c (test for excess errors)
-FAIL: gcc.target/i386/avx2-gather-2.c scan-tree-dump-times vect "note: 
vectorized 1 loops in function" 16
-FAIL: gcc.target/i386/avx2-gather-6.c scan-tree-dump-times vect "note: 
vectorized 1 loops in function" 1
-FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times 
vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-ceil-sfix-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-ceil-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times 
vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-sfix-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-ceilf-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times 
vcvttpd2dq[^\\n]*zmm[0-9].{7}(?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-floor-sfix-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]*zmm[0-9](?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-floor-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times 
vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-sfix-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-floorf-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-gather-2.c scan-tree-dump-times vect "note: 
vectorized 1 loops in function" 16
-FAIL: gcc.target/i386/avx512f-gather-5.c scan-assembler-times 
gather[^\\n]*zmm[0-9]+{%k[1-7]}(?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times 
vcvtpd2dq[^\\n]+ymm[0-9](?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-rint-sfix-vec-2.c scan-assembler-times 
vinserti64x4[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-rintf-sfix-vec-2.c scan-assembler-times 
vcvtps2dq[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times 
vcvttpd2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-round-sfix-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 2
-FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times 
vcvttps2dq[^\\n]+zmm[0-9].{7}(?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-roundf-sfix-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-trunc-vec-2.c scan-assembler-times 
vrndscalepd[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/avx512f-truncf-vec-2.c scan-assembler-times 
vrndscaleps[^\\n]+zmm[0-9](?:\\n|[ t]+#) 1
-FAIL: gcc.target/i386/funcspec-8.c  (test for errors, line 104)
-FAIL: gcc.target/i386/funcspec-8.c  

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Uros Bizjak
On Sat, Nov 19, 2016 at 9:05 AM, Jakub Jelinek  wrote:
> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
>> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
>> > I'm seeing lots of ICEs with this.
>>
>> Here is untested fix for that, will bootstrap/regtest it soon (after my
>> current set of bootstraps finishes).
>>
>> 2016-11-18  Jakub Jelinek  
>>
>>   * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
>>   don't initialize it, don't use it for the case where it isn't
>>   provable %{z} nor using the same argument, instead move merge
>>   argument into a new pseudo and use that as target.  Formatting fixes.
>
> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
> fixed a couple of FAILs, but not tons of others.
>
> Here is another patch I'm going to test which fixes many other FAILs, but
> still some are left:
> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)

I wonder why patch submitter didn't get these failures during
regtesting. There are plenty of tests (the above multi-vrsioning
tests) that depend on correct handling of ISA variables. I assumed
that these tests passed and consequently didn't went deep into the
implementation, but rather requested a couple of additional tests that
exercised added functionality.some more.

> Will debug even those.

Thanks!

Uros.

> 2016-11-19  Jakub Jelinek  
>
> * config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2,
> ix86_add_new_builtins): Formatting fixes.
> (ix86_expand_builtin): Use || instead of && for isa vs. isa2.
> (ix86_get_builtin): Likewise.
>
> --- gcc/config/i386/i386.c.jj   2016-11-18 22:30:16.0 +0100
> +++ gcc/config/i386/i386.c  2016-11-19 08:37:45.748175866 +0100
> @@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c
>  means that *both* cpuid bits must be set for the built-in to be 
> available.
>  Handle this here.  */
>if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL)
> - mask &= ~OPTION_MASK_ISA_AVX512VL;
> +   mask &= ~OPTION_MASK_ISA_AVX512VL;
>
>mask &= ~OPTION_MASK_ISA_64BIT;
>if (mask == 0
> @@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c
>
>  static inline tree
>  def_builtin2 (HOST_WIDE_INT mask, const char *name,
> -enum ix86_builtin_func_type tcode,
> -enum ix86_builtins code)
> + enum ix86_builtin_func_type tcode,
> + enum ix86_builtins code)
>  {
>tree decl = NULL_TREE;
>
> @@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const
>tree type = ix86_get_builtin_func_type (tcode);
>decl = add_builtin_function (name, type, code, BUILT_IN_MD,
>NULL, NULL_TREE);
> - ix86_builtins[(int) code] = decl;
> - ix86_builtins_isa[(int) code].set_and_not_built_p = false;
> +  ix86_builtins[(int) code] = decl;
> +  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
>  }
>else
>  {
> @@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const
>
>  static inline tree
>  def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
> -  enum ix86_builtin_func_type tcode, enum ix86_builtins code)
> +   enum ix86_builtin_func_type tcode, enum ix86_builtins 
> code)
>  {
>tree decl = def_builtin2 (mask, name, tcode, code);
>if (decl)
> @@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask,
>  static void
>  ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
>  {
> -  if (((isa & deferred_isa_values) == 0)
> -  && ((isa2 & deferred_isa_values2) == 0))
> +  if ((isa & deferred_isa_values) == 0
> +  && (isa2 & deferred_isa_values2) == 0)
>  return;
>
>/* Bits in ISA value can be removed from potential isa values.  */
> @@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa
>
>for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
>  {
> -  if ix86_builtins_isa[i].isa & isa) != 0) || 
> ((ix86_builtins_isa[i].isa2 & isa2) 

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Jakub Jelinek
On Sat, Nov 19, 2016 at 09:05:05AM +0100, Jakub Jelinek wrote:
> On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
> > On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> > > I'm seeing lots of ICEs with this.
> > 
> > Here is untested fix for that, will bootstrap/regtest it soon (after my
> > current set of bootstraps finishes).
> > 
> > 2016-11-18  Jakub Jelinek  
> > 
> > * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
> > don't initialize it, don't use it for the case where it isn't
> > provable %{z} nor using the same argument, instead move merge
> > argument into a new pseudo and use that as target.  Formatting fixes.
> 
> Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
> fixed a couple of FAILs, but not tons of others.
> 
> Here is another patch I'm going to test which fixes many other FAILs, but
> still some are left:
> FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
> FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
> FAIL: gcc.target/i386/mvc1.c (internal compiler error)
> FAIL: gcc.target/i386/mvc1.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c (internal compiler error)
> FAIL: gcc.target/i386/mvc6.c (test for excess errors)
> FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
> FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
> FAIL: gcc.target/i386/mvc8.c (internal compiler error)
> FAIL: gcc.target/i386/mvc8.c (test for excess errors)
> FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
> FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
> FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
> FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
> FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
> Will debug even those.

The fix for that (still not bootstrapped/regtested) is below.
Will now bootstrap/regtest all 3 patches and hopefully all the 4fma*
introduced regressions will be gone.

2016-11-19  Jakub Jelinek  

* config/i386/i386.c (ix86_valid_target_attribute_tree): Don't
clear opts->x_ix86_isa_flags, clear opts->x_ix86_isa_flags2
instead and using = 0 instead of &= 0.

--- gcc/config/i386/i386.c.jj   2016-11-19 08:54:37.0 +0100
+++ gcc/config/i386/i386.c  2016-11-19 09:20:52.314913008 +0100
@@ -6845,7 +6845,7 @@ ix86_valid_target_attribute_tree (tree a
 | OPTION_MASK_ABI_64
 | OPTION_MASK_ABI_X32
 | OPTION_MASK_CODE16);
- opts->x_ix86_isa_flags &= 0;
+ opts->x_ix86_isa_flags2 = 0;
}
   else if (!orig_arch_specified)
opts->x_ix86_arch_string = NULL;


Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-19 Thread Jakub Jelinek
On Fri, Nov 18, 2016 at 09:30:06PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> > I'm seeing lots of ICEs with this.
> 
> Here is untested fix for that, will bootstrap/regtest it soon (after my
> current set of bootstraps finishes).
> 
> 2016-11-18  Jakub Jelinek  
> 
>   * config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
>   don't initialize it, don't use it for the case where it isn't
>   provable %{z} nor using the same argument, instead move merge
>   argument into a new pseudo and use that as target.  Formatting fixes.

Now successfully bootstrapped/regtested on x86_64-linux and i686-linux and
fixed a couple of FAILs, but not tons of others.

Here is another patch I'm going to test which fixes many other FAILs, but
still some are left:
FAIL: gcc.target/i386/funcspec-3.c (internal compiler error)
FAIL: gcc.target/i386/funcspec-3.c (test for excess errors)
FAIL: gcc.target/i386/mvc1.c (internal compiler error)
FAIL: gcc.target/i386/mvc1.c (test for excess errors)
FAIL: gcc.target/i386/mvc6.c (internal compiler error)
FAIL: gcc.target/i386/mvc6.c (test for excess errors)
FAIL: gcc.target/i386/mvc6.c scan-assembler vpshufb
FAIL: gcc.target/i386/mvc6.c scan-assembler punpcklbw
FAIL: gcc.target/i386/mvc8.c (internal compiler error)
FAIL: gcc.target/i386/mvc8.c (test for excess errors)
FAIL: gcc.target/i386/pr67995-2.c (internal compiler error)
FAIL: gcc.target/i386/pr67995-2.c (test for excess errors)
FAIL: gcc.target/i386/pr71652-3.c (internal compiler error)
FAIL: gcc.target/i386/pr71652-3.c  (test for errors, line 5)
FAIL: gcc.target/i386/pr71652-3.c (test for excess errors)
Will debug even those.

2016-11-19  Jakub Jelinek  

* config/i386/i386.c (def_builtin, def_builtin2, def_builtin_const2,
ix86_add_new_builtins): Formatting fixes.
(ix86_expand_builtin): Use || instead of && for isa vs. isa2.
(ix86_get_builtin): Likewise.

--- gcc/config/i386/i386.c.jj   2016-11-18 22:30:16.0 +0100
+++ gcc/config/i386/i386.c  2016-11-19 08:37:45.748175866 +0100
@@ -30924,7 +30924,7 @@ def_builtin (HOST_WIDE_INT mask, const c
 means that *both* cpuid bits must be set for the built-in to be 
available.
 Handle this here.  */
   if (mask & ix86_isa_flags & OPTION_MASK_ISA_AVX512VL)
- mask &= ~OPTION_MASK_ISA_AVX512VL;
+   mask &= ~OPTION_MASK_ISA_AVX512VL;
 
   mask &= ~OPTION_MASK_ISA_64BIT;
   if (mask == 0
@@ -30976,8 +30976,8 @@ def_builtin_const (HOST_WIDE_INT mask, c
 
 static inline tree
 def_builtin2 (HOST_WIDE_INT mask, const char *name,
-enum ix86_builtin_func_type tcode,
-enum ix86_builtins code)
+ enum ix86_builtin_func_type tcode,
+ enum ix86_builtins code)
 {
   tree decl = NULL_TREE;
 
@@ -30992,8 +30992,8 @@ def_builtin2 (HOST_WIDE_INT mask, const
   tree type = ix86_get_builtin_func_type (tcode);
   decl = add_builtin_function (name, type, code, BUILT_IN_MD,
   NULL, NULL_TREE);
- ix86_builtins[(int) code] = decl;
- ix86_builtins_isa[(int) code].set_and_not_built_p = false;
+  ix86_builtins[(int) code] = decl;
+  ix86_builtins_isa[(int) code].set_and_not_built_p = false;
 }
   else
 {
@@ -31016,7 +31016,7 @@ def_builtin2 (HOST_WIDE_INT mask, const
 
 static inline tree
 def_builtin_const2 (HOST_WIDE_INT mask, const char *name,
-  enum ix86_builtin_func_type tcode, enum ix86_builtins code)
+   enum ix86_builtin_func_type tcode, enum ix86_builtins code)
 {
   tree decl = def_builtin2 (mask, name, tcode, code);
   if (decl)
@@ -31034,8 +31034,8 @@ def_builtin_const2 (HOST_WIDE_INT mask,
 static void
 ix86_add_new_builtins (HOST_WIDE_INT isa, HOST_WIDE_INT isa2)
 {
-  if (((isa & deferred_isa_values) == 0)
-  && ((isa2 & deferred_isa_values2) == 0))
+  if ((isa & deferred_isa_values) == 0
+  && (isa2 & deferred_isa_values2) == 0)
 return;
 
   /* Bits in ISA value can be removed from potential isa values.  */
@@ -31048,7 +31048,8 @@ ix86_add_new_builtins (HOST_WIDE_INT isa
 
   for (i = 0; i < (int)IX86_BUILTIN_MAX; i++)
 {
-  if ix86_builtins_isa[i].isa & isa) != 0) || 
((ix86_builtins_isa[i].isa2 & isa2) != 0))
+  if (((ix86_builtins_isa[i].isa & isa) != 0
+  || (ix86_builtins_isa[i].isa2 & isa2) != 0)
  && ix86_builtins_isa[i].set_and_not_built_p)
{
  tree decl, type;
@@ -36549,7 +36550,7 @@ ix86_expand_builtin (tree exp, rtx targe
  whether it is supported.  */
   if ((ix86_builtins_isa[fcode].isa
&& !(ix86_builtins_isa[fcode].isa & ix86_isa_flags))
-  && (ix86_builtins_isa[fcode].isa2
+  || (ix86_builtins_isa[fcode].isa2
  && !(ix86_builtins_isa[fcode].isa2 & ix86_isa_flags2)))
 {
   char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa,
@@ -38514,7 

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-18 Thread Jakub Jelinek
On Fri, Nov 18, 2016 at 08:41:01PM +0100, Jakub Jelinek wrote:
> I'm seeing lots of ICEs with this.

Here is untested fix for that, will bootstrap/regtest it soon (after my
current set of bootstraps finishes).

2016-11-18  Jakub Jelinek  

* config/i386/i386.c (ix86_expand_builtin): Remove msk_mov variable,
don't initialize it, don't use it for the case where it isn't
provable %{z} nor using the same argument, instead move merge
argument into a new pseudo and use that as target.  Formatting fixes.

--- gcc/config/i386/i386.c.jj   2016-11-18 20:04:31.0 +0100
+++ gcc/config/i386/i386.c  2016-11-18 21:21:17.764190127 +0100
@@ -38220,14 +38220,12 @@ rdseed_step:
   rtx (*fcn) (rtx, rtx, rtx, rtx);
   rtx (*fcn_mask) (rtx, rtx, rtx, rtx, rtx);
   rtx (*fcn_maskz) (rtx, rtx, rtx, rtx, rtx, rtx);
-  rtx (*msk_mov) (rtx, rtx, rtx, rtx);
   int masked = 1;
   machine_mode mode, wide_mode, nar_mode;
 
   nar_mode  = V4SFmode;
   mode  = V16SFmode;
   wide_mode = V64SFmode;
-  msk_mov   = gen_avx512f_loadv16sf_mask;
   fcn_mask  = gen_avx5124fmaddps_4fmaddps_mask;
   fcn_maskz = gen_avx5124fmaddps_4fmaddps_maskz;
 
@@ -38270,7 +38268,6 @@ rdseed_step:
  wide_mode = V64SImode;
  fcn_mask  = gen_avx5124vnniw_vp4dpwssd_mask;
  fcn_maskz = gen_avx5124vnniw_vp4dpwssd_maskz;
- msk_mov   = gen_avx512f_loadv16si_mask;
  goto v4fma_expand;
 
case IX86_BUILTIN_4DPWSSDS_MASK:
@@ -38279,7 +38276,6 @@ rdseed_step:
  wide_mode = V64SImode;
  fcn_mask  = gen_avx5124vnniw_vp4dpwssds_mask;
  fcn_maskz = gen_avx5124vnniw_vp4dpwssds_maskz;
- msk_mov   = gen_avx512f_loadv16si_mask;
  goto v4fma_expand;
 
case IX86_BUILTIN_4FMAPS_MASK:
@@ -38295,11 +38291,11 @@ v4fma_expand:
wide_reg = gen_reg_rtx (wide_mode);
for (i = 0; i < 4; i++)
  {
-   args[i] = CALL_EXPR_ARG (exp, i);
+   args[i] = CALL_EXPR_ARG (exp, i);
ops[i] = expand_normal (args[i]);
 
-   emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, (i) * 64),
- ops[i]);
+   emit_move_insn (gen_rtx_SUBREG (mode, wide_reg, i * 64),
+   ops[i]);
  }
 
accum = expand_normal (CALL_EXPR_ARG (exp, 4));
@@ -38318,7 +38314,7 @@ v4fma_expand:
  emit_insn (fcn (target, accum, wide_reg, mem));
else
  {
-   rtx merge, mask;
+   rtx merge, mask;
merge = expand_normal (CALL_EXPR_ARG (exp, 6));
 
mask = expand_normal (CALL_EXPR_ARG (exp, 7));
@@ -38340,18 +38336,16 @@ v4fma_expand:
merge = force_reg (mode, merge);
emit_insn (fcn_mask (target, wide_reg, mem, merge, mask));
  }
-   /* Merge with something unknown might happen if we z-mask w/ 
-O0.  */
+   /* Merge with something unknown might happen if we z-mask w/ 
-O0.  */
else
  {
-   rtx tmp = target;
-   emit_insn (fcn_mask (tmp, wide_reg, mem, tmp, mask));
-
-   target = force_reg (mode, merge);
-   emit_insn (msk_mov (target, tmp, target, mask));
+   target = gen_reg_rtx (mode);
+   emit_move_insn (target, merge);
+   emit_insn (fcn_mask (target, wide_reg, mem, target, mask));
  }
  }
- return target;
-   }
+   return target;
+ }
 
case IX86_BUILTIN_4FNMASS:
  fcn = gen_avx5124fmaddps_4fnmaddss;
@@ -38366,7 +38360,6 @@ v4fma_expand:
case IX86_BUILTIN_4FNMASS_MASK:
  fcn_mask = gen_avx5124fmaddps_4fnmaddss_mask;
  fcn_maskz = gen_avx5124fmaddps_4fnmaddss_maskz;
- msk_mov   = gen_avx512vl_loadv4sf_mask;
  goto s4fma_expand;
 
case IX86_BUILTIN_4FMASS_MASK:
@@ -38380,22 +38373,21 @@ v4fma_expand:
 
fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
-   msk_mov   = gen_avx512vl_loadv4sf_mask;
 
 s4fma_expand:
mode = V4SFmode;
wide_reg = gen_reg_rtx (V64SFmode);
for (i = 0; i < 4; i++)
  {
-rtx tmp;
-args[i] = CALL_EXPR_ARG (exp, i);
-ops[i] = expand_normal (args[i]);
+   rtx tmp;
+   args[i] = CALL_EXPR_ARG (exp, i);
+   ops[i] = expand_normal (args[i]);
 
-tmp = gen_reg_rtx (SFmode);
-emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
+   tmp = gen_reg_rtx (SFmode);
+   emit_move_insn (tmp, gen_rtx_SUBREG (SFmode, ops[i], 0));
 
-emit_move_insn 

Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-18 Thread Jakub Jelinek
Hi!

On Thu, Nov 17, 2016 at 02:18:57PM -0800, H.J. Lu wrote:
> > Hi HJ, could you please commit it?
> 
> Done.

I'm seeing lots of ICEs with this.

E.g. reduced:

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));
typedef unsigned char __mmask8;
typedef float __v4sf __attribute__ ((__vector_size__ (16)));

static inline  __m128 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
_mm_setzero_ps (void)
{
  return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f };
}

 __m128
foo (__mmask8 __U, __m128 __A, __m128 __B, __m128 __C, __m128 __D, __m128 __E, 
__m128 *__F)
{
  return (__m128) __builtin_ia32_4fmaddss_mask ((__v4sf) __B,
  (__v4sf) __C,
  (__v4sf) __D,
  (__v4sf) __E,
  (__v4sf) __A,
  (const __v4sf *) __F,
  (__v4sf) _mm_setzero_ps (),
  (__mmask8) __U);
}

ICEs with -mavx5124fmaps -O0, but succeeds with
-mavx512vl -mavx5124fmaps -O0 or -mavx5124fmaps -O2.

fcn_mask = gen_avx5124fmaddps_4fmaddss_mask;
fcn_maskz = gen_avx5124fmaddps_4fmaddss_maskz;
msk_mov   = gen_avx512vl_loadv4sf_mask;

looks wrong, while -mavx5124fmaps implies -mavx512f, it doesn't
imply -mavx512vl, so using -mavx512vl insns unconditionally is just wrong.
You need some fallback if avx512vl isn't available, perhaps use
avx512f 512-bit masked insns with bits in the mask forced to pick only the
ones you want?

Also, seems there are various formatting issues in the change,
e.g. shortly after s4fma_expand: there is indentation by 3 chars relative to
above { instead of 2, gen_rtx_SUBREG (V16SFmode, tmp, 0)); has extra 1 char
indentation, some lines too long.

Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-17 Thread H.J. Lu
On Thu, Nov 17, 2016 at 4:20 AM, Andrew Senkevich
 wrote:
> 16 Ноя 2016 г. 19:21 пользователь "Bernd Schmidt" 
> написал:
>
>
>>
>> On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
>>>
>>> 2016-11-15 17:56 GMT+03:00 Jeff Law :

 On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>
>
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>>
>>
>> --- a/gcc/genmodes.c
>> +++ b/gcc/genmodes.c
>> --- a/gcc/init-regs.c
>> +++ b/gcc/init-regs.c
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>>
>> These are middle-end changes, you will need a separate review for
>> these.
>
>
>
> Who could review these changes?


 I can.  I likely dropped the message because it looked x86 specific, so
 if
 you could resend it'd be appreciated.
>>>
>>>
>>> Attached (diff with previous only in fixed comments typos).
>>
>>
>> Next time please split middle-end changes out from target-related stuff
>> and send them separately.
>>
>> These ones are OK.
>>
>>
>> Bernd
>
> Hi HJ, could you please commit it?

Done.

-- 
H.J.


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-16 Thread Andrew Senkevich
2016-11-16 19:21 GMT+03:00 Bernd Schmidt :
> On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
>>
>> 2016-11-15 17:56 GMT+03:00 Jeff Law :
>>>
>>> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:


 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>
>
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> --- a/gcc/init-regs.c
> +++ b/gcc/init-regs.c
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
>
> These are middle-end changes, you will need a separate review for
> these.



 Who could review these changes?
>>>
>>>
>>> I can.  I likely dropped the message because it looked x86 specific, so
>>> if
>>> you could resend it'd be appreciated.
>>
>>
>> Attached (diff with previous only in fixed comments typos).
>
>
> Next time please split middle-end changes out from target-related stuff and
> send them separately.

Ok.

> These ones are OK.
>
>
> Bernd

Thanks!

Who could commit it?


--
WBR,
Andrew


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-16 Thread Bernd Schmidt

On 11/15/2016 05:31 PM, Andrew Senkevich wrote:

2016-11-15 17:56 GMT+03:00 Jeff Law :

On 11/15/2016 05:55 AM, Andrew Senkevich wrote:


2016-11-11 14:16 GMT+03:00 Uros Bizjak :


--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.



Who could review these changes?


I can.  I likely dropped the message because it looked x86 specific, so if
you could resend it'd be appreciated.


Attached (diff with previous only in fixed comments typos).


Next time please split middle-end changes out from target-related stuff 
and send them separately.


These ones are OK.


Bernd


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Andrew Senkevich
2016-11-15 17:56 GMT+03:00 Jeff Law :
> On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>>
>> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>>>
>>> --- a/gcc/genmodes.c
>>> +++ b/gcc/genmodes.c
>>> --- a/gcc/init-regs.c
>>> +++ b/gcc/init-regs.c
>>> --- a/gcc/machmode.h
>>> +++ b/gcc/machmode.h
>>>
>>> These are middle-end changes, you will need a separate review for these.
>>
>>
>> Who could review these changes?
>
> I can.  I likely dropped the message because it looked x86 specific, so if
> you could resend it'd be appreciated.

Attached (diff with previous only in fixed comments typos).


--
WBR,
Andrew


new_avx512_instructions_15.11.patch
Description: Binary data


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Jeff Law

On 11/15/2016 05:55 AM, Andrew Senkevich wrote:

2016-11-11 14:16 GMT+03:00 Uros Bizjak :

--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.


Who could review these changes?
I can.  I likely dropped the message because it looked x86 specific, so 
if you could resend it'd be appreciated.


jeff


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Andrew Senkevich
2016-11-11 14:16 GMT+03:00 Uros Bizjak :
> --- a/gcc/genmodes.c
> +++ b/gcc/genmodes.c
> --- a/gcc/init-regs.c
> +++ b/gcc/init-regs.c
> --- a/gcc/machmode.h
> +++ b/gcc/machmode.h
>
> These are middle-end changes, you will need a separate review for these.

Who could review these changes?


--
WBR,
Andrew


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-15 Thread Uros Bizjak
On Mon, Nov 14, 2016 at 7:28 PM, Andrew Senkevich
 wrote:
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>> The x86 part of the patch is OK with the above changes and additional
>> target attribute test for flags2 ISA features..
>
> Fixed according your comments, I will followup with additional tests soon.

OK.

Thanks,
Uros.


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-14 Thread Andrew Senkevich
2016-11-11 14:29 GMT+03:00 Jakub Jelinek :
> Hi!
>
> I've noticed preexisting:
>
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>
>> --- a/gcc/config/i386/i386-modes.def
>> +++ b/gcc/config/i386/i386-modes.def
>> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */
>>  VECTOR_MODES (FLOAT, 32); /*V16HF V8SF V4DF */
>>  VECTOR_MODES (FLOAT, 64); /*   V32HF V16SF V8DF */
>>  VECTOR_MODES (FLOAT, 128);/*  V64HF V32SF V16DF */
>
> The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix
> that?  None of them create V*HF mode, but they do create V*TF mode.

I have fixed it in new patch.


--
WBR,
Andrew


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-14 Thread Andrew Senkevich
2016-11-11 14:16 GMT+03:00 Uros Bizjak :
> The x86 part of the patch is OK with the above changes and additional
> target attribute test for flags2 ISA features..

Fixed according your comments, I will followup with additional tests soon.


--
WBR,
Andrew


new_avx512_instructions_14.11.patch
Description: Binary data


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-11 Thread Jakub Jelinek
Hi!

I've noticed preexisting:

On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:

> --- a/gcc/config/i386/i386-modes.def
> +++ b/gcc/config/i386/i386-modes.def
> @@ -84,6 +84,7 @@ VECTOR_MODES (FLOAT, 16); /* V8HF V4SF V2DF */
>  VECTOR_MODES (FLOAT, 32); /*V16HF V8SF V4DF */
>  VECTOR_MODES (FLOAT, 64); /*   V32HF V16SF V8DF */
>  VECTOR_MODES (FLOAT, 128);/*  V64HF V32SF V16DF */

The VECTOR_MODES (FLOAT, comments don't really match reality, shall we fix
that?  None of them create V*HF mode, but they do create V*TF mode.

Jakub


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-11 Thread Uros Bizjak
On Thu, Nov 10, 2016 at 6:18 PM, Andrew Senkevich
 wrote:
> 2016-11-10 19:36 GMT+03:00 Jakub Jelinek :
>> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>>> Hi,
>>>
>>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>>
>>> It requires additional patch for register allocator from Vladimir
>>> Makarov to be committed before.
>>
>> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
>> tabs), can you repost as attachment or configure your MUA not to do this?
>>
>> Just a couple of random nits follow:
>>
>>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>>
>> This mentions an option that doesn't exist, is that s/dd// ?
>
> Yes.
> Attached fixed version.

A couple of questions and comments below.

You are introducing flag2 ABI option flags. There are no tests for
corresponding __target__ attribute, please add some tests, similar to
gcc.target/i386/funcspec-?.c. These can be in a follow-up patch.

Please add new option to g++.dg/other/i386-{2,3}.C tests. These are
like gcc.target/i386/sse-{22,23}.c for c++.

Also, I guess we want to support these new options with
__builtin_cpu_supports. Please add this functionality in a follow-up
patch.

+(define_register_constraint "h" "TARGET_AVX512F ? MOD4_SSE_REGS : NO_REGS"
+ "Any EVEX encodable SSE register, which has number factor of four.")
+
No, we are extremely low on a single-letter constraints. We will use
these for possible future new register sets. Use Yv or something
similar instead.

+//additional structure for isa flags

Please use c comments throughout the patch.

@@ -1465,11 +1472,14 @@ enum reg_class
 {   0x11,0x1fe0,0x0 },   /* FLOAT_INT_REGS */\
 { 0x1ff100ff,0xffe0,   0x1f },   /* INT_SSE_REGS */  \
 { 0x1ff1,0xffe0,   0x1f },   /* FLOAT_INT_SSE_REGS */\
-   { 0x0,   0x0, 0x1fc0 },   /* MASK_EVEX_REGS */   \
+   { 0x0,   0x0, 0x1fc0 },   /* MASK_EVEX_REGS */\
{ 0x0,   0x0, 0x1fe0 },   /* MASK_REGS */ \
-{ 0x,0x,0x1 }\
+{ 0x1fe0,0xe000,   0x1f },   /* MOD4_SSE_REGS */ \
+{ 0x,0x,0x1 }\
 }

+/* { 0x0220,0x2000,   0x02 },*/   /* MOD4_SSE_REGS */
+

Please remove commented out code. Also, please fix whitespace at the new entry.

+mavx5124fmaps
+Target Report Mask(ISA_AVX5124FMAPS) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124FMAPS built-in functions and code generation.
+
+mavx5124vnniw
+Target Report Mask(ISA_AVX5124VNNIW) Var(ix86_isa_flags2) Save
+Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and
AVX512F and AVX5124VNNIW built-in functions and code generation.

Too much "and"s in the description.

--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
--- a/gcc/init-regs.c
+++ b/gcc/init-regs.c
--- a/gcc/machmode.h
+++ b/gcc/machmode.h

These are middle-end changes, you will need a separate review for these.

The x86 part of the patch is OK with the above changes and additional
target attribute test for flags2 ISA features..

Uros.


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 20:14 GMT+03:00 Vladimir N Makarov :
>
>
> On 11/10/2016 11:27 AM, Andrew Senkevich wrote:
>>
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>>
>>
> I've just committed the necessary patch.

Thanks, Vladimir.


--
WBR,
Andrew


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
2016-11-10 19:36 GMT+03:00 Jakub Jelinek :
> On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
>> Hi,
>>
>> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
>>
>> It requires additional patch for register allocator from Vladimir
>> Makarov to be committed before.
>
> Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
> tabs), can you repost as attachment or configure your MUA not to do this?
>
> Just a couple of random nits follow:
>
>> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.
>
> This mentions an option that doesn't exist, is that s/dd// ?

Yes.
Attached fixed version.


--
WBR,
Andrew


new_avx512_instructions.patch
Description: Binary data


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Vladimir N Makarov



On 11/10/2016 11:27 AM, Andrew Senkevich wrote:

Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.



I've just committed the necessary patch.



Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Jakub Jelinek
On Thu, Nov 10, 2016 at 07:27:00PM +0300, Andrew Senkevich wrote:
> Hi,
> 
> this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.
> 
> It requires additional patch for register allocator from Vladimir
> Makarov to be committed before.

Your MUA ate tabs (and in the ChangeLog you're using spaces instead of
tabs), can you repost as attachment or configure your MUA not to do this?

Just a couple of random nits follow:

> * gcc.target/i386/sse-12.c: Add -mavx5124fmaddps.

This mentions an option that doesn't exist, is that s/dd// ?

> * gcc.target/i386/sse-13.c: Ditto.

> @@ -399,6 +403,13 @@ ix86_handle_option (struct gcc_options *opts,
>   {
>opts->x_ix86_isa_flags &= ~OPTION_MASK_ISA_AVX512F_UNSET;
>opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX512F_UNSET;
> +
> +  //turn off additional isa flags

Comments start with capital letter, end with ., there should be space
between // and T, better use /* ... */ style comment to match other
comments in the file.

> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124FMAPS_UNSET;
> +  opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +  opts->x_ix86_isa_flags2_explicit |=
> OPTION_MASK_ISA_AVX5124VNNIW_UNSET;
> +
>   }

The formatting looks very weird.

Jakub


[PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-10 Thread Andrew Senkevich
Hi,

this patch enabled AVX512_4FMAPS and AVX512_4VNNIW instructions.

It requires additional patch for register allocator from Vladimir
Makarov to be committed before.

gcc/
* common/config/i386/i386-common.c
(OPTION_MASK_ISA_AVX5124FMAPS_SET,
OPTION_MASK_ISA_AVX5124FMAPS_UNSET,
OPTION_MASK_ISA_AVX5124VNNIW_SET,
OPTION_MASK_ISA_AVX5124VNNIW_UNSET): New.
(ix86_handle_option): Handle OPT_mavx5124fmaps,
OPT_mavx5124vnniw.
* config.gcc: Add avx5124fmapsintrin.h, avx5124vnniwintrin.h.
* config/i386/avx5124fmapsintrin.h: New file.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/constraints.md (h): New constraint.
* config/i386/cpuid.h: (bit_AVX5124VNNIW,
bit_AVX5124FMAPS): New.
* config/i386/driver-i386.c (host_detect_local_cpu):
Detect avx5124fmaps, avx5124vnniw.
* config/i386/i386-builtin-types.def: Add types
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF_V16SF_UHI,
V16SF_FTYPE_V16SF_V16SF_V16SF_V16SF_V16SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF,
V4SF_FTYPE_V4SF_V4SF_V4SF_V4SF_V4SF_PCV4SF_V4SF_UQI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI,
V16SI_FTYPE_V16SI_V16SI_V16SI_V16SI_V16SI_PCV4SI_V16SI_UHI.
* config/i386/i386-builtin.def (__builtin_ia32_4fmaddps_mask,
__builtin_ia32_4fmaddps, __builtin_ia32_4fmaddss,
__builtin_ia32_4fmaddss_mask, __builtin_ia32_4fnmaddps_mask,
__builtin_ia32_4fnmaddps, __builtin_ia32_4fnmaddss,
__builtin_ia32_4fnmaddss_mask, __builtin_ia32_vp4dpwssd,
__builtin_ia32_vp4dpwssd_mask, __builtin_ia32_vp4dpwssds,
__builtin_ia32_vp4dpwssds_mask): New.
* config/i386/i386-c.c (ix86_target_macros_internal):
Define __AVX5124FMAPS__, __AVX5124VNNIW__.
* config/i386/i386-modes.def (VECTOR_MODES (FLOAT, 256),
VECTOR_MODE (INT, SI, 64)): New modes.
* config/i386/i386.c (ix86_target_string): Add -mavx5124fmaps,
-mavx5124vnniw.
(PTA_AVX5124FMAPS, PTA_AVX5124VNNIW): Define.
(ix86_option_override_internal): Handle new options.
(ix86_valid_target_attribute_inner_p): Add avx5124fmaps,
avx5124vnniw.
(ix86_expand_builtin): Handle new builtins.
(ix86_additional_allocno_class_p): New.
* config/i386/i386.h (TARGET_AVX5124FMAPS,
TARGET_AVX5124FMAPS_P,
TARGET_AVX5124VNNIW,
TARGET_AVX5124VNNIW_P): Define.
(reg_class): Add MOD4_SSE_REGS.
(MOD4_SSE_REG_P, MOD4_SSE_REGNO_P): New.
* config/i386/i386.opt: Add mavx5124fmaps, mavx5124vnniw.
* config/i386/immintrin.h: Include avx5124fmapsintrin.h,
avx5124vnniwintrin.h.
* config/i386/sse.md (unspec): Add UNSPEC_VP4FMADD,
UNSPEC_VP4FNMADD,
UNSPEC_VP4DPWSSD, UNSPEC_VP4DPWSSDS.
(define_mode_iterator IMOD4): New.
(define_mode_attr imod4_narrow): Ditto.
(define_insn "mov"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fmaddss_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddps_maskz"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_mask"): Ditto.
(define_insn "avx5124fmaddps_4fnmaddss_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssd_maskz"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_mask"): Ditto.
(define_insn "avx5124vnniw_vp4dpwssds_maskz"): Ditto.
* init-regs.c (initialize_uninitialized_regs): Add emit_clobber call.
* genmodes.c (mode_size_inline): Extend return type.
* machmode.h (mode_size, mode_base_align): Extend type.
gcc/testsuite/
* gcc.target/i386/avx5124fmadd-v4fmaddps-1.c: New test.
* gcc.target/i386/avx5124fmadd-v4fmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-1.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddps-2.c: Ditto.
* gcc.target/i386/avx5124fmadd-v4fnmaddss-1.c: Ditto.
* gcc.target/i386/avx5124fmaps-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-check.h: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-1.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssd-2.c: Ditto.
* gcc.target/i386/avx5124vnniw-vp4dpwssds-1.c: Ditto.