from:"Allan Sandfeld Jensen"

Re: issue with behavior change of gcc -r between gcc-8 and gcc-9

2020-04-02 Thread Allan Sandfeld Jensen

On Wednesday, 1 April 2020 19:48:11 CEST Olivier Hainque wrote:
> 
> -r 's business was to arrange for the linker not to
> complain because the closure is incomplete, leaving us
> with complete control of the closure.
> 
> It doesn't seem to me there was a really strong motivation
> to suddenly have -r influence the closure the way it now does.
> 
> Would it be possible to revert to the previous behavior
> and document it ?
> 
> Or maybe allow it to be controllable by the target ports ?
> 
> Or provide something to bring back the flexibility we had
> if we really believe the default should change ? (I'm not
> convinced)

-r is used for relinking. The idea behind the change was to make it directly 
suitable for that. It takes object files and relinks them into a new object 
file. It gives the caller complete control.

It sounds like you are missing some way to add startfiles? A reverse of 
-nostartfiles?

But hopefully you can just use the linker directly? Unless you have LTO 
enabled object files you dont need the compiler to link.

`Allan

Re: [PATCH] Come up with -flto=auto option.

2019-07-24 Thread Allan Sandfeld Jensen

On Mittwoch, 24. Juli 2019 08:45:21 CEST Martin Liška wrote:
> On 7/24/19 12:11 AM, Allan Sandfeld Jensen wrote:
> > On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> >> Hi.
> >> 
> >> As we as openSUSE started using -flto, I see it very handy to have
> >> an option value that will automatically detect number of cores
> >> that can be used for parallel LTRANS phase.
> >> 
> >> Thoughts?
> > 
> > That's really nice.
> > 
> > How much extra work would it be to make it support a posix make jobserver?
> 
> We do support it via -flto=jobserver:
> 
Good to know :)

> 
> Problem is that nowadays you how much more common make systems like ninja,
> meson and others that probably do not support that.
> 
There are patches to enable it in ninja, and I know some Linux distros apply 
the patches by default. Though that is more listening, so it probably requires 
launching ninja using make, if you want to be able to pass it own to gcc.

'Allan

Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Allan Sandfeld Jensen

On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> Hi.
> 
> As we as openSUSE started using -flto, I see it very handy to have
> an option value that will automatically detect number of cores
> that can be used for parallel LTRANS phase.
> 
> Thoughts?
> 
That's really nice. 

How much extra work would it be to make it support a posix make jobserver? 

As far as I understand, you would need to guess a partition size first (as 
your patch here does), but then only start each job when given a token from 
the jobserver FD.

With that the integration to existing build infrastructure would be optimal.

Cheers
'Allan

[Patch] Make x86-64 a generic architecture as documented

2019-07-13 Thread Allan Sandfeld Jensen

Hello

Changing -march=x86-64 behaviour to match documentation and common usage.

Question: Is this also good for -m32 -march=x86-64 or will generic then mean 
something else?

Thank
Allan

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 364466b6b6f..90f9cbc3c35 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2019-07-13  Allan Sandfeld Jensen 
+
+   * gcc/common/config/i386/i386-common.c (processor_alias_table): Change
+   x86-64 architecture to use generic tuning and scheduling instead of 
K8.
+
 2019-07-11  Jakub Jelinek  

PR target/91124
diff --git a/gcc/common/config/i386/i386-common.c b/gcc/common/config/i386/
i386-common.c
index a394f874fe4..19ace226190 100644
--- a/gcc/common/config/i386/i386-common.c
+++ b/gcc/common/config/i386/i386-common.c
@@ -1673,7 +1673,7 @@ const pta processor_alias_table[] =
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR},
   {"athlon-mp", PROCESSOR_ATHLON, CPU_ATHLON,
 PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE | PTA_FXSR},
-  {"x86-64", PROCESSOR_K8, CPU_K8,
+  {"x86-64", PROCESSOR_GENERIC, CPU_GENERIC,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR},
   {"eden-x2", PROCESSOR_K8, CPU_K8,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_FXSR},

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-09-01 Thread Allan Sandfeld Jensen

On Montag, 27. August 2018 15:37:15 CEST Joseph Myers wrote:
> On Sun, 26 Aug 2018, Allan Sandfeld Jensen wrote:
> > Patch updated. I specifically edited a number of the existing tests that
> > used both -r and -nostdlib and removed -nostdlib so the patch is
> > exercised by existing tests. The patch bootstrapped, I didn't notice any
> > relevant failures when running the test suite (though I could have missed
> > something, I am never comfortable reading that output).
> 
> Note that Iain's comments also included that the patch is incomplete
> because of more specs in gcc.c (VTABLE_VERIFICATION_SPEC,
> SANITIZER_EARLY_SPEC, SANITIZER_SPEC) that needs corresponding updates to
> handle -r like -nostdlib.

Updated (but tests not rerun)>From 1d164bced7979c94767c260174e3c486d4fc8c5d Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sat, 1 Sep 2018 12:59:14 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-09-01 Allan Sandfeld Jensen 

gcc/doc/
* invoke.texi: Document -r.

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
(VTABLE_VERIFICATION_SPEC): Ditto
(SANITIZER_EARLY_SPEC): Ditto
(SANITIZER_SPEC): Ditto
* config/darwin.h (LINK_COMMAND_SPEC): Ditto
* cp/g++spec.c (lang_specific_driver): Ditto
* fortran/gfortranspec.c (lang_specific_driver): Ditto
* go/gospec.c (lang_specific_driver): Ditto

gcc/testsuite/
* g++.dg/ipa/pr64059.C: Removed now redundant -nostdlib.
* g++.dg/lto/20081109-1_0.C: Ditto
* g++.dg/lto/20090302_0.C: Ditto
* g++.dg/lto/pr45621_0.C: Ditto
* g++.dg/lto/pr60567_0.C: Ditto
* g++.dg/lto/pr62026.C: Ditto
* gcc.dg/lto/pr45736_0.c: Ditto
* gcc.dg/lto/pr52634_0.c: Ditto
* gfortran.dg/lto/20091016-1_0.f90: Ditto
* gfortran.dg/lto/pr79108_0.f90: Ditto
---
 gcc/config/darwin.h|  8 
 gcc/cp/g++spec.c   |  1 +
 gcc/doc/invoke.texi|  7 ++-
 gcc/fortran/gfortranspec.c |  1 +
 gcc/gcc.c  | 18 +-
 gcc/go/gospec.c|  1 +
 gcc/testsuite/g++.dg/ipa/pr64059.C |  2 +-
 gcc/testsuite/g++.dg/lto/20081109-1_0.C|  2 +-
 gcc/testsuite/g++.dg/lto/20090302_0.C  |  2 +-
 gcc/testsuite/g++.dg/lto/pr45621_0.C   |  2 +-
 gcc/testsuite/g++.dg/lto/pr60567_0.C   |  2 +-
 gcc/testsuite/g++.dg/lto/pr62026.C |  2 +-
 gcc/testsuite/gcc.dg/lto/pr45736_0.c   |  2 +-
 gcc/testsuite/gcc.dg/lto/pr52634_0.c   |  2 +-
 gcc/testsuite/gfortran.dg/lto/20091016-1_0.f90 |  2 +-
 gcc/testsuite/gfortran.dg/lto/pr79108_0.f90|  2 +-
 16 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index cd6d6521658..87f610259c0 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -180,20 +180,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-09-01 Thread Allan Sandfeld Jensen

On Montag, 27. August 2018 15:37:15 CEST Joseph Myers wrote:
> On Sun, 26 Aug 2018, Allan Sandfeld Jensen wrote:
> > Patch updated. I specifically edited a number of the existing tests that
> > used both -r and -nostdlib and removed -nostdlib so the patch is
> > exercised by existing tests. The patch bootstrapped, I didn't notice any
> > relevant failures when running the test suite (though I could have missed
> > something, I am never comfortable reading that output).
> 
> Note that Iain's comments also included that the patch is incomplete
> because of more specs in gcc.c (VTABLE_VERIFICATION_SPEC,
> SANITIZER_EARLY_SPEC, SANITIZER_SPEC) that needs corresponding updates to
> handle -r like -nostdlib.

Okay, I can add that, or whoever commits the patch can add that. We can also 
improve the feature if we discover more places that needs updating. Do you 
want me to post an version updated with with these two places? 

'Allan

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-26 Thread Allan Sandfeld Jensen

On Donnerstag, 23. August 2018 23:24:02 CEST Joseph Myers wrote:
> On Thu, 23 Aug 2018, Iain Sandoe wrote:
> > Joseph: As a side-comment, is there a reason that we don’t exclude
> > gomp/itm/fortran/gcov from the link for -nostdlib / -nodefaultlib?
> > 
> > If we are relying on the lib self-specs for this, then we’re not
> > succeeding since the one we build at the moment don’t include those
> > clauses.
> 
> Well, fortran/gfortranspec.c for example has
> 
> case OPT_nostdlib:
> case OPT_nodefaultlibs:
> case OPT_c:
> case OPT_S:
> case OPT_fsyntax_only:
> case OPT_E:
>   /* These options disable linking entirely or linking of the
>  standard libraries.  */
>   library = 0;
>   break;
> 
> and only uses libgfortran.spec if (library).  So it's certainly meant to
> avoid linking with libgfortran or its dependencies if -nostdlib.

Patch updated. I specifically edited a number of the existing tests that used 
both -r and -nostdlib and removed -nostdlib so the patch is exercised by 
existing tests. The patch bootstrapped, I didn't notice any relevant failures 
when running the test suite (though I could have missed something, I am never
comfortable reading that output).

'Allan

>From 07ed41a9afd107c5d45feb1ead7a74ca735a1bb2 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sun, 26 Aug 2018 20:02:54 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-08-26 Allan Sandfeld Jensen 

gcc/doc/
* invoke.texi: Document -r.

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
* config/darwin.h (LINK_COMMAND_SPEC): Handle -r like -nostdlib.
* cp/g++spec.c (lang_specific_driver): Handle -r like -nostdlib.
* fortran/gfortranspec.c (lang_specific_driver): Handle -r like -nostdlib.
* go/gospec.c (lang_specific_driver): Handle -r like -nostdlib.

gcc/testsuite/
* g++.dg/ipa/pr64059.C: Removed now redundant -nostdlib.
* g++.dg/lto/20081109-1_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/20090302_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr45621_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr60567_0.C: Removed now redundant -nostdlib.
* g++.dg/lto/pr62026.C: Removed now redundant -nostdlib.
* gcc.dg/lto/pr45736_0.c: Removed now redundant -nostdlib.
* gcc.dg/lto/pr52634_0.c: Removed now redundant -nostdlib.
* gfortran.dg/lto/20091016-1_0.f90: Removed now redundant -nostdlib.
* gfortran.dg/lto/pr79108_0.f90: Removed now redundant -nostdlib.

---
 gcc/config/darwin.h| 8 
 gcc/cp/g++spec.c   | 1 +
 gcc/doc/invoke.texi| 7 ++-
 gcc/fortran/gfortranspec.c | 1 +
 gcc/gcc.c  | 6 +++---
 gcc/go/gospec.c| 1 +
 gcc/testsuite/g++.dg/ipa/pr64059.C | 2 +-
 gcc/testsuite/g++.dg/lto/20081109-1_0.C| 2 +-
 gcc/testsuite/g++.dg/lto/20090302_0.C  | 2 +-
 gcc/testsuite/g++.dg/lto/pr45621_0.C   | 2 +-
 gcc/testsuite/g++.dg/lto/pr60567_0.C   | 2 +-
 gcc/testsuite/g++.dg/lto/pr62026.C | 2 +-
 gcc/testsuite/gcc.dg/lto/pr45736_0.c   | 2 +-
 gcc/testsuite/gcc.dg/lto/pr52634_0.c   | 2 +-
 gcc/testsuite/gfortran.dg/lto/20091016-1_0.f90 | 2 +-
 gcc/testsuite/gfortran.dg/lto/pr79108_0.f90| 2 +-
 16 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index cd6d6521658..87f610259c0 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -180,20 +180,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-26 Thread Allan Sandfeld Jensen

On Dienstag, 21. August 2018 00:38:58 CEST Joseph Myers wrote:
> On Fri, 3 Aug 2018, Allan Sandfeld Jensen wrote:
> > > I think you're changing the wrong place for this.  If you want -r to be
> > > usable with GCC without using -nostdlib (which is an interesting
> > > question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> > > overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> > 
> > Okay, so like this?
> 
> Could you confirm if this has passed a bootstrap and testsuite run, with
> no testsuite regressions compared to GCC without the patch applied?  I
> think it looks mostly OK (modulo ChangeLog rewrites and a missing second
> space after '.' in the manual change) but I'd like to make sure it's
> passed the usual testing before preparing it for commit.

I didn't think of running the tests since it only affects command line 
options, but it did bootstrap, and behave as expected for my specific usecase.

I will update and run the tests when I have time.

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-11 Thread Allan Sandfeld Jensen

On Samstag, 11. August 2018 11:18:39 CEST Jakub Jelinek wrote:
> On Sat, Aug 11, 2018 at 10:59:26AM +0200, Allan Sandfeld Jensen wrote:
> > +/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
> > +   using movss or movsd.  */
> > +static bool
> > +expand_vec_perm_movs (struct expand_vec_perm_d *d)
> > +{
> > +  machine_mode vmode = d->vmode;
> > +  unsigned i, nelt = d->nelt;
> > +  rtx x;
> > +
> > +  if (d->one_operand_p)
> > +return false;
> > +
> > +  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
> > +;
> > +  else
> > +return false;
> > +
> > +  /* Only the first element is changed. */
> 
> Two spaces after .
> 
> > +  if (d->perm[0] != nelt && d->perm[0] != 0)
> > +return false;
> > +  for (i = 1; i < nelt; ++i) {
> > +{
> > +  if (d->perm[i] != i + nelt - d->perm[0])
> > +return false;
> > +}
> > +  }
> 
> Extraneous {}s (both pairs, the outer ones even badly indented).
> 
> Otherwise LGTM.
> 
Updated:

Note as an infrequent contributor don't have commit access, so I need someone 
reviewing to also commit.

'Allan
>From e33241e5ddc7fa57c4ba7893669af7f7e636125e Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Sat, 11 Aug 2018 11:52:21 +0200
Subject: [PATCH] Match movss and movsd "blend" instructions

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-08-11 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h |  2 +-
 gcc/config/i386/i386.c  | 41 +
 gcc/config/i386/xmmintrin.h |  5 -
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7554fd1f659..15a3caa94c3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46145,6 +46145,43 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed.  */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i)
+if (d->perm[i] != i + nelt - d->perm[0])
+  return false;
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46887,6 +46924,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..f770570295c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,10 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return (__m128) __builtin_shuffle ((__v4sf)__A, (__v4sf)__B,
+ __extension__
+ (__attribute__((__vector_size__ (16))) int)
+ {4,1,2,3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */
-- 
2.17.1

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-11 Thread Allan Sandfeld Jensen

Updated:

Match movss and movsd "blend" instructions

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-08-11 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7554fd1f659..485850096e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46145,6 +46145,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46887,6 +46927,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..f770570295c 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,10 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return (__m128) __builtin_shuffle ((__v4sf)__A, (__v4sf)__B,
+ __extension__
+ (__attribute__((__vector_size__ (16))) int)
+ {4,1,2,3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-11 Thread Allan Sandfeld Jensen

On Freitag, 3. August 2018 13:56:12 CEST Allan Sandfeld Jensen wrote:
> On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> > On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > > gcc/
> > > 
> > > * gcc.c: Correct default specs for -r
> > 
> > I don't follow why your changes (which would need describing for each
> > individual spec changed) are corrections.
> > 
> > >  /* config.h can define LIB_SPEC to override the default libraries.  */
> > >  #ifndef LIB_SPEC
> > > 
> > > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}">
> > > 
> > >  #endif
> > 
> > '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> > following specs are used unless both -shared and -r are specified, which
> > seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> > expand to nothing if either -shared or -r is passed and to what follows if
> > neither is passed.
> > 
> > And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> > as it's generally overridden by targets - and normally for targets using
> > ELF shared libraries, for example, -lc *does* have to be used when linking
> > with -shared.
> > 
> > I think you're changing the wrong place for this.  If you want -r to be
> > usable with GCC without using -nostdlib (which is an interesting
> > question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> > overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
> Okay, so like this?


Any further comments or corrections to updated patch in the parent message?

'Allan

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-03 Thread Allan Sandfeld Jensen

On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > gcc/
> > 
> > * gcc.c: Correct default specs for -r
> 
> I don't follow why your changes (which would need describing for each
> individual spec changed) are corrections.
> 
> >  /* config.h can define LIB_SPEC to override the default libraries.  */
> >  #ifndef LIB_SPEC
> > 
> > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"> 
> >  #endif
> 
> '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> following specs are used unless both -shared and -r are specified, which
> seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> expand to nothing if either -shared or -r is passed and to what follows if
> neither is passed.
> 
> And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> as it's generally overridden by targets - and normally for targets using
> ELF shared libraries, for example, -lc *does* have to be used when linking
> with -shared.
> 
> I think you're changing the wrong place for this.  If you want -r to be
> usable with GCC without using -nostdlib (which is an interesting
> question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
Okay, so like this?
>From 9de68f2ef0b77a0c0bcf0d83232e7fc34b006406 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 1 Aug 2018 18:07:05 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-07-29 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c (LINK_COMMAND_SPEC): Handle -r like -nostdlib
* config/darwin.h (LINK_COMMAND_SPEC): Handle -r like -nostdlib
---
 gcc/config/darwin.h | 8 
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 3 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index 980ad9b4057..6ad657a4958 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -178,20 +178,20 @@ extern GTY(()) int darwin_ms_struct;
"%X %{s} %{t} %{Z} %{u*} \
 %{e*} %{r} \
 %{o*}%{!o:-o a.out} \
-%{!nostdlib:%{!nostartfiles:%S}} \
+%{!nostdlib:%{!r:%{!nostartfiles:%S}}} \
 %{L*} %(link_libgcc) %o %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} \
 %{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1): \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libgomp.a%s; : -lgomp } } \
 %{fgnu-tm: \
   %{static|static-libgcc|static-libstdc++|static-libgfortran: libitm.a%s; : -litm } } \
-%{!nostdlib:%{!nodefaultlibs:\
+%{!nostdlib:%{!r:%{!nodefaultlibs:\
   %{%:sanitize(address): -lasan } \
   %{%:sanitize(undefined): -lubsan } \
   %(link_ssp) \
   " DARWIN_EXPORT_DYNAMIC " %

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen

On Donnerstag, 2. August 2018 23:15:28 CEST Marc Glisse wrote:
> On Thu, 2 Aug 2018, Allan Sandfeld Jensen wrote:
> > I forgot. One of the things that makes using __builtin_shuffle ugly is
> > that
> > __v4si  as the suffle argument needs to be in _mm_move_ss, is declared
> > in emmintrin.h, but _mm_move_ss is in xmmintrin.h.
> 
> __v4si is some internal detail, I don't see much issue with moving it to
> xmmintrin.h if you want to use it there.
> 
> > In general the gcc __builtin_shuffle syntax with the argument being a
> > vector is kind of ackward. At least for the declaring intrinsics, the
> > clang still where the permutator is extra argument is easier to deal
> > with:
> > __builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
> > vs
> > __builtin_shuffle(a, b, 4, 0, 1, 2)
> 
> __builtin_shufflevector IIRC
> 
> >> The question is what users expect and get when they use -O0 with
> >> intrinsics?> 
> > Here is the version with __builtin_shuffle. It might be more expectable
> > -O0, but it is also uglier.
> 
> I am not convinced -O0 is very important.
> 
Me neither, and in any case I would argue the logic that recognizes the vector 
constructions patterns are not optimizations but instruction matching.

> If you start extending your approach to _mm_add_sd and others, while one
> instruction is easy enough to recognize, if we put several in a row, they
> will be partially simplified and may become harder to recognize.
> { x*(y+v[0]-z), v[1] } requires that you notice that the upper part of
> this vector is v[1], i.e. the upper part of a vector whose lower part
> appears somewhere in the arbitrarily complex expression for the lower
> part of the result. And you then have to propagate the fact that you are
> doing vector operations all the way back to v[0].
> 
> I don't have a strong opinion on what the best approach is.

Yes, I am not sure all of those could be done exhaustively with the existing 
logic, and it might also be of dubious value as in almost all cases the ps 
instructions have the same latency and bandwidth as the ss instructions, so 
developers should probably use _ps versions as they are scheduled better by 
the compiler (or at least better by gcc).
It was just an idea, and I haven't tried it at this point.

'Allan

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen

On Donnerstag, 2. August 2018 23:46:37 CEST Jakub Jelinek wrote:
> On Thu, Aug 02, 2018 at 10:50:58PM +0200, Allan Sandfeld Jensen wrote:
> > Here is the version with __builtin_shuffle. It might be more expectable
> > -O0, but it is also uglier.
> 
> I don't find anything ugly on it, except the formatting glitches (missing
> space before (, overlong line, and useless __extension__.
> Improving code generated for __builtin_shuffle is desirable too.
> 

__extension__ is needed when using the the {...} initialization otherwise -
std=C89 will produce warnings about standards.  The line is a bit long, but I 
thought it looked better like this rather than adding any emergency line 
breaks. Is there a hard limit?

> > --- a/gcc/config/i386/xmmintrin.h
> > +++ b/gcc/config/i386/xmmintrin.h
> > @@ -1011,7 +1011,8 @@ _mm_storer_ps (float *__P, __m128 __A)
> > 
> >  extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
> >  __artificial__)) _mm_move_ss (__m128 __A, __m128 __B)
> >  {
> > 
> > -  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
> > +  return __extension__ (__m128) __builtin_shuffle((__v4sf)__A,
> > (__v4sf)__B, + 
> > (__attribute__((__vector_size__ (16))) int){4, 1, 2, 3});
> And obviously use __v4si here instead of __attribute__((__vector_size__
> (16))) int.
> 
__v4si is declared in emmintrin.h, so I couldn't use it here unless I moved 
the definition. I tried changing as little as possible to not trigger bike 
shedding.

'Allan

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen

On Donnerstag, 2. August 2018 11:18:41 CEST Richard Biener wrote:
> On Thu, Aug 2, 2018 at 11:12 AM Allan Sandfeld Jensen
> 
>  wrote:
> > On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> > > On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > > >  extern __inline __m128d __attribute__((__gnu_inline__,
> > > >  __always_inline__,
> > > > 
> > > > __artificial__))
> > > > 
> > > >  _mm_move_sd (__m128d __A, __m128d __B)
> > > >  {
> > > > 
> > > > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > > > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > > > 
> > > >  }
> > > 
> > > If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> > > wonder if we should be explicit and use __builtin_shuffle instead of
> > > relying on some forwprop pass to transform it. Maybe not, just asking.
> > > And
> > > the answer need not even be the same for _mm_move_sd and _mm_move_ss.
> > 
> > I wrote it this way because this pattern could later also be used for the
> > other _ss intrinsics, such as _mm_add_ss, where a _builtin_shuffle could
> > not. To match the other intrinsics the logic that tries to match vector
> > construction just needs to be extended to try merge patterns even if one
> > of the subexpressions is not simple.
> 
> The question is what users expect and get when they use -O0 with intrinsics?
> 
> Richard.
> 
Here is the version with __builtin_shuffle. It might be more expectable -O0, 
but it is also uglier.

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..6501638f619 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d) __builtin_shuffle((__v2df)__A, (__v2df)__B, (__v2di){2, 1});
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..2337ef5ea08 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46143,6 +46143,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46885,6 +46925,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..45b99ff87d5 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,8 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return __extension__ (__m128) __builtin_shuffle((__v4sf)__A, (__v4sf)__B,
+  (__attribute__((__vector_size__ (16))) int){4, 1, 2, 3});
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen

On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> >  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
> > 
> > __artificial__))
> > 
> >  _mm_move_sd (__m128d __A, __m128d __B)
> >  {
> > 
> > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > 
> >  }
> 
> If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> wonder if we should be explicit and use __builtin_shuffle instead of
> relying on some forwprop pass to transform it. Maybe not, just asking. And
> the answer need not even be the same for _mm_move_sd and _mm_move_ss.

I forgot. One of the things that makes using __builtin_shuffle ugly is that 
__v4si  as the suffle argument needs to be in _mm_move_ss, is declared
in emmintrin.h, but _mm_move_ss is in xmmintrin.h.

In general the gcc __builtin_shuffle syntax with the argument being a vector 
is kind of ackward. At least for the declaring intrinsics, the clang still 
where the permutator is extra argument is easier to deal with:
__builtin_shuffle(a, b, (__v4si){4, 0, 1, 2})
 vs
 __builtin_shuffle(a, b, 4, 0, 1, 2)

Re: [PATCH][x86] Match movss and movsd "blend" instructions

2018-08-02 Thread Allan Sandfeld Jensen

On Mittwoch, 1. August 2018 18:51:41 CEST Marc Glisse wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> >  extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__,
> > 
> > __artificial__))
> > 
> >  _mm_move_sd (__m128d __A, __m128d __B)
> >  {
> > 
> > -  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
> > +  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
> > 
> >  }
> 
> If the goal is to have it represented as a VEC_PERM_EXPR internally, I
> wonder if we should be explicit and use __builtin_shuffle instead of
> relying on some forwprop pass to transform it. Maybe not, just asking. And
> the answer need not even be the same for _mm_move_sd and _mm_move_ss.

I wrote it this way because this pattern could later also be used for the 
other _ss intrinsics, such as _mm_add_ss, where a _builtin_shuffle could not. 
To match the other intrinsics the logic that tries to match vector 
construction just needs to be extended to try merge patterns even if one of 
the subexpressions is not simple.

'Allan

Re: [Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen

On Mittwoch, 1. August 2018 18:32:30 CEST Joseph Myers wrote:
> On Wed, 1 Aug 2018, Allan Sandfeld Jensen wrote:
> > gcc/
> > 
> > * gcc.c: Correct default specs for -r
> 
> I don't follow why your changes (which would need describing for each
> individual spec changed) are corrections.
> 
> >  /* config.h can define LIB_SPEC to override the default libraries.  */
> >  #ifndef LIB_SPEC
> > 
> > -#define LIB_SPEC "%{!shared:%{g*:-lg}
> > %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}" +#define LIB_SPEC
> > "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"> 
> >  #endif
> 
> '!' binds more closely than '|' in specs.  That is, !shared|!r means the
> following specs are used unless both -shared and -r are specified, which
> seems nonsensical to me.  I'd expect something more like "shared|r:;" to
> expand to nothing if either -shared or -r is passed and to what follows if
> neither is passed.
> 
> And that ignores that this LIB_SPEC value in gcc.c is largely irrelevant,
> as it's generally overridden by targets - and normally for targets using
> ELF shared libraries, for example, -lc *does* have to be used when linking
> with -shared.
> 
> I think you're changing the wrong place for this.  If you want -r to be
> usable with GCC without using -nostdlib (which is an interesting
> question), you actually need to change LINK_COMMAND_SPEC (also sometimes
> overridden for targets) to handle -r more like -nostdlib -nostartfiles.
> 
Ok, thanks for the information, I will investigate that. 

> > -#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
> > +#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC
> > "} "
> What's this "-ar" option you're handling here?

Dead code from a previous more ambitious version of the patch. I will remove.

`Allan

[Patch][GCC] Document and fix -r (partial linking)

2018-08-01 Thread Allan Sandfeld Jensen

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-08-01 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)>From 638966e6c7e072ca46c6af0664fbd57bedbfff80 Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 1 Aug 2018 18:07:05 +0200
Subject: [PATCH] Fix and document -r option

The option has existed and been working for years,
make sure it implies the right extra options, and list
it in the documentation.

2018-07-29 Allan Sandfeld Jensen 

gcc/doc

* invoke.texi: Document -r

gcc/
* gcc.c: Correct default specs for -r
---
 gcc/doc/invoke.texi | 7 ++-
 gcc/gcc.c   | 6 +++---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 6047d82065a..7da30bd9d99 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -518,7 +518,7 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Link Options,,Options for Linking}.
 @gccoptlist{@var{object-file-name}  -fuse-ld=@var{linker}  -l@var{library} @gol
 -nostartfiles  -nodefaultlibs  -nolibc  -nostdlib @gol
--pie  -pthread  -rdynamic @gol
+-pie  -pthread  -r  -rdynamic @gol
 -s  -static -static-pie -static-libgcc  -static-libstdc++ @gol
 -static-libasan  -static-libtsan  -static-liblsan  -static-libubsan @gol
 -shared  -shared-libgcc  -symbolic @gol
@@ -12444,6 +12444,11 @@ x86 Cygwin and MinGW targets.  On some targets this option also sets
 flags for the preprocessor, so it should be used consistently for both
 compilation and linking.
 
+@item -r
+@opindex r
+Produce a relocatable object as output. This is also known as partial
+linking.
+
 @item -rdynamic
 @opindex rdynamic
 Pass the flag @option{-export-dynamic} to the ELF linker, on targets
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 780d4859ef3..858a5600c14 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -675,7 +675,7 @@ proper position among the other output files.  */
 
 /* config.h can define LIB_SPEC to override the default libraries.  */
 #ifndef LIB_SPEC
-#define LIB_SPEC "%{!shared:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
+#define LIB_SPEC "%{!shared|!r:%{g*:-lg} %{!p:%{!pg:-lc}}%{p:-lc_p}%{pg:-lc_p}}"
 #endif
 
 /* When using -fsplit-stack we need to wrap pthread_create, in order
@@ -797,7 +797,7 @@ proper position among the other output files.  */
 /* config.h can define STARTFILE_SPEC to override the default crt0 files.  */
 #ifndef STARTFILE_SPEC
 #define STARTFILE_SPEC  \
-  "%{!shared:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
+  "%{!shared|!r:%{pg:gcrt0%O%s}%{!pg:%{p:mcrt0%O%s}%{!p:crt0%O%s}}}"
 #endif
 
 /* config.h can define ENDFILE_SPEC to override the default crtn files.  */
@@ -936,7 +936,7 @@ proper position among the other output files.  */
 #else
 #define LD_PIE_SPEC ""
 #endif
-#define LINK_PIE_SPEC "%{static|shared|r:;" PIE_SPEC ":" LD_PIE_SPEC "} "
+#define LINK_PIE_SPEC "%{static|shared|r|ar:;" PIE_SPEC ":" LD_PIE_SPEC "} "
 #endif
 
 #ifndef LINK_BUILDID_SPEC
-- 
2.17.0

[PATCH][x86] Match movss and movsd "blend" instructions

2018-08-01 Thread Allan Sandfeld Jensen

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c
>From e96b3aa9017ad0d19238c923146196405cc4e5af Mon Sep 17 00:00:00 2001
From: Allan Sandfeld Jensen 
Date: Wed, 9 May 2018 12:35:14 +0200
Subject: [PATCH] Match movss and movsd blends

Adds the ability to match movss and movsd as blend patterns,
implemented in a new method to be able to match these before shuffles,
while keeping other blends after.

2018-07-29 Allan Sandfeld Jensen 

gcc/config/i386

* i386.cc (expand_vec_perm_movs): New method matching movs
patterns.
* i386.cc (expand_vec_perm_1): Try the new method.

gcc/testsuite

* gcc.target/i386/sse2-movs.c: New test.
---
 gcc/config/i386/emmintrin.h   |  2 +-
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/xmmintrin.h   |  2 +-
 gcc/testsuite/gcc.target/i386/sse2-movs.c | 21 +++
 4 files changed, 67 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/sse2-movs.c

diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index b940a39d27b..1efd943bac4 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -113,7 +113,7 @@ _mm_setzero_pd (void)
 extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_sd (__m128d __A, __m128d __B)
 {
-  return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B);
+  return __extension__ (__m128d)(__v2df){__B[0],__A[1]};
 }
 
 /* Load two DPFP values from P.  The address must be 16-byte aligned.  */
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ee409cfe7e4..2337ef5ea08 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -46143,6 +46143,46 @@ expand_vselect_vconcat (rtx target, rtx op0, rtx op1,
   return ok;
 }
 
+/* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
+   using movss or movsd.  */
+static bool
+expand_vec_perm_movs (struct expand_vec_perm_d *d)
+{
+  machine_mode vmode = d->vmode;
+  unsigned i, nelt = d->nelt;
+  rtx x;
+
+  if (d->one_operand_p)
+return false;
+
+  if (TARGET_SSE2 && (vmode == V2DFmode || vmode == V4SFmode))
+;
+  else
+return false;
+
+  /* Only the first element is changed. */
+  if (d->perm[0] != nelt && d->perm[0] != 0)
+return false;
+  for (i = 1; i < nelt; ++i) {
+{
+  if (d->perm[i] != i + nelt - d->perm[0])
+return false;
+}
+  }
+
+  if (d->testing_p)
+return true;
+
+  if (d->perm[0] == nelt)
+x = gen_rtx_VEC_MERGE (vmode, d->op1, d->op0, GEN_INT (1));
+  else
+x = gen_rtx_VEC_MERGE (vmode, d->op0, d->op1, GEN_INT (1));
+
+  emit_insn (gen_rtx_SET (d->target, x));
+
+  return true;
+}
+
 /* A subroutine of ix86_expand_vec_perm_builtin_1.  Try to implement D
in terms of blendp[sd] / pblendw / pblendvb / vpblendd.  */
 
@@ -46885,6 +46925,10 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d)
 	}
 }
 
+  /* Try movss/movsd instructions.  */
+  if (expand_vec_perm_movs (d))
+return true;
+
   /* Finally, try the fully general two operand permute.  */
   if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt,
 			  d->testing_p))
diff --git a/gcc/config/i386/xmmintrin.h b/gcc/config/i386/xmmintrin.h
index f64f3f74a0b..699f681e054 100644
--- a/gcc/config/i386/xmmintrin.h
+++ b/gcc/config/i386/xmmintrin.h
@@ -1011,7 +1011,7 @@ _mm_storer_ps (float *__P, __m128 __A)
 extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_move_ss (__m128 __A, __m128 __B)
 {
-  return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B);
+  return __extension__ (__m128)(__v4sf){__B[0],__A[1],__A[2],__A[3]};
 }
 
 /* Extracts one of the four words of A.  The selector N must be immediate.  */
diff --git a/gcc/testsuite/gcc.target/i386/sse2-movs.c b/gcc/testsuite/gcc.target/i386/sse2-movs.c
new file mode 100644
index 000..79f486cfa82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-movs.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+/* { dg-final { scan-assembler "movss" } } */
+/* { dg-final {

Re: [Patch] Request for comments: short int builtins

2018-05-20 Thread Allan Sandfeld Jensen

On Sonntag, 20. Mai 2018 15:07:59 CEST Richard Biener wrote:
> On May 20, 2018 11:01:54 AM GMT+02:00, Allan Sandfeld Jensen 
<li...@carewolf.com> wrote:
> >A little over a year back we had a regression in a point release of gcc
> >
> >because the builtin __builtin_clzs got removed from i386, in part
> >because it
> >is was wrongly named for a target specific builtin, but we were using
> >it in Qt
> >since it existed in multiple compilers. I got the patch removing it
> >partially
> >reverted and the problem solved, but in the meantime I had worked on a
> >patch
> >to make it a generic builtin instead. I have rebased it and added it
> >below,
> >should I clean it up futher, finish the other builtins add tests and
> >propose
> >it, or is this not something we want?
> 
> Can't users simply do clz((unsigned short) s) - 16? GCC should be able to
> handle this for instruction selection With The addition of some folding
> patterns using the corresponding internal function.
> 
Of course, but we already have the builtin for i386, and a version of the 
builtin for all integer types except short for all platforms. Note the patch 
also generally adds short versions for all the general integer builtins, not 
just clzs and they are not all that trivial to synthesize (without knowing the 
trick, which gcc does).


'Allan

[Patch] Request for comments: short int builtins

2018-05-20 Thread Allan Sandfeld Jensen

A little over a year back we had a regression in a point release of gcc 
because the builtin __builtin_clzs got removed from i386, in part because it 
is was wrongly named for a target specific builtin, but we were using it in Qt 
since it existed in multiple compilers. I got the patch removing it partially 
reverted and the problem solved, but in the meantime I had worked on a patch 
to make it a generic builtin instead. I have rebased it and added it below, 
should I clean it up futher, finish the other builtins add tests and propose 
it, or is this not something we want?

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index 5365befd351..74a84e653e4 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -232,6 +232,8 @@ DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONG, BT_INT, BT_LONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_ULONG, BT_INT, BT_ULONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_LONGLONG, BT_INT, BT_LONGLONG)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_ULONGLONG, BT_INT, BT_ULONGLONG)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_INT16, BT_INT, BT_INT16)
+DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINT16, BT_INT, BT_UINT16)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_INTMAX, BT_INT, BT_INTMAX)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_UINTMAX, BT_INT, BT_UINTMAX)
 DEF_FUNCTION_TYPE_1 (BT_FN_INT_PTR, BT_INT, BT_PTR)
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 9a2bf8c7d38..a8ad00e8016 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -10689,14 +10689,17 @@ is_inexpensive_builtin (tree decl)
   case BUILT_IN_CLZIMAX:
   case BUILT_IN_CLZL:
   case BUILT_IN_CLZLL:
+  case BUILT_IN_CLZS:
   case BUILT_IN_CTZ:
   case BUILT_IN_CTZIMAX:
   case BUILT_IN_CTZL:
   case BUILT_IN_CTZLL:
+  case BUILT_IN_CTZS:
   case BUILT_IN_FFS:
   case BUILT_IN_FFSIMAX:
   case BUILT_IN_FFSL:
   case BUILT_IN_FFSLL:
+  case BUILT_IN_FFSS:
   case BUILT_IN_IMAXABS:
   case BUILT_IN_FINITE:
   case BUILT_IN_FINITEF:
@@ -10734,10 +10737,12 @@ is_inexpensive_builtin (tree decl)
   case BUILT_IN_POPCOUNTL:
   case BUILT_IN_POPCOUNTLL:
   case BUILT_IN_POPCOUNTIMAX:
+  case BUILT_IN_POPCOUNTS:
   case BUILT_IN_POPCOUNT:
   case BUILT_IN_PARITYL:
   case BUILT_IN_PARITYLL:
   case BUILT_IN_PARITYIMAX:
+  case BUILT_IN_PARITYS:
   case BUILT_IN_PARITY:
   case BUILT_IN_LABS:
   case BUILT_IN_LLABS:
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 449d08d682f..618ee798767 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -848,15 +848,18 @@ DEF_GCC_BUILTIN(BUILT_IN_CLZ, "clz", 
BT_FN_INT_UINT, ATTR_CONST_NOTHROW_
 DEF_GCC_BUILTIN(BUILT_IN_CLZIMAX, "clzimax", BT_FN_INT_UINTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLZL, "clzl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLZLL, "clzll", BT_FN_INT_ULONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CLZS, "clzs", BT_FN_INT_UINT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CONSTANT_P, "constant_p", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZ, "ctz", BT_FN_INT_UINT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZIMAX, "ctzimax", BT_FN_INT_UINTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZL, "ctzl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CTZLL, "ctzll", BT_FN_INT_ULONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CTZS, "ctzs", BT_FN_INT_UINT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSB, "clrsb", BT_FN_INT_INT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBIMAX, "clrsbimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBL, "clrsbl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_CLRSBLL, "clrsbll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_CLRSBS, "clrsbs", BT_FN_INT_INT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_DCGETTEXT, "dcgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING_INT, ATTR_FORMAT_ARG_2)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_DGETTEXT, "dgettext", 
BT_FN_STRING_CONST_STRING_CONST_STRING, ATTR_FORMAT_ARG_2)
 DEF_GCC_BUILTIN(BUILT_IN_DWARF_CFA, "dwarf_cfa", BT_FN_PTR, 
ATTR_NULL)
@@ -878,6 +881,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFS, "ffs", 
BT_FN_INT_INT, ATTR_CONST_NOTHROW_L
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSIMAX, "ffsimax", BT_FN_INT_INTMAX, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSL, "ffsl", BT_FN_INT_LONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL, "ffsll", BT_FN_INT_LONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSS, "ffss", BT_FN_INT_INT16, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID,

Re: [PATCH] Use two source permute for vector initialization (PR 85692, take 2)

2018-05-10 Thread Allan Sandfeld Jensen

On Donnerstag, 10. Mai 2018 09:57:22 CEST Jakub Jelinek wrote:
> On Wed, May 09, 2018 at 04:53:19PM +0200, Allan Sandfeld Jensen wrote:
> > > > @@ -2022,8 +2022,9 @@ simplify_vector_constructor
> > > > (gimple_stmt_iterator
> > > > *gsi)>
> > > > 
> > > >elem_type = TREE_TYPE (type);
> > > >elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> > > > 
> > > > -  vec_perm_builder sel (nelts, nelts, 1);
> > > > -  orig = NULL;
> > > > +  vec_perm_builder sel (nelts, 2, nelts);
> > > 
> > > Why this change?  I admit the vec_parm_builder arguments are confusing,
> > > but
> > > I think the second times third is the number of how many indices are
> > > being
> > > pushed into the vector, so I think (nelts, nelts, 1) is right.
> > 
> > I had the impression it was what was selected from. In any case, I changed
> > it because without I get crash when vec_perm_indices is created later
> > with a possible nparms of 2.
> 
> The documentation is apparently in vector-builder.h:
>This class is a wrapper around auto_vec for building vectors of T.
>It aims to encode each vector as npatterns interleaved patterns,
>where each pattern represents a sequence:
> 
>  { BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, BASE1 + STEP*3, ... }
> 
>The first three elements in each pattern provide enough information
>to derive the other elements.  If all patterns have a STEP of zero,
>we only need to encode the first two elements in each pattern.
>If BASE1 is also equal to BASE0 for all patterns, we only need to
>encode the first element in each pattern.  The number of encoded
>elements per pattern is given by nelts_per_pattern.
> 
>The class can be used in two ways:
> 
>1. It can be used to build a full image of the vector, which is then
>   canonicalized by finalize ().  In this case npatterns is initially
>   the number of elements in the vector and nelts_per_pattern is
>   initially 1.
> 
>2. It can be used to build a vector that already has a known encoding.
>   This is preferred since it is more efficient and copes with
>   variable-length vectors.  finalize () then canonicalizes the encoding
>   to a simpler form if possible.
> 
> As the vector is constant width and we are building the full image of the
> vector, the right arguments are (nelts, nelts, 1) as per 1. above, and the
> finalization can perhaps change it to something more compact.
> 
> > > (and sorry for missing your patch first, the PR wasn't ASSIGNED and
> > > there
> > > was no link to gcc-patches for it).
> > 
> > It is okay. You are welcome to take it over. I am not a regular gcc
> > contributor and thus not well-versed in the details, only the basic logic
> > of how things work.
> 
> Ok, here is my version of the patch.  Bootstrapped/regtested on x86_64-linux
> and i686-linux, ok for trunk?
> 
Looks good to me if that counts for anything.

'Allan

Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-09 Thread Allan Sandfeld Jensen

On Mittwoch, 9. Mai 2018 11:08:02 CEST Jakub Jelinek wrote:
> On Tue, May 08, 2018 at 01:25:35PM +0200, Allan Sandfeld Jensen wrote:
> > 2018-05-08 Allan Sandfeld Jensen <allan.jen...@qt.io>
> 
> 2 spaces between date and name and two spaces between name and email
> address.
> 
> > gcc/
> > 
> > PR tree-optimization/85692
> > * tree-ssa-forwprop.c (simplify_vector_constructor): Try two
> > source permute as well.
> > 
> > gcc/testsuite
> > 
> > * gcc.target/i386/pr85692.c: Test two simply constructions are
> > detected as permute instructions.
> 
> Just
>   * gcc.target/i386/pr85692.c: New test.
> 
> > diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c
> > b/gcc/testsuite/gcc.target/i386/pr85692.c new file mode 100644
> > index 000..322c1050161
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr85692.c
> > @@ -0,0 +1,18 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -msse4.1" } */
> > +/* { dg-final { scan-assembler "unpcklps" } } */
> > +/* { dg-final { scan-assembler "blendps" } } */
> > +/* { dg-final { scan-assembler-not "shufps" } } */
> > +/* { dg-final { scan-assembler-not "unpckhps" } } */
> > +
> > +typedef float v4sf __attribute__ ((vector_size (16)));
> > +
> > +v4sf unpcklps(v4sf a, v4sf b)
> > +{
> > +return v4sf{a[0],b[0],a[1],b[1]};
> 
> Though, not really sure if this has been tested at all.
> The above is valid only in C++ (and only C++11 and above), while the
> test is compiled as C and thus has to fail.
>
Yes, I thought it had been tested, but it wasn't. It also needs to change the 
first line to be a compile and not run test.

> > @@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator
> > *gsi)> 
> >elem_type = TREE_TYPE (type);
> >elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> > 
> > -  vec_perm_builder sel (nelts, nelts, 1);
> > -  orig = NULL;
> > +  vec_perm_builder sel (nelts, 2, nelts);
> 
> Why this change?  I admit the vec_parm_builder arguments are confusing, but
> I think the second times third is the number of how many indices are being
> pushed into the vector, so I think (nelts, nelts, 1) is right.
> 
I had the impression it was what was selected from. In any case, I changed it 
because without I get crash when vec_perm_indices is created later with a 
possible nparms of 2.

> > @@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator
> > *gsi)> 
> > return false;
> > 
> >op1 = gimple_assign_rhs1 (def_stmt);
> >ref = TREE_OPERAND (op1, 0);
> > 
> > -  if (orig)
> > +  if (orig1)
> > 
> > {
> > 
> > - if (ref != orig)
> > -   return false;
> > + if (ref == orig1 || orig2)
> > +   {
> > + if (ref != orig1 && ref != orig2)
> > +   return false;
> > +   }
> > + else
> > +   {
> > + if (TREE_CODE (ref) != SSA_NAME)
> > +   return false;
> > + if (! VECTOR_TYPE_P (TREE_TYPE (ref))
> > + || ! useless_type_conversion_p (TREE_TYPE (op1),
> > + TREE_TYPE (TREE_TYPE (ref
> > +   return false;
> > + if (TREE_TYPE (orig1) != TREE_TYPE (ref))
> > +   return false;
> 
> I think even different type is acceptable here, as long as its conversion to
> orig1's type is useless.
> 
> Furthermore, I think the way you wrote the patch with 2 variables rather
> than an array of 2 elements means too much duplication, this else block
> is a duplication of the else block below.  See the patch I've added to the
> PR
It seemed to me it was clearer like this, but I can see your point.

> (and sorry for missing your patch first, the PR wasn't ASSIGNED and there
> was no link to gcc-patches for it).
> 
It is okay. You are welcome to take it over. I am not a regular gcc 
contributor and thus not well-versed in the details, only the basic logic of 
how things work.

'Allan

Re: [Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen

On Dienstag, 8. Mai 2018 12:42:33 CEST Richard Biener wrote:
> On Tue, May 8, 2018 at 12:37 PM, Allan Sandfeld Jensen
> 
> <li...@carewolf.com> wrote:
> > I have tried to fix PR85692 that I opened.
> 
> Please add a testcase as well.  It also helps if you shortly tell what
> the patch does
> in your mail.
> 
Okay. I have updated the patch with a test-case based on my motivating 
examples. The patch just extends patching a vector construction to not just a 
single source permute instruction, but also a two source permute instruction.
commit 15c0f6a933d60b085416a59221851b604b955958
Author: Allan Sandfeld Jensen <allan.jen...@qt.io>
Date:   Tue May 8 13:16:18 2018 +0200

Try two source permute for vector construction

simplify_vector_constructor() was detecting when vector construction could
be implemented as a single source permute, but was not detecting when
it could be implemented as a double source permute. This patch adds the
    second case.

2018-05-08 Allan Sandfeld Jensen <allan.jen...@qt.io>

gcc/

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Try two
source permute as well.

gcc/testsuite

* gcc.target/i386/pr85692.c: Test two simply constructions are
detected as permute instructions.

diff --git a/gcc/testsuite/gcc.target/i386/pr85692.c b/gcc/testsuite/gcc.target/i386/pr85692.c
new file mode 100644
index 000..322c1050161
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr85692.c
@@ -0,0 +1,18 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -msse4.1" } */
+/* { dg-final { scan-assembler "unpcklps" } } */
+/* { dg-final { scan-assembler "blendps" } } */
+/* { dg-final { scan-assembler-not "shufps" } } */
+/* { dg-final { scan-assembler-not "unpckhps" } } */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+
+v4sf unpcklps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[0],a[1],b[1]};
+}
+
+v4sf blendps(v4sf a, v4sf b)
+{
+return v4sf{a[0],b[1],a[2],b[3]};
+}
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, ))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   , _code)
 	  || conv_code

[Patch] Use two source permute for vector initialization (PR 85692)

2018-05-08 Thread Allan Sandfeld Jensen

I have tried to fix PR85692 that I opened.

2018-05-08  Allan Sandfeld Jense 

PR tree-optimization/85692
* tree-ssa-forwprop.c (simplify_vector_constructor): Detect
  two source permute operations as well.
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 58ec6b47a5b..fbee8064160 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -2004,7 +2004,7 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
-  tree op, op2, orig, type, elem_type;
+  tree op, op2, orig1, orig2, type, elem_type;
   unsigned elem_size, i;
   unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
@@ -2022,8 +2022,9 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  vec_perm_builder sel (nelts, nelts, 1);
-  orig = NULL;
+  vec_perm_builder sel (nelts, 2, nelts);
+  orig1 = NULL;
+  orig2 = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
   FOR_EACH_VEC_SAFE_ELT (CONSTRUCTOR_ELTS (op), i, elt)
@@ -2063,10 +2064,26 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op1 = gimple_assign_rhs1 (def_stmt);
   ref = TREE_OPERAND (op1, 0);
-  if (orig)
+  if (orig1)
 	{
-	  if (ref != orig)
-	return false;
+	  if (ref == orig1 || orig2)
+	{
+	  if (ref != orig1 && ref != orig2)
+	return false;
+	}
+	  else
+	{
+	  if (TREE_CODE (ref) != SSA_NAME)
+		return false;
+	  if (! VECTOR_TYPE_P (TREE_TYPE (ref))
+		  || ! useless_type_conversion_p (TREE_TYPE (op1),
+		  TREE_TYPE (TREE_TYPE (ref
+		return false;
+	  if (TREE_TYPE (orig1) != TREE_TYPE (ref))
+		return false;
+	  orig2 = ref;
+	  maybe_ident = false;
+	  }
 	}
   else
 	{
@@ -2076,12 +2093,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	  || ! useless_type_conversion_p (TREE_TYPE (op1),
 	  TREE_TYPE (TREE_TYPE (ref
 	return false;
-	  orig = ref;
+	  orig1 = ref;
 	}
   unsigned int elt;
   if (maybe_ne (bit_field_size (op1), elem_size)
 	  || !constant_multiple_p (bit_field_offset (op1), elem_size, ))
 	return false;
+  if (orig2 && ref == orig2)
+	elt += nelts;
   if (elt != i)
 	maybe_ident = false;
   sel.quick_push (elt);
@@ -2089,14 +2108,17 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (i < nelts)
 return false;
 
-  if (! VECTOR_TYPE_P (TREE_TYPE (orig))
+  if (! VECTOR_TYPE_P (TREE_TYPE (orig1))
   || maybe_ne (TYPE_VECTOR_SUBPARTS (type),
-		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig
+		   TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig1
 return false;
 
+  if (!orig2)
+orig2 = orig1;
+
   tree tem;
   if (conv_code != ERROR_MARK
-  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig),
+  && (! supportable_convert_operation (conv_code, type, TREE_TYPE (orig1),
 	   , _code)
 	  || conv_code == CALL_EXPR))
 return false;
@@ -2104,16 +2126,16 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
   if (maybe_ident)
 {
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_from_tree (gsi, orig);
+	gimple_assign_set_rhs_from_tree (gsi, orig1);
   else
-	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	gimple_assign_set_rhs_with_ops (gsi, conv_code, orig1,
 	NULL_TREE, NULL_TREE);
 }
   else
 {
   tree mask_type;
 
-  vec_perm_indices indices (sel, 1, nelts);
+  vec_perm_indices indices (sel, 2, nelts);
   if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
   mask_type
@@ -2125,15 +2147,14 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 	return false;
   op2 = vec_perm_indices_to_tree (mask_type, indices);
   if (conv_code == ERROR_MARK)
-	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig, orig, op2);
+	gimple_assign_set_rhs_with_ops (gsi, VEC_PERM_EXPR, orig1, orig2, op2);
   else
 	{
 	  gimple *perm
-	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig)),
-   VEC_PERM_EXPR, orig, orig, op2);
-	  orig = gimple_assign_lhs (perm);
+	= gimple_build_assign (make_ssa_name (TREE_TYPE (orig1)),
+   VEC_PERM_EXPR, orig1, orig2, op2);
 	  gsi_insert_before (gsi, perm, GSI_SAME_STMT);
-	  gimple_assign_set_rhs_with_ops (gsi, conv_code, orig,
+	  gimple_assign_set_rhs_with_ops (gsi, conv_code, gimple_assign_lhs (perm),
 	  NULL_TREE, NULL_TREE);
 	}
 }

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-05-02 Thread Allan Sandfeld Jensen

On Tuesday 02 May 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 03:15:11PM +0200, Allan Sandfeld Jensen wrote:
> > Okay, I have tried that, and I also made it more obvious how the
> > intrinsics can become non-immediate shift.
> > 
> > 
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> > index b58f5050db0..b9406550fc5 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,10 @@
> > +2017-04-22  Allan Sandfeld Jensen  <sandf...@kde.org>
> > +
> > +   * config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*):
> > +   Use vector intrinstics instead of builtins.
> > +   * config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*):
> > +   Use vector intrinstics instead of builtins.
> > +
> > 
> >  2017-04-21  Uros Bizjak  <ubiz...@gmail.com>
> >  
> > * config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv.
> > 
> > diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h
> > index 82f170a3d61..64ba52b244e 100644
> > --- a/gcc/config/i386/avx2intrin.h
> > +++ b/gcc/config/i386/avx2intrin.h
> > @@ -665,13 +665,6 @@ _mm256_slli_si256 (__m256i __A, const int __N)
> > 
> >  extern __inline __m256i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > 
> > -_mm256_slli_epi16 (__m256i __A, int __B)
> > -{
> > -  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
> > -}
> > -
> > -extern __inline __m256i
> > -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > 
> >  _mm256_sll_epi16 (__m256i __A, __m128i __B)
> >  {
> >  
> >return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B);
> > 
> > @@ -679,9 +672,11 @@ _mm256_sll_epi16 (__m256i __A, __m128i __B)
> > 
> >  extern __inline __m256i
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > 
> > -_mm256_slli_epi32 (__m256i __A, int __B)
> > +_mm256_slli_epi16 (__m256i __A, int __B)
> > 
> >  {
> > 
> > -  return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B);
> > +  if (__builtin_constant_p(__B))
> > +return ((unsigned int)__B < 16) ? (__m256i)((__v16hi)__A << __B) :
> > _mm256_setzero_si256(); +  return _mm256_sll_epi16(__A,
> > _mm_cvtsi32_si128(__B));
> > 
> >  }
> 
> The formatting is wrong, missing spaces before function names and opening
> (, too long lines.  Also, you've removed some builtin uses like
> __builtin_ia32_psllwi256 above, but haven't removed those builtins from the
> compiler (unlike the intrinsics, the builtins are not supported and can be
> removed).  But I guess the primary question is on Uros, do we
> want to handle this in the *intrin.h headers and thus increase the size
> of those (and their parsing time etc.), or do we want to handle this
> in the target folders (tree as well as gimple one), where we'd convert
> e.g. __builtin_ia32_psllwi256 to the shift if the shift count is constant.
> 
Ok. I will await what you decide.

Btw. I thought of an alternative idea: Make a new set of built-ins, called for 
instance __builtin_lshift and __builtin_rshift, that translates simply to 
GIMPLE shifts, just like cpp_shifts currently does, the only difference being 
the new shifts (unlike C/C++ shifts) are defined for all shift sizes and on 
negative values.  With this also variable shift intrinsics can be written 
without builtins. Though to do this would making a whole set of them for all 
integer types, it would need to be implemented in the c-parser like 
__buitin_shuffle, and not with the other generic builtins.

Would that make sense?

Best regards
`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 09:33:09AM +0200, Allan Sandfeld Jensen wrote:
> > --- a/gcc/config/i386/avx2intrin.h
> > +++ b/gcc/config/i386/avx2intrin.h
> > @@ -667,7 +667,7 @@ extern __inline __m256i
> > 
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm256_slli_epi16 (__m256i __A, int __B)
> >  {
> > 
> > -  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
> > +  return ((__B & 0xff) < 16) ? (__m256i)((__v16hi)__A << (__B & 0xff)) :
> > _mm256_setzero_si256();
> > 
> >  }
> 
> What is the advantage of doing that when you replace one operation with
> several (&, <, ?:, <<)?
> I'd say instead we should fold the builtins if in the gimple fold target
> hook we see the shift count constant and can decide based on that.
> Or we could use __builtin_constant_p (__B) to decide whether to use
> the generic vector shifts or builtin, but that means larger IL.
> 
Okay, I have tried that, and I also made it more obvious how the intrinsics 
can become non-immediate shift.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b58f5050db0..b9406550fc5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-04-22  Allan Sandfeld Jensen  <sandf...@kde.org>
+
+	* config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*):
+	Use vector intrinstics instead of builtins.
+	* config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*):
+	Use vector intrinstics instead of builtins.
+
 2017-04-21  Uros Bizjak  <ubiz...@gmail.com>
 
 	* config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv.
diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h
index 82f170a3d61..64ba52b244e 100644
--- a/gcc/config/i386/avx2intrin.h
+++ b/gcc/config/i386/avx2intrin.h
@@ -665,13 +665,6 @@ _mm256_slli_si256 (__m256i __A, const int __N)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_slli_epi16 (__m256i __A, int __B)
-{
-  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
-}
-
-extern __inline __m256i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_sll_epi16 (__m256i __A, __m128i __B)
 {
   return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B);
@@ -679,9 +672,11 @@ _mm256_sll_epi16 (__m256i __A, __m128i __B)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_slli_epi32 (__m256i __A, int __B)
+_mm256_slli_epi16 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B);
+  if (__builtin_constant_p(__B))
+return ((unsigned int)__B < 16) ? (__m256i)((__v16hi)__A << __B) : _mm256_setzero_si256();
+  return _mm256_sll_epi16(__A, _mm_cvtsi32_si128(__B));
 }
 
 extern __inline __m256i
@@ -693,9 +688,11 @@ _mm256_sll_epi32 (__m256i __A, __m128i __B)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_slli_epi64 (__m256i __A, int __B)
+_mm256_slli_epi32 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psllqi256 ((__v4di)__A, __B);
+  if (__builtin_constant_p(__B))
+return ((unsigned int)__B < 32) ? (__m256i)((__v8si)__A << __B) : _mm256_setzero_si256();
+  return _mm256_sll_epi32(__A, _mm_cvtsi32_si128(__B));
 }
 
 extern __inline __m256i
@@ -707,6 +704,15 @@ _mm256_sll_epi64 (__m256i __A, __m128i __B)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
+_mm256_slli_epi64 (__m256i __A, int __B)
+{
+  if (__builtin_constant_p(__B))
+return ((unsigned int)__B < 64) ? (__m256i)((__v4di)__A << __B) : _mm256_setzero_si256();
+  return _mm256_sll_epi64(__A, _mm_cvtsi32_si128(__B));
+}
+
+extern __inline __m256i
+__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srai_epi16 (__m256i __A, int __B)
 {
   return (__m256i)__builtin_ia32_psrawi256 ((__v16hi)__A, __B);
@@ -756,13 +762,6 @@ _mm256_srli_si256 (__m256i __A, const int __N)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_srli_epi16 (__m256i __A, int __B)
-{
-  return (__m256i)__builtin_ia32_psrlwi256 ((__v16hi)__A, __B);
-}
-
-extern __inline __m256i
-__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srl_epi16 (__m256i __A, __m128i __B)
 {
   return (__m256i)__builtin_ia32_psrlw256((__v16hi)__A, (__v8hi)__B);
@@ -770,9 +769,11 @@ _mm256_srl_epi16 (__m256i __A, __m128i __B)
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_srli_epi32 (__m256i __A, int __B)
+_mm256_srli_epi16 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psrldi256 ((__v8si)__A, __B);
+  if (__builtin_constant_p(__B))
+return ((unsigned int)__B < 16) ? (__m256i)((__v16hu)__A >>

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 11:01:29AM +0200, Allan Sandfeld Jensen wrote:
> > On Monday 24 April 2017, Jakub Jelinek wrote:
> > > On Mon, Apr 24, 2017 at 10:34:58AM +0200, Allan Sandfeld Jensen wrote:
> > > > That is a different instruction. That is the vpsllw not vpsllwi
> > > > 
> > > > The intrinsics I changed is the immediate version, I didn't change
> > > > the non- immediate version. It is probably a bug if you can give
> > > > non-immediate values to the immediate only intrinsic. At least both
> > > > versions handles it, if in different ways, but is is illegal
> > > > arguments.
> > > 
> > > The documentation is unclear on that and I've only recently fixed up
> > > some cases where these intrinsics weren't able to handle non-constant
> > > arguments in some cases, while both ICC and clang coped with that
> > > fine.
> > > So it is clearly allowed and handled by all the compilers and needs to
> > > be supported, people use that in real-world code.
> > 
> > Undoubtedly it happens. I just make a mistake myself that created that
> > case. But it is rather unfortunate, and means we make wrong code
> > currently for corner case values.
> 
> The intrinsic documentation is poor, usually you have a good documentation
> on what the instructions do, and then you just have to guess what the
> intrinsics do.  You can of course ask Intel for clarification.
> 
> If you try:
> #include 
> 
> __m128i
> foo (__m128i a, int b)
> {
>   return _mm_slli_epi16 (a, b);
> }
> and call it with 257 from somewhere else, you can see that all the
> compilers will give you zero vector.  And similarly if you use 257
> literally instead of b.  So what the intrinsic (unlike the instruction)
> actually does is that it compares all bits of the imm8 argument (supposedly
> using unsigned comparison) and if it is bigger than 15 (or 7 or 31 or 63
> depending on the bitsize of element) it yields 0 vector.
> 
Good point. I was using intel's documentation at 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/, but if all 
compilers including us does something else, practicality wins.

It did make me curious and test out what _mm_slli_epi16(v, -250); compiles to. 
For some reason that becomes an undefined shift using the non-immediate sll in 
gcc, but returns the zero-vector in clang. With my patch it was a 6 bit shift, 
but that is apparently not de-facto standard.


`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 10:34:58AM +0200, Allan Sandfeld Jensen wrote:
> > That is a different instruction. That is the vpsllw not vpsllwi
> > 
> > The intrinsics I changed is the immediate version, I didn't change the
> > non- immediate version. It is probably a bug if you can give
> > non-immediate values to the immediate only intrinsic. At least both
> > versions handles it, if in different ways, but is is illegal arguments.
> 
> The documentation is unclear on that and I've only recently fixed up some
> cases where these intrinsics weren't able to handle non-constant arguments
> in some cases, while both ICC and clang coped with that fine.
> So it is clearly allowed and handled by all the compilers and needs to be
> supported, people use that in real-world code.
> 
Undoubtedly it happens. I just make a mistake myself that created that case. 
But it is rather unfortunate, and means we make wrong code currently for 
corner case values.

Note the difference in definition between the two intrinsics: 
_mm_ssl_epi16:
FOR j := 0 to 7
i := j*16
IF count[63:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend(a[i+15:i] << count[63:0])
FI
ENDFOR

_mm_ssli_epi16:
FOR j := 0 to 7
i := j*16
IF imm8[7:0] > 15
dst[i+15:i] := 0
ELSE
dst[i+15:i] := ZeroExtend(a[i+15:i] << imm8[7:0])
FI
ENDFOR

For a value such as 257, the immediate version does a 1 bit shift, while the 
non-immediate returns a zero vector. A simple function using the immediate 
intrinsic has to have an if-statement, if transformed to using the non-
immediate instruction.

`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Allan Sandfeld Jensen wrote:
> On Monday 24 April 2017, Jakub Jelinek wrote:
> > On Mon, Apr 24, 2017 at 10:02:40AM +0200, Allan Sandfeld Jensen wrote:
> > > > That said, both the options I've mentioned above provide the same
> > > > advantages and don't have the disadvantages of pessimizing normal
> > > > code.
> > > 
> > > What pessimizing? This produce the same or better code for all legal
> > > arguments. The only difference besides better generated code is that it
> > > allows
> > 
> > No.  Have you really tried that?
> > 
> > > the intrinsics to be used incorrectly with non-literal arguments
> > > because we lack the C-extension for constexp to prevent that.
> > 
> > Consider e.g. -O2 -mavx2 -mtune=intel:
> > #include 
> > 
> > __m256i
> > foo (__m256i x, int s)
> > {
> > 
> >   return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)x, s);
> > 
> > }
> > 
> > __m256i
> > bar (__m256i x, int s)
> > {
> > 
> >   return ((s & 0xff) < 16) ? (__m256i)((__v16hi)x << (s & 0xff)) :
> > _mm256_setzero_si256 (); }
> > 
> > The first one generates
> > 
> > movl%edi, %edi
> > vmovq   %rdi, %xmm1
> > vpsllw  %xmm1, %ymm0, %ymm0
> > ret
> > 
> > (because that is actually what the instruction does), the second one
> 
> That is a different instruction. That is the vpsllw not vpsllwi
> 
> The intrinsics I changed is the immediate version, I didn't change the non-
> immediate version. It is probably a bug if you can give non-immediate
> values to the immediate only intrinsic. At least both versions handles it,
> if in different ways, but is is illegal arguments.
> 
Though I now that I think about it, this means my change of to the existing 
sse-psslw-1.c test and friends is wrong, because it uses variable input.

`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 10:02:40AM +0200, Allan Sandfeld Jensen wrote:
> > > That said, both the options I've mentioned above provide the same
> > > advantages and don't have the disadvantages of pessimizing normal code.
> > 
> > What pessimizing? This produce the same or better code for all legal
> > arguments. The only difference besides better generated code is that it
> > allows
> 
> No.  Have you really tried that?
> 
> > the intrinsics to be used incorrectly with non-literal arguments because
> > we lack the C-extension for constexp to prevent that.
> 
> Consider e.g. -O2 -mavx2 -mtune=intel:
> #include 
> 
> __m256i
> foo (__m256i x, int s)
> {
>   return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)x, s);
> }
> 
> __m256i
> bar (__m256i x, int s)
> {
>   return ((s & 0xff) < 16) ? (__m256i)((__v16hi)x << (s & 0xff)) :
> _mm256_setzero_si256 (); }
> 
> The first one generates
> movl%edi, %edi
> vmovq   %rdi, %xmm1
> vpsllw  %xmm1, %ymm0, %ymm0
> ret
> (because that is actually what the instruction does), the second one
That is a different instruction. That is the vpsllw not vpsllwi

The intrinsics I changed is the immediate version, I didn't change the non-
immediate version. It is probably a bug if you can give non-immediate values 
to the immediate only intrinsic. At least both versions handles it, if in 
different ways, but is is illegal arguments.

`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 09:51:29AM +0200, Allan Sandfeld Jensen wrote:
> > On Monday 24 April 2017, Jakub Jelinek wrote:
> > > On Mon, Apr 24, 2017 at 09:33:09AM +0200, Allan Sandfeld Jensen wrote:
> > > > --- a/gcc/config/i386/avx2intrin.h
> > > > +++ b/gcc/config/i386/avx2intrin.h
> > > > @@ -667,7 +667,7 @@ extern __inline __m256i
> > > > 
> > > >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> > > >  _mm256_slli_epi16 (__m256i __A, int __B)
> > > >  {
> > > > 
> > > > -  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
> > > > +  return ((__B & 0xff) < 16) ? (__m256i)((__v16hi)__A << (__B &
> > > > 0xff)) : _mm256_setzero_si256();
> > > > 
> > > >  }
> > > 
> > > What is the advantage of doing that when you replace one operation with
> > > several (&, <, ?:, <<)?
> > > I'd say instead we should fold the builtins if in the gimple fold
> > > target hook we see the shift count constant and can decide based on
> > > that. Or we could use __builtin_constant_p (__B) to decide whether to
> > > use the generic vector shifts or builtin, but that means larger IL.
> > 
> > The advantage is that in this builtin, the __B is always a literal (or
> > constexpr), so the if statement is resolved at compile time.
> 
> Do we really want to support all the thousands _mm* intrinsics in constexpr
> contexts?  People can just use generic vectors instead.
> 
I would love to support it, but first we need a C extension attribute matching 
constexpr, and I consider it a separate issue.

> That said, both the options I've mentioned above provide the same
> advantages and don't have the disadvantages of pessimizing normal code.
> 
What pessimizing? This produce the same or better code for all legal 
arguments. The only difference besides better generated code is that it allows 
the intrinsics to be used incorrectly with non-literal arguments because we 
lack the C-extension for constexp to prevent that.

`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Monday 24 April 2017, Jakub Jelinek wrote:
> On Mon, Apr 24, 2017 at 09:33:09AM +0200, Allan Sandfeld Jensen wrote:
> > --- a/gcc/config/i386/avx2intrin.h
> > +++ b/gcc/config/i386/avx2intrin.h
> > @@ -667,7 +667,7 @@ extern __inline __m256i
> > 
> >  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> >  _mm256_slli_epi16 (__m256i __A, int __B)
> >  {
> > 
> > -  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
> > +  return ((__B & 0xff) < 16) ? (__m256i)((__v16hi)__A << (__B & 0xff)) :
> > _mm256_setzero_si256();
> > 
> >  }
> 
> What is the advantage of doing that when you replace one operation with
> several (&, <, ?:, <<)?
> I'd say instead we should fold the builtins if in the gimple fold target
> hook we see the shift count constant and can decide based on that.
> Or we could use __builtin_constant_p (__B) to decide whether to use
> the generic vector shifts or builtin, but that means larger IL.

The advantage is that in this builtin, the __B is always a literal (or 
constexpr), so the if statement is resolved at compile time.

`Allan

Re: [PATCH] [x86] Avoid builtins for SSE/AVX2 immidiate logical shifts

2017-04-24 Thread Allan Sandfeld Jensen

On Saturday 22 April 2017, Allan Sandfeld Jensen wrote:
> Replaces definitions of immediate logical shift intrinsics with GCC
> extension syntax. Tests are added to ensure the intrinsics still produce
> the right instructions and that a few basic optimizations now work.
> 
> Compared to the earlier version of the patch, all potentially undefined
> shifts are now avoided, which also means no variable shifts or arithmetic
> right shifts.

Fixed 2 errors in the tests.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b58f5050db0..b9406550fc5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-04-22  Allan Sandfeld Jensen  <sandf...@kde.org>
+
+	* config/i386/emmintrin.h (_mm_slli_*, _mm_srli_*):
+	Use vector intrinstics instead of builtins.
+	* config/i386/avx2intrin.h (_mm256_slli_*, _mm256_srli_*):
+	Use vector intrinstics instead of builtins.
+
 2017-04-21  Uros Bizjak  <ubiz...@gmail.com>
 
 	* config/i386/i386.md (*extzvqi_mem_rex64): Move above *extzv.
diff --git a/gcc/config/i386/avx2intrin.h b/gcc/config/i386/avx2intrin.h
index 82f170a3d61..acb49734131 100644
--- a/gcc/config/i386/avx2intrin.h
+++ b/gcc/config/i386/avx2intrin.h
@@ -667,7 +667,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_slli_epi16 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B);
+  return ((__B & 0xff) < 16) ? (__m256i)((__v16hi)__A << (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
@@ -681,7 +681,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_slli_epi32 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B);
+  return ((__B & 0xff) < 32) ? (__m256i)((__v8si)__A << (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
@@ -695,7 +695,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_slli_epi64 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psllqi256 ((__v4di)__A, __B);
+  return ((__B & 0xff) < 64) ? (__m256i)((__v4di)__A << (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
@@ -758,7 +758,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srli_epi16 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psrlwi256 ((__v16hi)__A, __B);
+  return ((__B & 0xff) < 16) ? (__m256i) ((__v16hu)__A >> (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
@@ -772,7 +772,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srli_epi32 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psrldi256 ((__v8si)__A, __B);
+  return ((__B & 0xff) < 32) ? (__m256i) ((__v8su)__A >> (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
@@ -786,7 +786,7 @@ extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_srli_epi64 (__m256i __A, int __B)
 {
-  return (__m256i)__builtin_ia32_psrlqi256 ((__v4di)__A, __B);
+  return ((__B & 0xff) < 64) ? (__m256i) ((__v4du)__A >> (__B & 0xff)) : _mm256_setzero_si256();
 }
 
 extern __inline __m256i
diff --git a/gcc/config/i386/emmintrin.h b/gcc/config/i386/emmintrin.h
index 828f4a07a9b..5c048d9fd0d 100644
--- a/gcc/config/i386/emmintrin.h
+++ b/gcc/config/i386/emmintrin.h
@@ -1140,19 +1140,19 @@ _mm_mul_epu32 (__m128i __A, __m128i __B)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_slli_epi16 (__m128i __A, int __B)
 {
-  return (__m128i)__builtin_ia32_psllwi128 ((__v8hi)__A, __B);
+  return ((__B & 0xff) < 16) ? (__m128i)((__v8hi)__A << (__B & 0xff)) : _mm_setzero_si128();
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_slli_epi32 (__m128i __A, int __B)
 {
-  return (__m128i)__builtin_ia32_pslldi128 ((__v4si)__A, __B);
+  return ((__B & 0xff) < 32) ? (__m128i)((__v4si)__A << (__B & 0xff)) : _mm_setzero_si128();
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_slli_epi64 (__m128i __A, int __B)
 {
-  return (__m128i)__builtin_ia32_psllqi128 ((__v2di)__A, __B);
+  return ((__B & 0xff) < 64) ? (__m128i)((__v2di)__A << (__B & 0xff)) : _mm_setzero_si128();
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
@@ -1205,19 +1205,19 @@ _mm_slli_si128 (__m128i __A, const int __N)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _mm_srli_epi16 (__m128i __A, int __B)
 {
-  return (__m128i)__builtin_ia32_psrlwi128 ((__v8hi)__A, __B);
+  return ((__B & 0xff) < 16) ? (__

Re: [Patch][i386] PR 70118: Fix ubsan warning on SSE2 loadl_epi64 and storel_epi64

2016-12-09 Thread Allan Sandfeld Jensen

On Tuesday 06 December 2016, Uros Bizjak wrote:
> Hello!
> 
> > 2016-11-30  Allan Sandfeld Jensen  <allan.jen...@qt.io>
> > 
> >PR target/70118
> >* gcc/config/i386/mmintrin.h (__m64_u): New type
> >* gcc/config/i386/emmintrin.h (_mm_loadl_epi64, _mm_storel_epi64):
> >Make the allowed unaligned memory access explicit.
> 
> OK.
> 
Thanks

Note I don't have write access.

Best regards
`Allan

Re: [Patch][i386] PR 70118: Fix ubsan warning on SSE2 loadl_epi64 and storel_epi64

2016-12-05 Thread Allan Sandfeld Jensen

Trying again, this time with changelog.

gcc/

2016-11-30  Allan Sandfeld Jensen  <allan.jen...@qt.io>

PR target/70118
* gcc/config/i386/mmintrin.h (__m64_u): New type
* gcc/config/i386/emmintrin.h (_mm_loadl_epi64, _mm_storel_epi64):
   Make the allowed unaligned memory access explicit.
Index: gcc/config/i386/emmintrin.h
===
--- gcc/config/i386/emmintrin.h	(revision 242892)
+++ gcc/config/i386/emmintrin.h	(working copy)
@@ -703,9 +703,9 @@
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_loadl_epi64 (__m128i const *__P)
+_mm_loadl_epi64 (__m128i_u const *__P)
 {
-  return _mm_set_epi64 ((__m64)0LL, *(__m64 *)__P);
+  return _mm_set_epi64 ((__m64)0LL, *(__m64_u *)__P);
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
@@ -721,9 +721,9 @@
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_storel_epi64 (__m128i *__P, __m128i __B)
+_mm_storel_epi64 (__m128i_u *__P, __m128i __B)
 {
-  *(long long *)__P = ((__v2di)__B)[0];
+  *(__m64_u *)__P = (__m64) ((__v2di)__B)[0];
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
Index: gcc/config/i386/mmintrin.h
===
--- gcc/config/i386/mmintrin.h	(revision 242892)
+++ gcc/config/i386/mmintrin.h	(working copy)
@@ -37,6 +37,9 @@
vector types, and their scalar components.  */
 typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
 
+/* Unaligned version of the same type  */
+typedef int __m64_u __attribute__ ((__vector_size__ (8), __may_alias__, __aligned__ (1)));
+
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((__vector_size__ (8)));
 typedef short __v4hi __attribute__ ((__vector_size__ (8)));

Re: [Patch][i386] PR 70118: Fix ubsan warning on SSE2 loadl_epi64 and storel_epi64

2016-11-27 Thread Allan Sandfeld Jensen

On Sunday 27 November 2016, Marc Glisse wrote:
> On Sat, 26 Nov 2016, Allan Sandfeld Jensen wrote:
> > Use the recently introduced unaligned variant of __m128i and add a
> > similar __m64 and use those to make it clear these two intrinsics
> > require neither 128- bit nor 64-bit alignment.
> 
> Thanks for doing this. You'll want Uros or Kirill to review your patch.
> There are probably several more places that could do with an unaligned
> fix, but we don't have to find them all at once.
> First I found it strange to use __m64, but then it actually seems like a
> good call to use a type that is not just aligned(1) but also may_alias.
> 
> +  *(__m64_u *)__P = __m64(((__v2di)__B)[0]);
> 
> gcc complains about this syntax for me, it wants parentheses around
> __m64... Did it pass the testsuite for you?

Fixed, it now matches the move just below.
Index: gcc/config/i386/emmintrin.h
===
--- gcc/config/i386/emmintrin.h	(revision 242892)
+++ gcc/config/i386/emmintrin.h	(working copy)
@@ -703,9 +703,9 @@
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_loadl_epi64 (__m128i const *__P)
+_mm_loadl_epi64 (__m128i_u const *__P)
 {
-  return _mm_set_epi64 ((__m64)0LL, *(__m64 *)__P);
+  return _mm_set_epi64 ((__m64)0LL, *(__m64_u *)__P);
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
@@ -721,9 +721,9 @@
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_storel_epi64 (__m128i *__P, __m128i __B)
+_mm_storel_epi64 (__m128i_u *__P, __m128i __B)
 {
-  *(long long *)__P = ((__v2di)__B)[0];
+  *(__m64_u *)__P = (__m64) ((__v2di)__B)[0];
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
Index: gcc/config/i386/mmintrin.h
===
--- gcc/config/i386/mmintrin.h	(revision 242892)
+++ gcc/config/i386/mmintrin.h	(working copy)
@@ -37,6 +37,9 @@
vector types, and their scalar components.  */
 typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
 
+/* Unaligned version of the same type  */
+typedef int __m64_u __attribute__ ((__vector_size__ (8), __may_alias__, __aligned__ (1)));
+
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((__vector_size__ (8)));
 typedef short __v4hi __attribute__ ((__vector_size__ (8)));

[Patch][i386] PR 70118: Fix ubsan warning on SSE2 loadl_epi64 and storel_epi64

2016-11-26 Thread Allan Sandfeld Jensen

Use the recently introduced unaligned variant of __m128i and add a similar 
__m64 and use those to make it clear these two intrinsics require neither 128-
bit nor 64-bit alignment.

`Allan
Index: gcc/config/i386/emmintrin.h
===
--- gcc/config/i386/emmintrin.h	(revision 242753)
+++ gcc/config/i386/emmintrin.h	(working copy)
@@ -703,9 +703,9 @@
 }
 
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_loadl_epi64 (__m128i const *__P)
+_mm_loadl_epi64 (__m128i_u const *__P)
 {
-  return _mm_set_epi64 ((__m64)0LL, *(__m64 *)__P);
+  return _mm_set_epi64 ((__m64)0LL, *(__m64_u *)__P);
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
@@ -721,9 +721,9 @@
 }
 
 extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__))
-_mm_storel_epi64 (__m128i *__P, __m128i __B)
+_mm_storel_epi64 (__m128i_u *__P, __m128i __B)
 {
-  *(long long *)__P = ((__v2di)__B)[0];
+  *(__m64_u *)__P = __m64(((__v2di)__B)[0]);
 }
 
 extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
Index: gcc/config/i386/mmintrin.h
===
--- gcc/config/i386/mmintrin.h	(revision 242753)
+++ gcc/config/i386/mmintrin.h	(working copy)
@@ -37,6 +37,9 @@
vector types, and their scalar components.  */
 typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__));
 
+/* Unaligned version of the same type  */
+typedef int __m64_u __attribute__ ((__vector_size__ (8), __may_alias__, __aligned__ (1)));
+
 /* Internal data types for implementing the intrinsics.  */
 typedef int __v2si __attribute__ ((__vector_size__ (8)));
 typedef short __v4hi __attribute__ ((__vector_size__ (8)));

Re: [PATCH] libiberty: Fix -Wimplicit-fallthrough warnings.

2016-11-18 Thread Allan Sandfeld Jensen

On Wednesday 02 November 2016, Mark Wielaard wrote:
> -case 11: c+=((hashval_t)k[10]<<24);
> -case 10: c+=((hashval_t)k[9]<<16);
> -case 9 : c+=((hashval_t)k[8]<<8);
> +case 11: c+=((hashval_t)k[10]<<24);  /* fall through */
> +case 10: c+=((hashval_t)k[9]<<16);   /* fall through */
> +case 9 : c+=((hashval_t)k[8]<<8);/* fall through */
>/* the first byte of c is reserved for the length */

This really highlights another exception -Wimplicit-fallthough should tolerate 
at least on level 1. Single line of code.

case X: my_statement();
case y: my_statement();

and 
case X:
my_statement();
case y:
my_statement();

In both cases, the lack of break is obvious at a glance and thus a comment 
highlighting it has never been needed and shouldn't be enforced.

Well, at least in my opinion, and it would take away all the rest of false 
positives I run into.

Best regards
`Allan

Re: [PATCH] Introduce -Wimplicit-fallthrough={0,1,2,3,4,5}

2016-10-13 Thread Allan Sandfeld Jensen

On Tuesday 11 October 2016, Jakub Jelinek wrote:
> Hi!
> 
> The following patch introduces difference warning levels for
> -Wimplicit-fallthrough warning, so projects can choose if they want to
> honor only attributes (-Wimplicit-fallthrough=5), or what kind of comments.
> =4 is very picky and accepts only very small amount of comments, =3 is what
> we had before this patch, =2 looks case insensitively for falls?[
> \t-]*thr(u|ough) anywhere in the comment, =1 accepts any comment, =0 is
> the same as -Wno-implicit-fallthrough - disables the warning.

I would suggest also looking for comments with "no break" in them, as that is 
another common way to annotate the intentional lack of a 'break'.

If you want another example besides the linux kernel, I unified all our fall 
through comments in qtbase in August: 
https://codereview.qt-project.org/#/c/163595/

Though Qt is far from -Wimpliciit-fallthough clean after that, another 
colleague is working on that since we have traditionally aimed for -Wall -
Wextra -Werror, though it will seriously wreck hawock with readability in 
several places with unrolled loops and switches on integers.

Best regards
`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-26 Thread Allan Sandfeld Jensen

On Monday 26 January 2015, H.J. Lu wrote:
 On Mon, Jan 26, 2015 at 11:08 AM, Allan Sandfeld Jensen
 
 carew...@gmail.com wrote:
  On Monday 26 January 2015, H.J. Lu wrote:
  On Mon, Jan 26, 2015 at 10:53 AM, Allan Sandfeld Jensen
  
  carew...@gmail.com wrote:
 Committed with a bunch of fixes (e.g. missing fold_builtin_cpu
 part in gcc/config/i386/i386.c, and mv17.C test didn't compile
 at all due to missing parenthesis).
 
 ... and now with committed ChangeLog and patch.
 
 gcc/ChangeLog:
 * config/i386/i386.c (get_builtin_code_for_version): Add
 support for BMI and BMI2 multiversion functions.
 (fold_builtin_cpu): Add F_BMI and F_BMI2.
 
 libgcc/ChangeLog:
 * config/i386/cpuinfo.c (enum processor_features): Add
 FEATURE_BMI and FEATURE_BMI2.
 (get_available_features): Detect FEATURE_BMI and
 FEATURE_BMI2.
 
 testsuite/ChangeLog:
 * gcc.target/i386/funcspec-5.c: Test new multiversion
 targets. * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion
 dispatcher.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9ec40cb..441911d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34289,15 +34289,18 @@ get_builtin_code_for_version (tree decl,
tree *predica te_list)

 P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,

-P_PROC_SSE4_2,

 P_POPCNT,

+P_PROC_SSE4_2,

 P_AVX,
 P_PROC_AVX,

+P_BMI,
+P_PROC_BMI,

 P_FMA4,
 P_XOP,
 P_PROC_XOP,
 P_FMA,
 P_PROC_FMA,

+P_BMI2,

 P_AVX2,
 P_PROC_AVX2,
 P_AVX512F,

This changed the priority of P_POPCNT and caused

FAIL: g++.dg/ext/mv1.C  -std=gnu++98 execution test
FAIL: g++.dg/ext/mv1.C  -std=gnu++11 execution test
FAIL: g++.dg/ext/mv1.C  -std=gnu++14 execution test

on Nehalem and Westmere machines:

mv1.exe:
/export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/mv1.C:
51: int main(): Assertion `val == 5' failed.

since val is 6 now.

Right. I am not sure why popcnt was prioritized below arch=corei7.
The logic is supposed to be that any target that includes an
extension is prioritized
   
   I don't understand your question.  popcnt feature is separate from
   -march. Its priority has nothing to do with -march=corei7.
   
   arch=corei7 implies popcnt. See PTA_NEHALEM in i386.c. The test would
   probably work with -march=core2.
   
   AFAIK The logic of the priorities in multiversioning is that
   architecture specific functions are chosen over feature specific,
   unless the feature is one that isn't required by the architecture.
  
  On SSE4.2 machines, we should choose
  
  int __attribute__ ((target(arch=corei7))) foo ();
  
  over
  
  int __attribute__ ((target(popcnt))) foo ();
  
  But we shouldn't choose
  
  int __attribute__ ((target(arch=corei7))) foo ();
  
  over
  
  int __attribute__ ((target(arch=corei7,popcnt))) foo ();
  
  I guess since they represent the exact same effective ISA, they would
  have equal priority, so that it would likely chose whatever comes last.
 
 I have no strong opinion on this.  But this is a user visible compiler
 behavior change.  We should issue a warning/note here.

Yes, or revert that part of the patch. It should have been a separate patch 
anyway.

`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-26 Thread Allan Sandfeld Jensen

On Monday 26 January 2015, you wrote:
 On Mon, Jan 26, 2015 at 10:38 AM, Allan Sandfeld Jensen
 
 al...@carewolf.com wrote:
  On Monday 26 January 2015, H.J. Lu wrote:
  On Sun, Jan 25, 2015 at 10:37 AM, Uros Bizjak ubiz...@gmail.com wrote:
   On Sun, Jan 25, 2015 at 7:23 PM, Uros Bizjak ubiz...@gmail.com wrote:
   On Sat, Jan 24, 2015 at 11:49 AM, Allan Sandfeld Jensen
   
   al...@carewolf.com wrote:
   On Saturday 24 January 2015, Uros Bizjak wrote:
   On Mon, Jan 12, 2015 at 6:02 PM, Uros Bizjak ubiz...@gmail.com 
wrote:
Hello!

On Wed, Dec 31, 2014 at 01:28:47PM +0100, Allan Sandfeld Jensen
  
  wrote:
 I recently wanted to use multiversioning for BMI2 specific
 extensions PDEP/PEXT, and noticed it wasn't there. So I wrote
 this patch to add it, and also added AES, F16C and BMI1 for
 completeness.

AES nor F16C doesn't make any sense IMHO for multiversioning,
you need special intrinsics for that anyway and when you use
them, the function will fail to compile without those
features. Multiversioning only makes sense for ISA features
the compiler uses for normal C/C++ code without any
intrinsics.

Patch reduced to just adding BMI and BMI2 multiversioning:
+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+ * config/i386/i386.c (get_builtin_code_for_version): Add
+ support for BMI and BMI2 multiversion functions.

+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+ * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
+ * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.

+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+ * config/i386/cpuinfo.c (enum processor_features): Add
FEATURE_BMI and + FEATURE_BMI2.
+ (get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.

OK for mainline
   
   Allan, did you commit the patch to mainline? I don't see it in SVN
   logs.
   
   (If you don't have SVN commit access, please mention it in the
   patch submission, so someone will commit the patch for you).
   
   Sorry. I don't have SVN commit access.
   
   Committed with a bunch of fixes (e.g. missing fold_builtin_cpu part
   in gcc/config/i386/i386.c, and mv17.C test didn't compile at all due
   to missing parenthesis).
   
   ... and now with committed ChangeLog and patch.
   
   gcc/ChangeLog:
   * config/i386/i386.c (get_builtin_code_for_version): Add
   support for BMI and BMI2 multiversion functions.
   (fold_builtin_cpu): Add F_BMI and F_BMI2.
   
   libgcc/ChangeLog:
   * config/i386/cpuinfo.c (enum processor_features): Add FEATURE_BMI
   and FEATURE_BMI2.
   (get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.
   
   testsuite/ChangeLog:
   * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
   * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
  
  diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
  index 9ec40cb..441911d 100644
  --- a/gcc/config/i386/i386.c
  +++ b/gcc/config/i386/i386.c
  @@ -34289,15 +34289,18 @@ get_builtin_code_for_version (tree decl, tree
  *predica te_list)
  
   P_PROC_SSE4_A,
   P_SSE4_1,
   P_SSE4_2,
  
  -P_PROC_SSE4_2,
  
   P_POPCNT,
  
  +P_PROC_SSE4_2,
  
   P_AVX,
   P_PROC_AVX,
  
  +P_BMI,
  +P_PROC_BMI,
  
   P_FMA4,
   P_XOP,
   P_PROC_XOP,
   P_FMA,
   P_PROC_FMA,
  
  +P_BMI2,
  
   P_AVX2,
   P_PROC_AVX2,
   P_AVX512F,
  
  This changed the priority of P_POPCNT and caused
  
  FAIL: g++.dg/ext/mv1.C  -std=gnu++98 execution test
  FAIL: g++.dg/ext/mv1.C  -std=gnu++11 execution test
  FAIL: g++.dg/ext/mv1.C  -std=gnu++14 execution test
  
  on Nehalem and Westmere machines:
  
  mv1.exe:
  /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/mv1.C:51:
  int main(): Assertion `val == 5' failed.
  
  since val is 6 now.
  
  Right. I am not sure why popcnt was prioritized below arch=corei7. The
  logic is supposed to be that any target that includes an extension is
  prioritized
 
 I don't understand your question.  popcnt feature is separate from -march.
 Its priority has nothing to do with -march=corei7.
 
arch=corei7 implies popcnt. See PTA_NEHALEM in i386.c. The test would probably 
work with -march=core2.

AFAIK The logic of the priorities in multiversioning is that architecture 
specific functions are chosen over feature specific, unless the feature is one 
that isn't required by the architecture.

`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-26 Thread Allan Sandfeld Jensen

On Monday 26 January 2015, H.J. Lu wrote:
 On Mon, Jan 26, 2015 at 10:53 AM, Allan Sandfeld Jensen
 
 carew...@gmail.com wrote:
Committed with a bunch of fixes (e.g. missing fold_builtin_cpu
part in gcc/config/i386/i386.c, and mv17.C test didn't compile at
all due to missing parenthesis).

... and now with committed ChangeLog and patch.

gcc/ChangeLog:
* config/i386/i386.c (get_builtin_code_for_version): Add
support for BMI and BMI2 multiversion functions.
(fold_builtin_cpu): Add F_BMI and F_BMI2.

libgcc/ChangeLog:
* config/i386/cpuinfo.c (enum processor_features): Add
FEATURE_BMI and FEATURE_BMI2.
(get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.

testsuite/ChangeLog:
* gcc.target/i386/funcspec-5.c: Test new multiversion targets.
* g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
   
   diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
   index 9ec40cb..441911d 100644
   --- a/gcc/config/i386/i386.c
   +++ b/gcc/config/i386/i386.c
   @@ -34289,15 +34289,18 @@ get_builtin_code_for_version (tree decl,
   tree *predica te_list)
   
P_PROC_SSE4_A,
P_SSE4_1,
P_SSE4_2,
   
   -P_PROC_SSE4_2,
   
P_POPCNT,
   
   +P_PROC_SSE4_2,
   
P_AVX,
P_PROC_AVX,
   
   +P_BMI,
   +P_PROC_BMI,
   
P_FMA4,
P_XOP,
P_PROC_XOP,
P_FMA,
P_PROC_FMA,
   
   +P_BMI2,
   
P_AVX2,
P_PROC_AVX2,
P_AVX512F,
   
   This changed the priority of P_POPCNT and caused
   
   FAIL: g++.dg/ext/mv1.C  -std=gnu++98 execution test
   FAIL: g++.dg/ext/mv1.C  -std=gnu++11 execution test
   FAIL: g++.dg/ext/mv1.C  -std=gnu++14 execution test
   
   on Nehalem and Westmere machines:
   
   mv1.exe:
   /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/mv1.C:51:
   int main(): Assertion `val == 5' failed.
   
   since val is 6 now.
   
   Right. I am not sure why popcnt was prioritized below arch=corei7. The
   logic is supposed to be that any target that includes an extension is
   prioritized
  
  I don't understand your question.  popcnt feature is separate from
  -march. Its priority has nothing to do with -march=corei7.
  
  arch=corei7 implies popcnt. See PTA_NEHALEM in i386.c. The test would
  probably work with -march=core2.
  
  AFAIK The logic of the priorities in multiversioning is that architecture
  specific functions are chosen over feature specific, unless the feature
  is one that isn't required by the architecture.
 
 On SSE4.2 machines, we should choose
 
 int __attribute__ ((target(arch=corei7))) foo ();
 
 over
 
 int __attribute__ ((target(popcnt))) foo ();
 
 But we shouldn't choose
 
 int __attribute__ ((target(arch=corei7))) foo ();
 
 over
 
 int __attribute__ ((target(arch=corei7,popcnt))) foo ();

I guess since they represent the exact same effective ISA, they would have 
equal priority, so that it would likely chose whatever comes last. 

`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-26 Thread Allan Sandfeld Jensen

On Monday 26 January 2015, H.J. Lu wrote:
 On Sun, Jan 25, 2015 at 10:37 AM, Uros Bizjak ubiz...@gmail.com wrote:
  On Sun, Jan 25, 2015 at 7:23 PM, Uros Bizjak ubiz...@gmail.com wrote:
  On Sat, Jan 24, 2015 at 11:49 AM, Allan Sandfeld Jensen
  
  al...@carewolf.com wrote:
  On Saturday 24 January 2015, Uros Bizjak wrote:
  On Mon, Jan 12, 2015 at 6:02 PM, Uros Bizjak ubiz...@gmail.com wrote:
   Hello!
   
   On Wed, Dec 31, 2014 at 01:28:47PM +0100, Allan Sandfeld Jensen 
wrote:
I recently wanted to use multiversioning for BMI2 specific
extensions PDEP/PEXT, and noticed it wasn't there. So I wrote
this patch to add it, and also added AES, F16C and BMI1 for
completeness.
   
   AES nor F16C doesn't make any sense IMHO for multiversioning, you
   need special intrinsics for that anyway and when you use them,
   the function will fail to compile without those features.
   Multiversioning only makes sense for ISA features the compiler
   uses for normal C/C++ code without any intrinsics.
   
   Patch reduced to just adding BMI and BMI2 multiversioning:
   +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
   +
   + * config/i386/i386.c (get_builtin_code_for_version): Add
   + support for BMI and BMI2 multiversion functions.
   
   +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
   +
   + * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
   + * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
   
   +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
   +
   + * config/i386/cpuinfo.c (enum processor_features): Add FEATURE_BMI
   and + FEATURE_BMI2.
   + (get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.
   
   OK for mainline
  
  Allan, did you commit the patch to mainline? I don't see it in SVN
  logs.
  
  (If you don't have SVN commit access, please mention it in the patch
  submission, so someone will commit the patch for you).
  
  Sorry. I don't have SVN commit access.
  
  Committed with a bunch of fixes (e.g. missing fold_builtin_cpu part in
  gcc/config/i386/i386.c, and mv17.C test didn't compile at all due to
  missing parenthesis).
  
  ... and now with committed ChangeLog and patch.
  
  gcc/ChangeLog:
  * config/i386/i386.c (get_builtin_code_for_version): Add
  support for BMI and BMI2 multiversion functions.
  (fold_builtin_cpu): Add F_BMI and F_BMI2.
  
  libgcc/ChangeLog:
  * config/i386/cpuinfo.c (enum processor_features): Add FEATURE_BMI
  and FEATURE_BMI2.
  (get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.
  
  testsuite/ChangeLog:
  * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
  * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
 
 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 9ec40cb..441911d 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -34289,15 +34289,18 @@ get_builtin_code_for_version (tree decl, tree
 *predica te_list)
  P_PROC_SSE4_A,
  P_SSE4_1,
  P_SSE4_2,
 -P_PROC_SSE4_2,
  P_POPCNT,
 +P_PROC_SSE4_2,
  P_AVX,
  P_PROC_AVX,
 +P_BMI,
 +P_PROC_BMI,
  P_FMA4,
  P_XOP,
  P_PROC_XOP,
  P_FMA,
  P_PROC_FMA,
 +P_BMI2,
  P_AVX2,
  P_PROC_AVX2,
  P_AVX512F,
 
 This changed the priority of P_POPCNT and caused
 
 FAIL: g++.dg/ext/mv1.C  -std=gnu++98 execution test
 FAIL: g++.dg/ext/mv1.C  -std=gnu++11 execution test
 FAIL: g++.dg/ext/mv1.C  -std=gnu++14 execution test
 
 on Nehalem and Westmere machines:
 
 mv1.exe:
 /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/ext/mv1.C:51: int
 main(): Assertion `val == 5' failed.
 
 since val is 6 now.

Right. I am not sure why popcnt was prioritized below arch=corei7. The logic 
is supposed to be that any target that includes an extension is prioritized 
above that extension. I included the change in the order in i386.c, but not 
the updated test. It is probably better to not change the order in i386.c, or 
at least change that in a separate commit.

`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-24 Thread Allan Sandfeld Jensen

On Saturday 24 January 2015, Uros Bizjak wrote:
 On Mon, Jan 12, 2015 at 6:02 PM, Uros Bizjak ubiz...@gmail.com wrote:
  Hello!
  
  On Wed, Dec 31, 2014 at 01:28:47PM +0100, Allan Sandfeld Jensen wrote:
   I recently wanted to use multiversioning for BMI2 specific extensions
   PDEP/PEXT, and noticed it wasn't there. So I wrote this patch to add
   it, and also added AES, F16C and BMI1 for completeness.
  
  AES nor F16C doesn't make any sense IMHO for multiversioning, you need
  special intrinsics for that anyway and when you use them, the function
  will fail to compile without those features.
  Multiversioning only makes sense for ISA features the compiler uses for
  normal C/C++ code without any intrinsics.
  
  Patch reduced to just adding BMI and BMI2 multiversioning:
  +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
  +
  + * config/i386/i386.c (get_builtin_code_for_version): Add
  + support for BMI and BMI2 multiversion functions.
  
  +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
  +
  + * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
  + * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
  
  +2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
  +
  + * config/i386/cpuinfo.c (enum processor_features): Add FEATURE_BMI and
  + FEATURE_BMI2.
  + (get_available_features): Detect FEATURE_BMI and FEATURE_BMI2.
  
  OK for mainline
 
 Allan, did you commit the patch to mainline? I don't see it in SVN logs.
 
 (If you don't have SVN commit access, please mention it in the patch
 submission, so someone will commit the patch for you).
 
Sorry. I don't have SVN commit access.

`Allan

Re: [Patch, i386] Support BMI and BMI2 targets in multiversioning

2015-01-10 Thread Allan Sandfeld Jensen

On Wednesday 31 December 2014, Jakub Jelinek wrote:
 On Wed, Dec 31, 2014 at 01:28:47PM +0100, Allan Sandfeld Jensen wrote:
  I recently wanted to use multiversioning for BMI2 specific extensions
  PDEP/PEXT, and noticed it wasn't there. So I wrote this patch to add it,
  and also added AES, F16C and BMI1 for completeness.
 
 AES nor F16C doesn't make any sense IMHO for multiversioning, you need
 special intrinsics for that anyway and when you use them, the function will
 fail to compile without those features.
 Multiversioning only makes sense for ISA features the compiler uses for
 normal C/C++ code without any intrinsics.
 
Patch reduced to just adding BMI and BMI2 multiversioning:

Best regards
`Allan
commit aa5b98407bcc3ae2f1fef70455b89447112c3e2c
Author: Allan Sandfeld Jensen allan.jen...@digia.com
Date:   Fri Dec 26 21:14:01 2014 +0100

BMI and BMI2 multiversion support

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ff8a5e6..6afd58a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+	* config/i386/i386.c (get_builtin_code_for_version): Add
+	support for BMI and BMI2 multiversion functions.
+
 2014-12-27  H.J. Lu  hongjiu...@intel.com
 
 	PR target/64409
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d693fdb..9ae9bca 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34261,15 +34261,18 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,
-P_PROC_SSE4_2,
 P_POPCNT,
+P_PROC_SSE4_2,
 P_AVX,
 P_PROC_AVX,
+P_BMI,
+P_PROC_BMI,
 P_FMA4,
 P_XOP,
 P_PROC_XOP,
 P_FMA,
 P_PROC_FMA,
+P_BMI2,
 P_AVX2,
 P_PROC_AVX2,
 P_AVX512F,
@@ -34295,12 +34298,14 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
   {sse4a, P_SSE4_A},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
-  {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
+  {sse4.2, P_SSE4_2},
   {avx, P_AVX},
+  {bmi, P_BMI},
   {fma4, P_FMA4},
   {xop, P_XOP},
   {fma, P_FMA},
+  {bmi2, P_BMI2},
   {avx2, P_AVX2},
   {avx512f, P_AVX512F}
 };
@@ -34395,7 +34400,7 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	  break;
 	case PROCESSOR_BTVER2:
 	  arg_str = btver2;
-	  priority = P_PROC_AVX;
+	  priority = P_PROC_BMI;
 	  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index ef6ddcc..5b11622 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+	* gcc.target/i386/funcspec-5.c: Test new multiversion targets.
+	* g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
+
 2014-12-28  H.J. Lu  hongjiu...@intel.com
 
 	* gcc.target/i386/pr57003.c: Skip on x32.
diff --git a/gcc/testsuite/g++.dg/ext/mv17.C b/gcc/testsuite/g++.dg/ext/mv17.C
new file mode 100644
index 000..311f217
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mv17.C
@@ -0,0 +1,91 @@
+/* Test case to check if Multiversioning works for BMI and BMI2.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options -O2 } */
+
+#include assert.h
+
+// Check BMI feature selection works
+int foo () __attribute__((target(default)));
+int foo () __attribute__((target(bmi)));
+int foo () __attribute__((target(bmi2)));
+
+// Check specialized versions for archs with BMI is chosen over generic BMI versions.
+int bar () __attribute__((target(default)));
+int bar () __attribute__((target(bmi)));
+int bar () __attribute__((target(bmi2)));
+int bar () __attribute__((target(arch=btver2)));
+int bar () __attribute__((target(arch=haswell)));
+
+int main ()
+{
+  int val = foo ();
+
+  if (__builtin_cpu_supports (bmi2))
+assert (val == 2);
+  else if (__builtin_cpu_supports (bmi))
+assert (val == 1);
+  else
+assert (val == 0);
+
+  val = bar ();
+
+  if (__builtin_cpu_is (btver2)
+assert (val == 5);
+  else if (__builtin_cpu_is (haswell))
+assert (val == 6);
+  else if (__builtin_cpu_supports (bmi2))
+assert (val == 2);
+  else if (__builtin_cpu_supports (bmi))
+assert (val == 1);
+  else
+assert (val == 0);
+
+  return 0;
+}
+
+int __attribute__ ((target(default)))
+foo ()
+{
+  return 0;
+}
+
+int __attribute__ ((target(bmi)))
+foo ()
+{
+  return 1;
+}
+int __attribute__ ((target(bmi2)))
+foo ()
+{
+  return 2;
+}
+
+int __attribute__ ((target(default)))
+bar ()
+{
+  return 0;
+}
+
+int __attribute__ ((target(bmi)))
+bar ()
+{
+  return 1;
+}
+int __attribute__ ((target(bmi2)))
+bar ()
+{
+  return 2;
+}
+
+int __attribute__ ((target(arch=btver2)))
+bar ()
+{
+  return 5;
+}
+
+int __attribute__ ((target(arch=haswell)))
+bar ()
+{
+  return 6;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/funcspec-5.c b/gcc/testsuite/gcc.target/i386/funcspec-5.c
index 269e610..2e316a7

[Patch, i386] Support AES, F16C, BMI and BMI2 targets in multiversioning

2014-12-31 Thread Allan Sandfeld Jensen

I recently wanted to use multiversioning for BMI2 specific extensions 
PDEP/PEXT, and noticed it wasn't there. So I wrote this patch to add it, and 
also added AES, F16C and BMI1 for completeness.

Happy new year
`Allan
commit 062c09d45d22302ffbd4f86d88e16a1a0d49cd80
Author: Allan Sandfeld Jensen allan.jen...@digia.com
Date:   Fri Dec 26 21:14:01 2014 +0100

AES, F16C BMI and BMI2 multiversion support

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ff8a5e6..83f16a5 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+   * config/i386/i386.c (get_builtin_code_for_version): Add
+   support for AES, BMI, BMI2 and F16C multiversion functions.
+
 2014-12-27  H.J. Lu  hongjiu...@intel.com
 
PR target/64409
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d693fdb..a1b74dc 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -34261,15 +34261,22 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
 P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,
-P_PROC_SSE4_2,
 P_POPCNT,
+P_PROC_SSE4_2,
+P_AES,
+P_PROC_AES,
 P_AVX,
 P_PROC_AVX,
+P_F16C,
+P_PROC_F16C,
+P_BMI,
+P_PROC_BMI,
 P_FMA4,
 P_XOP,
 P_PROC_XOP,
 P_FMA,
 P_PROC_FMA,
+P_BMI2,
 P_AVX2,
 P_PROC_AVX2,
 P_AVX512F,
@@ -34295,12 +34302,16 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
   {sse4a, P_SSE4_A},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
-  {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
+  {sse4.2, P_SSE4_2},
+  {aes, P_AES},
   {avx, P_AVX},
+  {f16c, P_F16C},
+  {bmi, P_BMI},
   {fma4, P_FMA4},
   {xop, P_XOP},
   {fma, P_FMA},
+  {bmi2, P_BMI2},
   {avx2, P_AVX2},
   {avx512f, P_AVX512F}
 };
@@ -34350,21 +34361,25 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
  priority = P_PROC_SSSE3;
  break;
case PROCESSOR_NEHALEM:
- if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_AES)
+ if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_AES) {
arg_str = westmere;
- else
+priority = P_PROC_AES;
+  } else {
/* We translate arch=corei7 and arch=nehalem to
   corei7 so that it will be mapped to M_INTEL_COREI7
   as cpu type to cover all M_INTEL_COREI7_XXXs.  */
arg_str = corei7;
- priority = P_PROC_SSE4_2;
+priority = P_PROC_SSE4_2;
+  }
  break;
case PROCESSOR_SANDYBRIDGE:
- if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_F16C)
+ if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_F16C) {
arg_str = ivybridge;
- else
+   priority = P_PROC_F16C;
+ } else {
arg_str = sandybridge;
- priority = P_PROC_AVX;
+   priority = P_PROC_AVX;
+ }
  break;
case PROCESSOR_HASWELL:
  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_ADX)
@@ -34395,7 +34410,7 @@ get_builtin_code_for_version (tree decl, tree 
*predicate_list)
  break;
case PROCESSOR_BTVER2:
  arg_str = btver2;
- priority = P_PROC_AVX;
+ priority = P_PROC_BMI;
  break;
case PROCESSOR_BDVER1:
  arg_str = bdver1;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index ef6ddcc..5b11622 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+   * gcc.target/i386/funcspec-5.c: Test new multiversion targets.
+   * g++.dg/ext/mv17.C: Test BMI/BMI2 multiversion dispatcher.
+
 2014-12-28  H.J. Lu  hongjiu...@intel.com
 
* gcc.target/i386/pr57003.c: Skip on x32.
diff --git a/gcc/testsuite/g++.dg/ext/mv17.C b/gcc/testsuite/g++.dg/ext/mv17.C
new file mode 100644
index 000..311f217
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mv17.C
@@ -0,0 +1,91 @@
+/* Test case to check if Multiversioning works for BMI and BMI2.  */
+/* { dg-do run { target i?86-*-* x86_64-*-* } } */
+/* { dg-options -O2 } */
+
+#include assert.h
+
+// Check BMI feature selection works
+int foo () __attribute__((target(default)));
+int foo () __attribute__((target(bmi)));
+int foo () __attribute__((target(bmi2)));
+
+// Check specialized versions for archs with BMI is chosen over generic BMI 
versions.
+int bar () __attribute__((target(default)));
+int bar () __attribute__((target(bmi)));
+int bar () __attribute__((target(bmi2)));
+int bar () __attribute__((target(arch=btver2)));
+int bar () __attribute__((target(arch=haswell)));
+
+int main ()
+{
+  int val = foo ();
+
+  if (__builtin_cpu_supports (bmi2))
+assert (val == 2);
+  else

Re: [Patch, i386] Support AES, F16C, BMI and BMI2 targets in multiversioning

2014-12-31 Thread Allan Sandfeld Jensen

On Wednesday 31 December 2014, Jakub Jelinek wrote:
 On Wed, Dec 31, 2014 at 01:28:47PM +0100, Allan Sandfeld Jensen wrote:
  I recently wanted to use multiversioning for BMI2 specific extensions
  PDEP/PEXT, and noticed it wasn't there. So I wrote this patch to add it,
  and also added AES, F16C and BMI1 for completeness.
 
 AES nor F16C doesn't make any sense IMHO for multiversioning, you need
 special intrinsics for that anyway and when you use them, the function will
 fail to compile without those features.
 Multiversioning only makes sense for ISA features the compiler uses for
 normal C/C++ code without any intrinsics.
 
I disagree. You just include the intrinsics and use them. Inside a function 
with the right target rule they will work even if they don't outside. This 
works even without multiversioning, e.g. just specific functions with specific 
targets.

`Allan

Re: [Patch, i386] Separate Intel processor with expanded ISA

2014-01-26 Thread Allan Sandfeld Jensen

Updated patch with test.
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ccbea0f..e80c30b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2013-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+	* config/i386/i386.c (get_builtin_code_for_version): Separate
+	Westmere from Nehalem, Ivy Bridge from Sandy Bridge and
+	Broadwell from Haswell. 
+
 2014-01-25  Walter Lee  w...@tilera.com
 
 	* config/tilegx/sync.md (atomic_fetch_sub): Fix negation and
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cf79486..50f3624 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -31298,18 +31298,27 @@ get_builtin_code_for_version (tree decl, tree *predicate_list)
 	  priority = P_PROC_SSSE3;
 	  break;
 	case PROCESSOR_NEHALEM:
-	  /* We translate arch=corei7 and arch=nehelam to
-		 corei7 so that it will be mapped to M_INTEL_COREI7
-		 as cpu type to cover all M_INTEL_COREI7_XXXs.  */
-	  arg_str = corei7;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_AES)
+		arg_str = westmere;
+	  else
+		/* We translate arch=corei7 and arch=nehelam to
+		   corei7 so that it will be mapped to M_INTEL_COREI7
+		   as cpu type to cover all M_INTEL_COREI7_XXXs.  */
+		arg_str = corei7;
 	  priority = P_PROC_SSE4_2;
 	  break;
 	case PROCESSOR_SANDYBRIDGE:
-	  arg_str = sandybridge;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_F16C)
+		arg_str = ivybridge;
+	  else
+		arg_str = sandybridge;
 	  priority = P_PROC_AVX;
 	  break;
 	case PROCESSOR_HASWELL:
-	  arg_str = haswell;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_ADX)
+		arg_str = broadwell;
+	  else
+		arg_str = haswell;
 	  priority = P_PROC_AVX2;
 	  break;
 	case PROCESSOR_BONNELL:
diff --git a/gcc/testsuite/g++.dg/ext/mv16.C b/gcc/testsuite/g++.dg/ext/mv16.C
new file mode 100644
index 000..4737c79
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/mv16.C
@@ -0,0 +1,69 @@
+// Test that dispatching can choose the right multiversion
+// for Intel CPUs with the same internal GCC processor id
+// but slighly different sets of x86 extensions.
+
+// { dg-do run { target i?86-*-* x86_64-*-* } }
+// { dg-require-ifunc  }
+// { dg-options -O2 }
+
+#include assert.h
+
+int __attribute__ ((target(default)))
+foo ()
+{
+return 0;
+}
+
+int __attribute__ ((target(arch=nehalem)))
+foo ()
+{
+return 4;
+}
+
+int __attribute__ ((target(arch=westmere)))
+foo ()
+{
+return 5;
+}
+
+int __attribute__ ((target(arch=sandybridge)))
+foo ()
+{
+return 8;
+}
+
+int __attribute__ ((target(arch=ivybridge)))
+foo ()
+{
+return 9;
+}
+
+int __attribute__ ((target(arch=haswell)))
+foo ()
+{
+return 12;
+}
+
+int main ()
+{
+int val = foo ();
+
+if (__builtin_cpu_is (nehalem))
+	assert (val == 4);
+else
+if (__builtin_cpu_is (westmere))
+	assert (val == 5);
+else
+if (__builtin_cpu_is (sandybridge))
+	assert (val == 8);
+else
+if (__builtin_cpu_is (ivybridge))
+	assert (val == 9);
+else
+if (__builtin_cpu_is (haswell))
+	assert (val == 12);
+else
+	assert (val == 0);
+  
+return 0;
+}

Re: [Patch, i386] Separate Intel processor with expanded ISA

2014-01-07 Thread Allan Sandfeld Jensen

No comments? 

On Sunday 29 December 2013, Allan Sandfeld Jensen wrote:
 The function dispatcher might currently choose functions declared with
 target(arch=ivybridge) on a Sandy Bridge CPU. This happens because the
 function is only detected as sandybridge when generated. The attached patch
 detects Westmere, Ivybridge and Broadwell processors based on new ISAs they
 support.
 
 Regards
 `Allan

[Patch, i386] Separate Intel processor with expanded ISA

2013-12-28 Thread Allan Sandfeld Jensen

The function dispatcher might currently choose functions declared with 
target(arch=ivybridge) on a Sandy Bridge CPU. This happens because the 
function is only detected as sandybridge when generated. The attached patch 
detects Westmere, Ivybridge and Broadwell processors based on new ISAs they 
support.

Regards
`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 206233)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-29  Allan Sandfeld Jensen  sandf...@kde.org
+
+	* config/i386/i386.c (get_builtin_code_for_version): Separate
+	Westmere from Nehalem, Ivy Bridge from Sandy Bridge and
+	Broadwell from Haswell. 
+
 2013-12-28  Eric Botcazou  ebotca...@adacore.com
 
 	* doc/invoke.texi (output file options): Document -fada-spec-parent.
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206233)
+++ gcc/config/i386/i386.c	(working copy)
@@ -30030,18 +30005,27 @@
 	  priority = P_PROC_SSSE3;
 	  break;
 	case PROCESSOR_NEHALEM:
-	  /* We translate arch=corei7 and arch=nehelam to
-		 corei7 so that it will be mapped to M_INTEL_COREI7
-		 as cpu type to cover all M_INTEL_COREI7_XXXs.  */
-	  arg_str = corei7;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_AES)
+		arg_str = westmere;
+	  else
+		/* We translate arch=corei7 and arch=nehelam to
+		   corei7 so that it will be mapped to M_INTEL_COREI7
+		   as cpu type to cover all M_INTEL_COREI7_XXXs.  */
+		arg_str = corei7;
 	  priority = P_PROC_SSE4_2;
 	  break;
 	case PROCESSOR_SANDYBRIDGE:
-	  arg_str = sandybridge;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_F16C)
+		arg_str = ivybridge;
+	  else
+		arg_str = sandybridge;
 	  priority = P_PROC_AVX;
 	  break;
 	case PROCESSOR_HASWELL:
-	  arg_str = haswell;
+	  if (new_target-x_ix86_isa_flags  OPTION_MASK_ISA_ADX)
+		arg_str = broadwell;
+	  else
+		arg_str = haswell;
 	  priority = P_PROC_AVX2;
 	  break;
 	case PROCESSOR_BONNELL:

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-26 Thread Allan Sandfeld Jensen

On Thursday 26 December 2013, Gopalasubramanian, Ganesh wrote:
 Hi,
 
  (get_amd_cpu): Handle AMD_BOBCAT, AMD_JAGUAR, AMDFAM15H_BDVER2 and
  AMDFAM15H_BDVER3.
 
 As mentioned earlier, we would like to stick with BTVER1 and BTVER2 instead
 of using BOBCAT or JAGUAR. Attached patch does the changes.
 
Sorry for missing your comment. Thanks for fixing it. Renaming the comments 
with the AMD family names might be overdoing it though. 

`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-25 Thread Allan Sandfeld Jensen

On Wednesday 25 December 2013, Uros Bizjak wrote:
 On Tue, Dec 24, 2013 at 4:17 PM, Allan Sandfeld Jensen
 
 carew...@gmail.com wrote:
  Will libgcc/config/i386/cpuinfo.c update be a separate patch?
  Should we use a single definition for both i386.c and libgcc?
  
  Currently they need to be in the same patch. But yes, moving the
  definition out to a common header would probably be a good idea to
  reduce potential mismatches in future.
 
 This could be a follow-up patch.
 
  How does the patch get commited after being accepted? It has been many
  years since I last contributed to gcc, and I can not remember the rest
  of the process, and doubt it is still the same.
 
 Please send me ChangeLog entries, and I will commit the patch for you.
 
 Patch is OK for mainline.
 
Ah, sorry I moved the ChangeLog entries to ChangeLog.arch but it didn't get 
included by svn diff. I have attached it.

Christmas greetings
`Allan
gcc/

2013-12-23  Allan Sandfeld Jensen sandf...@kde.org

PR gcc/59422
* config/i386/i386.c: Extend function multiversioning
to better support recent Intel and AMD models.

2013-12-23   H.J. Lu  hongjiu...@intel.com

   * config/i386/i386.c (get_builtin_code_for_version): Handle
   PROCESSOR_HASWELL and PROCESSOR_SILVERMONT.
   (processor_model): Add M_INTEL_COREI7_IVYBRIDGE and
   M_INTEL_COREI7_HASWELL.
   (arch_names_table): Add ivybridge, haswell, bonnell,
   silvermont.

libgcc/

2013-12-23  Allan Sandfeld Jensen sandf...@kde.org

PR gcc/59422
* config/i386/cpuinfo.c: Detect sse4a, fma4, xop and fma
ISAs and recent Intel and AMD models.

2013-12-23   H.J. Lu  hongjiu...@intel.com

   * config/i386/cpuinfo.c (processor_subtypes): Add
   INTEL_COREI7_IVYBRIDGE and INTEL_COREI7_HASWELL.
   (get_intel_cpu): Check Ivy Bridge and Haswell processors.

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-24 Thread Allan Sandfeld Jensen

On Monday 23 December 2013, H.J. Lu wrote:
 
 If you use
 
 {corei7-avx, M_INTEL_COREI7_SANYBRIDGE},
 {core-avx2, M_INTEL_COREI7_HASWELL},
 
 will it cause any problems?  When there are both
 
Actually I seems I don't need these definitions any more after your clean-up 
of Intel architecture names. I have attached patch with them removed (and 
named haswell enums back to corei7_haswell).

If both target(arch=corei7-avx) and target(arch=sandybridge) is present 
the dispatcher appears to choose sandybridge. If you want a warning for 
duplicates in this case, I suggest adding it in a later patch.

`Allan
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206179)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29970,16 +29970,21 @@
 P_SSE3,
 P_SSSE3,
 P_PROC_SSSE3,
-P_SSE4_a,
-P_PROC_SSE4_a,
+P_SSE4_A,
+P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29998,11 +30003,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_A},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30054,26 +30063,50 @@
 	  arg_str = nehalem;
 	  priority = P_PROC_SSE4_2;
 	  break;
-case PROCESSOR_SANDYBRIDGE:
-  arg_str = sandybridge;
-  priority = P_PROC_SSE4_2;
-  break;
+	case PROCESSOR_SANDYBRIDGE:
+	  arg_str = sandybridge;
+	  priority = P_PROC_AVX;
+	  break;
+	case PROCESSOR_HASWELL:
+	  arg_str = haswell;
+	  priority = P_PROC_AVX2;
+	  break;
 	case PROCESSOR_BONNELL:
 	  arg_str = bonnell;
 	  priority = P_PROC_SSSE3;
 	  break;
+	case PROCESSOR_SILVERMONT:
+	  arg_str = silvermont;
+	  priority = P_PROC_SSE4_2;
+	  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
-	  priority = P_PROC_SSE4_a;
+	  priority = P_PROC_SSE4_A;
 	  break;
+	case PROCESSOR_BTVER1:
+	  arg_str = bobcat;
+	  priority = P_PROC_SSE4_A;
+	  break;
+	case PROCESSOR_BTVER2:
+	  arg_str = jaguar;
+	  priority = P_PROC_AVX;
+	  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
+	case PROCESSOR_BDVER3:
+	  arg_str = bdver3;
+	  priority = P_PROC_FMA;
+	  break;
+	case PROCESSOR_BDVER4:
+	  arg_str = bdver4;
+	  priority = P_PROC_AVX2;
+	  break;
 	}  
 	}
 
@@ -30938,6 +30971,10 @@
 F_SSE4_2,
 F_AVX,
 F_AVX2,
+F_SSE4_A,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_MAX
   };
 
@@ -30955,6 +30992,8 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SILVERMONT,
+M_AMD_BOBCAT,
+M_AMD_JAGUAR,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30965,7 +31004,9 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_COREI7_HASWELL
   };
 
   static struct _arch_names_table
@@ -30984,15 +31025,21 @@
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
   {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {ivybridge, M_INTEL_COREI7_IVYBRIDGE},
+  {haswell, M_INTEL_COREI7_HASWELL},
+  {bonnell, M_INTEL_BONNELL},
+  {silvermont, M_INTEL_SILVERMONT},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL},
+  {bobcat, M_AMD_BOBCAT},  
   {amdfam15h, M_AMDFAM15H},
   {bdver1, M_AMDFAM15H_BDVER1},
   {bdver2, M_AMDFAM15H_BDVER2},
   {bdver3, M_AMDFAM15H_BDVER3},
   {bdver4, M_AMDFAM15H_BDVER4},
+  {jaguar, M_AMD_JAGUAR},  
 };
 
   static struct _isa_names_table
@@ -31009,9 +31056,13 @@
   {sse2,   F_SSE2},
   {sse3,   F_SSE3},
   {ssse3,  F_SSSE3},
+  {sse4a,  F_SSE4_A},
   {sse4.1, F_SSE4_1},
   {sse4.2, F_SSE4_2},
   {avx,F_AVX},
+  {fma4,   F_FMA4},
+  {xop,F_XOP},
+  {fma,F_FMA},
   {avx2,   F_AVX2}
 };
 
Index: gcc/testsuite/gcc.target/i386/funcspec-5.c
===
--- gcc/testsuite/gcc.target/i386/funcspec-5.c	(revision 206179)
+++ gcc/testsuite/gcc.target/i386/funcspec-5.c	(working copy)
@@ -17,7 +17,9 @@

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-24 Thread Allan Sandfeld Jensen

On Tuesday 24 December 2013, H.J. Lu wrote:
 
 Will libgcc/config/i386/cpuinfo.c update be a separate patch?
 Should we use a single definition for both i386.c and libgcc?

Currently they need to be in the same patch. But yes, moving the definition 
out to a common header would probably be a good idea to reduce potential 
mismatches in future.

How does the patch get commited after being accepted? It has been many years 
since I last contributed to gcc, and I can not remember the rest of the 
process, and doubt it is still the same.

Regards
`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-23 Thread Allan Sandfeld Jensen

On Monday 23 December 2013, H.J. Lu wrote:
 On Thu, Dec 19, 2013 at 11:20:39AM +0100, Allan Sandfeld Jensen wrote:
  On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote:
Sorry, I must have been looking at an older version, but as I said I
already did enable it in the latest patch. (see
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html )
   
   Sorry for causing another revision but we would like to stick with
   btver1 and btver2 rather than BOBCAT or JAGUAR. Therefore the
   changes would be like
  
  I will need to make an updated patch to move the new ISAs to the end of
  the list anyway. I will send it in a few days to give AMD or Intel
  developers time to comment on the current version.
 
 I renamed Intel processor names. Please update your patch.  Here is my
 patch to add more Intel processor support.  You can add it to your
 patch.
 
Updated patch attached. Rebased, fixed coding style, moved new ISA enums to 
the end and applied H.J.Lu's patch.

`Allan
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206179)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29970,16 +29970,21 @@
 P_SSE3,
 P_SSSE3,
 P_PROC_SSSE3,
-P_SSE4_a,
-P_PROC_SSE4_a,
+P_SSE4_A,
+P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29998,11 +30003,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_A},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30054,26 +30063,50 @@
 	  arg_str = nehalem;
 	  priority = P_PROC_SSE4_2;
 	  break;
-case PROCESSOR_SANDYBRIDGE:
-  arg_str = sandybridge;
-  priority = P_PROC_SSE4_2;
-  break;
+	case PROCESSOR_SANDYBRIDGE:
+	  arg_str = sandybridge;
+	  priority = P_PROC_AVX;
+	  break;
+	case PROCESSOR_HASWELL:
+	  arg_str = haswell;
+	  priority = P_PROC_SSE4_2;
+	  break;
 	case PROCESSOR_BONNELL:
 	  arg_str = bonnell;
 	  priority = P_PROC_SSSE3;
 	  break;
+	case PROCESSOR_SILVERMONT:
+	  arg_str = silvermont;
+	  priority = P_PROC_SSE4_2;
+	  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
-	  priority = P_PROC_SSE4_a;
+	  priority = P_PROC_SSE4_A;
 	  break;
+	case PROCESSOR_BTVER1:
+	  arg_str = bobcat;
+	  priority = P_PROC_SSE4_A;
+	  break;
+	case PROCESSOR_BTVER2:
+	  arg_str = jaguar;
+	  priority = P_PROC_AVX;
+	  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
+	case PROCESSOR_BDVER3:
+	  arg_str = bdver3;
+	  priority = P_PROC_FMA;
+	  break;
+	case PROCESSOR_BDVER4:
+	  arg_str = bdver4;
+	  priority = P_PROC_AVX2;
+	  break;
 	}  
 	}
 
@@ -30938,6 +30971,10 @@
 F_SSE4_2,
 F_AVX,
 F_AVX2,
+F_SSE4_A,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_MAX
   };
 
@@ -30955,6 +30992,10 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SILVERMONT,
+M_INTEL_COREI7_AVX,
+M_INTEL_CORE_AVX2,
+M_AMD_BOBCAT,
+M_AMD_JAGUAR,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30965,7 +31006,9 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_CORE_HASWELL
   };
 
   static struct _arch_names_table
@@ -30983,16 +31026,24 @@
   {corei7, M_INTEL_COREI7},
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
+  {corei7-avx, M_INTEL_COREI7_AVX},  
   {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {ivybridge, M_INTEL_COREI7_IVYBRIDGE},
+  {core-avx2, M_INTEL_CORE_AVX2},
+  {haswell, M_INTEL_CORE_HASWELL},
+  {bonnell, M_INTEL_BONNELL},
+  {silvermont, M_INTEL_SILVERMONT},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL},
+  {bobcat, M_AMD_BOBCAT},  
   {amdfam15h, M_AMDFAM15H},
   {bdver1, M_AMDFAM15H_BDVER1},
   {bdver2, M_AMDFAM15H_BDVER2},
   {bdver3, M_AMDFAM15H_BDVER3},
   {bdver4, M_AMDFAM15H_BDVER4},
+  {jaguar, M_AMD_JAGUAR},  
 };
 
   static struct _isa_names_table
@@ -31009,9

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-23 Thread Allan Sandfeld Jensen

On Monday 23 December 2013, Allan Sandfeld Jensen wrote:
 On Monday 23 December 2013, H.J. Lu wrote:
  On Thu, Dec 19, 2013 at 11:20:39AM +0100, Allan Sandfeld Jensen wrote:
   On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote:
 Sorry, I must have been looking at an older version, but as I said
 I already did enable it in the latest patch. (see
 http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html )

Sorry for causing another revision but we would like to stick with
btver1 and btver2 rather than BOBCAT or JAGUAR. Therefore the
changes would be like
   
   I will need to make an updated patch to move the new ISAs to the end of
   the list anyway. I will send it in a few days to give AMD or Intel
   developers time to comment on the current version.
  
  I renamed Intel processor names. Please update your patch.  Here is my
  patch to add more Intel processor support.  You can add it to your
  patch.
 
 Updated patch attached. Rebased, fixed coding style, moved new ISA enums to
 the end and applied H.J.Lu's patch.
 
Fixed merging mistake that left haswell with SSE4_2 priority.

`Allan
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206179)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29970,16 +29970,21 @@
 P_SSE3,
 P_SSSE3,
 P_PROC_SSSE3,
-P_SSE4_a,
-P_PROC_SSE4_a,
+P_SSE4_A,
+P_PROC_SSE4_A,
 P_SSE4_1,
 P_SSE4_2,
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29998,11 +30003,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_A},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30054,26 +30063,50 @@
 	  arg_str = nehalem;
 	  priority = P_PROC_SSE4_2;
 	  break;
-case PROCESSOR_SANDYBRIDGE:
-  arg_str = sandybridge;
-  priority = P_PROC_SSE4_2;
-  break;
+	case PROCESSOR_SANDYBRIDGE:
+	  arg_str = sandybridge;
+	  priority = P_PROC_AVX;
+	  break;
+	case PROCESSOR_HASWELL:
+	  arg_str = haswell;
+	  priority = P_PROC_AVX2;
+	  break;
 	case PROCESSOR_BONNELL:
 	  arg_str = bonnell;
 	  priority = P_PROC_SSSE3;
 	  break;
+	case PROCESSOR_SILVERMONT:
+	  arg_str = silvermont;
+	  priority = P_PROC_SSE4_2;
+	  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
-	  priority = P_PROC_SSE4_a;
+	  priority = P_PROC_SSE4_A;
 	  break;
+	case PROCESSOR_BTVER1:
+	  arg_str = bobcat;
+	  priority = P_PROC_SSE4_A;
+	  break;
+	case PROCESSOR_BTVER2:
+	  arg_str = jaguar;
+	  priority = P_PROC_AVX;
+	  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
+	case PROCESSOR_BDVER3:
+	  arg_str = bdver3;
+	  priority = P_PROC_FMA;
+	  break;
+	case PROCESSOR_BDVER4:
+	  arg_str = bdver4;
+	  priority = P_PROC_AVX2;
+	  break;
 	}  
 	}
 
@@ -30938,6 +30971,10 @@
 F_SSE4_2,
 F_AVX,
 F_AVX2,
+F_SSE4_A,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_MAX
   };
 
@@ -30955,6 +30992,10 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SILVERMONT,
+M_INTEL_COREI7_AVX,
+M_INTEL_CORE_AVX2,
+M_AMD_BOBCAT,
+M_AMD_JAGUAR,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30965,7 +31006,9 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_CORE_HASWELL
   };
 
   static struct _arch_names_table
@@ -30983,16 +31026,24 @@
   {corei7, M_INTEL_COREI7},
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
+  {corei7-avx, M_INTEL_COREI7_AVX},  
   {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {ivybridge, M_INTEL_COREI7_IVYBRIDGE},
+  {core-avx2, M_INTEL_CORE_AVX2},
+  {haswell, M_INTEL_CORE_HASWELL},
+  {bonnell, M_INTEL_BONNELL},
+  {silvermont, M_INTEL_SILVERMONT},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL},
+  {bobcat, M_AMD_BOBCAT},  
   {amdfam15h, M_AMDFAM15H},
   {bdver1, M_AMDFAM15H_BDVER1},
   {bdver2, M_AMDFAM15H_BDVER2},
   {bdver3

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-23 Thread Allan Sandfeld Jensen

On Monday 23 December 2013, H.J. Lu wrote:
 On Mon, Dec 23, 2013 at 8:57 AM, Allan Sandfeld Jensen
 
 carew...@gmail.com wrote:
  On Monday 23 December 2013, Allan Sandfeld Jensen wrote:
  On Monday 23 December 2013, H.J. Lu wrote:
   On Thu, Dec 19, 2013 at 11:20:39AM +0100, Allan Sandfeld Jensen wrote:
On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote:
  Sorry, I must have been looking at an older version, but as I
  said I already did enable it in the latest patch. (see
  http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html )
 
 Sorry for causing another revision but we would like to stick with
 btver1 and btver2 rather than BOBCAT or JAGUAR. Therefore
 the changes would be like

I will need to make an updated patch to move the new ISAs to the end
of the list anyway. I will send it in a few days to give AMD or
Intel developers time to comment on the current version.
   
   I renamed Intel processor names. Please update your patch.  Here is my
   patch to add more Intel processor support.  You can add it to your
   patch.
  
  Updated patch attached. Rebased, fixed coding style, moved new ISA enums
  to the end and applied H.J.Lu's patch.
  
  Fixed merging mistake that left haswell with SSE4_2 priority.
  
  `Allan
 
 +M_INTEL_COREI7_AVX,
 +M_INTEL_CORE_AVX2,
 
 Do we need them?   M_INTEL_COREI7_AVX is the same
 M_INTEL_COREI7_SANDYBRIDGE and M_INTEL_CORE_AVX2
 is the same as M_INTEL_COREI7_HASWELL.
 
M_INTEL_COREI7_AVX is the common model for both sandybridge and ivybridge. 
Matching PROCESSOR_SANDYBRIDGE, or march=corei7-avx. Similarly 
M_INTEL_CORE_AVX2 is the common model for haswell and broadwell, matching 
PROCESSOR_HASWELL or march=core-avx2.

 +M_INTEL_CORE_HASWELL
 
 Please change M_INTEL_CORE_HASWELL to M_INTEL_COREI7_HASWELL.
 
I used the name core_haswell to make its prefix match that of its model 
core_avx2 (as opposed to corei7_avx for instance).

 +  {corei7-avx, M_INTEL_COREI7_AVX},
 +  {core-avx2, M_INTEL_CORE_AVX2},
 
 Why do we need them?

Without the existence of these entries, __attribute__((target(corei7-avx))) 
or __attribute__((target(core-avx2)) failed to compile because of how 
parameters to attributes were verified.

Regards
`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-19 Thread Allan Sandfeld Jensen

On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote:
  Yes, I changed that in the last patch, though I consider it momentarily
  problematic because you do not yet enable AVX with march=btver2 (AVX
  versions would currently be better than btver2 versions for a btver2
  arch), but expect
 
 march=btver2 will be fixed soon.
 
 The  processor_alias_table entry for btver2 in i386.c enables AVX.
 
 snip
   {btver2, PROCESSOR_BTVER2, CPU_BTVER2,
 PTA_64BIT | PTA_MMX |  PTA_SSE  | PTA_SSE2 | PTA_SSE3
 
 | PTA_SSSE3 | PTA_SSE4A |PTA_ABM | PTA_CX16 | PTA_SSE4_1
 | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX
 | PTA_BMI | PTA_F16C | PTA_MOVBE | PTA_PRFCHW
 | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT},
 
 /snip
 
 The assembly listing for a simple test (compiled with -march=btver2) also
 has -mavx enabled. So, can you please enable AVX for btver2?
 
Sorry, I must have been looking at an older version, but as I said I already 
did enable it in the latest patch. (see 
http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html )

`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-19 Thread Allan Sandfeld Jensen

On Thursday 19 December 2013, Gopalasubramanian, Ganesh wrote:
  Sorry, I must have been looking at an older version, but as I said I
  already did enable it in the latest patch. (see
  http://gcc.gnu.org/ml/gcc-patches/2013-12/msg01577.html )
 
 Sorry for causing another revision but we would like to stick with btver1
 and btver2 rather than BOBCAT or JAGUAR. Therefore the changes would
 be like
 
I will need to make an updated patch to move the new ISAs to the end of the 
list anyway. I will send it in a few days to give AMD or Intel developers time 
to comment on the current version.

`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Allan Sandfeld Jensen

Update patch. Solved __attribute((target(arch=corei7-avx))) by defining 
proper architectures for the recent Intel families instead of renaming 
submodels. 

I am thinking the patch is starting to touch a bit many different details, 
perhaps it should be split up, or is it good as is?

Regards
`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 206065)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen sandf...@kde.org
+
+PR gcc/59422
+* config/i386/i386.c: Extend function multiversioning
+to better support recent Intel and AMD models.
+
 2013-12-17  Jan Hubicka  hubi...@ucw.cz
 
 	* ipa-devirt.c (get_polymorphic_call_info): Fix offset calculatoin
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 206065)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29965,9 +29965,14 @@
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29986,11 +29991,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_a},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30044,25 +30053,49 @@
 	  break;
 case PROCESSOR_COREI7_AVX:
   arg_str = corei7-avx;
-  priority = P_PROC_SSE4_2;
+  priority = P_PROC_AVX;
   break;
+case PROCESSOR_HASWELL:
+  arg_str = core-avx2;
+  priority = P_PROC_AVX2;
+  break;
 	case PROCESSOR_ATOM:
 	  arg_str = atom;
 	  priority = P_PROC_SSSE3;
 	  break;
+case PROCESSOR_SLM:
+  arg_str = slm;
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
 	  priority = P_PROC_SSE4_a;
 	  break;
+case PROCESSOR_BTVER1:
+  arg_str = bobcat;
+  priority = P_PROC_SSE4_a;
+  break;
+case PROCESSOR_BTVER2:
+  arg_str = jaguar;
+  priority = P_PROC_AVX;
+  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
-	}  
+case PROCESSOR_BDVER3:
+  arg_str = bdver3;
+  priority = P_PROC_FMA;
+  break;
+case PROCESSOR_BDVER4:
+  arg_str = bdver4;
+  priority = P_PROC_AVX2;
+  break;
+}  
 	}
 
   cl_target_option_restore (global_options, cur_target);
@@ -30922,9 +30955,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX
   };
@@ -30943,6 +30980,10 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SLM,
+M_INTEL_COREI7_AVX,
+M_INTEL_CORE_AVX2,
+M_AMD_BOBCAT,
+M_AMD_JAGUAR,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30953,7 +30994,9 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_CORE_HASWELL
   };
 
   static struct _arch_names_table
@@ -30971,11 +31014,17 @@
   {corei7, M_INTEL_COREI7},
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
+  {corei7-avx, M_INTEL_COREI7_AVX},
   {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {ivybridge, M_INTEL_COREI7_IVYBRIDGE},
+  {core-avx2, M_INTEL_CORE_AVX2},
+  {haswell, M_INTEL_CORE_HASWELL},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL},
+  {bobcat, M_AMD_BOBCAT},
+  {jaguar, M_AMD_JAGUAR},
   {amdfam15h, M_AMDFAM15H},
   {bdver1, M_AMDFAM15H_BDVER1},
   {bdver2, M_AMDFAM15H_BDVER2},
@@ -30997,9 +31046,13 @@
   {sse2,   F_SSE2},
   {sse3,   F_SSE3},
   {ssse3,  F_SSSE3},
+  {sse4a,  F_SSE4_a},
   {sse4.1, F_SSE4_1},
   {sse4.2, F_SSE4_2},
   {avx,F_AVX},
+  {fma4,   F_FMA4},
+  {xop,F_XOP},
+  {fma,F_FMA},
   {avx2,   F_AVX2}
 };
 
Index: gcc/testsuite/gcc.target/i386/funcspec-5.c
===
--- gcc/testsuite/gcc.target/i386/funcspec-5.c

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-18 Thread Allan Sandfeld Jensen

On Wednesday 18 December 2013, Gopalasubramanian, Ganesh wrote:
 Ping!
 
 Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote:
  Yes, I figured that was the original idea behind it, but the final family
  of the jaguar processors seems to have become 16h instead of 14h
  (bobcat) at some point.
 
 Yes. It is amdfam16h. I was supposed to pass on some comments on the patch.
 1. Amdfam16h for Jaguar.
 2. For Jaguar, the priority needs to be AVX (AVX got included into the
 Jaguar ISA).
 
Yes, I changed that in the last patch, though I consider it momentarily 
problematic because you do not yet enable AVX with march=btver2 (AVX versions 
would currently be better than btver2 versions for a btver2 arch), but expect 
march=btver2 will be fixed soon.

Regards
'Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-17 Thread Allan Sandfeld Jensen

On Monday 16 December 2013, Uros Bizjak wrote:
 On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak ubiz...@gmail.com wrote:
  On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen
  
  carew...@gmail.com wrote:
  Hi again
  
  On Wednesday 11 December 2013, Uros Bizjak wrote:
  Hello!
  
   PR gcc/59422
   
   This patch extends the supported targets for function multi versiong
   to also include Haswell, Silvermont, and the most recent AMD models.
   It also prioritizes AVX2 versions over AMD specific pre-AVX2
   versions.
  
  Please add a ChangeLog entry and attach the complete patch. Please
  also state how you tested the patch, as outlined in the instructions
  [1].
  
  [1] http://gcc.gnu.org/contribute.html
  
  Updated patch for better CPU model detection and added ChangeLog.
  
  The patch has been tested with the attached test.cpp. Verified that it
  doesn't build before the patch, and that it builds after, and verified
  it selects correct versions at runtime based on either CPU model or
  supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom
  II).
  
  Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is
  that an old term for what has become the Jaguar architecture?
  
  Thanks for the patch!
  
  However, you should not change the existing order of enums in
  cpuinfo.c (enum processor_vendor, enum processor_types, enum
  processor_subtypes, enum processor_features), but new entries should
  be added at the end (before *_MAX entry, if exists) of the enum. The
  enums (enum processor_features and enum processor_model) in
  config/i386/i386.c should mirror these changes. Please see [1].
  
  Probably, we should document this in the source...
  
  -  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
  +  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
  
  Huh... Thanks for catching this. -march=sandybridge is not recognized...
 
 -  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
 +  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
 +  {core-avx-i, M_INTEL_COREI7_IVYBRIDGE},
 +  {core-avx2, M_INTEL_COREI7_HASWELL},
 
 Ah, no. These names are not intended to be used in -march. We can
 follow the tradition and use sandybridge, ivybridge and haswell here.
 
I had the problem that arch=corei7-avx was not recognized as a valid 
property argument until I made that change. I thought it was the intend to 
merge this list of models with the canonical names, but perhaps it is an error 
in the new parameter validation?

Note that similarly arch=sandybridge is accepted as a valid property 
argument but then fails as an invalid argument for march.

Regards
`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-17 Thread Allan Sandfeld Jensen

On Tuesday 17 December 2013, Allan Sandfeld Jensen wrote:
 On Monday 16 December 2013, Uros Bizjak wrote:
  On Mon, Dec 16, 2013 at 10:34 AM, Uros Bizjak ubiz...@gmail.com wrote:
   On Sun, Dec 15, 2013 at 7:54 PM, Allan Sandfeld Jensen
   
   carew...@gmail.com wrote:
   Hi again
   
   On Wednesday 11 December 2013, Uros Bizjak wrote:
   Hello!
   
PR gcc/59422

This patch extends the supported targets for function multi
versiong to also include Haswell, Silvermont, and the most recent
AMD models. It also prioritizes AVX2 versions over AMD specific
pre-AVX2 versions.
   
   Please add a ChangeLog entry and attach the complete patch. Please
   also state how you tested the patch, as outlined in the instructions
   [1].
   
   [1] http://gcc.gnu.org/contribute.html
   
   Updated patch for better CPU model detection and added ChangeLog.
   
   The patch has been tested with the attached test.cpp. Verified that it
   doesn't build before the patch, and that it builds after, and verified
   it selects correct versions at runtime based on either CPU model or
   supported ISA (tested on 3 machines: SandyBridge, IvyBridge and Phenom
   II).
   
   Btw, I couldn't find anything that corresponds to gcc's btver2 arch.
   Is that an old term for what has become the Jaguar architecture?
   
   Thanks for the patch!
   
   However, you should not change the existing order of enums in
   cpuinfo.c (enum processor_vendor, enum processor_types, enum
   processor_subtypes, enum processor_features), but new entries should
   be added at the end (before *_MAX entry, if exists) of the enum. The
   enums (enum processor_features and enum processor_model) in
   config/i386/i386.c should mirror these changes. Please see [1].
   
   Probably, we should document this in the source...
   
   -  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
   +  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
   
   Huh... Thanks for catching this. -march=sandybridge is not
   recognized...
  
  -  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
  +  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
  +  {core-avx-i, M_INTEL_COREI7_IVYBRIDGE},
  +  {core-avx2, M_INTEL_COREI7_HASWELL},
  
  Ah, no. These names are not intended to be used in -march. We can
  follow the tradition and use sandybridge, ivybridge and haswell here.
 
 I had the problem that arch=corei7-avx was not recognized as a valid
 property argument until I made that change. I thought it was the intend to
 merge this list of models with the canonical names, but perhaps it is an
 error in the new parameter validation?
 
Ah, sorry. I think I misremembered the problem. After reviewing the code 
again, I think the only problem is with target(arch=core-avx-i) because it 
is not in the list of architectures (because it is treated as the same 
architecture as corei7-avx presumably).

I will revert the sandybridge name change in the next patch, and make the new 
names match.

`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-16 Thread Allan Sandfeld Jensen

Patch updated to keep libgcc enums backwards compatible.

`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 205984)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen sandf...@kde.org
+
+PR gcc/59422
+* config/i386/i386.c: Extend function multiversioning
+to better support recent Intel and AMD models.
+
 2013-12-14  Marek Polacek  pola...@redhat.com
 
 	PR sanitizer/59503
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 205984)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29962,9 +29962,14 @@
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29983,11 +29988,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_a},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30041,25 +30050,49 @@
 	  break;
 case PROCESSOR_COREI7_AVX:
   arg_str = corei7-avx;
-  priority = P_PROC_SSE4_2;
+  priority = P_PROC_AVX;
   break;
+case PROCESSOR_HASWELL:
+  arg_str = core-avx2;
+  priority = P_PROC_AVX2;
+  break;
 	case PROCESSOR_ATOM:
 	  arg_str = atom;
 	  priority = P_PROC_SSSE3;
 	  break;
+case PROCESSOR_SLM:
+  arg_str = slm;
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
 	  priority = P_PROC_SSE4_a;
 	  break;
+case PROCESSOR_BTVER1:
+  arg_str = btver1;
+  priority = P_PROC_SSE4_a;
+  break;
+case PROCESSOR_BTVER2:
+  arg_str = btver2;
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
-	}  
+case PROCESSOR_BDVER3:
+  arg_str = bdver3;
+  priority = P_PROC_FMA;
+  break;
+case PROCESSOR_BDVER4:
+  arg_str = bdver4;
+  priority = P_PROC_AVX2;
+  break;
+}  
 	}
 
   cl_target_option_restore (global_options, cur_target);
@@ -30919,9 +30952,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX
   };
@@ -30940,6 +30977,7 @@
 M_AMDFAM10H,
 M_AMDFAM15H,
 M_INTEL_SLM,
+M_AMDFAM14H,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
@@ -30950,7 +30988,10 @@
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
-M_AMDFAM15H_BDVER4
+M_AMDFAM15H_BDVER4,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_COREI7_HASWELL,
+M_AMDFAM14H_BTVER1,
   };
 
   static struct _arch_names_table
@@ -30968,11 +31009,15 @@
   {corei7, M_INTEL_COREI7},
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
-  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
+  {core-avx-i, M_INTEL_COREI7_IVYBRIDGE},
+  {core-avx2, M_INTEL_COREI7_HASWELL},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL},
+  {amdfam14h, M_AMDFAM14H},
+  {btver1, M_AMDFAM14H_BTVER1},
   {amdfam15h, M_AMDFAM15H},
   {bdver1, M_AMDFAM15H_BDVER1},
   {bdver2, M_AMDFAM15H_BDVER2},
@@ -30994,9 +31039,13 @@
   {sse2,   F_SSE2},
   {sse3,   F_SSE3},
   {ssse3,  F_SSSE3},
+  {sse4a,  F_SSE4_a},
   {sse4.1, F_SSE4_1},
   {sse4.2, F_SSE4_2},
   {avx,F_AVX},
+  {fma4,   F_FMA4},
+  {xop,F_XOP},
+  {fma,F_FMA},
   {avx2,   F_AVX2}
 };
 
Index: libgcc/ChangeLog
===
--- libgcc/ChangeLog	(revision 205984)
+++ libgcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen sandf...@kde.org
+
+PR gcc/59422
+* config/i386/cpuinfo.c: Detect sse4a, fma4, xop and fma
+ISAs and recent Intel and AMD models.
+
 2013-12-12  Zhenqiang Chen  zhenqiang.c...@arm.com
 
 	* config.host (arm*-*-uclinux*): Move t-arm before t-bpabi.
Index

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-16 Thread Allan Sandfeld Jensen

On Monday 16 December 2013, Gopalasubramanian, Ganesh wrote:
  Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is
  that an old term for what has become the Jaguar architecture?
 
 Yes, btver2 = jaguar. We have the name as per its family name (i.e,
 bobcat family) in GCC. Similarly we have the names bdver2 =
 piledriver, bdver3 = steamroller as per their family (bulldozer)
 name.
 
Yes, I figured that was the original idea behind it, but the final family of 
the jaguar processors seems to have become 16h instead of 14h (bobcat) at some 
point.

Regards
`Allan

Re: [Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-15 Thread Allan Sandfeld Jensen

Hi again
On Wednesday 11 December 2013, Uros Bizjak wrote:
 Hello!
 
  PR gcc/59422
  
  This patch extends the supported targets for function multi versiong to
  also include Haswell, Silvermont, and the most recent AMD models. It
  also prioritizes AVX2 versions over AMD specific pre-AVX2 versions.
 
 Please add a ChangeLog entry and attach the complete patch. Please
 also state how you tested the patch, as outlined in the instructions
 [1].
 
 [1] http://gcc.gnu.org/contribute.html
 
Updated patch for better CPU model detection and added ChangeLog.

The patch has been tested with the attached test.cpp. Verified that it doesn't 
build before the patch, and that it builds after, and verified it selects 
correct versions at runtime based on either CPU model or supported ISA (tested 
on 3 machines: SandyBridge, IvyBridge and Phenom II).

Btw, I couldn't find anything that corresponds to gcc's btver2 arch. Is that 
an old term for what has become the Jaguar architecture?

`Allan
Index: gcc/ChangeLog
===
--- gcc/ChangeLog	(revision 205984)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2013-12-14  Allan Sandfeld Jensen sandf...@kde.org
+
+PR gcc/59422
+* config/i386/i386.c: Extend function multiversioning
+to better support recent Intel and AMD models.
+
 2013-12-14  Marek Polacek  pola...@redhat.com
 
 	PR sanitizer/59503
Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c	(revision 205984)
+++ gcc/config/i386/i386.c	(working copy)
@@ -29962,9 +29962,14 @@
 P_PROC_SSE4_2,
 P_POPCNT,
 P_AVX,
+P_PROC_AVX,
+P_FMA4,
+P_XOP,
+P_PROC_XOP,
+P_FMA,
+P_PROC_FMA,
 P_AVX2,
-P_FMA,
-P_PROC_FMA
+P_PROC_AVX2
   };
 
  enum feature_priority priority = P_ZERO;
@@ -29983,11 +29988,15 @@
   {sse, P_SSE},
   {sse2, P_SSE2},
   {sse3, P_SSE3},
+  {sse4a, P_SSE4_a},
   {ssse3, P_SSSE3},
   {sse4.1, P_SSE4_1},
   {sse4.2, P_SSE4_2},
   {popcnt, P_POPCNT},
   {avx, P_AVX},
+  {fma4, P_FMA4},
+  {xop, P_XOP},
+  {fma, P_FMA},
   {avx2, P_AVX2}
 };
 
@@ -30041,25 +30050,49 @@
 	  break;
 case PROCESSOR_COREI7_AVX:
   arg_str = corei7-avx;
-  priority = P_PROC_SSE4_2;
+  priority = P_PROC_AVX;
   break;
+case PROCESSOR_HASWELL:
+  arg_str = core-avx2;
+  priority = P_PROC_AVX2;
+  break;
 	case PROCESSOR_ATOM:
 	  arg_str = atom;
 	  priority = P_PROC_SSSE3;
 	  break;
+case PROCESSOR_SLM:
+  arg_str = slm;
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_AMDFAM10:
 	  arg_str = amdfam10h;
 	  priority = P_PROC_SSE4_a;
 	  break;
+case PROCESSOR_BTVER1:
+  arg_str = btver1;
+  priority = P_PROC_SSE4_a;
+  break;
+case PROCESSOR_BTVER2:
+  arg_str = btver2;
+  priority = P_PROC_SSE4_2;
+  break;
 	case PROCESSOR_BDVER1:
 	  arg_str = bdver1;
-	  priority = P_PROC_FMA;
+	  priority = P_PROC_XOP;
 	  break;
 	case PROCESSOR_BDVER2:
 	  arg_str = bdver2;
 	  priority = P_PROC_FMA;
 	  break;
-	}  
+case PROCESSOR_BDVER3:
+  arg_str = bdver3;
+  priority = P_PROC_FMA;
+  break;
+case PROCESSOR_BDVER4:
+  arg_str = bdver4;
+  priority = P_PROC_AVX2;
+  break;
+}  
 	}
 
   cl_target_option_restore (global_options, cur_target);
@@ -30919,9 +30952,13 @@
 F_SSE2,
 F_SSE3,
 F_SSSE3,
+F_SSE4_a,
 F_SSE4_1,
 F_SSE4_2,
 F_AVX,
+F_FMA4,
+F_XOP,
+F_FMA,
 F_AVX2,
 F_MAX
   };
@@ -30938,15 +30975,20 @@
 M_INTEL_CORE2,
 M_INTEL_COREI7,
 M_AMDFAM10H,
+M_AMDFAM14H,
 M_AMDFAM15H,
 M_INTEL_SLM,
 M_CPU_SUBTYPE_START,
 M_INTEL_COREI7_NEHALEM,
 M_INTEL_COREI7_WESTMERE,
 M_INTEL_COREI7_SANDYBRIDGE,
+M_INTEL_COREI7_IVYBRIDGE,
+M_INTEL_COREI7_HASWELL,
 M_AMDFAM10H_BARCELONA,
 M_AMDFAM10H_SHANGHAI,
 M_AMDFAM10H_ISTANBUL,
+M_AMDFAM14H_BTVER1,
+M_AMDFAM14H_BTVER2,
 M_AMDFAM15H_BDVER1,
 M_AMDFAM15H_BDVER2,
 M_AMDFAM15H_BDVER3,
@@ -30968,11 +31010,16 @@
   {corei7, M_INTEL_COREI7},
   {nehalem, M_INTEL_COREI7_NEHALEM},
   {westmere, M_INTEL_COREI7_WESTMERE},
-  {sandybridge, M_INTEL_COREI7_SANDYBRIDGE},
+  {corei7-avx, M_INTEL_COREI7_SANDYBRIDGE},
+  {core-avx-i, M_INTEL_COREI7_IVYBRIDGE},
+  {core-avx2, M_INTEL_COREI7_HASWELL},
   {amdfam10h, M_AMDFAM10H},
   {barcelona, M_AMDFAM10H_BARCELONA},
   {shanghai, M_AMDFAM10H_SHANGHAI},
   {istanbul, M_AMDFAM10H_ISTANBUL

[Patch, i386] PR 59422 - Support more targets for function multi versioning

2013-12-10 Thread Allan Sandfeld Jensen

PR gcc/59422

This patch extends the supported targets for function multi versiong to also 
include Haswell, Silvermont, and the most recent AMD models. It also 
prioritizes AVX2 versions over AMD specific pre-AVX2 versions.

69 matches

Mail list logo