Re: [Darwin, committed] Fix PR87030 and tidy config fragments.

2019-07-23 Thread Iain Sandoe


> On 23 Jul 2019, at 21:31, Iain Sandoe  wrote:
> 
> This is about 32/64b host and multilib support across the range of Darwin
> systems.
> 
> Prior to Darwin8 (OS X 10.4), the toolchains support only PowerPC and only 
> 32b.
> 
> On Darwin8 it is possible to target a 64b multilib, but with support limited
> to a few of the main libraries on the system (not a recommended 
> configuration).
> 
> From Darwin9 to Darwin17 (OSX 10.5 to 10.13) it is possible to have either
> 32 or 64b hosted toolchains, with support for a 64 or 32b multilib 
> respectively.
> 
> On Darwin9 the kernel is 32b, but with support for 64b executables, so it's
> conventional to build a 32b host toolchain supporting a 64b multilib.  However
> this is not enforced (merely a convention).
> 
> There is also some platform hardware supporting Darwin10/11 which is only 32b
> and for which the same situation applies.  However, from Darwin10 to Darwin17,
> the majority of platform hardware supports a 64b kernel and it's conventional
> to build a 64b host toolchain with support for a 32b multilib.
> 
> On/from Darwin18 (OS X 10.14), the development headers (in the SDK) no longer
> expose the interfaces for the 32b multilib support (although sufficient 
> runtime
> support remains installed that the testsuite can be run for a 32b multilib).
> 
> The PR is raised against this latter situation since the absence of exposed
> interfaces causes a 'default' bootstrap fail regardless of the availability of
> the runtimes.  Given the number of permutations, I felt it warranted a general
> solution, especially since the current scheme of target headers and t-make
> fragments has become somewhat messy.
> 
> The changes here enforce the single 32b PowerPC multilib for Darwin < 8 and 
> the
> single X86 64b multilib for Darwin >= 18.  This means that there is no longer
> any need to configure Darwin18+ '--disable-multilib', but also that if you 
> want
> to use the ability to continue to test the compiler's 32b multilib there, you
> need to make a configuration targeting an earlier OS version (and using the
> SDK from that).
> 
> It has been tested across a range of Darwin systems (back to Darwin9, since 
> self-
> hosting on Darwin8 and 7 is currently in need of some tweaks).
> 
> applied to mainline
> thanks
> Iain

I missed committing the changes to one file and some comments, and failed to 
add the
patch here too.  corrected below.
Iain

> 
> gcc/
> 
>   PR bootstrap/87030
>   * config.gcc (*-*-darwin*): Don't include CPU t-darwin here.
>   (i[34567]86-*-darwin*): Adjust to use biarch files. Produce
>   an error message if i686-darwin configuration is attempted for
>   Darwin >= 18.
>   (x86_64-*-darwin*): Switch to single multilib for Darwin >= 18.
>   (powerpc-*-darwin*): Use biarch files where needed.
>   (powerpc64-*-darwin*): Likewise.
>   * config/i386/darwin.h (REAL_LIBGCC_SPEC): Move to new biarch file.
>   (DARWIN_ARCH_SPEC, DARWIN_SUBARCH_SPEC): Revise for default single
>   arch case.
>   * config/i386/darwin32-biarch.h: New.
>   * config/i386/darwin64.h: Rename.
>   * gcc/config/i386/darwin64-biarch.h: To this.
>   * config/i386/t-darwin: Rename.
>   * gcc/config/i386/t-darwin32-biarch: To this.
>   * config/i386/t-darwin64: Rename.
>   * gcc/config/i386/t-darwin64-biarch: To this.
>   * config/rs6000/darwin32-biarch.h: New.
>   * config/rs6000/darwin64.h: Rename.
>   * config/rs6000/darwin64-biarch.h: To this.
>   (DARWIN_ARCH_SPEC, DARWIN_SUBARCH_SPEC): Revise for default single
>   arch case.
>   * config/rs6000/t-darwin8: Rename.
>   * config/rs6000/t-darwin32-biarch: To this.
>   * config/rs6000/t-darwin64 Rename.
>   * config/rs6000/t-darwin64-biarch: To this

diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
index 93bd5588b8..fea3f73854 100644
--- a/gcc/config/i386/darwin.h
+++ b/gcc/config/i386/darwin.h
@@ -139,9 +139,6 @@ along with GCC; see the file COPYING3.  If not see
   " ASM_OPTIONS " -force_cpusubtype_ALL \
   %{static}" ASM_MMACOSX_VERSION_MIN_SPEC
 
-#define DARWIN_ARCH_SPEC "%{m64:x86_64;:i386}"
-#define DARWIN_SUBARCH_SPEC DARWIN_ARCH_SPEC
-
 #undef ENDFILE_SPEC
 #define ENDFILE_SPEC \
   "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
@@ -149,12 +146,15 @@ along with GCC; see the file COPYING3.  If not see
%{mpc64:crtprec64.o%s} \
%{mpc80:crtprec80.o%s}" TM_DESTRUCTOR
 
+#define DARWIN_ARCH_SPEC "x86_64"
+
+/* We default to x86_64 for single-arch builds, bi-arch overrides.  */
 #undef SUBTARGET_EXTRA_SPECS
 #define SUBTARGET_EXTRA_SPECS   \
   DARWIN_EXTRA_SPECS\
-  { "darwin_arch", DARWIN_ARCH_SPEC },  \
+  { "darwin_arch", DARWIN_ARCH_SPEC }, \
   { "darwin_crt2", "" },\
-  { "darwin_subarch", 

[Committed] PR fortran/54072 -- More fun with BOZ

2019-07-23 Thread Steve Kargl
I've committed the attached patch as a follow-up to
the recent BOZ (r273747).  It removes a few leftover
comments as well as fixes the PR.

2019-07-23  Steven G. Kargl  

 PR fortran/54072
 * check.c (gfc_invalid_boz): Fix comment.
 (illegal_boz_arg): New function.
 (gfc_check_transfer): Use to arguments.
 (gfc_check_storage_size): Ditto.
 (gfc_check_complex): Remove leftover comment from BOZ patch.
 * primary.c (match_boz_constant): Remove leftover comment. 


2019-07-23  Steven G. Kargl  

 PR fortran/54072
 * gfortran.dg/illegal_boz_arg_1.f90: New tests.

-- 
Steve
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c	(revision 273747)
+++ gcc/fortran/check.c	(working copy)
@@ -35,10 +35,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "target-memory.h"
 
 /* A BOZ literal constant can appear in a limited number of contexts.
-   gfc_invalid_boz() is a help function to simplify error/warning generation.
-   Note, gfortran accepts the nonstandard 'X' for 'Z' the nonstandard
-   suffix location.  If -fallow-invalid-boz is used, then issue a warning;
-   otherwise issue an error.  */
+   gfc_invalid_boz() is a helper function to simplify error/warning
+   generation.  gfortran accepts the nonstandard 'X' for 'Z', and gfortran
+   allows the BOZ indicator to appear as a suffix.  If -fallow-invalid-boz
+   is used, then issue a warning; otherwise issue an error.  */
 
 bool
 gfc_invalid_boz (const char *msg, locus *loc)
@@ -54,6 +54,20 @@ gfc_invalid_boz (const char *msg, locus *loc)
 }
 
 
+/* Issue an error for an illegal BOZ argument.  */
+static bool
+illegal_boz_arg (gfc_expr *x)
+{
+  if (x->ts.type == BT_BOZ)
+{
+  gfc_error ("BOZ literal constant at %L cannot be an actual argument "
+		 "to %qs", >where, gfc_current_intrinsic);
+  return true;
+}
+
+  return false;
+}
+
 /* Some precedures take two arguments such that both cannot be BOZ.  */
 
 static bool
@@ -2202,8 +2216,6 @@ gfc_check_co_sum (gfc_expr *a, gfc_expr *result_image,
 bool
 gfc_check_complex (gfc_expr *x, gfc_expr *y)
 {
-
-  /* FIXME BOZ.  What to do with complex?  */
   if (!boz_args_check (x, y))
 return false;
 
@@ -5894,6 +5906,12 @@ gfc_check_transfer (gfc_expr *source, gfc_expr *mold, 
   return false;
 }
 
+  if (source->ts.type == BT_BOZ && illegal_boz_arg (source))
+return false;
+
+  if (mold->ts.type == BT_BOZ && illegal_boz_arg (mold))
+return false;
+
   /* MOLD shall be a scalar or array of any type.  */
   if (mold->ts.type == BT_PROCEDURE
   && mold->symtree->n.sym->attr.subroutine == 1)
@@ -7124,6 +7142,9 @@ gfc_check_storage_size (gfc_expr *a, gfc_expr *kind)
 		 gfc_current_intrinsic, >where);
   return false;
 }
+
+  if (a->ts.type == BT_BOZ && illegal_boz_arg (a))
+return false;
 
   if (kind == NULL)
 return true;
Index: gcc/fortran/primary.c
===
--- gcc/fortran/primary.c	(revision 273747)
+++ gcc/fortran/primary.c	(working copy)
@@ -494,7 +494,6 @@ match_boz_constant (gfc_expr **result)
   e->boz.str = XCNEWVEC (char, length + 1);
   strncpy (e->boz.str, buffer, length);
 
-  /* FIXME BOZ.  */
   if (!gfc_in_match_data ()
   && (!gfc_notify_std(GFC_STD_F2003, "BOZ used outside a DATA "
 			  "statement at %L", >where)))
Index: gcc/testsuite/gfortran.dg/illegal_boz_arg_1.f90
===
--- gcc/testsuite/gfortran.dg/illegal_boz_arg_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/illegal_boz_arg_1.f90	(working copy)
@@ -0,0 +1,9 @@
+! { dg-do compile }
+program foo
+   implicit none
+   integer :: i = 42
+   print *, storage_size(z'1234') ! { dg-error "cannot be an actual" }
+   print *, transfer(z'1234', i)  ! { dg-error "cannot be an actual" }
+   print *, transfer(i, z'1234')  ! { dg-error "cannot be an actual" }
+   print *, transfer(i, i, z'1234')   ! { dg-error "must be INTEGER" }
+end program foo


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-23 Thread Andrew MacLeod

On 7/23/19 5:33 AM, Richard Biener wrote:



What irange contains internally is a bit arbitrary.  It's really an API
for managing a set of subranges.  We could have used trees internally
just as easily, then we wouldnt need a type field. Since we were doing
lots of operations, rather than going back and forth from trees all the
time, we just used the underlying wide_int directly.   we could have
fiddle farted around with HOST_WIDE_INT or whatever, but wide_int is
there, has all the operations, and it works fine. so thats what it
currently is on the branch.

But a range has no type.  Period.  The fact that with the in-tree implementation
there's access to a type is artificial.  All workers in the in-tree
implementation
get a type for the operation they carry out.

But somehow irange doesn't get that.

In fact, an irange shouldn't be bound to any type, and the usual
"carry out multiplication of two integer typed vars with the result
in integer" on irange should be multiply two iranges (with unbound
result) plus a "truncate to integer type" operation.



so following thru on the implementation details of doing that,  we do 
the exact same thing that VRP does for multiply.. we eventually call 
wide_int_range_multiplicative_op.

The code from tree-vrp.c:

  wide_int res_lb, res_ub;
  wide_int vr0_lb = wi::to_wide (vr0_min);
  wide_int vr0_ub = wi::to_wide (vr0_max);
  wide_int vr1_lb = wi::to_wide (vr1->min ());
  wide_int vr1_ub = wi::to_wide (vr1->max ());
  bool overflow_undefined = TYPE_OVERFLOW_UNDEFINED (type);
  unsigned prec = TYPE_PRECISION (type);

  if (wide_int_range_multiplicative_op (res_lb, res_ub,
    code, TYPE_SIGN (type), prec,
    vr0_lb, vr0_ub, vr1_lb, vr1_ub,
    overflow_undefined))
    vr->set (VR_RANGE, wide_int_to_tree (type, res_lb),
 wide_int_to_tree (type, res_ub));


Which , lo and behold, it requires a type in order to get the signop 
right in the 4th argument.  we also use the type to figure out the 
precision in the 5th argument, as well as the overflow condition at the 
end.


So it *is* bound to a type in order to do the operation, its just a 
matter of whether you pass that type around, or include it with the 
object.  I simply can't imagine why you think it isn't.


sure, in this case the LHS, vr0, and vr1 are all the same type.. (or 
should be type compatible_p), so we can pass in one type, but not all 
operations follow that pattern... casts, shifts, comparisons, etc can 
have different types in the operand positions.  We include it with each 
range and  we always have accurate info associated with the operand.


How is that  a bad thing?


We are treating a range object as a unique self contained object.
Therefore, the range has a type so we know how to print it, and can
confirm before any operation that the ranges being operated on are
appropriately matched.  We found and opened bugzillas over the past
couple years for places where our code caught bugs because a range was
created and then operated on in a way that was not compatible with
another range.  I think there is a still an open one against ada(?)
where the switch and case  are different precision.

I'm fine with sanity-checking where it makes sense but looking at
the irange code the fact that there is a type is a fundamental requirement.
IMHO that's bad design.


we could remove the type from the range object.. we aren't critically 
tied to it, but then virtually everywhere we pass a range, we'll be 
passing in the type to be operated on, or the sign flag, or overflow 
flag,   or some combination of those.  Its only a matter of time until 
someone gets them out of whack.. It seems far more logical  to simply 
keep the type associated so we can pick up overflow, signop, precision 
and such as needed.. and do some sanity checking along the way.  thats 
what the type field is for after all, to consolidate all the info you 
might want...  Why pass the extra parameters when you don' t need to.




  From my point of view, a range object is similar to a tree node. A tree
node has the bits to indicate what the value is, but also associates a
type with those bits within the same object.  This is less error prone
than passing around the bits and the type separately.  As ranges are
starting to be used in many places outside of VRP, we should do the same
thing with ranges.  WIth value_range it would actually be free since
there is already a tree for the bounds already which contains the type.

Not for VARYING or UNDEFINED.


Personally, Id put it in both for consistency.  I view a range as an 
object we are associating with either an expression or an ssa-name. That 
range should then have a consistent type with that name or expression.  
Even if the range is empty, I would still expect it to be compatible 
since I'm usually associating it with an ssa_name or expression.



Because you seem so dead set 

Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Jeff Law
On 7/23/19 8:23 AM, Martin Liška wrote:
> On 7/23/19 3:57 PM, Jeff Law wrote:
>> On 7/23/19 7:50 AM, Michael Matz wrote:
>>> Hi,
>>>
>>> On Tue, 23 Jul 2019, Jeff Law wrote:
>>>
> great you found time to make this. It should become the default for
> -flto IMO.
 I was going to hack it into the rpm configury bits since we have access
 to the # cores there.  But an auto-selector within GCC is even better.

 BTW, isn't this all going to wreck havoc with reproducible builds since 
 partitioning can affect code generation?  That's one of our open 
 questions...
>>>
>>> See Richi for this, but the reason for -flto=auto (or just -flto, or 
>>> whatever the endresult will be) is _also_ reproducible builds, because 
>>> some packages like to encode the compile flags into their binaries and 
>>> hence would change depending on build host just because of "-flto=32" vs. 
>>> "-flto=64" even when the code remains exactly the same.
>> Makes sense.
>>
>> What did you end up doing with old autoconf scripts that aren't LTO
>> safe?  Many of the older style tests to see if a function exists are
>> broken by LTO.  I've seen more issues with this than anything in the LTO
>> space so far.
> 
> Well, I've seen some of these failures, but only a few.
Many appear to be silent, possibly not really affecting anything (like
all the packages that test for doprnt, but really don't care about it in
the end).But there were enough real failures that I put in auditing
to detect any cases where we get different config.h files with LTO vs
non-LTO and that is tripping often enough to have my concerns about how
much work it's going to be to get everything fixed.


But still, overall we're moving forward.  Next step is to get everything
classified into buckets and start iterating.  Presumably you'd be open
to a google doc of some kind where we can coordinate any such efforts?

jeff

ps.  I'm on PTO July 25 to Aug 5, so not much is going to happen in the
next couple weeks :-)


Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Allan Sandfeld Jensen
On Dienstag, 23. Juli 2019 10:30:07 CEST Martin Liška wrote:
> Hi.
> 
> As we as openSUSE started using -flto, I see it very handy to have
> an option value that will automatically detect number of cores
> that can be used for parallel LTRANS phase.
> 
> Thoughts?
> 
That's really nice. 

How much extra work would it be to make it support a posix make jobserver? 

As far as I understand, you would need to guess a partition size first (as 
your patch here does), but then only start each job when given a token from 
the jobserver FD.

With that the integration to existing build infrastructure would be optimal.

Cheers
'Allan




[PATCH] i386: Separate costs of pseudo registers from hard registers

2019-07-23 Thread H.J. Lu
On Mon, Jun 24, 2019 at 9:16 AM H.J. Lu  wrote:
>
> On Mon, Jun 24, 2019 at 6:37 AM Richard Biener  wrote:
> >
> > On Thu, 20 Jun 2019, Jan Hubicka wrote:
> >
> > > > > Currently, costs of moves are also used for costs of RTL expressions. 
> > > > >   This
> > > > > patch:
> > > > >
> > > > > https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00405.html
> > > > >
> > > > > includes:
> > > > >
> > > > > diff --git a/gcc/config/i386/x86-tune-costs.h 
> > > > > b/gcc/config/i386/x86-tune-costs.h
> > > > > index e943d13..8409a5f 100644
> > > > > --- a/gcc/config/i386/x86-tune-costs.h
> > > > > +++ b/gcc/config/i386/x86-tune-costs.h
> > > > > @@ -1557,7 +1557,7 @@ struct processor_costs skylake_cost = {
> > > > >{4, 4, 4}, /* cost of loading integer registers
> > > > >  in QImode, HImode and SImode.
> > > > >  Relative to reg-reg move (2).  */
> > > > > -  {6, 6, 6}, /* cost of storing integer registers */
> > > > > +  {6, 6, 3}, /* cost of storing integer registers */
> > > > >2, /* cost of reg,reg fld/fst */
> > > > >{6, 6, 8}, /* cost of loading fp registers
> > > > >  in SFmode, DFmode and XFmode */
> > >
> > > Well, it seems that the patch was fixing things on wrong spot - the
> > > tables are intended to be mostly latency based. I think we ought to
> > > document divergences from these including benchmarks where the change
> > > helped. Otherwise it is very hard to figure out why the entry does not
> > > match the reality.
> > > > >
> > > > > It lowered the cost for SImode store and made it cheaper than 
> > > > > SSE<->integer
> > > > > register move.  It caused a regression:
> > > > >
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90878
> > > > >
> > > > > Since the cost for SImode store is also used to compute scalar_store
> > > > > in ix86_builtin_vectorization_cost, it changed loop costs in
> > > > >
> > > > > void
> > > > > foo (long p2, long *diag, long d, long i)
> > > > > {
> > > > >   long k;
> > > > >   k = p2 < 3 ? p2 + p2 : p2 + 3;
> > > > >   while (i < k)
> > > > > diag[i++] = d;
> > > > > }
> > > > >
> > > > > As the result, the loop is unrolled 4 times with -O3 -march=skylake,
> > > > > instead of 3.
> > > > >
> > > > > My patch separates costs of moves from costs of RTL expressions.  We 
> > > > > have
> > > > > a follow up patch which restores the cost for SImode store back to 6 
> > > > > and leave
> > > > > the cost of scalar_store unchanged.  It keeps loop unrolling 
> > > > > unchanged and
> > > > > improves powf performance in glibc by 30%.  We are collecting SPEC 
> > > > > CPU 2017
> > > > > data now.
> > >
> > > I have seen the problem with scalar_store with AMD tuning as well.
> > > It seems to make SLP vectorizer to be happy about idea of turning
> > > sequence of say integer tores into code which moves all the values into
> > > AVX register and then does one vector store.
> > >
> > > The cost basically compare cost of N scalar stores to 1 scalar store +
> > > vector construction. Vector construction then N*sse_op+addss.
> > >
> > > With testcase:
> > >
> > > short array[8];
> > > test (short a,short b,short c,short d,short e,short f,short g,short h)
> > > {
> > >   array[0]=a;
> > >   array[1]=b;
> > >   array[2]=c;
> > >   array[3]=d;
> > >   array[4]=e;
> > >   array[5]=f;
> > >   array[6]=g;
> > >   array[7]=h;
> > > }
> > > int iarray[8];
> > > test2 (int a,int b,int c,int d,int e,int f,int g,int h)
> > > {
> > >   iarray[0]=a;
> > >   iarray[1]=b;
> > >   iarray[2]=c;
> > >   iarray[3]=d;
> > >   iarray[4]=e;
> > >   iarray[5]=f;
> > >   iarray[6]=g;
> > >   iarray[7]=h;
> > > }
> > >
> > > I get the following codegen:
> > >
> > >
> > > test:
> > > vmovd   %edi, %xmm0
> > > vmovd   %edx, %xmm2
> > > vmovd   %r8d, %xmm1
> > > vmovd   8(%rsp), %xmm3
> > > vpinsrw $1, 16(%rsp), %xmm3, %xmm3
> > > vpinsrw $1, %esi, %xmm0, %xmm0
> > > vpinsrw $1, %ecx, %xmm2, %xmm2
> > > vpinsrw $1, %r9d, %xmm1, %xmm1
> > > vpunpckldq  %xmm2, %xmm0, %xmm0
> > > vpunpckldq  %xmm3, %xmm1, %xmm1
> > > vpunpcklqdq %xmm1, %xmm0, %xmm0
> > > vmovaps %xmm0, array(%rip)
> > > ret
> > >
> > > test2:
> > > vmovd   %r8d, %xmm5
> > > vmovd   %edx, %xmm6
> > > vmovd   %edi, %xmm7
> > > vpinsrd $1, %r9d, %xmm5, %xmm1
> > > vpinsrd $1, %ecx, %xmm6, %xmm3
> > > vpinsrd $1, %esi, %xmm7, %xmm0
> > > vpunpcklqdq %xmm3, %xmm0, %xmm0
> > > vmovd   16(%rbp), %xmm4
> > > vpinsrd $1, 24(%rbp), %xmm4, %xmm2
> > > vpunpcklqdq %xmm2, %xmm1, %xmm1
> > > vinserti128 $0x1, %xmm1, %ymm0, %ymm0
> > > vmovdqu %ymm0, iarray(%rip)
> > > vzeroupper
> > >   ret
> > >
> > > which is about 20% slower on my skylake notebook than the
> > > non-SLP-vectorized variant.
> > >
> > > I wonder if the vec_construct costs should be made more realistic.
> > > It is computed as:

Re: [PATCH,fortran] Handle BOZ in accordance to Fortran 2018 standard (1st batch)

2019-07-23 Thread Steve Kargl
On Mon, Jul 22, 2019 at 02:33:08PM +0200, Dominique d'Humières wrote:
> (A) I see the following failures
> 
> FAIL: libgomp.fortran/reduction4.f90   -O0  (test for excess errors)
> …
> FAIL: libgomp.fortran/reduction5.f90   -Os  (test for excess errors)
> 

I added a fix for this to the megapatch.  It is rather slow
to test on i586-*-freebsd, so I tend to forget to test libgomp.

> (B) The points mentioned in
> https://gcc.gnu.org/ml/fortran/2017-10/msg00037.html have been fixed,
> except the points (3) and (6) which still give an ICE. I understand
> that the coddles are invalid,
>  but IMO they should give an error along the line
> 
> "BOZ literal at %L outside a DATA statement and outside INT/REAL/DBLE/CMPLX »
> 

I'll fix this shortly.  A BOZ is not allowed in transfer()
or storage_size().  I need to update check.c with a new
illegal_boz_arg() function and use it with the right checking
functions.

-- 
Steve


Re: [PATCH,fortran] Handle BOZ in accordance to Fortran 2018 standard

2019-07-23 Thread Steve Kargl
On Sat, Jul 20, 2019 at 10:12:07AM -0700, Jerry DeLisle wrote:
> On 7/17/19 8:32 PM, Steve Kargl wrote:
> > I will be away until Monday.  Plenty of time for a review.
> > 
> > 
> 
> ---snip --
> 
> Something not quite right here in this comment.
> 
> 
> +/* A BOZ literal constant can appear in a limited number of contexts.
> +   gfc_invalid_boz() is a help function to simplify error/warning generation.
> +   Note, gfortran accepts the nonstandard 'X' for 'Z' the nonstandard
>^<<< in >>?
> +   suffix location.  If -fallow-invalid-boz is used, then issue a warning;
> +   otherwise issue an error.  */
> 

Whoops forgot to fix this before committing.  I'll fix shortly.

> 
> +  /* FIXME BOZ.  What to do with complex?  */
> 
> Is your question here regarding whether to treat the two real storage 
> locations 
> as one single area and pad? Or to duplicate one BOZ pattern into each? Or 
> require two BOZ patterns to be provided? or something else?
> 

Old comment.  I remove shortly.  The issue center arounds
complex(z'1234',z'4567').  This is now rejected as there is
no information on how to convert the BOZ.


> --- snip ---
> 
> -
> +  /* FIXME BOZ.  */
> if (!gfc_in_match_data ()
> && (!gfc_notify_std(GFC_STD_F2003, "BOZ used outside a DATA "
> -   "statement at %C")))
> -  return MATCH_ERROR;
> +   "statement at %L", >where)))
> +return MATCH_ERROR;
> 
> Maybe expand the comment a bit to better hint at the issue.
> 

Whoops. Again, committed before fixing.  I'll update shortly.

> --- snip ---
> 
> The patch applies cleanly and tests OK on my machines here. I am very much in 
> favor of deprecating LONG and SHORT which are way too ambiguous.
> 
> I say OK to commit.
> 

Thanks.  Committed.

-- 
Steve


[Darwin, committed] Fix PR87030 and tidy config fragments.

2019-07-23 Thread Iain Sandoe
This is about 32/64b host and multilib support across the range of Darwin
systems.

Prior to Darwin8 (OS X 10.4), the toolchains support only PowerPC and only 32b.

On Darwin8 it is possible to target a 64b multilib, but with support limited
to a few of the main libraries on the system (not a recommended configuration).

>From Darwin9 to Darwin17 (OSX 10.5 to 10.13) it is possible to have either
32 or 64b hosted toolchains, with support for a 64 or 32b multilib respectively.

On Darwin9 the kernel is 32b, but with support for 64b executables, so it's
conventional to build a 32b host toolchain supporting a 64b multilib.  However
this is not enforced (merely a convention).

There is also some platform hardware supporting Darwin10/11 which is only 32b
and for which the same situation applies.  However, from Darwin10 to Darwin17,
the majority of platform hardware supports a 64b kernel and it's conventional
to build a 64b host toolchain with support for a 32b multilib.

On/from Darwin18 (OS X 10.14), the development headers (in the SDK) no longer
expose the interfaces for the 32b multilib support (although sufficient runtime
support remains installed that the testsuite can be run for a 32b multilib).

The PR is raised against this latter situation since the absence of exposed
interfaces causes a 'default' bootstrap fail regardless of the availability of
the runtimes.  Given the number of permutations, I felt it warranted a general
solution, especially since the current scheme of target headers and t-make
fragments has become somewhat messy.

The changes here enforce the single 32b PowerPC multilib for Darwin < 8 and the
single X86 64b multilib for Darwin >= 18.  This means that there is no longer
any need to configure Darwin18+ '--disable-multilib', but also that if you want
to use the ability to continue to test the compiler's 32b multilib there, you
need to make a configuration targeting an earlier OS version (and using the
SDK from that).

It has been tested across a range of Darwin systems (back to Darwin9, since 
self-
hosting on Darwin8 and 7 is currently in need of some tweaks).

applied to mainline
thanks
Iain

gcc/

PR bootstrap/87030
* config.gcc (*-*-darwin*): Don't include CPU t-darwin here.
(i[34567]86-*-darwin*): Adjust to use biarch files. Produce
an error message if i686-darwin configuration is attempted for
Darwin >= 18.
(x86_64-*-darwin*): Switch to single multilib for Darwin >= 18.
(powerpc-*-darwin*): Use biarch files where needed.
(powerpc64-*-darwin*): Likewise.
* config/i386/darwin.h (REAL_LIBGCC_SPEC): Move to new biarch file.
(DARWIN_ARCH_SPEC, DARWIN_SUBARCH_SPEC): Revise for default single
arch case.
* config/i386/darwin32-biarch.h: New.
* config/i386/darwin64.h: Rename.
* gcc/config/i386/darwin64-biarch.h: To this.
* config/i386/t-darwin: Rename.
* gcc/config/i386/t-darwin32-biarch: To this.
* config/i386/t-darwin64: Rename.
* gcc/config/i386/t-darwin64-biarch: To this.
* config/rs6000/darwin32-biarch.h: New.
* config/rs6000/darwin64.h: Rename.
* config/rs6000/darwin64-biarch.h: To this.
(DARWIN_ARCH_SPEC, DARWIN_SUBARCH_SPEC): Revise for default single
arch case.
* config/rs6000/t-darwin8: Rename.
* config/rs6000/t-darwin32-biarch: To this.
* config/rs6000/t-darwin64 Rename.
* config/rs6000/t-darwin64-biarch: To this.



Re: [PATCH v3 0/5] OpenRISC updates for 10 (fpu, fixes)

2019-07-23 Thread Stafford Horne
Hi All,

This is all upstream now.  Thank you for the reviews.

On Tue, Jul 09, 2019 at 10:06:21PM +0900, Stafford Horne wrote:
> Hello,
> 
> New since v2:
>  - Fix comment formatting pointed out by Segher in valatile patch
>  - Fix issue and add test for rotrsi3 options pointed out by Segher
>  - Fix issue with reg mask for doubles being backwards Pointed out by Segher
>and Richard.
> 
> New since v1:
>  - Changed 64-bit FPU operations to use explicit register pairs as per spec
>revision suggested by Richard Henderson.
>  - Added patch for new -mrori option
>  - Added patch for msoft-div fix from other series (no changes)
>  - Fixed volatile spelling pointed out by Bernhard 
>Reutner-Fischer 
> 
> This is a set of patches to bring FPU support to the OpenRISC backend.  The
> backend also add support for 64-bit floating point operations on 32-bit cores
> using register pairs, see orfpx64a32 [0].
> 
> This binutils patches are already upstream.
> 
> The toolchain has been tested using the gcc and binutils testsuites as well as
> floating point test suites running on sim and an fpga soft core 
> or1k_marocchino.
> [1]
> 
> I have also included a few fixes to PRs:
> 
>  - 90362 or1k: Soft divide does not work correctly
>  - 90363 or1k: Extra mask insn after load from memory
> 
> This whole patch series can be found on my github repo [2] as well.
> 
> If all is OK, I plan to commit these to master (gcc 10).  Then back port the 
> PR
> fixes to the GCC 9 branch, I will ask for guidance when I start to do the
> backporting.
> 
> -Stafford
> 
> [0] https://openrisc.io/proposals/orfpx64a32
> [1] https://github.com/openrisc/or1k_marocchino
> [2] g...@github.com:stffrdhrn/gcc.git or1k-fpu-3
> 
> 
> *** BLURB HERE ***
> 
> Stafford Horne (5):
>   or1k: Fix code quality for volatile memory loads
>   or1k: Fix issues with msoft-div
>   or1k: Add mrori option, fix option docs
>   or1k: Initial support for FPU
>   or1k: only force reg for immediates
> 
>  gcc/config.gcc|   2 +
>  gcc/config/or1k/constraints.md|   4 +
>  gcc/config/or1k/elf.opt   |   6 +-
>  gcc/config/or1k/or1k.c|  50 +++--
>  gcc/config/or1k/or1k.h|   3 +
>  gcc/config/or1k/or1k.md   | 131 --
>  gcc/config/or1k/or1k.opt  |  78 +
>  gcc/config/or1k/predicates.md |  30 +
>  gcc/doc/invoke.texi   |  77 -
>  gcc/testsuite/gcc.target/or1k/div-mul-3.c |  31 +
>  gcc/testsuite/gcc.target/or1k/ror-4.c |   8 ++
>  gcc/testsuite/gcc.target/or1k/ror-5.c |   9 ++
>  gcc/testsuite/gcc.target/or1k/shftimm-1.c |   8 +-
>  gcc/testsuite/gcc.target/or1k/swap-1.c|  70 
>  gcc/testsuite/gcc.target/or1k/swap-2.c|  47 
>  libgcc/config/or1k/lib1funcs.S|   6 +-
>  16 files changed, 484 insertions(+), 76 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/or1k/div-mul-3.c
>  create mode 100644 gcc/testsuite/gcc.target/or1k/ror-4.c
>  create mode 100644 gcc/testsuite/gcc.target/or1k/ror-5.c
>  create mode 100644 gcc/testsuite/gcc.target/or1k/swap-1.c
>  create mode 100644 gcc/testsuite/gcc.target/or1k/swap-2.c
> 
> -- 
> 2.21.0
> 


[committed] Add PR markers and test for pr86061

2019-07-23 Thread Jeff Law

I was wandering through the DSE BZs.  pr86061 is something I just
recently fixed.  I've added markers to the relevant ChangeLog entries
and added the test from the BZ to the testsuite as well.

jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index d405921425f..c6c0f4a70af 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -400,6 +400,7 @@
 
 2019-07-19  Jeff Law  
 
+   PR tree-optimization/86061
* tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle
strncpy.  Drop some trivial dead code.
(maybe_trim_memstar_call): Handle strncpy.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 8e6b02668fb..14fe6b9cb27 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2019-07-23  Jeff Law  
+
+   PR tree-optimization/86061
+   * gcc.dg/tree-ssa/pr86061.c: New test.
+
 2019-07-23  Richard Biener  
 
PR tree-optimization/83518
@@ -176,6 +181,7 @@
 
 2019-07-19  Jeff Law  
 
+   PR tree-optimization/86061
* gcc.dg/tree-ssa/ssa-dse-37.c: New test.
* gcc.dg/tree-ssa/ssa-dse-38.c: New test.
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr86061.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr86061.c
new file mode 100644
index 000..f2b079298c4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr86061.c
@@ -0,0 +1,20 @@
+/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */
+
+
+struct S { int i; char n[128]; int j; };
+
+void f (char*);
+
+void g (struct S *p)
+{
+  char a[sizeof p->n + 1];
+
+  __builtin_memset (a, 0, sizeof a);   // dead store, can be eliminated
+
+  __builtin_strncpy (a, p->n, sizeof a - 1);
+  a[sizeof a - 1] = '\0';
+
+  f (a);
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead call" 1 "dse1" } } */


[PATCH] Fix typo LIBGCCJIT_SYMLINK -> LIBGCCJIT_SONAME_SYMLINK

2019-07-23 Thread Arvind Sankar
There seems to be a typo in gcc/jit/Make-lang.in where it references an
undefined LIBGCCJIT_SYMLINK instead of LIBGCCJIT_SONAME_SYMLINK.
---
 gcc/jit/Make-lang.in | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/jit/Make-lang.in b/gcc/jit/Make-lang.in
index 660f54d78bd..5cfd6035d8c 100644
--- a/gcc/jit/Make-lang.in
+++ b/gcc/jit/Make-lang.in
@@ -65,7 +65,7 @@ LIBGCCJIT_SONAME_OPTION = \
 -Wl$(COMMA)$(LD_SONAME_OPTION)$(COMMA)$(LIBGCCJIT_SONAME))
 
 jit: $(LIBGCCJIT_FILENAME) \
-   $(LIBGCCJIT_SYMLINK) \
+   $(LIBGCCJIT_SONAME_SYMLINK) \
$(LIBGCCJIT_LINKER_NAME_SYMLINK) \
$(FULL_DRIVER_NAME)
 
@@ -301,9 +301,8 @@ jit.uninstall:
 # We just have to delete files specific to us.
 
 jit.mostlyclean:
-   -rm -f $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SYMLINK)
+   -rm -f $(LIBGCCJIT_FILENAME) $(LIBGCCJIT_SONAME_SYMLINK)
-rm -f $(LIBGCCJIT_LINKER_NAME_SYMLINK) $(FULL_DRIVER_NAME)
-   -rm -f $(LIBGCCJIT_SONAME)
-rm -f $(jit_OBJS)
 
 jit.clean:
-- 
2.21.0


Merge from trunk to gccgo branch

2019-07-23 Thread Ian Lance Taylor
I merged trunk revision 273743 to the gccgo branch.

Ian


Go patch committed: Use correct value in 2-case select send

2019-07-23 Thread Ian Lance Taylor
This Go frontend patch by Cherry Zhang fixes the 2-case select send.
In the channel-send case, the value to be sent may needs an (implicit)
type conversion to the channel element type.  This patch ensures that
we use the correct value type for the send.  This fixes
https://golang.org/issue/33235.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 273713)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-b7bce0dbccb978d33eb8ce0bffc02fae2c2857c1
+480477ca64c3001b9c7e92ef8b978dc92a5912d2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/statements.cc
===
--- gcc/go/gofrontend/statements.cc (revision 273577)
+++ gcc/go/gofrontend/statements.cc (working copy)
@@ -5880,6 +5880,7 @@ Select_statement::lower_two_case(Block*
  : this->clauses_->at(1));
   Location loc = this->location();
   Expression* chan = chancase.channel();
+  Type* valtype = chan->type()->channel_type()->element_type();
 
   Temporary_statement* chantmp = Statement::make_temporary(NULL, chan, loc);
   b->add_statement(chantmp);
@@ -5891,7 +5892,8 @@ Select_statement::lower_two_case(Block*
 {
   // if selectnbsend(chan, ) { body } else { default body }
 
-  Temporary_statement* ts = Statement::make_temporary(NULL, 
chancase.val(), loc);
+  Temporary_statement* ts =
+Statement::make_temporary(valtype, chancase.val(), loc);
   // Tell the escape analysis that the value escapes, as it may be sent
   // to a channel.
   ts->set_value_escapes();
@@ -5904,7 +5906,6 @@ Select_statement::lower_two_case(Block*
 }
   else
 {
-  Type* valtype = chan->type()->channel_type()->element_type();
   Temporary_statement* ts = Statement::make_temporary(valtype, NULL, loc);
   b->add_statement(ts);
 


[PATCH 3/4] New IPA-SRA implementation

2019-07-23 Thread Martin Jambor
This patch actually adds the analysis bits of IPA-SRA - both the
function summary generation and the interprocedural analysis and
decision stage.  The transformation itself then happens in the call
graph cloning infrastructure changes which are in the previous patch.
Please see the cover letter of the whole patch-set for more
information.

Martin

2019-07-23  Martin Jambor  

* Makefile.in (GTFILES): Added ipa-sra.c.
(OBJS): Added ipa-sra.o.
* dbgcnt.def: Added ipa_sra_params and ipa_sra_retvalues.
* doc/invoke.texi (ipa-sra-max-replacements): New.
* ipa-sra.c: New file.
* lto-section-in.c (lto_section_name): Added ipa-sra section.
* lto-streamer.h (lto_section_type): Likewise.
* params.def (PARAM_IPA_SRA_MAX_REPLACEMENTS): New.
* passes.def: Add new IPA-SRA.
* tree-pass.h (make_pass_ipa_sra): Declare.
---
 gcc/Makefile.in  |3 +-
 gcc/dbgcnt.def   |2 +
 gcc/doc/invoke.texi  |5 +
 gcc/ipa-sra.c| 4014 ++
 gcc/lto-section-in.c |3 +-
 gcc/lto-streamer.h   |1 +
 gcc/params.def   |7 +
 gcc/passes.def   |1 +
 gcc/tree-pass.h  |1 +
 9 files changed, 4035 insertions(+), 2 deletions(-)
 create mode 100644 gcc/ipa-sra.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9573f58221c..0cccaa64b40 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1369,6 +1369,7 @@ OBJS = \
init-regs.o \
internal-fn.o \
ipa-cp.o \
+   ipa-sra.o \
ipa-devirt.o \
ipa-fnsummary.o \
ipa-polymorphic-call.o \
@@ -2528,7 +2529,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/reload.h $(srcdir)/caller-save.c $(srcdir)/symtab.c \
   $(srcdir)/alias.c $(srcdir)/bitmap.c $(srcdir)/cselib.c $(srcdir)/cgraph.c \
   $(srcdir)/ipa-prop.c $(srcdir)/ipa-cp.c $(srcdir)/ipa-utils.h \
-  $(srcdir)/ipa-param-manipulation.h $(srcdir)/dbxout.c \
+  $(srcdir)/ipa-param-manipulation.h $(srcdir)/ipa-sra.c $(srcdir)/dbxout.c \
   $(srcdir)/signop.h \
   $(srcdir)/dwarf2out.h \
   $(srcdir)/dwarf2asm.c \
diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 19936375505..c023c755af8 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -167,6 +167,8 @@ DEBUG_COUNTER (if_after_combine)
 DEBUG_COUNTER (if_after_reload)
 DEBUG_COUNTER (if_conversion)
 DEBUG_COUNTER (if_conversion_tree)
+DEBUG_COUNTER (ipa_sra_params)
+DEBUG_COUNTER (ipa_sra_retvalues)
 DEBUG_COUNTER (ira_move)
 DEBUG_COUNTER (local_alloc_for_sched)
 DEBUG_COUNTER (merged_ipa_icf)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7bac080dca1..9cee2479b2c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11835,6 +11835,11 @@ parameters only when their cumulative size is less or 
equal to
 @option{ipa-sra-ptr-growth-factor} times the size of the original
 pointer parameter.
 
+@item ipa-sra-max-replacements
+Maximum pieces of an aggregate that IPA-SRA tracks.  As a
+consequence, it is also the maximum number of replacements of a formal
+parameter.
+
 @item sra-max-scalarization-size-Ospeed
 @itemx sra-max-scalarization-size-Osize
 The two Scalar Reduction of Aggregates passes (SRA and IPA-SRA) aim to
diff --git a/gcc/ipa-sra.c b/gcc/ipa-sra.c
new file mode 100644
index 000..67c76db13c3
--- /dev/null
+++ b/gcc/ipa-sra.c
@@ -0,0 +1,4014 @@
+/* Interprocedural scalar replacement of aggregates
+   Copyright (C) 2008-2019 Free Software Foundation, Inc.
+
+   Contributed by Martin Jambor 
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+/* IPA-SRA is an interprocedural pass that removes unused function return
+   values (turning functions returning a value which is never used into void
+   functions), removes unused function parameters.  It can also replace an
+   aggregate parameter by a set of other parameters representing part of the
+   original, turning those passed by reference into new ones which pass the
+   value directly.
+
+   The pass is a true IPA one, which means that it works in three stages in
+   order to be able to take advantage of LTO.  First, summaries about functions
+   and each calls are generated.  Function summaries (often called call graph
+   node summaries) contain mainly information about which parameters are
+   potential transformation candidates and which bits of candidates are
+   accessed.  We 

[PATCH 1/4] Remove old IPA-SRA, introduce tree-sra.h

2019-07-23 Thread Martin Jambor
This patch removes the old IPA-SRA.  Please see the covert letter for
more information about the whole patch-set.

Martin

2019-07-23  Martin Jambor  

* dbgcnt.def: Remove eipa_sra.
* passes.def: Remove old IPA-SRA.
* tree-pass.h (make_pass_early_ipa_sra): Remove declaration.
* tree-sra.c: Removed IPA-SRA.  Include tree-sra.h.
(type_internals_preclude_sra_p): Make public.
* tree-sra.h: New file.
---
 gcc/dbgcnt.def  |1 -
 gcc/passes.def  |1 -
 gcc/tree-pass.h |1 -
 gcc/tree-sra.c  | 1859 +--
 gcc/tree-sra.h  |   31 +
 5 files changed, 68 insertions(+), 1825 deletions(-)
 create mode 100644 gcc/tree-sra.h

diff --git a/gcc/dbgcnt.def b/gcc/dbgcnt.def
index 230072f7bb5..19936375505 100644
--- a/gcc/dbgcnt.def
+++ b/gcc/dbgcnt.def
@@ -156,7 +156,6 @@ DEBUG_COUNTER (df_byte_scan)
 DEBUG_COUNTER (dse)
 DEBUG_COUNTER (dse1)
 DEBUG_COUNTER (dse2)
-DEBUG_COUNTER (eipa_sra)
 DEBUG_COUNTER (gcse2_delete)
 DEBUG_COUNTER (global_alloc_at_func)
 DEBUG_COUNTER (global_alloc_at_reg)
diff --git a/gcc/passes.def b/gcc/passes.def
index 1a7fd144f87..99c0fee3155 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -89,7 +89,6 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_dse);
  NEXT_PASS (pass_cd_dce);
  NEXT_PASS (pass_phiopt, true /* early_p */);
- NEXT_PASS (pass_early_ipa_sra);
  NEXT_PASS (pass_tail_recursion);
  NEXT_PASS (pass_convert_switch);
  NEXT_PASS (pass_cleanup_eh);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 1c8df3d0a71..8f03c84132b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -355,7 +355,6 @@ extern gimple_opt_pass *make_pass_early_tree_profile 
(gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_cleanup_eh (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sra (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_sra_early (gcc::context *ctxt);
-extern gimple_opt_pass *make_pass_early_ipa_sra (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_recursion (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tail_calls (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_fix_loops (gcc::context *ctxt);
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index 03c1a2ae9e9..bf57c44d2bf 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -96,15 +96,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfg.h"
 #include "tree-dfa.h"
 #include "tree-ssa.h"
-#include "symbol-summary.h"
-#include "ipa-param-manipulation.h"
-#include "ipa-prop.h"
 #include "params.h"
 #include "dbgcnt.h"
-#include "tree-inline.h"
-#include "ipa-fnsummary.h"
-#include "ipa-utils.h"
 #include "builtins.h"
+#include "tree-sra.h"
 
 
 /* Enumeration of all aggregate reductions we can do.  */
@@ -169,8 +164,7 @@ struct access
   struct access *first_child;
 
   /* In intraprocedural SRA, pointer to the next sibling in the access tree as
- described above.  In IPA-SRA this is a pointer to the next access
- belonging to the same group (having the same representative).  */
+ described above.  */
   struct access *next_sibling;
 
   /* Pointers to the first and last element in the linked list of assign
@@ -185,9 +179,6 @@ struct access
  when grp_to_be_replaced flag is set.  */
   tree replacement_decl;
 
-  /* Is this access an access to a non-addressable field? */
-  unsigned non_addressable : 1;
-
   /* Is this access made in reverse storage order? */
   unsigned reverse : 1;
 
@@ -260,19 +251,6 @@ struct access
 
   /* Should TREE_NO_WARNING of a replacement be set?  */
   unsigned grp_no_warning : 1;
-
-  /* Is it possible that the group refers to data which might be (directly or
- otherwise) modified?  */
-  unsigned grp_maybe_modified : 1;
-
-  /* Set when this is a representative of a pointer to scalar (i.e. by
- reference) parameter which we consider for turning into a plain scalar
- (i.e. a by value parameter).  */
-  unsigned grp_scalar_ptr : 1;
-
-  /* Set when we discover that this pointer is not safe to dereference in the
- caller.  */
-  unsigned grp_not_necessarilly_dereferenced : 1;
 };
 
 typedef struct access *access_p;
@@ -349,29 +327,6 @@ static struct obstack name_obstack;
propagated to their assignment counterparts. */
 static struct access *work_queue_head;
 
-/* Number of parameters of the analyzed function when doing early ipa SRA.  */
-static int func_param_count;
-
-/* scan_function sets the following to true if it encounters a call to
-   __builtin_apply_args.  */
-static bool encountered_apply_args;
-
-/* Set by scan_function when it finds a recursive call.  */
-static bool encountered_recursive_call;
-
-/* Set by scan_function when it finds a recursive call with less actual
-   arguments than formal parameters..  */
-static bool encountered_unchangable_recursive_call;
-
-/* This is a table in which for each basic block and parameter 

[PATCH 4/4] Modifications to the testsuite

2019-07-23 Thread Martin Jambor
This are all modifications to the testsuite required to get to the
state described in the cover letter of the entire IPA-SRA
patch-series.  Please note that ipa/ipa-sra-2.c and ipa/ipa-sra-6.c
should actually be svn rm-ed instead as they try to invoke
functionality that the new IPA-SRA does not have (splitting aggregates
passed by reference into individual bits passed by reference).  For
more information, see the cover letter of the whole IPA-SRA patch-set.

Martin

2019-07-23  Martin Jambor  

* g++.dg/ipa/pr81248.C: Adjust dg-options and dump-scan.
* gcc.dg/ipa/ipa-sra-1.c: Likewise.
* gcc.dg/ipa/ipa-sra-10.c: Likewise.
* gcc.dg/ipa/ipa-sra-11.c: Likewise.
* gcc.dg/ipa/ipa-sra-3.c: Likewise.
* gcc.dg/ipa/ipa-sra-4.c: Likewise.
* gcc.dg/ipa/ipa-sra-5.c: Likewise.
* gcc.dg/ipa/ipacost-2.c: Disable ipa-sra.
* gcc.dg/ipa/ipcp-agg-9.c: Likewise.
* gcc.dg/ipa/pr78121.c: Adjust scan pattern.
* gcc.dg/ipa/vrp1.c: Likewise.
* gcc.dg/ipa/vrp2.c: Likewise.
* gcc.dg/ipa/vrp3.c: Likewise.
* gcc.dg/ipa/vrp7.c: Likewise.
* gcc.dg/ipa/vrp8.c: Likewise.
* gcc.dg/noreorder.c: use noipa attribute instead of noinline.
* gcc.dg/ipa/20040703-wpa.c: New test.
* gcc.dg/ipa/ipa-sra-12.c: New test.
* gcc.dg/ipa/ipa-sra-13.c: Likewise.
* gcc.dg/ipa/ipa-sra-14.c: Likewise.
* gcc.dg/ipa/ipa-sra-15.c: Likewise.
* gcc.dg/ipa/ipa-sra-16.c: Likewise.
* gcc.dg/ipa/ipa-sra-17.c: Likewise.
* gcc.dg/ipa/ipa-sra-18.c: Likewise.
* gcc.dg/ipa/ipa-sra-19.c: Likewise.
* gcc.dg/ipa/ipa-sra-20.c: Likewise.
* gcc.dg/ipa/ipa-sra-21.c: Likewise.
* gcc.dg/ipa/ipa-sra-22.c: Likewise.
* gcc.dg/sso/ipa-sra-1.c: Likewise.
* g++.dg/ipa/ipa-sra-2.C: Likewise.
* g++.dg/ipa/ipa-sra-3.C: Likewise.
* gcc.dg/tree-ssa/ipa-cp-1.c: Make return value used.
* g++.dg/ipa/devirt-19.C: Add missing return, add -fipa-cp-clone
option.
* g++.dg/lto/devirt-19_0.C: Add -fipa-cp-clone option.

* gcc.dg/ipa/ipa-sra-2.c: Mark to be removed.
* gcc.dg/ipa/ipa-sra-6.c: Likewise.
---
 gcc/testsuite/g++.dg/ipa/devirt-19.C |   5 +-
 gcc/testsuite/g++.dg/ipa/ipa-sra-1.C |  46 +++
 gcc/testsuite/g++.dg/ipa/ipa-sra-2.C |  19 +++
 gcc/testsuite/g++.dg/ipa/ipa-sra-3.C |   9 ++
 gcc/testsuite/g++.dg/ipa/pr81248.C   |   4 +-
 gcc/testsuite/g++.dg/lto/devirt-19_0.C   |   2 +-
 gcc/testsuite/gcc.dg/ipa/20040703-wpa.c  | 151 +++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-1.c |   4 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-10.c|   4 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-11.c|   6 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-12.c|  50 
 gcc/testsuite/gcc.dg/ipa/ipa-sra-13.c|  49 
 gcc/testsuite/gcc.dg/ipa/ipa-sra-14.c|  60 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-15.c|  61 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-16.c|  74 +++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-17.c| 102 +++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-18.c|  49 
 gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c|  31 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-2.c |   9 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-20.c|  38 ++
 gcc/testsuite/gcc.dg/ipa/ipa-sra-21.c|  33 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-22.c|  56 +
 gcc/testsuite/gcc.dg/ipa/ipa-sra-3.c |   7 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-4.c |   8 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-5.c |   4 +-
 gcc/testsuite/gcc.dg/ipa/ipa-sra-6.c |   6 +-
 gcc/testsuite/gcc.dg/ipa/ipacost-2.c |   4 +-
 gcc/testsuite/gcc.dg/ipa/ipcp-agg-9.c|   2 +-
 gcc/testsuite/gcc.dg/ipa/pr78121.c   |   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp1.c  |   4 +-
 gcc/testsuite/gcc.dg/ipa/vrp2.c  |   4 +-
 gcc/testsuite/gcc.dg/ipa/vrp3.c  |   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp7.c  |   2 +-
 gcc/testsuite/gcc.dg/ipa/vrp8.c  |   2 +-
 gcc/testsuite/gcc.dg/noreorder.c |   6 +-
 gcc/testsuite/gcc.dg/sso/ipa-sra-1.c |  57 +
 gcc/testsuite/gcc.dg/tree-ssa/ipa-cp-1.c |   2 +-
 37 files changed, 931 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/ipa-sra-1.C
 create mode 100644 gcc/testsuite/g++.dg/ipa/ipa-sra-2.C
 create mode 100644 gcc/testsuite/g++.dg/ipa/ipa-sra-3.C
 create mode 100644 gcc/testsuite/gcc.dg/ipa/20040703-wpa.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-12.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-13.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-14.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-15.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-16.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-17.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-18.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/ipa-sra-19.c
 create mode 100644 

[PATCH 0/4] True IPA reimplementation of IPA-SRA (v3)

2019-07-23 Thread Martin Jambor
Hello,

I have been working a bit more on the IPA-SRA reimplementation that I
have previously posted here in:

   https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00472.html


This reimplementation is a full IPA pass that can handle strictly
connected components in the call-graph, can take advantage of LTO and
does not weirdly switch functions in pass-pipeline like our current
quasi-IPA SRA does.  Unlike the current IPA-SRA it can also remove
return values, even in SCCs.  On the other hand, it is less powerful
when it comes to structures passed by reference.  By design it will
not create references to bits of an aggregate because that turned out
to be just obfuscation in practice.  However, it also cannot usually
split aggregates passed by reference that are just passed to another
function (where splitting would be useful) because it cannot perform
the same TBAA analysis like the current implementation which already
knows what types it should look at because it has access to bodies of
all functions attempts to modify (for an example see
https://gcc.gnu.org/ml/gcc-patches/2019-05/msg00955.html).

Since May I have switched the order of IPA-CP and IPA-SRA, so that
IPA-CP is run before IPA-SRA and it can propagate constants in
aggregates before these are split and tracking is lost.  This has
uncovered two small bugs, as did LTO-building Mozilla Firefox with the
patch and another issue was discovered when bootstrapping on ppc64le.
All of these are fixed.  I have also added two dbgcounters, one for
processing parameters and another for return values removal.

The patch has successfully bootstrapped, LTO-bootstrapped and
LTO-profilebootstrapped on x86_64-linux.  Testing on x86_64 fixes 24
LTO tests in guality/pr68860-1.c but regresses 13 guality tests, 3 of
which are not LTO.  Ther patch also passed bootstrap and testing on
aarch64-linux and ppc64le-linux.

When building the big libxul library of Mozilla Firefox with both LTO
and profile feedback, the patch has modified 21093 functions out of
259745 it has seen, so slightly over 8%.  In the process, it has
removed 6693 unused return values and 1538 unused parameters and split
7126 parameters into 8394 smaller components.  Note that IPA-CP can
also remove unused scalar parameters in simple situations, and since
it now runs first, the 1538 are only those which IPA-CP left behind
for one reason or another.  The built Firefux could browse the Amazon
web site for a while without any apparent problem.

After the patches are approved, I would like to commit them in one big
commit but in order to ease review I have split the big patch into
four smaller (but still rather big) ones:
  1. Removal of old IPA-SRA,
  2. New parameter manipulation infrastructure,
  3. New IPA-SRA implementation, and
  4. Modifications to the testsuite.

When applied one after another they should compile but I have only
exhaustively tested the complete patch.  If you want to experiment
with new IPA-SRA or the parameter manipulation infrastructure it
introduces, I suggest that you clone the jamborm/ipa-sra branch from
our git "mirror"
(https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/jamborm/ipa-sra).

Thanks in advance for any questions, comments and suggestions,

Martin



2019-07-23  Martin Jambor  

* coretypes.h (cgraph_edge): Declare.
* ipa-param-manipulation.c: Rewrite.
* ipa-param-manipulation.h: Likewise.
* Makefile.in (GTFILES): Added ipa-param-manipulation.h and ipa-sra.c.
(OBJS): Added ipa-sra.o.
* cgraph.h (ipa_replace_map): Removed fields old_tree, replace_p
and ref_p, added fields param_adjustments and performed_splits.
(struct cgraph_clone_info): Remove ags_to_skip and
combined_args_to_skip, new field param_adjustments.
(cgraph_node::create_clone): Changed parameters to use
ipa_param_adjustments.
(cgraph_node::create_virtual_clone): Likewise.
(cgraph_node::create_virtual_clone_with_body): Likewise.
(tree_function_versioning): Likewise.
(cgraph_build_function_type_skip_args): Removed.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Convert to
using ipa_param_adjustments.
(clone_of_p): Likewise.
* cgraphclones.c (cgraph_build_function_type_skip_args): Removed.
(build_function_decl_skip_args): Likewise.
(duplicate_thunk_for_node): Adjust parameters using
ipa_param_body_adjustments, copy param_adjustments instead of
args_to_skip.
(cgraph_node::create_clone): Convert to using ipa_param_adjustments.
(cgraph_node::create_virtual_clone): Likewise.
(cgraph_node::create_version_clone_with_body): Likewise.
(cgraph_materialize_clone): Likewise.
(symbol_table::materialize_all_clones): Likewise.
* ipa-fnsummary.c (ipa_fn_summary_t::duplicate): Simplify
ipa_replace_map check.
* ipa-cp.c (get_replacement_map): Do not initialize removed 

Re: [PATCH] MIPS: Fix GCC `noreorder' for undefined R5900 short loops

2019-07-23 Thread Fredrik Noring
Hi Maciej,

> > How can one reasonably prevent the bug when compiling a whole Linux
> > distribution with thousands of packages if GAS merely warns and proceeds
> > in many cases?
> 
>  I think the tools must not set a policy.  By using `.set noreorder' the 
> user told the toolchain he knows better and asked to keep its hands away.
> 
>  As I say you can use `-Werror' for code auditing.  And in most cases 
> manually scheduled delay slots in handcoded assembly are a sign of a 
> problem with the piece of code affected, regardless of the R5900 erratum.

Well, it seems practical to use unofficial tools and a patched GAS to fix
this assembly bug, then. It's a grave problem for the R5900 and it needs to
be reliably detected.

> > > > [ In theory, GAS could actually insert NOPs prior to the noreorder 
> > > > directive,
> > > > to make the loop longer that six instructions, but GAS does not have 
> > > > that
> > > > kind of capability. Another option that GCC prevents is to place a NOP 
> > > > in
> > > > the delay slot. ]
> > > 
> > >  Well, GAS does have that capability, although of course it is not 
> > > enabled 
> > > for `noreorder' code.
> > 
> > Hmm? I'm unable to make sense of that, since such NOPs are unaffected by
> > noreorder. This is what I meant:
> > 
> > loop:   addiu   $5,$5,1
> > addiu   $4,$4,1
> > lb  $2,-1($5)
> > nop /* These two NOPs would prevent the */
> > nop /* bug while preserving noreorder. */
> > .setnoreorder
> > .setnomacro
> > bne $2,$3,loop
> > sb  $2,-1($4)
> > .setmacro
> > .setreorder
> 
>  See `nops_for_insn'.  However again, as I noted, `.set noreorder' tells 
> GAS not to analyse the dependencies for code within.  There is no need to 
> schedule this delay slot manually, and if this is generated code, then the 
> producer (GCC) should have taken care of the hazards, be it architectural 
> or errata.  If this is manually written code, then the author asked for 
> trouble.

I'm using the principle above to unobtrusively adjust problematic kernel
code, via a short_loop_war macro. Here is one patched instance:

--- a/arch/mips/boot/compressed/head.S
+++ b/arch/mips/boot/compressed/head.S
@@ -13,6 +13,7 @@
  */
 
 #include 
+#include 
 #include 
 
.set noreorder
@@ -29,6 +30,7 @@ start:
PTR_LA  a0, _edata
PTR_LA  a2, _end
 1: sw  zero, 0(a0)
+   short_loop_war(1b)
bne a2, a0, 1b
 addiu  a0, a0, 4
 
@@ -48,6 +50,7 @@ start:
jr  k0
 nop
 3:
+   short_loop_war(3b)
b   3b
 nop
END(start)

The short_loop_war macro is designed to be placed just before the branch,
and it inserts NOPs as necessary for the R5900:

#ifdef CONFIG_CPU_R5900
.macro  short_loop_war loop_target
.setinstruction_count,2 + (. - \loop_target) / 4
.ifgt   7 - instruction_count
.rept   7 - instruction_count
nop
.endr
.endif
.endm
#else
.macro  short_loop_war loop_target
.endm
#endif

>  Well, BogoMIPS is just an arbitrary number.

So presumably the noreorder BogoMIPS variant can be replaced with a
single reorder variant that works with all MIPS implementations?

>  There is a data dependency here, so manual delay slot scheduling was 
> unavoidable.  I'd suggest using this as a functional equivalent for the 
> R5900:
> 
>   addiu   a0,a0,1
> 1:addiu   a0,a0,-1
>   bneza0,1b
> 
> and avoiding the irregularity for a0==0.

Thanks!

> > I used the suggested patch, and recompiled the kernel with it, and verified
> > with the machine code tool that there were no undefined loops. Wouldn't that
> > be enough?
> 
>  It would break if GAS was invoked without `-mfix-r5900' (e.g. by using an 
> explicit `-Wa,-mno-fix-r5900' option).

I see, hmm.

> > I tried to limit the case to branch delays only, which is a rough
> > approximation. Call and jump delay slots are acceptable. Are you referring
> > to this piece?
> > 
> > ;; Can the instruction be put into a delay slot?
> > (define_attr "can_delay" "no,yes"
> >   (if_then_else (and (eq_attr "type" "!branch,call,jump")
> >  (eq_attr "hazard" "none")
> >  (match_test "get_attr_insn_count (insn) == 1"))
> > (const_string "yes")
> > (const_string "no")))
> 
>  Yes.  My suggestion does not preclude limiting the workaround to branches 
> only while being precise about what the situation is, i.e. branches still 
> require a delay slot instruction (unlike MIPS16 or uMIPS compact branches) 
> but no real instruction is suitable.  I'd prefer if GCC actually scheduled 
> the required dummy NOP instruction explicitly.

OK.

Fredrik


Re: [PATCH 0/3] add support for POD struct convention (PR 61339)

2019-07-23 Thread Arvind Sankar
On Tue, Jul 23, 2019 at 10:31:16AM -0600, Martin Sebor wrote:
> On 7/23/19 10:11 AM, Arvind Sankar wrote:
> > Hi, SVN rev 273311 appears to have been committed without regenerating
> > gcc/config*?
> 
> That commit wasn't meant to change the configure script since
> the warning implementation isn't part of the patch.  Let me
> back it out.
> 
> Martin

Thanks


Re: [PATCH 0/3] add support for POD struct convention (PR 61339)

2019-07-23 Thread Martin Sebor

On 7/23/19 10:11 AM, Arvind Sankar wrote:

Hi, SVN rev 273311 appears to have been committed without regenerating
gcc/config*?


That commit wasn't meant to change the configure script since
the warning implementation isn't part of the patch.  Let me
back it out.

Martin


Re: [PATCH] PR91195: fix -Wmaybe-uninitialized warning for conditional store optimization

2019-07-23 Thread Martin Sebor

On 7/22/19 10:26 PM, JiangNing OS wrote:

This patch is to fix PR91195. Is it OK for trunk?

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 711a31ea597..4db36644160 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2019-07-22  Jiangning Liu  
+
+   PR middle-end/91195
+   * tree-ssa-phiopt.c (cond_store_replacement): Work around
+   -Wmaybe-uninitialized warning.
+
  2019-07-22  Stafford Horne  
  
  	* config/or1k/or1k.c (or1k_expand_compare): Check for int before

diff --git a/gcc/tree-ssa-phiopt.c b/gcc/tree-ssa-phiopt.c
index b64bde695f4..7a86007d087 100644
--- a/gcc/tree-ssa-phiopt.c
+++ b/gcc/tree-ssa-phiopt.c
@@ -2240,6 +2240,14 @@ cond_store_replacement (basic_block middle_bb, 
basic_block join_bb,
tree base = get_base_address (lhs);
if (!auto_var_p (base) || TREE_ADDRESSABLE (base))
return false;
+
+  /* The transformation below will inherently introduce a memory load,
+for which LHS may not be initialized yet if it is not in NOTRAP,
+so a -Wmaybe-uninitialized warning message could be triggered.
+Since it's a bit hard to have a very accurate uninitialization
+analysis to memory reference, we disable the warning here to avoid
+confusion.  */
+  TREE_NO_WARNING (lhs) = 1;


The no-warning bit is sometimes (typically?) set by the middle-end
after a warning has been issued, to avoid triggering other warnings
down the line for an already diagnosed expression.  Unless it's
done just for the purposes of a single pass and the bit is cleared
afterwards, using it to avoid potential false positives doesn't
seem like a robust solution.  It will mask warnings for constructs
that have been determined to be invalid.

FWIW, the middle-end is susceptible to quite a few false positives
that would nice to avoid.  We have discussed various approaches to
the problem but setting the no-warning bit seems like too blunt of
an instrument.

Martin


Re: [PATCH 0/3] add support for POD struct convention (PR 61339)

2019-07-23 Thread Arvind Sankar
Hi, SVN rev 273311 appears to have been committed without regenerating
gcc/config*?

Thanks.


Re: [ARM][PATCH 2/2] Remove redundant constructs added for FP16 support.

2019-07-23 Thread Srinath Parvathaneni
Hi,

Pinging for review of https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01922.html

Regards,
SRI.


From: Srinath Parvathaneni 

Sent: 29 May 2019 15:48
To: gcc-patches@gcc.gnu.org 

Cc: nd ; Richard Earnshaw 
; Ramana 
Radhakrishnan 
; Kyrylo 
Tkachov 
Subject: [ARM][PATCH 2/2] Remove redundant constructs added for FP16 support.

Hello,

The patch reworks some of the VRND and VCVT code added for the FP16
extension support to remove the redundant UNSPECS and related
constructs.

Tested for arm-none-linux-gnueabihf with native bootstrap and make check
and for arm-none-eabi with cross-compiled check-gcc on an
ARMv8.4-A emulator.

Ok for trunk? If ok, could someone please commit the patch on my behalf,
I don't have commit rights.

2019-05-29 Srinath Parvathaneni 

   Matthew Wahab  

* config/arm/iterators.md (VCVT_HF_US_N): Remove.
(VCVT_SI_US_N): Remove.
(VCVT_HF_US): Remove.
(VCVTH_US): Remove.
(FP16_RND): Remove.
(sup): Remove UNSPEC_VCVTA_S, UNSPEC_VCVTA_U, UNSPEC_VCVTM_S,
UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U, UNSPEC_VCVTP_S,
UNSPEC_VCVTP_U, UNSPEC_VCVT_HF_S_N, UNSPEC_VCVT_HF_U_N,
UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N, UNSPEC_VCVTH_S,
UNSPEC_VCVTH_U.
(vcvth_op): Remove.
(fp16_rnd_insn): Remove.
* config/arm/unspecs.md: Remove UNSPEC_VCVT_HF_S_N,
UNSPEC_VCVT_HF_U_N, UNSPEC_VCVT_SI_S_N, UNSPEC_VCVT_SI_U_N,
UNSPEC_VCVTH_S, UNSPEC_VCVTH_U, UNSPEC_VCVTA_S, UNSPEC_VCVTA_U,
UNSPEC_VCVTM_S, UNSPEC_VCVTM_U, UNSPEC_VCVTN_S, UNSPEC_VCVTN_U,
UNSPEC_VCVTP_S, UNSPEC_VCVTP_U, UNSPEC_VRND, UNSPEC_VRNDA,
UNSPEC_VRNDI, UNSPEC_VRNDM, UNSPEC_VRNDN, UNSPEC_VRNDP,
UNSPEC_VRNDX.
* config/arm/vfp.md (neon_vcvthhf): Replace VCVTH_US with
VCVT_US.
(neon_vcvthsi): Likewise.
(neon_vcvth_nhf_unspec): Replace VCVTH_US_N with VCVT_US_N.
(neon_vcvth_nhf): Likewise.
(neon_vcvth_nsi_unspec): Replace VCVTH_SI_US_N with
VCVT_US_N.
(neon_vcvth_nsi): Likewise.



Re: [ARM][PATCH 1/2] Support HFmode for standard names implemented with VRINT instructions.

2019-07-23 Thread Srinath Parvathaneni
Hi,


Pinging for review of https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01921.html.


Regards,

SRI.


From: gcc-patches-ow...@gcc.gnu.org 
 on behalf 
of Srinath Parvathaneni 

Sent: 29 May 2019 15:47:59
To: gcc-patches@gcc.gnu.org 

Cc: nd ; Richard Earnshaw 
; Ramana 
Radhakrishnan 
; Kyrylo 
Tkachov 
Subject: [ARM][PATCH 1/2] Support HFmode for standard names implemented with 
VRINT instructions.

Hello,

The initial implementation of the FP16 extension added HFmode support to
a limited number of the standard names.  Following
https://gcc.gnu.org/ml/gcc-patches/2016-08/msg00168.html , this patch
extends the HFmode support to the names implemented by the ARM
 and l expanders: btrunc, ceil, round,
floor, nearbyint and rint. This patch also changes the patterns
supporting the neon_vrnd* and neon_vcvt* Adv.SIMD intrinsics to use the
standard names, where apropriate.

No new tests are added. The ARM tests for the SF and DF mode variants of
these names are based on GCC builtin functions and there doesn't seem to
be an obvious alternative to trigger the new patterns through the
standard names. The pattern definitions though are tested through the
Adv.SIMD intrinsics.

The following patch reworks the implementation for HFmode VRINT to
remove a number of redundant constructs that were added in the initial
implementation.

The two patches have been tested for arm-none-linux-gnueabihf with
native bootstrap and make check and for arm-none-eabi with
cross-compiled check-gcc on an ARMv8.4-A emulator.

Ok for trunk? If ok, could someone please commit the patch on my behalf,
I don't have commit rights.

2019-05-29 Srinath Parvathaneni 

   Matthew Wahab  

* config/arm/iterators.md (fp16_rnd_str): Replace UNSPEC_VRND
values with equivalent UNSPEC_VRINT values.  Add UNSPEC_NVRINTZ,
UNSPEC_NVRINTA, UNSPEC_NVRINTM, UNSPEC_NVRINTN, UNSPEC_NVRINTP,
UNSPEC_NVRINTX.
(vrint_variant): Fix some white-space.
(vrint_predicable): Fix some white-space.
* config/arm/neon.md (neon_v): Replace
FP16_RND iterator with NEON_VRINT and update the rest of the
pattern accordingly.
(neon_vcvt): Replace with
neon_vcvt.
(neon_vcvt): New.
(neon_vcvtn): New.
* config/arm/unspecs.md: Add UNSPEC_VRINTN.
* config/arm/vfp.md (neon_vhf): Convert to an
expander invoking hf2.
(neon_vrndihf): Remove.
(neon_vrndnhf): New.
(neon_vcvthsi): Remove.
(hf2): New.
(lhfsi2): New.
(neon_vcvthsi): New.
(neon_vcvtnhsi): New.



Re: [C++ PATCH] PR c++/90590 Suppress warning for enumeration value not handled in switch warning

2019-07-23 Thread Matthew Beliveau
ping

On Tue, Jul 16, 2019 at 8:40 AM Marek Polacek  wrote:
>
> On Mon, Jul 15, 2019 at 09:47:26AM -0400, Matthew Beliveau wrote:
> > Okay I kept the TYPE_MAIN_VARIANT and dropped the accidental new line!
> > Hopefully this should be fine!
>
> CCing Joseph as this is c-family/ stuff.
>
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2019-07-12  Matthew Beliveau  
> >
> >   PR c++/90590
> >   * c-warn.c (c_do_switch_warnings): Suppress warning for enumerators
> >   with reserved names that are in a system header.
> >
> >   * c-c++-common/pr90590-1.c: New test.
> >   * c-c++-common/pr90590-1.h: New test.
> >   * c-c++-common/pr90590-2.c: New test.
> >   * c-c++-common/pr90590-2.h: New test.
> >
> > diff --git gcc/c-family/c-warn.c gcc/c-family/c-warn.c
> > index b5d09e761d7..51c54a283e5 100644
> > --- gcc/c-family/c-warn.c
> > +++ gcc/c-family/c-warn.c
> > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "gcc-rich-location.h"
> >  #include "gimplify.h"
> >  #include "c-family/c-indentation.h"
> > +#include "c-family/c-spellcheck.h"
> >  #include "calls.h"
> >  #include "stor-layout.h"
> >
> > @@ -1628,6 +1629,15 @@ c_do_switch_warnings (splay_tree cases, location_t 
> > switch_location,
> >if (cond && tree_int_cst_compare (cond, value))
> >   continue;
> >
> > +  /* If the enumerator is defined in a system header and uses a 
> > reserved
> > +  name, then we continue to avoid throwing a warning.  */
> > +  location_t loc = DECL_SOURCE_LOCATION
> > + (TYPE_STUB_DECL (TYPE_MAIN_VARIANT (type)));
> > +  if (in_system_header_at (loc)
> > +   && name_reserved_for_implementation_p
> > +   (IDENTIFIER_POINTER (TREE_PURPOSE (chain
> > + continue;
> > +
> >/* If there is a default_node, the only relevant option is
> >Wswitch-enum.  Otherwise, if both are enabled then we prefer
> >to warn using -Wswitch because -Wswitch is enabled by -Wall
> > diff --git gcc/testsuite/c-c++-common/pr90590-1.c 
> > gcc/testsuite/c-c++-common/pr90590-1.c
> > new file mode 100644
> > index 000..4e11debb7fa
> > --- /dev/null
> > +++ gcc/testsuite/c-c++-common/pr90590-1.c
> > @@ -0,0 +1,15 @@
> > +// PR c++/90590
> > +// { dg-options -Wswitch }
> > +#include "pr90590-1.h"
> > +
> > +void
> > +g ()
> > +{
> > +  enum E e = _A;
> > +  switch (e) // { dg-bogus "enumeration value '_C' not handled in switch" }
> > +{
> > +case _A:
> > +case _B:
> > +  break;
> > +}
> > +}
> > diff --git gcc/testsuite/c-c++-common/pr90590-1.h 
> > gcc/testsuite/c-c++-common/pr90590-1.h
> > new file mode 100644
> > index 000..22f1a7d5d52
> > --- /dev/null
> > +++ gcc/testsuite/c-c++-common/pr90590-1.h
> > @@ -0,0 +1,2 @@
> > +#pragma GCC system_header
> > +enum E { _A, _B, _C };
> > diff --git gcc/testsuite/c-c++-common/pr90590-2.c 
> > gcc/testsuite/c-c++-common/pr90590-2.c
> > new file mode 100644
> > index 000..23da97f9d74
> > --- /dev/null
> > +++ gcc/testsuite/c-c++-common/pr90590-2.c
> > @@ -0,0 +1,11 @@
> > +// PR c++/90590
> > +// { dg-options -Wswitch }
> > +
> > +#include "pr90590-2.h"
> > +
> > +void
> > +fn ()
> > +{
> > +  switch (c.b) // { dg-bogus "enumeration value" }
> > +;
> > +}
> > diff --git gcc/testsuite/c-c++-common/pr90590-2.h 
> > gcc/testsuite/c-c++-common/pr90590-2.h
> > new file mode 100644
> > index 000..e4f8635576f
> > --- /dev/null
> > +++ gcc/testsuite/c-c++-common/pr90590-2.h
> > @@ -0,0 +1,4 @@
> > +#pragma GCC system_header
> > +struct {
> > +  enum { _A } b;
> > +} c;
>
>
> Marek


Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Martin Liška
On 7/23/19 3:57 PM, Jeff Law wrote:
> On 7/23/19 7:50 AM, Michael Matz wrote:
>> Hi,
>>
>> On Tue, 23 Jul 2019, Jeff Law wrote:
>>
 great you found time to make this. It should become the default for
 -flto IMO.
>>> I was going to hack it into the rpm configury bits since we have access
>>> to the # cores there.  But an auto-selector within GCC is even better.
>>>
>>> BTW, isn't this all going to wreck havoc with reproducible builds since 
>>> partitioning can affect code generation?  That's one of our open 
>>> questions...
>>
>> See Richi for this, but the reason for -flto=auto (or just -flto, or 
>> whatever the endresult will be) is _also_ reproducible builds, because 
>> some packages like to encode the compile flags into their binaries and 
>> hence would change depending on build host just because of "-flto=32" vs. 
>> "-flto=64" even when the code remains exactly the same.
> Makes sense.
> 
> What did you end up doing with old autoconf scripts that aren't LTO
> safe?  Many of the older style tests to see if a function exists are
> broken by LTO.  I've seen more issues with this than anything in the LTO
> space so far.

Well, I've seen some of these failures, but only a few.

Martin

> 
> We're capturing config.h files and comparing them with and without LTO
> to at least be able to enumerate where these issues are.   Sadly this
> test gets lots of false positives due to flag and timestamp encodings
> into the config.h files :(
> 
> jeff
> 



[committed] Re: [PATCH] make gdbhooks.py idempotent with respect to reloading

2019-07-23 Thread Vladislav Ivanishin
Vladislav Ivanishin  writes:

> Vladislav Ivanishin  writes:
>
>> Hi!
>>
>> It is nice to be able to reload the pretty printers and convenience
>> functions from gdbhooks.py without exiting GDB: reloading cc1 takes
>> several seconds (plus, the debugging session is lost).
>>
>> Previously:
>>
>>(gdb) python import imp; imp.reload(gdbhooks);
>>RuntimeError: pretty-printer already registered: gcc
>>
>> Fixing this turned out easier than I expected.
>> (gdb) py help (gdb.printing)
>> revealed, that we can pass replace parameter to register_pretty_printer
>> (which is False by default).
>>
>> With the patch:
>>
>> (gdb) python import imp; imp.reload(gdbhooks);
>> Successfully loaded GDB hooks for GCC
>>
>> gcc/
>>
>> * gdbhooks.py: Pass replace=True to
>> gdb.printing.register_pretty_printer.
>> ---
>>  gcc/gdbhooks.py | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
>> index 15a5ceaa56f..26a09749aa3 100644
>> --- a/gcc/gdbhooks.py
>> +++ b/gcc/gdbhooks.py
>> @@ -602,7 +602,8 @@ def build_pretty_printer():
>>  
>>  gdb.printing.register_pretty_printer(
>>  gdb.current_objfile(),
>> -build_pretty_printer())
>> +build_pretty_printer(),
>> +replace=True)
>>  
>>  def find_gcc_source_dir():
>>  # Use location of global "g" to locate the source tree
>> -- 
>> 2.22.0
>>
>>
>> OK?
>
> I actually think, that change is obvious.  It has proven useful and I've
> not run into any issues with it.
>
>> BTW, perhaps I should also add a convenience function for 'import imp;
>> imp.reload(gdbhooks)' or something to that effect?
>
> I added a user-defined gdb command and a short alias for it.  I think,
> this is obvious too, but would feel more comfortable if someone OKs it.
> Dave?
>
> [PATCH] gdbinit.in: add reload-gdbhooks (rh) command
>
> gcc/
> * gdbinit.in (reload-gdbhooks): New command with an attached doc 
> string.
> (rh): New alias for it.
> ---
>  gcc/gdbinit.in | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/gcc/gdbinit.in b/gcc/gdbinit.in
> index cec55f86749..2454441f859 100644
> --- a/gcc/gdbinit.in
> +++ b/gcc/gdbinit.in
> @@ -216,6 +216,16 @@ is emitted (as opposed to those warnings that are 
> suppressed by
>  command-line options).
>  end
>  
> +define reload-gdbhooks
> +python import imp; imp.reload(gdbhooks)
> +end
> +
> +document reload-gdbhooks
> +Load the gdbhooks.py module again in order to pick up any changes made to it.
> +end
> +
> +alias rh = reload-gdbhooks
> +
>  # Define some macros helpful to gdb when it is expanding macros.
>  macro define __FILE__ "gdb"
>  macro define __LINE__ 1

Committed both patches as obvious, revisions 273737 and 273738
respectively.

-- 
Vlad


[PR rtl-optimization/91173] Backport to GCC 8 and 9

2019-07-23 Thread Matthew Beliveau
Tested on GCC 8 and 9.
Bootstrapped/regtested on x86_64-linux, ok for trunk?

2019-07-23  Matthew Beliveau  
	
	Backported from mainline
	2019-07-16  Jeff Law  
	
	PR rtl-optimization/91173
	* tree-ssa-address.c (addr_for_mem_ref): If the base is an
	SSA_NAME with a constant value, fold its value into the offset
	and clear the base before calling gen_addr_rtx.

	* g++.dg/pr91173.C: New test.

diff --git gcc/testsuite/g++.dg/pr91173.C gcc/testsuite/g++.dg/pr91173.C
new file mode 100644
index 000..b8fb41ba0cd
--- /dev/null
+++ gcc/testsuite/g++.dg/pr91173.C
@@ -0,0 +1,45 @@
+class a {
+  int b;
+  void *c;
+
+public:
+  bool aa();
+  int () {
+if (aa()) {
+  void *d(c);
+  return static_cast(d)[b];
+}
+return *(int *)0;
+  }
+};
+typedef enum {E} e;
+class f : public a {
+  int g;
+
+public:
+  int ac() {
+if (g)
+  return 1;
+return ac();
+  }
+};
+int *ad;
+struct h {
+  static int ae(e, int *m) {
+f ag;
+int *ah;
+while (!0) {
+  ad = ();
+  ah = ad + ag.ac();
+  while (ad < ah)
+*m = *ad++;
+}
+  }
+};
+template 
+void i(int *, int *, int, int *, e n, int *o) {
+  h::ae(n, o);
+}
+int aq, ar, as, at, au;
+void aw() { i(, , as, , (e)0, ); }
+
diff --git gcc/tree-ssa-address.c gcc/tree-ssa-address.c
index 1c17e935914..2e5d87734d6 100644
--- gcc/tree-ssa-address.c
+++ gcc/tree-ssa-address.c
@@ -259,6 +259,20 @@ addr_for_mem_ref (struct mem_address *addr, addr_space_t as,
 	 ? expand_expr (addr->index, NULL_RTX, pointer_mode, EXPAND_NORMAL)
 	 : NULL_RTX);
 
+  /* addr->base could be an SSA_NAME that was set to a constant value.  The
+ call to expand_expr may expose that constant.  If so, fold the value
+ into OFF and clear BSE.  Otherwise we may later try to pull a mode from
+ BSE to generate a REG, which won't work with constants because they
+ are modeless.  */
+  if (bse && GET_CODE (bse) == CONST_INT)
+{
+  if (off)
+	off = simplify_gen_binary (PLUS, pointer_mode, bse, off);
+  else
+	off = bse;
+  gcc_assert (GET_CODE (off) == CONST_INT);
+  bse = NULL_RTX;
+}
   gen_addr_rtx (pointer_mode, sym, bse, idx, st, off, , NULL, NULL);
   if (pointer_mode != address_mode)
 address = convert_memory_address (address_mode, address);


[PATCH][RFC][x86] Fix PR91154, add SImode smax, allow SImode add in SSE regs

2019-07-23 Thread Richard Biener


The following fixes the runtime regression of 456.hmmer caused
by matching ICC in code generation and using cmov more aggressively
(through GIMPLE level MAX_EXPR usage).  Appearantly (discovered
by manual assembler editing) using the SSE unit for performing
SImode loads, adds and then two singed max operations plus stores
is quite a bit faster than cmovs - even faster than the original
single cmov plus branchy second max.  Even more so for AMD CPUs
than Intel CPUs.

Instead of hacking up some pattern recognition pass to transform
integer mode memory-to-memory computation chains involving
conditional moves to "vector" code (similar to what STV does
for TImode ops on x86_64) the following simply allows SImode
into SSE registers (support for this is already there in some
important places like move patterns!).  For the particular
case of 456.hmmer the required support is loads/stores
(already implemented), SImode adds and SImode smax.

So the patch adds a smax pattern for SImode (we don't have any
for scalar modes but currently expand via a conditional move sequence)
emitting as SSE vector max or cmp/cmov depending on the alternative.

And it amends the *add_1 pattern with SSE alternatives
(which have to come before the memory alternative as IRA otherwise
doesn't consider reloading a memory operand to a register).

With this in place the runtime of 456.hmmer improves by 10%
on Haswell which is back to before regression speed but not
to same levels as seen with manually editing just the single
important loop.

I'm currently benchmarking all SPEC CPU 2006 on Haswell.  More
interesting is probably Zen where moves crossing the
integer - vector domain are excessively expensive (they get
done via the stack).

Clearly this approach will run into register allocation issues
but it looks cleaner than writing yet another STV-like pass
(STV itself is quite awkwardly structured so I refrain from
touching it...).

Anyway - comments?  It seems to me that MMX-in-SSE does
something very similar.

Bootstrapped on x86_64-unknown-linux-gnu, previous testing
revealed some issue.  Forgot that *add_1 also handles
DImode..., fixed below, re-testing in progress.

Thanks,
Richard.

2019-07-23  Richard Biener  

PR target/91154
* config/i386/i386.md (smaxsi3): New.
(*add_1): Add SSE and AVX variants.
* config/i386/i386.c (ix86_lea_for_add_ok): Do not allow
SSE registers.

Index: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md (revision 273732)
+++ gcc/config/i386/i386.md (working copy)
@@ -1881,6 +1881,33 @@ (define_expand "mov"
   ""
   "ix86_expand_move (mode, operands); DONE;")
 
+(define_insn "smaxsi3"
+ [(set (match_operand:SI 0 "register_operand" "=r,v,x")
+   (smax:SI (match_operand:SI 1 "register_operand" "%0,v,0")
+(match_operand:SI 2 "register_operand" "r,v,x")))
+  (clobber (reg:CC FLAGS_REG))]
+  "TARGET_SSE4_1"
+{
+  switch (get_attr_type (insn))
+{
+case TYPE_SSEADD:
+  if (which_alternative == 1)
+return "vpmaxsd\t{%2, %1, %0|%0, %1, %2}";
+  else
+return "pmaxsd\t{%2, %0|%0, %2}";
+case TYPE_ICMOV:
+  /* ???  Instead split this after reload?  */
+  return "cmpl\t{%2, %0|%0, %2}\n"
+   "\tcmovl\t{%2, %0|%0, %2}";
+default:
+  gcc_unreachable ();
+}
+}
+  [(set_attr "isa" "noavx,avx,noavx")
+   (set_attr "prefix" "orig,vex,orig")
+   (set_attr "memory" "none")
+   (set_attr "type" "icmov,sseadd,sseadd")])
+
 (define_insn "*mov_xor"
   [(set (match_operand:SWI48 0 "register_operand" "=r")
(match_operand:SWI48 1 "const0_operand"))
@@ -5368,10 +5395,10 @@ (define_insn_and_split "*add3_doubl
 })
 
 (define_insn "*add_1"
-  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
+  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,v,x,r,r,r")
(plus:SWI48
- (match_operand:SWI48 1 "nonimmediate_operand" "%0,0,r,r")
- (match_operand:SWI48 2 "x86_64_general_operand" "re,m,0,le")))
+ (match_operand:SWI48 1 "nonimmediate_operand" "%0,v,0,0,r,r")
+ (match_operand:SWI48 2 "x86_64_general_operand" "re,v,x,*m,0,le")))
(clobber (reg:CC FLAGS_REG))]
   "ix86_binary_operator_ok (PLUS, mode, operands)"
 {
@@ -5390,10 +5417,23 @@ (define_insn "*add_1"
   return "dec{}\t%0";
}
 
+case TYPE_SSEADD:
+  if (which_alternative == 1)
+{
+  if (mode == SImode)
+   return "%vpaddd\t{%2, %1, %0|%0, %1, %2}";
+ else
+   return "%vpaddq\t{%2, %1, %0|%0, %1, %2}";
+   }
+  else if (mode == SImode)
+   return "paddd\t{%2, %0|%0, %2}";
+  else
+   return "paddq\t{%2, %0|%0, %2}";
+
 default:
   /* For most processors, ADD is faster than LEA.  This alternative
 was added to use ADD as much as possible.  */
-  if (which_alternative == 2)
+  if (which_alternative == 4)
 

Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Jeff Law
On 7/23/19 7:50 AM, Michael Matz wrote:
> Hi,
> 
> On Tue, 23 Jul 2019, Jeff Law wrote:
> 
>>> great you found time to make this. It should become the default for
>>> -flto IMO.
>> I was going to hack it into the rpm configury bits since we have access
>> to the # cores there.  But an auto-selector within GCC is even better.
>>
>> BTW, isn't this all going to wreck havoc with reproducible builds since 
>> partitioning can affect code generation?  That's one of our open 
>> questions...
> 
> See Richi for this, but the reason for -flto=auto (or just -flto, or 
> whatever the endresult will be) is _also_ reproducible builds, because 
> some packages like to encode the compile flags into their binaries and 
> hence would change depending on build host just because of "-flto=32" vs. 
> "-flto=64" even when the code remains exactly the same.
Makes sense.

What did you end up doing with old autoconf scripts that aren't LTO
safe?  Many of the older style tests to see if a function exists are
broken by LTO.  I've seen more issues with this than anything in the LTO
space so far.

We're capturing config.h files and comparing them with and without LTO
to at least be able to enumerate where these issues are.   Sadly this
test gets lots of false positives due to flag and timestamp encodings
into the config.h files :(

jeff


Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Michael Matz
Hi,

On Tue, 23 Jul 2019, Jeff Law wrote:

> > great you found time to make this. It should become the default for
> > -flto IMO.
> I was going to hack it into the rpm configury bits since we have access
> to the # cores there.  But an auto-selector within GCC is even better.
> 
> BTW, isn't this all going to wreck havoc with reproducible builds since 
> partitioning can affect code generation?  That's one of our open 
> questions...

See Richi for this, but the reason for -flto=auto (or just -flto, or 
whatever the endresult will be) is _also_ reproducible builds, because 
some packages like to encode the compile flags into their binaries and 
hence would change depending on build host just because of "-flto=32" vs. 
"-flto=64" even when the code remains exactly the same.


Ciao,
Michael.


[PATCH][MSP430] Reject -m{code,data}-region options unless large memory model is selected

2019-07-23 Thread Jozef Lawrynowicz
The attached patch adds error messages to be emitted when -mcode/data-region are
used without the accompanying -mlarge option that enables the large memory
model.
Use of the upper/either regions without -mlarge can cause relocation overflows
or stack mismanagement when incorrect call/return instructions are generated.
In most cases, using the lower region without -mlarge has no effect, but it can
affect code generation unexpectedly (msp430_do_not_relax_short_jumps), or
add the ".lower" prefix to some section names.

Similarly, handling of the upper/lower/either attributes has been
modified so the attributes will be ignored, and a warning will be emitted, when
they are used without -mlarge.

I have added the error messages to the driver so that uses of
-mcode/data-region=none can be caught. The compiler cannot catch "none" since
it is the default value that the option variable is initialized to.

So only users calling cc1* directly will see the errors in
msp430_option_override, but I also have updated them to be more similar to the
new errors emitted by the driver.

Successfully regtested on trunk for msp430-elf.

Ok for trunk?
>From 0c37dc30d67229392ae8f834e462dd6336f7ccfc Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 22 Jul 2019 21:37:45 +0100
Subject: [PATCH] MSP430: Disallow mcode/data-region unless the large memory
 model is in use

gcc/ChangeLog:

2019-07-23  Jozef Lawrynowicz  

	* config/msp430/msp430.h (DRIVER_SELF_SPECS): Define and emit errors
	when -m{code,data}-region are used without -mlarge.
	* config/msp430/msp430.c (msp430_option_override): Error when a
	non-default code or data region is used without -mlarge.
	(msp430_section_attr): Emit a warning and do not add upper/lower/either
	attributes when they are used without -mlarge.

gcc/testsuite/ChangeLog:

2019-07-23  Jozef Lawrynowicz  

	* gcc.target/msp430/pr78818-data-region.c: Add -mlarge to dg-options.
	* gcc.target/msp430/region-misuse-code.c: New test.
	* gcc.target/msp430/region-misuse-data.c: Likewise.
	* gcc.target/msp430/region-misuse-code-data.c: Likewise.
	* gcc.target/msp430/region-attribute-misuse.c: Likewise.
---
 gcc/config/msp430/msp430.c| 35 ---
 gcc/config/msp430/msp430.h|  8 +
 .../gcc.target/msp430/pr78818-data-region.c   |  3 +-
 .../msp430/region-attribute-misuse.c  | 16 +
 .../msp430/region-misuse-code-data.c  |  4 +++
 .../gcc.target/msp430/region-misuse-code.c|  4 +++
 .../gcc.target/msp430/region-misuse-data.c|  4 +++
 7 files changed, 69 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/msp430/region-attribute-misuse.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/region-misuse-code-data.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/region-misuse-code.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/region-misuse-data.c

diff --git a/gcc/config/msp430/msp430.c b/gcc/config/msp430/msp430.c
index 265c2f642d8..c7b774e71a6 100644
--- a/gcc/config/msp430/msp430.c
+++ b/gcc/config/msp430/msp430.c
@@ -875,10 +875,26 @@ msp430_option_override (void)
   if (TARGET_LARGE && !msp430x)
 error ("%<-mlarge%> requires a 430X-compatible %<-mmcu=%>");
 
-  if (msp430_code_region == MSP430_REGION_UPPER && ! msp430x)
-error ("%<-mcode-region=upper%> requires 430X-compatible cpu");
-  if (msp430_data_region == MSP430_REGION_UPPER && ! msp430x)
-error ("%<-mdata-region=upper%> requires 430X-compatible cpu");
+  if (!TARGET_LARGE && msp430_code_region == MSP430_REGION_EITHER)
+error ("%<-mcode-region=either%> requires the large memory model "
+	   "(%<-mlarge%>)");
+  else if (!TARGET_LARGE && msp430_code_region == MSP430_REGION_UPPER)
+error ("%<-mcode-region=upper%> requires the large memory model "
+	   "(%<-mlarge%>)");
+  else if (!TARGET_LARGE && msp430_code_region == MSP430_REGION_LOWER)
+error ("%<-mcode-region=lower%> requires the large memory model "
+	   "(%<-mlarge%>)");
+
+  if (!TARGET_LARGE && msp430_data_region == MSP430_REGION_EITHER)
+error ("%<-mdata-region=either%> requires the large memory model "
+	   "(%<-mlarge%>)");
+  else if (!TARGET_LARGE && msp430_data_region == MSP430_REGION_UPPER)
+error ("%<-mdata-region=upper%> requires the large memory model "
+	   "(%<-mlarge%>)");
+  else if (!TARGET_LARGE && msp430_data_region == MSP430_REGION_LOWER)
+error ("%<-mdata-region=lower%> requires the large memory model "
+	   "(%<-mlarge%>)");
+
 
   if (flag_exceptions || flag_non_call_exceptions
   || flag_unwind_tables || flag_asynchronous_unwind_tables)
@@ -2038,6 +2054,17 @@ msp430_section_attr (tree * node,
 	message = "already marked with 'upper' attribute";
 }
 
+  /* It does not make sense to use upper/lower/either attributes without
+ -mlarge.
+ Without -mlarge, "lower" is the default and only region, so is redundant.
+ Without -mlarge, "upper" will (and "either" might) place code/data in the
+ upper 

Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Richard Biener
On Tue, Jul 23, 2019 at 3:09 PM Jeff Law  wrote:
>
> On 7/23/19 3:20 AM, Jan Hubicka wrote:
> >> Hi.
> >>
> >> As we as openSUSE started using -flto, I see it very handy to have
> >> an option value that will automatically detect number of cores
> >> that can be used for parallel LTRANS phase.
> >>
> >> Thoughts?
> > Hi,
> > great you found time to make this. It should become the default for
> > -flto IMO.
> I was going to hack it into the rpm configury bits since we have access
> to the # cores there.  But an auto-selector within GCC is even better.
>
> BTW, isn't this all going to wreck havoc with reproducible builds since
> partitioning can affect code generation?  That's one of our open
> questions...

Partitioning is independent on N in -flto=N it's just dependent on
the LTO partitioning --params.

Richard.

> Jeff


Re: Add ARRAY_REF based access patch disambiguation

2019-07-23 Thread Richard Biener
On Fri, 19 Jul 2019, Jan Hubicka wrote:

> Hi,
> this patch adds bare bones of disambiguation of access paths via
> ARRAY_REF.  Similarly to COMPONENT_REF we need a matched ARRAY_REF size
> and prove that indexes are actually different.
> 
> This adds about 20 new disambiguations to tramp3d.
> 
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * tree-ssa-alias.c (nonoverlapping_component_refs_since_match_p):
>   Rename to ...
>   (nonoverlapping_refs_since_match_p): ... this; handle also ARRAY_REFs.
>   (alias_stats): Update stats.
>   (dump_alias_stats): Likewise.
>   (aliasing_matching_component_refs_p): Add partial_overlap argument;
>   pass it to nonoverlapping_refs_since_match_p.
>   (aliasing_component_refs_walk): Update call of
>   aliasing_matching_component_refs_p
>   (nonoverlapping_array_refs_p): New function.
>   (decl_refs_may_alias_p, indirect_ref_may_alias_decl_p,
>   indirect_refs_may_alias_p): Update calls of
>   nonoverlapping_refs_since_match_p.
>   * gcc.dg/tree-ssa/alias-access-path-10.c: New testcase.
> 
> Index: tree-ssa-alias.c
> ===
> --- tree-ssa-alias.c  (revision 273570)
> +++ tree-ssa-alias.c  (working copy)
> @@ -87,7 +87,7 @@ along with GCC; see the file COPYING3.
> this file.  Low-level disambiguators dealing with points-to
> information are in tree-ssa-structalias.c.  */
>  
> -static int nonoverlapping_component_refs_since_match_p (tree, tree, tree, 
> tree);
> +static int nonoverlapping_refs_since_match_p (tree, tree, tree, tree, bool);
>  static bool nonoverlapping_component_refs_p (const_tree, const_tree);
>  
>  /* Query statistics for the different low-level disambiguators.
> @@ -104,9 +104,9 @@ static struct {
>unsigned HOST_WIDE_INT aliasing_component_refs_p_no_alias;
>unsigned HOST_WIDE_INT nonoverlapping_component_refs_p_may_alias;
>unsigned HOST_WIDE_INT nonoverlapping_component_refs_p_no_alias;
> -  unsigned HOST_WIDE_INT 
> nonoverlapping_component_refs_since_match_p_may_alias;
> -  unsigned HOST_WIDE_INT 
> nonoverlapping_component_refs_since_match_p_must_overlap;
> -  unsigned HOST_WIDE_INT 
> nonoverlapping_component_refs_since_match_p_no_alias;
> +  unsigned HOST_WIDE_INT nonoverlapping_refs_since_match_p_may_alias;
> +  unsigned HOST_WIDE_INT nonoverlapping_refs_since_match_p_must_overlap;
> +  unsigned HOST_WIDE_INT nonoverlapping_refs_since_match_p_no_alias;
>  } alias_stats;
>  
>  void
> @@ -137,15 +137,15 @@ dump_alias_stats (FILE *s)
>  alias_stats.nonoverlapping_component_refs_p_no_alias,
>  alias_stats.nonoverlapping_component_refs_p_no_alias
>  + alias_stats.nonoverlapping_component_refs_p_may_alias);
> -  fprintf (s, "  nonoverlapping_component_refs_since_match_p: "
> +  fprintf (s, "  nonoverlapping_refs_since_match_p: "
>  HOST_WIDE_INT_PRINT_DEC" disambiguations, "
>  HOST_WIDE_INT_PRINT_DEC" must overlaps, "
>  HOST_WIDE_INT_PRINT_DEC" queries\n",
> -alias_stats.nonoverlapping_component_refs_since_match_p_no_alias,
> -alias_stats.nonoverlapping_component_refs_since_match_p_must_overlap,
> -alias_stats.nonoverlapping_component_refs_since_match_p_no_alias
> -+ alias_stats.nonoverlapping_component_refs_since_match_p_may_alias
> -+ 
> alias_stats.nonoverlapping_component_refs_since_match_p_must_overlap);
> +alias_stats.nonoverlapping_refs_since_match_p_no_alias,
> +alias_stats.nonoverlapping_refs_since_match_p_must_overlap,
> +alias_stats.nonoverlapping_refs_since_match_p_no_alias
> ++ alias_stats.nonoverlapping_refs_since_match_p_may_alias
> ++ alias_stats.nonoverlapping_refs_since_match_p_must_overlap);
>fprintf (s, "  aliasing_component_refs_p: "
>  HOST_WIDE_INT_PRINT_DEC" disambiguations, "
>  HOST_WIDE_INT_PRINT_DEC" queries\n",
> @@ -856,7 +856,8 @@ type_has_components_p (tree type)
>  
>  /* MATCH1 and MATCH2 which are part of access path of REF1 and REF2
> respectively are either pointing to same address or are completely
> -   disjoint.
> +   disjoint. If PARITAL_OVERLAP is true, assume that outermost arrays may
> +   just partly overlap.
>  
> Try to disambiguate using the access path starting from the match
> and return false if there is no conflict.
> @@ -867,24 +868,27 @@ static bool
>  aliasing_matching_component_refs_p (tree match1, tree ref1,
>   poly_int64 offset1, poly_int64 max_size1,
>   tree match2, tree ref2,
> - poly_int64 offset2, poly_int64 max_size2)
> + poly_int64 offset2, poly_int64 max_size2,
> + bool partial_overlap)
>  {
>poly_int64 offadj, sztmp, msztmp;
>bool reverse;
>  
> -
> -  get_ref_base_and_extent (match2, , , , );
> -  offset2 -= offadj;
> -  

Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Jeff Law
On 7/23/19 3:20 AM, Jan Hubicka wrote:
>> Hi.
>>
>> As we as openSUSE started using -flto, I see it very handy to have
>> an option value that will automatically detect number of cores
>> that can be used for parallel LTRANS phase.
>>
>> Thoughts?
> Hi,
> great you found time to make this. It should become the default for
> -flto IMO.
I was going to hack it into the rpm configury bits since we have access
to the # cores there.  But an auto-selector within GCC is even better.

BTW, isn't this all going to wreck havoc with reproducible builds since
partitioning can affect code generation?  That's one of our open
questions...

Jeff


[PATCH][MSP430] Allow lower-case "r" to be used in register names by defining ADDITIONAL_REGISTER_NAMES (PR target/70320)

2019-07-23 Thread Jozef Lawrynowicz
The attached patch enables a lower-case "r" to be used in register names
specified in asm statements clobber list or command line options, in addition to
the upper case "R" that is currently supported.

Successfully regtested on trunk for msp430-elf.

Ok for trunk?
>From d639b2ba7d4a93d790bde3ad55df751116eab04b Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 22 Jul 2019 10:35:43 +0100
Subject: [PATCH] MSP430 additional register

gcc/ChangeLog:

2019-07-23  Jozef Lawrynowicz  

	PR target/70320
	* config/msp430/msp430.h: Define ADDITIONAL_REGISTER_NAMES.

gcc/testsuite/ChangeLog:

2019-07-23  Jozef Lawrynowicz  

	PR target/70320
	* gcc.target/msp430/asm-register-names-lower-case.c: New test.
	* gcc.target/msp430/asm-register-names-upper-case.c: Likewise.
---
 gcc/config/msp430/msp430.h| 22 
 .../msp430/asm-register-names-lower-case.c| 25 +++
 .../msp430/asm-register-names-upper-case.c| 25 +++
 3 files changed, 72 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/msp430/asm-register-names-lower-case.c
 create mode 100644 gcc/testsuite/gcc.target/msp430/asm-register-names-upper-case.c

diff --git a/gcc/config/msp430/msp430.h b/gcc/config/msp430/msp430.h
index 1288b1a263d..f97cbec8d21 100644
--- a/gcc/config/msp430/msp430.h
+++ b/gcc/config/msp430/msp430.h
@@ -224,6 +224,28 @@ extern const char * msp430_select_hwmult_lib (int, const char **);
   "argptr"			\
 }
 
+/* Allow lowercase "r" to be used in register names instead of upper
+   case "R".  */
+#define ADDITIONAL_REGISTER_NAMES	\
+{	\
+{ "r0",  0 },			\
+{ "r1",  1 },			\
+{ "r2",  2 },			\
+{ "r3",  3 },			\
+{ "r4",  4 },			\
+{ "r5",  5 },			\
+{ "r6",  6 },			\
+{ "r7",  7 },			\
+{ "r8",  8 },			\
+{ "r9",  9 },			\
+{ "r10", 10 },			\
+{ "r11", 11 },			\
+{ "r12", 12 },			\
+{ "r13", 13 },			\
+{ "r14", 14 },			\
+{ "r15", 15 }			\
+}
+
 enum reg_class
 {
   NO_REGS,
diff --git a/gcc/testsuite/gcc.target/msp430/asm-register-names-lower-case.c b/gcc/testsuite/gcc.target/msp430/asm-register-names-lower-case.c
new file mode 100644
index 000..98e39298484
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/asm-register-names-lower-case.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-expand" } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R4" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R5" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R6" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R7" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R8" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R9" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R10" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R11" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R12" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R13" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R14" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R15" expand } } */
+
+/* PR target/70320
+   Check that a lower case "r" in register names is accepted in
+   an asm statement clobber list.  */
+
+void
+foo (void)
+{
+  __asm__ ("" : : : "r4", "r5", "r6", "r7", "r8", "r9", "r10", "r11", "r12",
+	   "r13", "r14", "r15");
+}
diff --git a/gcc/testsuite/gcc.target/msp430/asm-register-names-upper-case.c b/gcc/testsuite/gcc.target/msp430/asm-register-names-upper-case.c
new file mode 100644
index 000..b417a8c5df4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/msp430/asm-register-names-upper-case.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-fdump-rtl-expand" } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R4" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R5" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R6" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R7" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R8" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R9" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R10" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R11" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R12" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R13" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R14" expand } } */
+/* { dg-final { scan-rtl-dump "(?n)clobber.*R15" expand } } */
+
+/* PR target/70320
+   Check that a lower case "r" in register names is accepted in
+   an asm statement clobber list.  */
+
+void
+foo (void)
+{
+  __asm__ ("" : : : "R4", "R5", "R6", "R7", "R8", "R9", "R10", "R11", "R12",
+	   "R13", "R14", "R15");
+}
-- 
2.17.1



Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:

> On Tue, 23 Jul 2019 at 17:48, Richard Biener  wrote:
> >
> > On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
> >
> > > On Tue, 23 Jul 2019 at 16:36, Richard Biener  wrote:
> > > >
> > > > On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
> > > >
> > > > > On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
> > > > >  wrote:
> > > > > >
> > > > > > Not really my area, but FWIW...
> > > > > >
> > > > > > Prathamesh Kulkarni  writes:
> > > > > > > Hi,
> > > > > > > The attached patch tries to fix PR91166.
> > > > > > > Does it look OK ?
> > > > > > > Bootstrap+test in progress on aarch64-linux-gnu and 
> > > > > > > x86_64-unknown-linux-gnu.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prathamesh
> > > > > > >
> > > > > > > 2019-07-17  Prathamesh Kulkarni  
> > > > > > >
> > > > > > >   PR middle-end/91166
> > > > > > >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> > > > > > >   (define_predicates): Add entry for uniform_vector_p.
> > > > > > >
> > > > > > > testsuite/
> > > > > > >   * gcc.target/aarch64/sve/pr91166.c: New test.
> > > > > > >
> > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > > index 4a7aa0185d8..2ad98c28fd8 100644
> > > > > > > --- a/gcc/match.pd
> > > > > > > +++ b/gcc/match.pd
> > > > > > > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not 
> > > > > > > see
> > > > > > > integer_valued_real_p
> > > > > > > integer_pow2p
> > > > > > > uniform_integer_cst_p
> > > > > > > -   HONOR_NANS)
> > > > > > > +   HONOR_NANS
> > > > > > > +   uniform_vector_p)
> > > > > > >
> > > > > > >  /* Operator lists.  */
> > > > > > >  (define_operator_list tcc_comparison
> > > > > > > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > > >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > > > > > (type; })
> > > > > > > (if (changed)
> > > > > > >  (vec_perm { op0; } { op1; } { op2; }))
> > > > > > > +
> > > > > > > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same 
> > > > > > > element.  */
> > > > > > > +(simplify
> > > > > > > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > > > > > > + { @0; })
> > > > > > > +
> > > > > > > +(simplify
> > > > > > > + (vec_perm uniform_vector_p@0 @0 @1)
> > > > > > > + { @0; })
> > > > > >
> > > > > > No need for the curly braces here, can use "@0" as the target of
> > > > > > the simplification.
> > > > > >
> > > > > > It'd probably be worth using (match ...) to define a new predicate
> > > > > > that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> > > > > > calling into uniform_vector_p for the latter two.
> > > > > Hi,
> > > > > Thanks for the suggestions.
> > > > > Does this version look OK ?
> > > >
> > > > Can you write
> > > >
> > > > +(simplify
> > > > + (vec_perm (vec_same_elem_p@0 @1) @0 @2)
> > > > + @0)
> > > >
> > > > as
> > > >
> > > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > > >
> > > > ?
> > > (simplify
> > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > >  @0)
> > >
> > > results in:
> > > gimple-match.c: In function ‘bool
> > > gimple_simplify_VEC_PERM_EXPR(gimple_match_op*, gimple**, tree_node*
> > > (*)(tree), code_helper, tree, tree, tree, tree)’:
> > > gimple-match.c:103031:36: error: cannot convert ‘tree_node* (*)(tree)’
> > > {aka ‘tree_node* (*)(tree_node*)’} to ‘tree_node**’
> > >if (gimple_vec_same_elem_p (op0, valueize))
> > > ^~~~
> > >
> > > because gimple_vec_same_elem_p has tree *res_ops as 2nd param and
> > > we're passing valueize as 2nd arg.
> >
> > Ah, you need the
> >
> > (match vec_same_elem_p
> >  @0
> >  (if (uniform_vector_p (@0)))
> >
> > (match vec_same_elem_p
> >  (vec_duplicate @0))
> >
> > form then.
> Thanks, that worked.
> Is the attached patch OK to commit ?

Yes.

Thanks,
Richard.

> Thanks,
> Prathamesh
> >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Otherwise looks OK.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Richard
> > > > > >
> > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > > > new file mode 100644
> > > > > > > index 000..42654be3b31
> > > > > > > --- /dev/null
> > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > > > @@ -0,0 +1,20 @@
> > > > > > > +/* { dg-do compile } */
> > > > > > > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" 
> > > > > > > } */
> > > > > > > +
> > > > > > > +void
> > > > > > > +f1 (double x[][4])
> > > > > > > +{
> > > > > > > +  for (int i = 0; i < 4; ++i)
> > > > > > > +for (int j = 0; j < 4; ++j)
> > > > > > > +  x[i][j] = 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +void
> > > > > > > +f2 (double x[][4], double y)
> > > > > > > +{
> > > > > > > +  for (int i = 0; i < 4; ++i)
> > > > > 

Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Prathamesh Kulkarni
On Tue, 23 Jul 2019 at 17:48, Richard Biener  wrote:
>
> On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
>
> > On Tue, 23 Jul 2019 at 16:36, Richard Biener  wrote:
> > >
> > > On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
> > >
> > > > On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
> > > >  wrote:
> > > > >
> > > > > Not really my area, but FWIW...
> > > > >
> > > > > Prathamesh Kulkarni  writes:
> > > > > > Hi,
> > > > > > The attached patch tries to fix PR91166.
> > > > > > Does it look OK ?
> > > > > > Bootstrap+test in progress on aarch64-linux-gnu and 
> > > > > > x86_64-unknown-linux-gnu.
> > > > > >
> > > > > > Thanks,
> > > > > > Prathamesh
> > > > > >
> > > > > > 2019-07-17  Prathamesh Kulkarni  
> > > > > >
> > > > > >   PR middle-end/91166
> > > > > >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> > > > > >   (define_predicates): Add entry for uniform_vector_p.
> > > > > >
> > > > > > testsuite/
> > > > > >   * gcc.target/aarch64/sve/pr91166.c: New test.
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index 4a7aa0185d8..2ad98c28fd8 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
> > > > > > integer_valued_real_p
> > > > > > integer_pow2p
> > > > > > uniform_integer_cst_p
> > > > > > -   HONOR_NANS)
> > > > > > +   HONOR_NANS
> > > > > > +   uniform_vector_p)
> > > > > >
> > > > > >  /* Operator lists.  */
> > > > > >  (define_operator_list tcc_comparison
> > > > > > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > > > > (type; })
> > > > > > (if (changed)
> > > > > >  (vec_perm { op0; } { op1; } { op2; }))
> > > > > > +
> > > > > > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  
> > > > > > */
> > > > > > +(simplify
> > > > > > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > > > > > + { @0; })
> > > > > > +
> > > > > > +(simplify
> > > > > > + (vec_perm uniform_vector_p@0 @0 @1)
> > > > > > + { @0; })
> > > > >
> > > > > No need for the curly braces here, can use "@0" as the target of
> > > > > the simplification.
> > > > >
> > > > > It'd probably be worth using (match ...) to define a new predicate
> > > > > that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> > > > > calling into uniform_vector_p for the latter two.
> > > > Hi,
> > > > Thanks for the suggestions.
> > > > Does this version look OK ?
> > >
> > > Can you write
> > >
> > > +(simplify
> > > + (vec_perm (vec_same_elem_p@0 @1) @0 @2)
> > > + @0)
> > >
> > > as
> > >
> > >  (vec_perm vec_same_elem_p@0 @0 @1)
> > >
> > > ?
> > (simplify
> >  (vec_perm vec_same_elem_p@0 @0 @1)
> >  @0)
> >
> > results in:
> > gimple-match.c: In function ‘bool
> > gimple_simplify_VEC_PERM_EXPR(gimple_match_op*, gimple**, tree_node*
> > (*)(tree), code_helper, tree, tree, tree, tree)’:
> > gimple-match.c:103031:36: error: cannot convert ‘tree_node* (*)(tree)’
> > {aka ‘tree_node* (*)(tree_node*)’} to ‘tree_node**’
> >if (gimple_vec_same_elem_p (op0, valueize))
> > ^~~~
> >
> > because gimple_vec_same_elem_p has tree *res_ops as 2nd param and
> > we're passing valueize as 2nd arg.
>
> Ah, you need the
>
> (match vec_same_elem_p
>  @0
>  (if (uniform_vector_p (@0)))
>
> (match vec_same_elem_p
>  (vec_duplicate @0))
>
> form then.
Thanks, that worked.
Is the attached patch OK to commit ?

Thanks,
Prathamesh
>
> > Thanks,
> > Prathamesh
> > >
> > > Otherwise looks OK.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > >
> > > > > Thanks,
> > > > > Richard
> > > > >
> > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > > new file mode 100644
> > > > > > index 000..42654be3b31
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > > @@ -0,0 +1,20 @@
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } 
> > > > > > */
> > > > > > +
> > > > > > +void
> > > > > > +f1 (double x[][4])
> > > > > > +{
> > > > > > +  for (int i = 0; i < 4; ++i)
> > > > > > +for (int j = 0; j < 4; ++j)
> > > > > > +  x[i][j] = 0;
> > > > > > +}
> > > > > > +
> > > > > > +void
> > > > > > +f2 (double x[][4], double y)
> > > > > > +{
> > > > > > +  for (int i = 0; i < 4; ++i)
> > > > > > +for (int j = 0; j < 4; ++j)
> > > > > > +  x[i][j] = y;
> > > > > > +}
> > > > > > +
> > > > > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } 
> > > > > > */
> > > >
> > >
> > > --
> > > Richard Biener 
> > > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
> >
>
> --

Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:

> On Tue, 23 Jul 2019 at 16:36, Richard Biener  wrote:
> >
> > On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
> >
> > > On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
> > >  wrote:
> > > >
> > > > Not really my area, but FWIW...
> > > >
> > > > Prathamesh Kulkarni  writes:
> > > > > Hi,
> > > > > The attached patch tries to fix PR91166.
> > > > > Does it look OK ?
> > > > > Bootstrap+test in progress on aarch64-linux-gnu and 
> > > > > x86_64-unknown-linux-gnu.
> > > > >
> > > > > Thanks,
> > > > > Prathamesh
> > > > >
> > > > > 2019-07-17  Prathamesh Kulkarni  
> > > > >
> > > > >   PR middle-end/91166
> > > > >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> > > > >   (define_predicates): Add entry for uniform_vector_p.
> > > > >
> > > > > testsuite/
> > > > >   * gcc.target/aarch64/sve/pr91166.c: New test.
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 4a7aa0185d8..2ad98c28fd8 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
> > > > > integer_valued_real_p
> > > > > integer_pow2p
> > > > > uniform_integer_cst_p
> > > > > -   HONOR_NANS)
> > > > > +   HONOR_NANS
> > > > > +   uniform_vector_p)
> > > > >
> > > > >  /* Operator lists.  */
> > > > >  (define_operator_list tcc_comparison
> > > > > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > > > (type; })
> > > > > (if (changed)
> > > > >  (vec_perm { op0; } { op1; } { op2; }))
> > > > > +
> > > > > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
> > > > > +(simplify
> > > > > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > > > > + { @0; })
> > > > > +
> > > > > +(simplify
> > > > > + (vec_perm uniform_vector_p@0 @0 @1)
> > > > > + { @0; })
> > > >
> > > > No need for the curly braces here, can use "@0" as the target of
> > > > the simplification.
> > > >
> > > > It'd probably be worth using (match ...) to define a new predicate
> > > > that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> > > > calling into uniform_vector_p for the latter two.
> > > Hi,
> > > Thanks for the suggestions.
> > > Does this version look OK ?
> >
> > Can you write
> >
> > +(simplify
> > + (vec_perm (vec_same_elem_p@0 @1) @0 @2)
> > + @0)
> >
> > as
> >
> >  (vec_perm vec_same_elem_p@0 @0 @1)
> >
> > ?
> (simplify
>  (vec_perm vec_same_elem_p@0 @0 @1)
>  @0)
> 
> results in:
> gimple-match.c: In function ‘bool
> gimple_simplify_VEC_PERM_EXPR(gimple_match_op*, gimple**, tree_node*
> (*)(tree), code_helper, tree, tree, tree, tree)’:
> gimple-match.c:103031:36: error: cannot convert ‘tree_node* (*)(tree)’
> {aka ‘tree_node* (*)(tree_node*)’} to ‘tree_node**’
>if (gimple_vec_same_elem_p (op0, valueize))
> ^~~~
> 
> because gimple_vec_same_elem_p has tree *res_ops as 2nd param and
> we're passing valueize as 2nd arg.

Ah, you need the

(match vec_same_elem_p
 @0
 (if (uniform_vector_p (@0)))

(match vec_same_elem_p
 (vec_duplicate @0))

form then.

> Thanks,
> Prathamesh
> >
> > Otherwise looks OK.
> >
> > Thanks,
> > Richard.
> >
> > > Thanks,
> > > Prathamesh
> > >
> > > >
> > > > Thanks,
> > > > Richard
> > > >
> > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > new file mode 100644
> > > > > index 000..42654be3b31
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > > @@ -0,0 +1,20 @@
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> > > > > +
> > > > > +void
> > > > > +f1 (double x[][4])
> > > > > +{
> > > > > +  for (int i = 0; i < 4; ++i)
> > > > > +for (int j = 0; j < 4; ++j)
> > > > > +  x[i][j] = 0;
> > > > > +}
> > > > > +
> > > > > +void
> > > > > +f2 (double x[][4], double y)
> > > > > +{
> > > > > +  for (int i = 0; i < 4; ++i)
> > > > > +for (int j = 0; j < 4; ++j)
> > > > > +  x[i][j] = y;
> > > > > +}
> > > > > +
> > > > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */
> > >
> >
> > --
> > Richard Biener 
> > SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> > GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Prathamesh Kulkarni
On Tue, 23 Jul 2019 at 16:36, Richard Biener  wrote:
>
> On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:
>
> > On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
> >  wrote:
> > >
> > > Not really my area, but FWIW...
> > >
> > > Prathamesh Kulkarni  writes:
> > > > Hi,
> > > > The attached patch tries to fix PR91166.
> > > > Does it look OK ?
> > > > Bootstrap+test in progress on aarch64-linux-gnu and 
> > > > x86_64-unknown-linux-gnu.
> > > >
> > > > Thanks,
> > > > Prathamesh
> > > >
> > > > 2019-07-17  Prathamesh Kulkarni  
> > > >
> > > >   PR middle-end/91166
> > > >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> > > >   (define_predicates): Add entry for uniform_vector_p.
> > > >
> > > > testsuite/
> > > >   * gcc.target/aarch64/sve/pr91166.c: New test.
> > > >
> > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > index 4a7aa0185d8..2ad98c28fd8 100644
> > > > --- a/gcc/match.pd
> > > > +++ b/gcc/match.pd
> > > > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
> > > > integer_valued_real_p
> > > > integer_pow2p
> > > > uniform_integer_cst_p
> > > > -   HONOR_NANS)
> > > > +   HONOR_NANS
> > > > +   uniform_vector_p)
> > > >
> > > >  /* Operator lists.  */
> > > >  (define_operator_list tcc_comparison
> > > > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > > (type; })
> > > > (if (changed)
> > > >  (vec_perm { op0; } { op1; } { op2; }))
> > > > +
> > > > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
> > > > +(simplify
> > > > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > > > + { @0; })
> > > > +
> > > > +(simplify
> > > > + (vec_perm uniform_vector_p@0 @0 @1)
> > > > + { @0; })
> > >
> > > No need for the curly braces here, can use "@0" as the target of
> > > the simplification.
> > >
> > > It'd probably be worth using (match ...) to define a new predicate
> > > that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> > > calling into uniform_vector_p for the latter two.
> > Hi,
> > Thanks for the suggestions.
> > Does this version look OK ?
>
> Can you write
>
> +(simplify
> + (vec_perm (vec_same_elem_p@0 @1) @0 @2)
> + @0)
>
> as
>
>  (vec_perm vec_same_elem_p@0 @0 @1)
>
> ?
(simplify
 (vec_perm vec_same_elem_p@0 @0 @1)
 @0)

results in:
gimple-match.c: In function ‘bool
gimple_simplify_VEC_PERM_EXPR(gimple_match_op*, gimple**, tree_node*
(*)(tree), code_helper, tree, tree, tree, tree)’:
gimple-match.c:103031:36: error: cannot convert ‘tree_node* (*)(tree)’
{aka ‘tree_node* (*)(tree_node*)’} to ‘tree_node**’
   if (gimple_vec_same_elem_p (op0, valueize))
^~~~

because gimple_vec_same_elem_p has tree *res_ops as 2nd param and
we're passing valueize as 2nd arg.

Thanks,
Prathamesh
>
> Otherwise looks OK.
>
> Thanks,
> Richard.
>
> > Thanks,
> > Prathamesh
> >
> > >
> > > Thanks,
> > > Richard
> > >
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > new file mode 100644
> > > > index 000..42654be3b31
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > > @@ -0,0 +1,20 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> > > > +
> > > > +void
> > > > +f1 (double x[][4])
> > > > +{
> > > > +  for (int i = 0; i < 4; ++i)
> > > > +for (int j = 0; j < 4; ++j)
> > > > +  x[i][j] = 0;
> > > > +}
> > > > +
> > > > +void
> > > > +f2 (double x[][4], double y)
> > > > +{
> > > > +  for (int i = 0; i < 4; ++i)
> > > > +for (int j = 0; j < 4; ++j)
> > > > +  x[i][j] = y;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */
> >
>
> --
> Richard Biener 
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)


Re: Use ODR for canonical types construction in LTO

2019-07-23 Thread Martin Liška
Hi.

Honza can you please fix this:

gcc/ipa-devirt.c:1616:12: warning: implicit conversion of NULL constant to 
'bool' [-Wnull-conversion]

Thanks,
Martin


Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Prathamesh Kulkarni wrote:

> On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
>  wrote:
> >
> > Not really my area, but FWIW...
> >
> > Prathamesh Kulkarni  writes:
> > > Hi,
> > > The attached patch tries to fix PR91166.
> > > Does it look OK ?
> > > Bootstrap+test in progress on aarch64-linux-gnu and 
> > > x86_64-unknown-linux-gnu.
> > >
> > > Thanks,
> > > Prathamesh
> > >
> > > 2019-07-17  Prathamesh Kulkarni  
> > >
> > >   PR middle-end/91166
> > >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> > >   (define_predicates): Add entry for uniform_vector_p.
> > >
> > > testsuite/
> > >   * gcc.target/aarch64/sve/pr91166.c: New test.
> > >
> > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > index 4a7aa0185d8..2ad98c28fd8 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
> > > integer_valued_real_p
> > > integer_pow2p
> > > uniform_integer_cst_p
> > > -   HONOR_NANS)
> > > +   HONOR_NANS
> > > +   uniform_vector_p)
> > >
> > >  /* Operator lists.  */
> > >  (define_operator_list tcc_comparison
> > > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE 
> > > (type; })
> > > (if (changed)
> > >  (vec_perm { op0; } { op1; } { op2; }))
> > > +
> > > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
> > > +(simplify
> > > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > > + { @0; })
> > > +
> > > +(simplify
> > > + (vec_perm uniform_vector_p@0 @0 @1)
> > > + { @0; })
> >
> > No need for the curly braces here, can use "@0" as the target of
> > the simplification.
> >
> > It'd probably be worth using (match ...) to define a new predicate
> > that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> > calling into uniform_vector_p for the latter two.
> Hi,
> Thanks for the suggestions.
> Does this version look OK ?

Can you write

+(simplify
+ (vec_perm (vec_same_elem_p@0 @1) @0 @2)
+ @0)

as

 (vec_perm vec_same_elem_p@0 @0 @1)

?

Otherwise looks OK.

Thanks,
Richard.
 
> Thanks,
> Prathamesh
> 
> >
> > Thanks,
> > Richard
> >
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > new file mode 100644
> > > index 000..42654be3b31
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > > @@ -0,0 +1,20 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> > > +
> > > +void
> > > +f1 (double x[][4])
> > > +{
> > > +  for (int i = 0; i < 4; ++i)
> > > +for (int j = 0; j < 4; ++j)
> > > +  x[i][j] = 0;
> > > +}
> > > +
> > > +void
> > > +f2 (double x[][4], double y)
> > > +{
> > > +  for (int i = 0; i < 4; ++i)
> > > +for (int j = 0; j < 4; ++j)
> > > +  x[i][j] = y;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

[PATCH] [MIPS] Add machine mode to get_fcsr pattern operand

2019-07-23 Thread Mihailo Stojanovic
From: Mihailo Stojanovic 

Hi,

Missing machine mode for the unspec_volatile operand of get_fcsr
patterns causes an ICE in simplify_subreg on n64 ABI.

This adds the missing machine modes and a new test.

Tested on mips64el-mti-linux-gnu.

Ok for trunk and possibly backport?

Cheers,
Mihailo

gcc/

* config/mips/mips.md (mips_get_fcsr, *mips_get_fcsr): Use SI
  machine mode for unspec_volatile operand.
* testsuite/gcc.target/mips/get-fcsr-3.c: New test.
---
 gcc/config/mips/mips.md| 4 ++--
 gcc/testsuite/gcc.target/mips/get-fcsr-3.c | 9 +
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/get-fcsr-3.c

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index e17b1d5..abc8485e 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7588,7 +7588,7 @@
 ;; __builtin_mips_get_fcsr: move the FCSR into operand 0.
 (define_expand "mips_get_fcsr"
   [(set (match_operand:SI 0 "register_operand")
-   (unspec_volatile [(const_int 0)] UNSPEC_GET_FCSR))]
+   (unspec_volatile:SI [(const_int 0)] UNSPEC_GET_FCSR))]
   "TARGET_HARD_FLOAT_ABI"
 {
   if (TARGET_MIPS16)
@@ -7600,7 +7600,7 @@
 
 (define_insn "*mips_get_fcsr"
   [(set (match_operand:SI 0 "register_operand" "=d")
-   (unspec_volatile [(const_int 0)] UNSPEC_GET_FCSR))]
+   (unspec_volatile:SI [(const_int 0)] UNSPEC_GET_FCSR))]
   "TARGET_HARD_FLOAT"
   "cfc1\t%0,$31")
 
diff --git a/gcc/testsuite/gcc.target/mips/get-fcsr-3.c 
b/gcc/testsuite/gcc.target/mips/get-fcsr-3.c
new file mode 100644
index 000..7bb97b6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/get-fcsr-3.c
@@ -0,0 +1,9 @@
+/* { dg-options "-mabi=64 -mhard-float" } */
+
+NOMIPS16 unsigned int
+foo (void)
+{
+  return __builtin_mips_get_fcsr () & 0x1;
+}
+
+/* { dg-final { scan-assembler "cfc1" } } */
-- 
2.7.4



Re: PR91166 - Unfolded ZIPs of constants

2019-07-23 Thread Prathamesh Kulkarni
On Fri, 19 Jul 2019 at 18:12, Richard Sandiford
 wrote:
>
> Not really my area, but FWIW...
>
> Prathamesh Kulkarni  writes:
> > Hi,
> > The attached patch tries to fix PR91166.
> > Does it look OK ?
> > Bootstrap+test in progress on aarch64-linux-gnu and 
> > x86_64-unknown-linux-gnu.
> >
> > Thanks,
> > Prathamesh
> >
> > 2019-07-17  Prathamesh Kulkarni  
> >
> >   PR middle-end/91166
> >   * match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
> >   (define_predicates): Add entry for uniform_vector_p.
> >
> > testsuite/
> >   * gcc.target/aarch64/sve/pr91166.c: New test.
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 4a7aa0185d8..2ad98c28fd8 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
> > integer_valued_real_p
> > integer_pow2p
> > uniform_integer_cst_p
> > -   HONOR_NANS)
> > +   HONOR_NANS
> > +   uniform_vector_p)
> >
> >  /* Operator lists.  */
> >  (define_operator_list tcc_comparison
> > @@ -5568,3 +5569,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type; 
> > })
> > (if (changed)
> >  (vec_perm { op0; } { op1; } { op2; }))
> > +
> > +/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
> > +(simplify
> > + (vec_perm (vec_duplicate@0 @1) @0 @2)
> > + { @0; })
> > +
> > +(simplify
> > + (vec_perm uniform_vector_p@0 @0 @1)
> > + { @0; })
>
> No need for the curly braces here, can use "@0" as the target of
> the simplification.
>
> It'd probably be worth using (match ...) to define a new predicate
> that handles (vec_duplicate ...), VECTOR_CST and CONSTRUCTOR,
> calling into uniform_vector_p for the latter two.
Hi,
Thanks for the suggestions.
Does this version look OK ?

Thanks,
Prathamesh

>
> Thanks,
> Richard
>
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > new file mode 100644
> > index 000..42654be3b31
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
> > @@ -0,0 +1,20 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
> > +
> > +void
> > +f1 (double x[][4])
> > +{
> > +  for (int i = 0; i < 4; ++i)
> > +for (int j = 0; j < 4; ++j)
> > +  x[i][j] = 0;
> > +}
> > +
> > +void
> > +f2 (double x[][4], double y)
> > +{
> > +  for (int i = 0; i < 4; ++i)
> > +for (int j = 0; j < 4; ++j)
> > +  x[i][j] = y;
> > +}
> > +
> > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */
2019-07-23  Prathamesh Kulkarni  

PR middle-end/91166
* match.pd (vec_perm_expr(v, v, mask) -> v): New pattern.
(define_predicates): Add entry for uniform_vector_p.
(vec_same_elem_p): New match pattern.

testsuite/
* gcc.target/aarch64/sve/pr91166.c: New test.

diff --git a/gcc/match.pd b/gcc/match.pd
index 4a7aa0185d8..f14670a7982 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -36,7 +36,8 @@ along with GCC; see the file COPYING3.  If not see
integer_valued_real_p
integer_pow2p
uniform_integer_cst_p
-   HONOR_NANS)
+   HONOR_NANS
+   uniform_vector_p)
 
 /* Operator lists.  */
 (define_operator_list tcc_comparison
@@ -5568,3 +5569,15 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  { bitsize_int (at * tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type; })
(if (changed)
 (vec_perm { op0; } { op1; } { op2; }))
+
+/* VEC_PERM_EXPR (v, v, mask) -> v where v contains same element.  */
+
+(match (vec_same_elem_p @0)
+ uniform_vector_p@0)
+
+(match (vec_same_elem_p @0)
+ (vec_duplicate @0))
+
+(simplify
+ (vec_perm (vec_same_elem_p@0 @1) @0 @2)
+ @0)
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
new file mode 100644
index 000..42654be3b31
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr91166.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8.2-a+sve -fdump-tree-optimized" } */
+
+void
+f1 (double x[][4]) 
+{
+  for (int i = 0; i < 4; ++i)
+for (int j = 0; j < 4; ++j)
+  x[i][j] = 0;
+}
+
+void
+f2 (double x[][4], double y)
+{
+  for (int i = 0; i < 4; ++i)
+for (int j = 0; j < 4; ++j)
+  x[i][j] = y;
+}
+
+/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized"} } */


[PATCH] Deduce automatically number of cores for -flto option.

2019-07-23 Thread Martin Liška
On 7/23/19 11:20 AM, Jan Hubicka wrote:
> Hi,
> great you found time to make this. It should become the default for
> -flto IMO.

Works for me. Then I'm suggesting to not come up with -flto=auto and
only document that -flto passed during linking will automatically detect
number of cores. It's the same what clang does.

> 
> I think we also should auto-detect the case where jobserver is available
> and in that case let make to connect to the outer jobserver.  (We should
> also really convince make developers to invent way to connect to it w/o
> the extra + role)

Good idea, however, I don't have any experience with make and I'm leaving that
to you as a follow up improvement.

Martin
>From 72489809a50a52f37748fe9ca4e7382ef2faa00c Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 23 Jul 2019 10:14:31 +0200
Subject: [PATCH] Deduce automatically number of cores for -flto option.

gcc/ChangeLog:

2019-07-23  Martin Liska  

	* doc/invoke.texi: Document new behavior.
	* lto-wrapper.c (cpuset_popcount): New function
	is a copy of libgomp/config/linux/proc.c.
	(init_num_threads): Likewise.
	(run_gcc): Automatically detect core count for -flto.
---
 gcc/doc/invoke.texi |   3 +-
 gcc/lto-wrapper.c   | 121 +++-
 2 files changed, 122 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 77a2d561e38..f5bfea3f003 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10396,7 +10396,8 @@ If you specify the optional @var{n}, the optimization and code
 generation done at link time is executed in parallel using @var{n}
 parallel jobs by utilizing an installed @command{make} program.  The
 environment variable @env{MAKE} may be used to override the program
-used.  The default value for @var{n} is 1.
+used.  The default value for @var{n} is automatically detected based
+on number of cores.
 
 You can also specify @option{-flto=jobserver} to use GNU make's
 job server mode to determine the number of parallel jobs. This
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 946897726d0..7e29ecb3618 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1110,6 +1110,110 @@ cmp_priority (const void *a, const void *b)
   return *((const int *)b)-*((const int *)a);
 }
 
+/* Number of CPUs that can be used for parallel LTRANS phase.  */
+
+static unsigned long nthreads_var = 0;
+
+#ifdef HAVE_PTHREAD_AFFINITY_NP
+unsigned long cpuset_size;
+static unsigned long get_cpuset_size;
+cpu_set_t *cpusetp;
+
+unsigned long
+static cpuset_popcount (unsigned long cpusetsize, cpu_set_t *cpusetp)
+{
+#ifdef CPU_COUNT_S
+  /* glibc 2.7 and above provide a macro for this.  */
+  return CPU_COUNT_S (cpusetsize, cpusetp);
+#else
+#ifdef CPU_COUNT
+  if (cpusetsize == sizeof (cpu_set_t))
+/* glibc 2.6 and above provide a macro for this.  */
+return CPU_COUNT (cpusetp);
+#endif
+  size_t i;
+  unsigned long ret = 0;
+  STATIC_ASSERT (sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int));
+  for (i = 0; i < cpusetsize / sizeof (cpusetp->__bits[0]); i++)
+{
+  unsigned long int mask = cpusetp->__bits[i];
+  if (mask == 0)
+	continue;
+  ret += __builtin_popcountl (mask);
+}
+  return ret;
+#endif
+}
+#endif
+
+/* At startup, determine the default number of threads.  It would seem
+   this should be related to the number of cpus online.  */
+
+static void
+init_num_threads (void)
+{
+#ifdef HAVE_PTHREAD_AFFINITY_NP
+#if defined (_SC_NPROCESSORS_CONF) && defined (CPU_ALLOC_SIZE)
+  cpuset_size = sysconf (_SC_NPROCESSORS_CONF);
+  cpuset_size = CPU_ALLOC_SIZE (cpuset_size);
+#else
+  cpuset_size = sizeof (cpu_set_t);
+#endif
+
+  cpusetp = (cpu_set_t *) xmalloc (gomp_cpuset_size);
+  do
+{
+  int ret = pthread_getaffinity_np (pthread_self (), gomp_cpuset_size,
+	cpusetp);
+  if (ret == 0)
+	{
+	  /* Count only the CPUs this process can use.  */
+	  nthreads_var = cpuset_popcount (cpuset_size, cpusetp);
+	  if (nthreads_var == 0)
+	break;
+	  get_cpuset_size = cpuset_size;
+#ifdef CPU_ALLOC_SIZE
+	  unsigned long i;
+	  for (i = cpuset_size * 8; i; i--)
+	if (CPU_ISSET_S (i - 1, cpuset_size, cpusetp))
+	  break;
+	  cpuset_size = CPU_ALLOC_SIZE (i);
+#endif
+	  return;
+	}
+  if (ret != EINVAL)
+	break;
+#ifdef CPU_ALLOC_SIZE
+  if (cpuset_size < sizeof (cpu_set_t))
+	cpuset_size = sizeof (cpu_set_t);
+  else
+	cpuset_size = cpuset_size * 2;
+  if (cpuset_size < 8 * sizeof (cpu_set_t))
+	cpusetp
+	  = (cpu_set_t *) realloc (cpusetp, cpuset_size);
+  else
+	{
+	  /* Avoid fatal if too large memory allocation would be
+	 requested, e.g. kernel returning EINVAL all the time.  */
+	  void *p = realloc (cpusetp, cpuset_size);
+	  if (p == NULL)
+	break;
+	  cpusetp = (cpu_set_t *) p;
+	}
+#else
+  break;
+#endif
+}
+  while (1);
+  cpuset_size = 0;
+  nthreads_var = 1;
+  free (cpusetp);
+  cpusetp = NULL;
+#endif
+#ifdef _SC_NPROCESSORS_ONLN
+  nthreads_var = sysconf 

[PATCH] Fix PR91231

2019-07-23 Thread Richard Biener


With LTO it's easy to run out of linemap encoding space (surprisingly
so for columns even though LTO never uses any column besides zero...).
But some code cannot deal with locations being dropped to zero
"after the fact" (without LTO you do not get re-encoding), in this
example it is the inline-entry markers which assert that the
corresponding BLOCK has a location associated with an inline.

The fix here is to drop those stmts when we end up re-encoding
a formerly inlined_function_outer_scope_p as UNKNOWN_LOCATION.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2019-07-23  Richard Biener  

PR debug/91231
* lto-streamer-in.c (input_function): Drop inline-entry markers
that ended up with an unknown location block.

Index: gcc/lto-streamer-in.c
===
--- gcc/lto-streamer-in.c   (revision 273718)
+++ gcc/lto-streamer-in.c   (working copy)
@@ -1140,6 +1140,14 @@ input_function (tree fn_decl, struct dat
  ? !MAY_HAVE_DEBUG_MARKER_STMTS
  : !MAY_HAVE_DEBUG_BIND_STMTS))
remove = true;
+ /* In case the linemap overflows locations can be dropped
+to zero.  Thus do not keep nonsensical inline entry markers
+we'd later ICE on.  */
+ tree block;
+ if (gimple_debug_inline_entry_p (stmt)
+ && (block = gimple_block (stmt))
+ && !inlined_function_outer_scope_p (block))
+   remove = true;
  if (is_gimple_call (stmt)
  && gimple_call_internal_p (stmt))
{


[PATCH] Fix PR83518 for aarch64

2019-07-23 Thread Richard Biener


This is the last piece to fix the testcase for aarch64 and it's
gimplification of int a[] = { 1,2,3} to a = *.LC0;

It also fixes having calls in GIMPLE FE IL when emitting a CFG.

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-07-23  Richard Biener  

PR tree-optimization/83518
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle aggregate
init from a constant even when partial defs are already recorded.

c/
* gimple-parser.c (c_parser_parse_gimple_body): When we have
a CFG also rebuild cgraph edges.

* gcc.dg/tree-ssa/ssa-fre-79.c: New testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-79.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-79.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-79.c  (working copy)
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+/* { dg-options "-O -fgimple -fdump-tree-fre1" } */
+
+struct S { char a[4]; };
+const struct S cs = { 1, 2, 3, 4 };
+
+int __GIMPLE(ssa,startwith("fre"))
+main ()
+{
+  struct S s;
+  short _1;
+
+  __BB(2):
+  s = cs;
+  s.a[1] = _Literal (char) 3;
+  _1 = __MEM  ( + 1);
+  if (_1 != _Literal (short) 0x303)
+goto __BB3;
+  else
+goto __BB4;
+
+  __BB(3):
+  __builtin_abort ();
+
+  __BB(4):
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-not "abort" "fre1" } } */
Index: gcc/c/gimple-parser.c
===
--- gcc/c/gimple-parser.c   (revision 273657)
+++ gcc/c/gimple-parser.c   (working copy)
@@ -356,6 +356,8 @@ c_parser_parse_gimple_body (c_parser *cp
   gcov_type t = PARAM_VALUE (PARAM_GIMPLE_FE_COMPUTED_HOT_BB_THRESHOLD);
   set_hot_bb_threshold (t);
   update_max_bb_count ();
+  cgraph_node::get_create (cfun->decl);
+  cgraph_edge::rebuild_edges ();
 }
   dump_function (TDI_gimple, current_function_decl);
 }
Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 273667)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2702,9 +2702,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   && gimple_assign_single_p (def_stmt)
   && (DECL_P (gimple_assign_rhs1 (def_stmt))
   || TREE_CODE (gimple_assign_rhs1 (def_stmt)) == MEM_REF
-  || handled_component_p (gimple_assign_rhs1 (def_stmt)))
-  /* Handling this is more complicated, give up for now.  */
-  && data->partial_defs.is_empty ())
+  || handled_component_p (gimple_assign_rhs1 (def_stmt
 {
   tree base2;
   int i, j, k;
@@ -2808,8 +2806,30 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   /* Try folding the new reference to a constant.  */
   tree val = fully_constant_vn_reference_p (vr);
   if (val)
-   return vn_reference_lookup_or_insert_for_pieces
-(vuse, vr->set, vr->type, vr->operands, val);
+   {
+ if (data->partial_defs.is_empty ())
+   return vn_reference_lookup_or_insert_for_pieces
+   (vuse, vr->set, vr->type, vr->operands, val);
+ /* This is the only interesting case for partial-def handling
+coming from targets that like to gimplify init-ctors as
+aggregate copies from constant data like aarch64 for
+PR83518.  */
+ if (maxsize.is_constant ()
+ && known_eq (ref->size, maxsize))
+   {
+ pd_data pd;
+ pd.rhs = val;
+ pd.offset = 0;
+ pd.size = maxsizei / BITS_PER_UNIT;
+ return data->push_partial_def (pd, vuse, maxsizei);
+   }
+   }
+
+  /* Continuing with partial defs isn't easily possible here, we
+ have to find a full def from further lookups from here.  Probably
+not worth the special-casing everywhere.  */
+  if (!data->partial_defs.is_empty ())
+   return (void *)-1;
 
   /* Adjust *ref from the new operands.  */
   if (!ao_ref_init_from_vn_reference (, vr->set, vr->type, vr->operands))


Re: [PATCH] Remove a pointless global var

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Alexander Monakov wrote:

> On Tue, 23 Jul 2019, Richard Biener wrote:
> 
> > > Can you let me handle sort.cc changes once there's agreement
> > > which way to go?
> > 
> > Sure - feel free to give option 3 a shot.  I'm just worried
> > about the system.h redirect which would be then disabled for C?
> 
> *nod*   My preference would be to poison qsort for C, but there's also
> the possibility to employ option 1 for C while using option 3 for C++.

Works for me.  For C we can use a static redirect in the header as well
and do the usual casting.  Or the poisoning (not sure if that works).
We probably also want to redirect qsort_r from the system then.

Richard.


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-23 Thread Richard Biener
On Tue, Jul 23, 2019 at 1:59 AM Jeff Law  wrote:
>
> On 7/16/19 12:37 PM, Andrew MacLeod wrote:
> > On 7/9/19 5:56 AM, Richard Biener wrote:
> >> On Tue, Jul 9, 2019 at 9:28 AM Aldy Hernandez  wrote:
> >>>
> >>>
> >>> On 7/4/19 6:33 AM, Richard Biener wrote:
>  On Wed, Jul 3, 2019 at 2:17 PM Aldy Hernandez  wrote:
> > On 7/3/19 7:08 AM, Richard Biener wrote:
> >> On Wed, Jul 3, 2019 at 11:19 AM Aldy Hernandez 
> >> wrote:
> > How about we keep VARYING and UNDEFINED typeless until right before we
> > call into the ranger.  At which point, we have can populate min/max
> > because we have the tree_code and the type handy.  So right before we
> > call into the ranger do:
> >
> >   if (varying_p ())
> > foo->set_varying(TYPE);
> >
> > This would avoid the type cache, and keep the ranger happy.
>  you cannot do set_varying on the static const range but instead
>  you'd do
> 
>  value_range tem (*foo);
>  if (varying_p ())
>   tem->set_full_range (TYPE);
> 
>  which I think we already do in some places.  Thus my question _where_
>  you actually need this.
> >>> Basically, everywhere.  By having a type for varying/undefined, we don't
> >>> have to special case anything.  Sure, we could for example, special case
> >>> the invert operation for undefined / varying.  And we could special case
> >>> everything dealing with ranges to handle varying and undefined, but why?
> >>>We could also pass a type argument everywhere, but that's just ugly.
> >>> However, I do understand your objection to the type cache.
> >>>
> >>> How about the attached approach?  Set the type for varying/undefined
> >>> when we know it, while avoiding touching the CONST varying.  Then right
> >>> before calling the ranger, pass down a new varying node with min/max for
> >>> any varyings that were still typeless until that point.
> >>>
> >>> I have taken care of never adding a set_varying() that was not already
> >>> there.  Would this keep the const happy?
> >>>
> >>> Technically we don't need to set varying/undef types for every instance
> >>> in VRP, but we need it at least for the code that will be shared with
> >>> range-ops (extract_range_from_multiplicative_op, union, intersect, etc).
> >>>I just figured if we have the information, might as well set it for
> >>> consistency.
> >>>
> >>> If you like this approach, I can rebase the other patches that depend on
> >>> this one.
> >> OK, so I went ant checked what you do for class irange which has
> >> a type but no kind member (but constructors with a kind).  It also
> >> uses wide_int members for storage.  For a pure integer constant
> >> range representation this represents somewhat odd choices;  I'd
> >> have elided the m_type member completely here, it seems fully
> >> redundant.  Only range operations need to be carried out in a
> >> specific type (what I was suggesting above).  Even the precision
> >> encoded in the wide_int members is redundant then (I'd have
> >> expected widest_int here and trailing-wide-ints for optimizing
> >> storage).
> >
> > What irange contains internally is a bit arbitrary.  It's really an API
> > for managing a set of subranges.  We could have used trees internally
> > just as easily, then we wouldnt need a type field. Since we were doing
> > lots of operations, rather than going back and forth from trees all the
> > time, we just used the underlying wide_int directly.   we could have
> > fiddle farted around with HOST_WIDE_INT or whatever, but wide_int is
> > there, has all the operations, and it works fine. so thats what it
> > currently is on the branch.
> >
> > We are treating a range object as a unique self contained object.
> > Therefore, the range has a type so we know how to print it, and can
> > confirm before any operation that the ranges being operated on are
> > appropriately matched.  We found and opened bugzillas over the past
> > couple years for places where our code caught bugs because a range was
> > created and then operated on in a way that was not compatible with
> > another range.  I think there is a still an open one against ada(?)
> > where the switch and case  are different precision.
> >
> > From my point of view, a range object is similar to a tree node. A tree
> > node has the bits to indicate what the value is, but also associates a
> > type with those bits within the same object.  This is less error prone
> > than passing around the bits and the type separately.  As ranges are
> > starting to be used in many places outside of VRP, we should do the same
> > thing with ranges.  WIth value_range it would actually be free since
> > there is already a tree for the bounds already which contains the type.
> >
> >
> >
> >
> >
> >> to fold_range/op_range?  The API also seems to be oddly
> >> constrained to binary ops.  Anyway, the way you build
> >> the operator table requires an awful lot of global C++ ctor
> >> 

Add znver2 scheduler model

2019-07-23 Thread Jan Hubicka
Hi,
this patch adds znver2 scheduler model. Znver2 is close enough to znver1
that I have decided to implement it in one automaton.  The main
difference is extra AGU unit that seems to be used for store only
(according to CPU diagram), the fact that 256bit vector operations are
no longer split (and thus they behave like 128bit) and reduced latency
of fp multiply and conversion operations.

The patch seems to have very little effect on overall performance but
since we do not model the out of order core and thus we think that the
CPU is mostly starved by not having enough parallelism to exectue.
Still it is better to be precise.

Bootstrapped/regtested x86_64-linux, commited.

* common/config/i386/i386-common.c: Use PROCESSOR_ZNVER2 scheduler for
znver2.
* config/i386/znver1.md: Enable patterns for znver2 and add store
variants which use extra AGU unit.
Index: common/config/i386/i386-common.c
===
--- common/config/i386/i386-common.c(revision 273727)
+++ common/config/i386/i386-common.c(working copy)
@@ -1760,7 +1760,7 @@ const pta processor_alias_table[] =
   | PTA_RDRND | PTA_MOVBE | PTA_MWAITX | PTA_ADX | PTA_RDSEED
   | PTA_CLZERO | PTA_CLFLUSHOPT | PTA_XSAVEC | PTA_XSAVES
   | PTA_SHA | PTA_LZCNT | PTA_POPCNT},
-  {"znver2", PROCESSOR_ZNVER2, CPU_ZNVER1,
+  {"znver2", PROCESSOR_ZNVER2, CPU_ZNVER2,
 PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3
   | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1
   | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_AVX2
Index: config/i386/znver1.md
===
--- config/i386/znver1.md   (revision 273727)
+++ config/i386/znver1.md   (working copy)
@@ -17,10 +17,11 @@
 ;; .
 ;;
 
 (define_attr "znver1_decode" "direct,vector,double"
   (const_string "direct"))
 
-;; AMD znver1 Scheduling
+;; AMD znver1 and znver2 Scheduling
 ;; Modeling automatons for zen decoders, integer execution pipes,
 ;; AGU pipes and floating point execution units.
 (define_automaton "znver1, znver1_ieu, znver1_fp, znver1_agu")
@@ -51,13 +52,21 @@
 (define_cpu_unit "znver1-ieu3" "znver1_ieu")
 (define_reservation "znver1-ieu" 
"znver1-ieu0|znver1-ieu1|znver1-ieu2|znver1-ieu3")
 
-;; 2 AGU pipes.
+;; 2 AGU pipes in znver1 and 3 AGU pipes in znver2
+;; According to CPU diagram last AGU unit is used only for stores.
 (define_cpu_unit "znver1-agu0" "znver1_agu")
 (define_cpu_unit "znver1-agu1" "znver1_agu")
+(define_cpu_unit "znver2-agu2" "znver1_agu")
 (define_reservation "znver1-agu-reserve" "znver1-agu0|znver1-agu1")
+(define_reservation "znver2-store-agu-reserve" 
"znver1-agu0|znver1-agu1|znver2-agu2")
 
+;; Load is 4 cycles. We do not model reservation of load unit.
+;;(define_reservation "znver1-load" "znver1-agu-reserve, nothing, nothing, 
nothing")
 (define_reservation "znver1-load" "znver1-agu-reserve")
+;; Store operations differs between znver1 and znver2 because extra AGU
+;; was added.
 (define_reservation "znver1-store" "znver1-agu-reserve")
+(define_reservation "znver2-store" "znver2-store-agu-reserve")
 
 ;; vectorpath (microcoded) instructions are single issue instructions.
 ;; So, they occupy all the integer units.
@@ -65,6 +74,9 @@
  +znver1-ieu2+znver1-ieu3
  +znver1-agu0+znver1-agu1")
 
+(define_reservation "znver2-ivector" "znver1-ieu0+znver1-ieu1
+ +znver1-ieu2+znver1-ieu3
+ +znver1-agu0+znver1-agu1+znver2-agu2")
 ;; Floating point unit 4 FP pipes.
 (define_cpu_unit "znver1-fp0" "znver1_fp")
 (define_cpu_unit "znver1-fp1" "znver1_fp")
@@ -76,6 +88,9 @@
 (define_reservation "znver1-fvector" "znver1-fp0+znver1-fp1
  +znver1-fp2+znver1-fp3
  +znver1-agu0+znver1-agu1")
+(define_reservation "znver2-fvector" "znver1-fp0+znver1-fp1
+ +znver1-fp2+znver1-fp3
+ +znver1-agu0+znver1-agu1+znver2-agu2")
 
 ;; Call instruction
 (define_insn_reservation "znver1_call" 1
@@ -83,27 +98,36 @@
  (eq_attr "type" "call,callv"))
 "znver1-double,znver1-store,znver1-ieu0|znver1-ieu3")
 
+(define_insn_reservation "znver2_call" 1
+(and (eq_attr "cpu" "znver2")
+ (eq_attr "type" "call,callv"))
+"znver1-double,znver2-store,znver1-ieu0|znver1-ieu3")
+
 ;; General instructions
 (define_insn_reservation "znver1_push" 1
 (and (eq_attr "cpu" "znver1")
  (and (eq_attr "type" "push")
-  (eq_attr "memory" "none,unknown")))
+  (eq_attr "memory" "store")))
  

Update X86_TUNE_AVOID_256FMA_CHAINS for znver2

2019-07-23 Thread Jan Hubicka
Hi,
this patch enables logic which avoid FMA for matrix multiplicaiton loop
for 256 bit vectors. The underlying issue is same as with znver1. While
combined latency of mutliply and add operations is slower than FMA, the
dependency chain in matrix multiplication depends only on additions
that are faster.

Bootstrapped/regtested x86_64-linux, comitted.

* config/i386/i386-options.c (ix86_option_override_internal): Default
PARAM_AVOID_FMA_MAX_BITS to 256 for znver2.
* conifg/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS): Set for
ZNVER2.

Index: config/i386/i386-options.c
===
--- config/i386/i386-options.c  (revision 273727)
+++ config/i386/i386-options.c  (working copy)
@@ -2779,7 +2779,11 @@ ix86_option_override_internal (bool main
 opts->x_flag_cf_protection
   = (cf_protection_level) (opts->x_flag_cf_protection | CF_SET);
 
-  if (ix86_tune_features [X86_TUNE_AVOID_128FMA_CHAINS])
+  if (ix86_tune_features [X86_TUNE_AVOID_256FMA_CHAINS])
+maybe_set_param_value (PARAM_AVOID_FMA_MAX_BITS, 256,
+  opts->x_param_values,
+  opts_set->x_param_values);
+  else if (ix86_tune_features [X86_TUNE_AVOID_128FMA_CHAINS])
 maybe_set_param_value (PARAM_AVOID_FMA_MAX_BITS, 128,
   opts->x_param_values,
   opts_set->x_param_values);
Index: config/i386/x86-tune.def
===
--- config/i386/x86-tune.def(revision 273727)
+++ config/i386/x86-tune.def(working copy)
@@ -431,6 +431,10 @@ DEF_TUNE (X86_TUNE_USE_GATHER, "use_gath
smaller FMA chain.  */
 DEF_TUNE (X86_TUNE_AVOID_128FMA_CHAINS, "avoid_fma_chains", m_ZNVER)
 
+/* X86_TUNE_AVOID_256FMA_CHAINS: Avoid creating loops with tight 256bit or
+   smaller FMA chain.  */
+DEF_TUNE (X86_TUNE_AVOID_256FMA_CHAINS, "avoid_fma256_chains", m_ZNVER2)
+
 /*/
 /* AVX instruction selection tuning (some of SSE flags affects AVX, too) */
 /*/


Re: [range-ops] patch 01/04: types for VR_UNDEFINED and VR_VARYING

2019-07-23 Thread Richard Biener
On Tue, Jul 16, 2019 at 8:37 PM Andrew MacLeod  wrote:
>
> On 7/9/19 5:56 AM, Richard Biener wrote:
> > On Tue, Jul 9, 2019 at 9:28 AM Aldy Hernandez  wrote:
> >>
> >>
> >> On 7/4/19 6:33 AM, Richard Biener wrote:
> >>> On Wed, Jul 3, 2019 at 2:17 PM Aldy Hernandez  wrote:
>  On 7/3/19 7:08 AM, Richard Biener wrote:
> > On Wed, Jul 3, 2019 at 11:19 AM Aldy Hernandez  wrote:
>  How about we keep VARYING and UNDEFINED typeless until right before we
>  call into the ranger.  At which point, we have can populate min/max
>  because we have the tree_code and the type handy.  So right before we
>  call into the ranger do:
> 
>    if (varying_p ())
>  foo->set_varying(TYPE);
> 
>  This would avoid the type cache, and keep the ranger happy.
> >>> you cannot do set_varying on the static const range but instead you'd do
> >>>
> >>> value_range tem (*foo);
> >>> if (varying_p ())
> >>>  tem->set_full_range (TYPE);
> >>>
> >>> which I think we already do in some places.  Thus my question _where_
> >>> you actually need this.
> >> Basically, everywhere.  By having a type for varying/undefined, we don't
> >> have to special case anything.  Sure, we could for example, special case
> >> the invert operation for undefined / varying.  And we could special case
> >> everything dealing with ranges to handle varying and undefined, but why?
> >>We could also pass a type argument everywhere, but that's just ugly.
> >> However, I do understand your objection to the type cache.
> >>
> >> How about the attached approach?  Set the type for varying/undefined
> >> when we know it, while avoiding touching the CONST varying.  Then right
> >> before calling the ranger, pass down a new varying node with min/max for
> >> any varyings that were still typeless until that point.
> >>
> >> I have taken care of never adding a set_varying() that was not already
> >> there.  Would this keep the const happy?
> >>
> >> Technically we don't need to set varying/undef types for every instance
> >> in VRP, but we need it at least for the code that will be shared with
> >> range-ops (extract_range_from_multiplicative_op, union, intersect, etc).
> >>I just figured if we have the information, might as well set it for
> >> consistency.
> >>
> >> If you like this approach, I can rebase the other patches that depend on
> >> this one.
> > OK, so I went ant checked what you do for class irange which has
> > a type but no kind member (but constructors with a kind).  It also
> > uses wide_int members for storage.  For a pure integer constant
> > range representation this represents somewhat odd choices;  I'd
> > have elided the m_type member completely here, it seems fully
> > redundant.  Only range operations need to be carried out in a
> > specific type (what I was suggesting above).  Even the precision
> > encoded in the wide_int members is redundant then (I'd have
> > expected widest_int here and trailing-wide-ints for optimizing
> > storage).
>
> What irange contains internally is a bit arbitrary.  It's really an API
> for managing a set of subranges.  We could have used trees internally
> just as easily, then we wouldnt need a type field. Since we were doing
> lots of operations, rather than going back and forth from trees all the
> time, we just used the underlying wide_int directly.   we could have
> fiddle farted around with HOST_WIDE_INT or whatever, but wide_int is
> there, has all the operations, and it works fine. so thats what it
> currently is on the branch.

But a range has no type.  Period.  The fact that with the in-tree implementation
there's access to a type is artificial.  All workers in the in-tree
implementation
get a type for the operation they carry out.

But somehow irange doesn't get that.

In fact, an irange shouldn't be bound to any type, and the usual
"carry out multiplication of two integer typed vars with the result
in integer" on irange should be multiply two iranges (with unbound
result) plus a "truncate to integer type" operation.

>
> We are treating a range object as a unique self contained object.
> Therefore, the range has a type so we know how to print it, and can
> confirm before any operation that the ranges being operated on are
> appropriately matched.  We found and opened bugzillas over the past
> couple years for places where our code caught bugs because a range was
> created and then operated on in a way that was not compatible with
> another range.  I think there is a still an open one against ada(?)
> where the switch and case  are different precision.

I'm fine with sanity-checking where it makes sense but looking at
the irange code the fact that there is a type is a fundamental requirement.
IMHO that's bad design.

>
>  From my point of view, a range object is similar to a tree node. A tree
> node has the bits to indicate what the value is, but also associates a
> type with those bits within the same object.  This is less error 

Update x86-tune-costs.h for znver2

2019-07-23 Thread Jan Hubicka
Hi,
this patch updates znver2 costs to match reality.  In particular we
re-benchmarked memcpy strategies and it looks that glibc now wins even
for relatively small blocks. 
Moreover I updated costs of moves to reflect that znver2 has 256 vector
paths and faster multiplication.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* x86-tune-costs.h (znver2_memcpy): Update.
(znver2_costs): Update 256 bit SSE costs and multiplication.
Index: config/i386/x86-tune-costs.h
===
--- config/i386/x86-tune-costs.h(revision 273727)
+++ config/i386/x86-tune-costs.h(working copy)
@@ -1279,12 +1279,12 @@ struct processor_costs znver1_cost = {
 static stringop_algs znver2_memcpy[2] = {
   {libcall, {{6, loop, false}, {14, unrolled_loop, false},
 {-1, rep_prefix_4_byte, false}}},
-  {libcall, {{16, loop, false}, {8192, rep_prefix_8_byte, false},
+  {libcall, {{16, loop, false}, {64, rep_prefix_4_byte, false},
 {-1, libcall, false;
 static stringop_algs znver2_memset[2] = {
   {libcall, {{8, loop, false}, {24, unrolled_loop, false},
 {2048, rep_prefix_4_byte, false}, {-1, libcall, false}}},
-  {libcall, {{48, unrolled_loop, false}, {8192, rep_prefix_8_byte, false},
+  {libcall, {{24, rep_prefix_4_byte, false}, {128, rep_prefix_8_byte, false},
 {-1, libcall, false;
 
 struct processor_costs znver2_cost = {
@@ -1335,11 +1335,11 @@ struct processor_costs znver2_cost = {
   in SImode and DImode.  */
   {8, 8},  /* cost of storing MMX registers
   in SImode and DImode.  */
-  2, 3, 6, /* cost of moving XMM,YMM,ZMM
+  2, 2, 3, /* cost of moving XMM,YMM,ZMM
   register.  */
-  {6, 6, 6, 10, 20},   /* cost of loading SSE registers
+  {6, 6, 6, 6, 12},/* cost of loading SSE registers
   in 32,64,128,256 and 512-bit.  */
-  {6, 6, 6, 10, 20},   /* cost of unaligned loads.  */
+  {6, 6, 6, 6, 12},/* cost of unaligned loads.  */
   {8, 8, 8, 8, 16},/* cost of storing SSE registers
   in 32,64,128,256 and 512-bit.  */
   {8, 8, 8, 8, 16},/* cost of unaligned stores.  */
@@ -1372,7 +1372,7 @@ struct processor_costs znver2_cost = {
   COSTS_N_INSNS (1),   /* cost of cheap SSE instruction.  */
   COSTS_N_INSNS (3),   /* cost of ADDSS/SD SUBSS/SD insns.  */
   COSTS_N_INSNS (3),   /* cost of MULSS instruction.  */
-  COSTS_N_INSNS (4),   /* cost of MULSD instruction.  */
+  COSTS_N_INSNS (3),   /* cost of MULSD instruction.  */
   COSTS_N_INSNS (5),   /* cost of FMA SS instruction.  */
   COSTS_N_INSNS (5),   /* cost of FMA SD instruction.  */
   COSTS_N_INSNS (10),  /* cost of DIVSS instruction.  */


Re: [PATCH] Remove a pointless global var

2019-07-23 Thread Alexander Monakov
On Tue, 23 Jul 2019, Richard Biener wrote:

> > Can you let me handle sort.cc changes once there's agreement
> > which way to go?
> 
> Sure - feel free to give option 3 a shot.  I'm just worried
> about the system.h redirect which would be then disabled for C?

*nod*   My preference would be to poison qsort for C, but there's also
the possibility to employ option 1 for C while using option 3 for C++.

Alexander


Re: [PATCH] Come up with -flto=auto option.

2019-07-23 Thread Jan Hubicka
> Hi.
> 
> As we as openSUSE started using -flto, I see it very handy to have
> an option value that will automatically detect number of cores
> that can be used for parallel LTRANS phase.
> 
> Thoughts?
Hi,
great you found time to make this. It should become the default for
-flto IMO.

I think we also should auto-detect the case where jobserver is available
and in that case let make to connect to the outer jobserver.  (We should
also really convince make developers to invent way to connect to it w/o
the extra + role)

Honza
> 
> gcc/ChangeLog:
> 
> 2019-07-23  Martin Liska  
> 
>   * doc/invoke.texi: Document the new option value.
>   * lto-wrapper.c (cpuset_popcount): New function
>   is a copy of libgomp/config/linux/proc.c.
>   (init_num_threads): Likewise.
>   (run_gcc): Support -flto=auto.
> ---
>  gcc/doc/invoke.texi |   3 ++
>  gcc/lto-wrapper.c   | 124 +++-
>  2 files changed, 126 insertions(+), 1 deletion(-)
> 
> 

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 77a2d561e38..58656fbe1e1 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -10398,6 +10398,9 @@ parallel jobs by utilizing an installed 
> @command{make} program.  The
>  environment variable @env{MAKE} may be used to override the program
>  used.  The default value for @var{n} is 1.
>  
> +You can specify @var{auto} to automatically detect number of
> +cores that will determine the number of parallel jobs.
> +
>  You can also specify @option{-flto=jobserver} to use GNU make's
>  job server mode to determine the number of parallel jobs. This
>  is useful when the Makefile calling GCC is already executing in parallel.
> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
> index 946897726d0..5451285f896 100644
> --- a/gcc/lto-wrapper.c
> +++ b/gcc/lto-wrapper.c
> @@ -1110,6 +1110,110 @@ cmp_priority (const void *a, const void *b)
>return *((const int *)b)-*((const int *)a);
>  }
>  
> +/* Number of CPUs that can be used for parallel LTRANS phase.  */
> +
> +static unsigned long nthreads_var = 0;
> +
> +#ifdef HAVE_PTHREAD_AFFINITY_NP
> +unsigned long cpuset_size;
> +static unsigned long get_cpuset_size;
> +cpu_set_t *cpusetp;
> +
> +unsigned long
> +static cpuset_popcount (unsigned long cpusetsize, cpu_set_t *cpusetp)
> +{
> +#ifdef CPU_COUNT_S
> +  /* glibc 2.7 and above provide a macro for this.  */
> +  return CPU_COUNT_S (cpusetsize, cpusetp);
> +#else
> +#ifdef CPU_COUNT
> +  if (cpusetsize == sizeof (cpu_set_t))
> +/* glibc 2.6 and above provide a macro for this.  */
> +return CPU_COUNT (cpusetp);
> +#endif
> +  size_t i;
> +  unsigned long ret = 0;
> +  STATIC_ASSERT (sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int));
> +  for (i = 0; i < cpusetsize / sizeof (cpusetp->__bits[0]); i++)
> +{
> +  unsigned long int mask = cpusetp->__bits[i];
> +  if (mask == 0)
> + continue;
> +  ret += __builtin_popcountl (mask);
> +}
> +  return ret;
> +#endif
> +}
> +#endif
> +
> +/* At startup, determine the default number of threads.  It would seem
> +   this should be related to the number of cpus online.  */
> +
> +static void
> +init_num_threads (void)
> +{
> +#ifdef HAVE_PTHREAD_AFFINITY_NP
> +#if defined (_SC_NPROCESSORS_CONF) && defined (CPU_ALLOC_SIZE)
> +  cpuset_size = sysconf (_SC_NPROCESSORS_CONF);
> +  cpuset_size = CPU_ALLOC_SIZE (cpuset_size);
> +#else
> +  cpuset_size = sizeof (cpu_set_t);
> +#endif
> +
> +  cpusetp = (cpu_set_t *) xmalloc (gomp_cpuset_size);
> +  do
> +{
> +  int ret = pthread_getaffinity_np (pthread_self (), gomp_cpuset_size,
> + cpusetp);
> +  if (ret == 0)
> + {
> +   /* Count only the CPUs this process can use.  */
> +   nthreads_var = cpuset_popcount (cpuset_size, cpusetp);
> +   if (nthreads_var == 0)
> + break;
> +   get_cpuset_size = cpuset_size;
> +#ifdef CPU_ALLOC_SIZE
> +   unsigned long i;
> +   for (i = cpuset_size * 8; i; i--)
> + if (CPU_ISSET_S (i - 1, cpuset_size, cpusetp))
> +   break;
> +   cpuset_size = CPU_ALLOC_SIZE (i);
> +#endif
> +   return;
> + }
> +  if (ret != EINVAL)
> + break;
> +#ifdef CPU_ALLOC_SIZE
> +  if (cpuset_size < sizeof (cpu_set_t))
> + cpuset_size = sizeof (cpu_set_t);
> +  else
> + cpuset_size = cpuset_size * 2;
> +  if (cpuset_size < 8 * sizeof (cpu_set_t))
> + cpusetp
> +   = (cpu_set_t *) realloc (cpusetp, cpuset_size);
> +  else
> + {
> +   /* Avoid fatal if too large memory allocation would be
> +  requested, e.g. kernel returning EINVAL all the time.  */
> +   void *p = realloc (cpusetp, cpuset_size);
> +   if (p == NULL)
> + break;
> +   cpusetp = (cpu_set_t *) p;
> + }
> +#else
> +  break;
> +#endif
> +}
> +  while (1);
> +  cpuset_size = 0;
> +  nthreads_var = 1;
> +  free (cpusetp);
> +  cpusetp = NULL;
> +#endif
> +#ifdef 

Re: [PATCH] Remove a pointless global var

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Alexander Monakov wrote:

> On Tue, 23 Jul 2019, Richard Biener wrote:
> 
> > OK, so like below.  Slight complication is that C++ doesn't allow
> > the void * to int (*)() casting which means we probably have to go
> > with the template wrapper.  I have to check if we use gcc_qsort
> > from C code, the prototypes are currently in system.h.
> 
> This is a bit confusing.  I see the patch implements option 1, but
> you're saying you'd prefer option 3 (which is also my preference)?

I just wanted to see how it plays out thus I took the simplest
which is option 1.  But yes, if option 3 works I'm all for it.

> Calls from C code should be very rare, it's possible to adjust them
> manually if going for option 3.
> 
> > Of course easiest is to simply rewrite all qsort calls to use
> > [gcc_]qsort_r and not provide qsort and poison it.  I'm leaning
> > towards this "solution".
> 
> All ~150 of them?

:/

> > On targets with register passing conventions we can also
> > completely elide the cmp2to3 wrapper (ok, that's a hack).
> 
> A bad hack indeed.  Option 3 allows getting efficient code
> in common cases without causing undefined behavior.
> 
> Can you let me handle sort.cc changes once there's agreement
> which way to go?

Sure - feel free to give option 3 a shot.  I'm just worried
about the system.h redirect which would be then disabled for C?

Thanks,
Richard.


Re: [PATCH] Remove a pointless global var

2019-07-23 Thread Alexander Monakov
On Tue, 23 Jul 2019, Richard Biener wrote:

> OK, so like below.  Slight complication is that C++ doesn't allow
> the void * to int (*)() casting which means we probably have to go
> with the template wrapper.  I have to check if we use gcc_qsort
> from C code, the prototypes are currently in system.h.

This is a bit confusing.  I see the patch implements option 1, but
you're saying you'd prefer option 3 (which is also my preference)?

Calls from C code should be very rare, it's possible to adjust them
manually if going for option 3.

> Of course easiest is to simply rewrite all qsort calls to use
> [gcc_]qsort_r and not provide qsort and poison it.  I'm leaning
> towards this "solution".

All ~150 of them?

> On targets with register passing conventions we can also
> completely elide the cmp2to3 wrapper (ok, that's a hack).

A bad hack indeed.  Option 3 allows getting efficient code
in common cases without causing undefined behavior.

Can you let me handle sort.cc changes once there's agreement
which way to go?

Alexander


[PATCH] Come up with -flto=auto option.

2019-07-23 Thread Martin Liška
Hi.

As we as openSUSE started using -flto, I see it very handy to have
an option value that will automatically detect number of cores
that can be used for parallel LTRANS phase.

Thoughts?

gcc/ChangeLog:

2019-07-23  Martin Liska  

* doc/invoke.texi: Document the new option value.
* lto-wrapper.c (cpuset_popcount): New function
is a copy of libgomp/config/linux/proc.c.
(init_num_threads): Likewise.
(run_gcc): Support -flto=auto.
---
 gcc/doc/invoke.texi |   3 ++
 gcc/lto-wrapper.c   | 124 +++-
 2 files changed, 126 insertions(+), 1 deletion(-)


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 77a2d561e38..58656fbe1e1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10398,6 +10398,9 @@ parallel jobs by utilizing an installed @command{make} program.  The
 environment variable @env{MAKE} may be used to override the program
 used.  The default value for @var{n} is 1.
 
+You can specify @var{auto} to automatically detect number of
+cores that will determine the number of parallel jobs.
+
 You can also specify @option{-flto=jobserver} to use GNU make's
 job server mode to determine the number of parallel jobs. This
 is useful when the Makefile calling GCC is already executing in parallel.
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index 946897726d0..5451285f896 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -1110,6 +1110,110 @@ cmp_priority (const void *a, const void *b)
   return *((const int *)b)-*((const int *)a);
 }
 
+/* Number of CPUs that can be used for parallel LTRANS phase.  */
+
+static unsigned long nthreads_var = 0;
+
+#ifdef HAVE_PTHREAD_AFFINITY_NP
+unsigned long cpuset_size;
+static unsigned long get_cpuset_size;
+cpu_set_t *cpusetp;
+
+unsigned long
+static cpuset_popcount (unsigned long cpusetsize, cpu_set_t *cpusetp)
+{
+#ifdef CPU_COUNT_S
+  /* glibc 2.7 and above provide a macro for this.  */
+  return CPU_COUNT_S (cpusetsize, cpusetp);
+#else
+#ifdef CPU_COUNT
+  if (cpusetsize == sizeof (cpu_set_t))
+/* glibc 2.6 and above provide a macro for this.  */
+return CPU_COUNT (cpusetp);
+#endif
+  size_t i;
+  unsigned long ret = 0;
+  STATIC_ASSERT (sizeof (cpusetp->__bits[0]) == sizeof (unsigned long int));
+  for (i = 0; i < cpusetsize / sizeof (cpusetp->__bits[0]); i++)
+{
+  unsigned long int mask = cpusetp->__bits[i];
+  if (mask == 0)
+	continue;
+  ret += __builtin_popcountl (mask);
+}
+  return ret;
+#endif
+}
+#endif
+
+/* At startup, determine the default number of threads.  It would seem
+   this should be related to the number of cpus online.  */
+
+static void
+init_num_threads (void)
+{
+#ifdef HAVE_PTHREAD_AFFINITY_NP
+#if defined (_SC_NPROCESSORS_CONF) && defined (CPU_ALLOC_SIZE)
+  cpuset_size = sysconf (_SC_NPROCESSORS_CONF);
+  cpuset_size = CPU_ALLOC_SIZE (cpuset_size);
+#else
+  cpuset_size = sizeof (cpu_set_t);
+#endif
+
+  cpusetp = (cpu_set_t *) xmalloc (gomp_cpuset_size);
+  do
+{
+  int ret = pthread_getaffinity_np (pthread_self (), gomp_cpuset_size,
+	cpusetp);
+  if (ret == 0)
+	{
+	  /* Count only the CPUs this process can use.  */
+	  nthreads_var = cpuset_popcount (cpuset_size, cpusetp);
+	  if (nthreads_var == 0)
+	break;
+	  get_cpuset_size = cpuset_size;
+#ifdef CPU_ALLOC_SIZE
+	  unsigned long i;
+	  for (i = cpuset_size * 8; i; i--)
+	if (CPU_ISSET_S (i - 1, cpuset_size, cpusetp))
+	  break;
+	  cpuset_size = CPU_ALLOC_SIZE (i);
+#endif
+	  return;
+	}
+  if (ret != EINVAL)
+	break;
+#ifdef CPU_ALLOC_SIZE
+  if (cpuset_size < sizeof (cpu_set_t))
+	cpuset_size = sizeof (cpu_set_t);
+  else
+	cpuset_size = cpuset_size * 2;
+  if (cpuset_size < 8 * sizeof (cpu_set_t))
+	cpusetp
+	  = (cpu_set_t *) realloc (cpusetp, cpuset_size);
+  else
+	{
+	  /* Avoid fatal if too large memory allocation would be
+	 requested, e.g. kernel returning EINVAL all the time.  */
+	  void *p = realloc (cpusetp, cpuset_size);
+	  if (p == NULL)
+	break;
+	  cpusetp = (cpu_set_t *) p;
+	}
+#else
+  break;
+#endif
+}
+  while (1);
+  cpuset_size = 0;
+  nthreads_var = 1;
+  free (cpusetp);
+  cpusetp = NULL;
+#endif
+#ifdef _SC_NPROCESSORS_ONLN
+  nthreads_var = sysconf (_SC_NPROCESSORS_ONLN);
+#endif
+}
 
 /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */
 
@@ -1124,6 +1228,7 @@ run_gcc (unsigned argc, char *argv[])
   const char *collect_gcc, *collect_gcc_options;
   int parallel = 0;
   int jobserver = 0;
+  int auto_parallel = 0;
   bool no_partition = false;
   struct cl_decoded_option *fdecoded_options = NULL;
   struct cl_decoded_option *offload_fdecoded_options = NULL;
@@ -1251,6 +1356,11 @@ run_gcc (unsigned argc, char *argv[])
 	  jobserver = 1;
 	  parallel = 1;
 	}
+	  else if (strcmp (option->arg, "auto") == 0)
+	{
+	  auto_parallel = 1;
+	  parallel = 1;
+	}
 	  else
 	{
 	  parallel = 

[Ada] Eliminate redundant overflow checks for conversions from fixed-point

2019-07-23 Thread Pierre-Marie de Rodat
This eliminates redundant overflow checks that are generated for
conversions from fixed-point to integer types when range checks are also
enabled (which is the default), as the former checks are subsumed into
the latter checks.

No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Eric Botcazou  

gcc/ada/

* checks.adb (Activate_Overflow_Check): Remove redundant
argument.
* exp_ch4.adb (Discrete_Range_Check): Reset the overflow flag.
(Expand_N_Type_Conversion): Do not reset it here.--- gcc/ada/checks.adb
+++ gcc/ada/checks.adb
@@ -435,7 +435,7 @@ package body Checks is
 
   --  Fall through for cases where we do set the flag
 
-  Set_Do_Overflow_Check (N, True);
+  Set_Do_Overflow_Check (N);
   Possible_Local_Raise (N, Standard_Constraint_Error);
end Activate_Overflow_Check;
 

--- gcc/ada/exp_ch4.adb
+++ gcc/ada/exp_ch4.adb
@@ -11000,6 +11000,11 @@ package body Exp_Ch4 is
 Rewrite (Expr, Unchecked_Convert_To (Ityp, Expr));
  end if;
 
+ --  Reset overflow flag, since the range check will include
+ --  dealing with possible overflow, and generate the check.
+
+ Set_Do_Overflow_Check (N, False);
+
  Generate_Range_Check (Expr, Target_Type, CE_Range_Check_Failed);
   end Discrete_Range_Check;
 
@@ -12096,11 +12101,6 @@ package body Exp_Ch4 is
   or else (Is_Fixed_Point_Type (Etype (Expression (N)))
 and then Conversion_OK (N)))
  then
---  Reset overflow flag, since the range check will include
---  dealing with possible overflow, and generate the check.
-
-Set_Do_Overflow_Check (N, False);
-
 --  If Address is either a source type or target type,
 --  suppress range check to avoid typing anomalies when
 --  it is a visible integer type.



[Ada] Plug small loophole in Generate_Range_Check

2019-07-23 Thread Pierre-Marie de Rodat
The Generate_Range_Check routine is responsible for generating range
checks in the scalar case.  It automatically deals with possible
overflow in the process when the source and the target base types are
different.

However there is one case where overflow is not dealt with correctly,
namely when the target base type is narrower than the source base type
and both are floating-point types. In this case, the routine will
convert the source type to the target base type without checking for
overflow. In practice this does not matter much because the conversion
would yield an infinity on overflow, which would then fail the
subsequent range check. However it's more correct to have a proper
overflow check with -gnateF than relying on the infinity.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Eric Botcazou  

gcc/ada/

* checks.adb (Convert_And_Check_Range): Add Suppress parameter
and pass it in the call to Insert_Actions.  Rename local
variable.
(Generate_Range_Check): Minor comment fixes.  Pass Range_Check
in the first call to Convert_And_Check_Range and All_Checks in
the second call.
* exp_ch4.adb (Expand_N_Type_Conversion): Reset the
Do_Overflow_Check flag in the float-to-float case too if there
is also a range check.

gcc/testsuite/

* gnat.dg/range_check5.adb: New testcase.--- gcc/ada/checks.adb
+++ gcc/ada/checks.adb
@@ -6841,18 +6841,19 @@ package body Checks is
   Source_Base_Type : constant Entity_Id  := Base_Type (Source_Type);
   Target_Base_Type : constant Entity_Id  := Base_Type (Target_Type);
 
-  procedure Convert_And_Check_Range;
-  --  Convert the conversion operand to the target base type and save in
-  --  a temporary. Then check the converted value against the range of the
-  --  target subtype.
+  procedure Convert_And_Check_Range (Suppress : Check_Id);
+  --  Convert N to the target base type and save the result in a temporary.
+  --  The action is analyzed using the default checks as modified by the
+  --  given Suppress argument. Then check the converted value against the
+  --  range of the target subtype.
 
   -
   -- Convert_And_Check_Range --
   -
 
-  procedure Convert_And_Check_Range is
- Tnn   : constant Entity_Id := Make_Temporary (Loc, 'T', N);
- Conv_Node : Node_Id;
+  procedure Convert_And_Check_Range (Suppress : Check_Id) is
+ Tnn: constant Entity_Id := Make_Temporary (Loc, 'T', N);
+ Conv_N : Node_Id;
 
   begin
  --  For enumeration types with non-standard representation this is a
@@ -6867,36 +6868,26 @@ package body Checks is
and then Present (Enum_Pos_To_Rep (Source_Base_Type))
and then Is_Integer_Type (Target_Base_Type)
  then
-Conv_Node :=
-  OK_Convert_To
-(Typ  => Target_Base_Type,
- Expr => Duplicate_Subexpr (N));
-
- --  Common case
-
+Conv_N := OK_Convert_To (Target_Base_Type, Duplicate_Subexpr (N));
  else
-Conv_Node :=
-  Make_Type_Conversion (Loc,
-Subtype_Mark => New_Occurrence_Of (Target_Base_Type, Loc),
-Expression   => Duplicate_Subexpr (N));
+Conv_N := Convert_To (Target_Base_Type, Duplicate_Subexpr (N));
  end if;
 
- --  We make a temporary to hold the value of the converted value
- --  (converted to the base type), and then do the test against this
- --  temporary. The conversion itself is replaced by an occurrence of
- --  Tnn and followed by the explicit range check. Note that checks
- --  are suppressed for this code, since we don't want a recursive
- --  range check popping up.
+ --  We make a temporary to hold the value of the conversion to the
+ --  target base type, and then do the test against this temporary.
+ --  N itself is replaced by an occurrence of Tnn and followed by
+ --  the explicit range check.
 
  -- Tnn : constant Target_Base_Type := Target_Base_Type (N);
  -- [constraint_error when Tnn not in Target_Type]
+ -- Tnn
 
  Insert_Actions (N, New_List (
Make_Object_Declaration (Loc,
  Defining_Identifier => Tnn,
  Object_Definition   => New_Occurrence_Of (Target_Base_Type, Loc),
  Constant_Present=> True,
- Expression  => Conv_Node),
+ Expression  => Conv_N),
 
Make_Raise_Constraint_Error (Loc,
  Condition =>
@@ -6904,7 +6895,7 @@ package body Checks is
  Left_Opnd  => New_Occurrence_Of (Tnn, Loc),
  Right_Opnd => New_Occurrence_Of (Target_Type, Loc)),
  Reason => Reason)),
-   Suppress => 

[Ada] Fix binding of ghost units with finalizer

2019-07-23 Thread Pierre-Marie de Rodat
Linking of an enabled ghost unit which requires a finalizer lead to an
error, as the name generated by the binder for calling the finalizer was
not the same as the name chosen by the compiler. Now fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Yannick Moy  

gcc/ada/

* exp_ch7.adb (Create_Finalizer): Force finalizer not to be
Ghost enabled.
* exp_dbug.adb (Get_External_Name): Explain special case of
Ghost finalizer.

gcc/testsuite/

* gnat.dg/ghost6.adb, gnat.dg/ghost6_pkg.ads: New testcase.--- gcc/ada/exp_ch7.adb
+++ gcc/ada/exp_ch7.adb
@@ -2035,6 +2035,13 @@ package body Exp_Ch7 is
 
 Analyze (Fin_Body, Suppress => All_Checks);
  end if;
+
+ --  Never consider that the finalizer procedure is enabled Ghost, even
+ --  when the corresponding unit is Ghost, as this would lead to an
+ --  an external name with a ___ghost_ prefix that the binder cannot
+ --  generate, as it has no knowledge of the Ghost status of units.
+
+ Set_Is_Checked_Ghost_Entity (Fin_Id, False);
   end Create_Finalizer;
 
   --

--- gcc/ada/exp_dbug.adb
+++ gcc/ada/exp_dbug.adb
@@ -914,6 +914,14 @@ package body Exp_Dbug is
   --  names produced for Ghost entities, while "__ghost_" can appear in
   --  names of entities inside a child/local package called "Ghost".
 
+  --  The compiler-generated finalizer for an enabled Ghost unit is treated
+  --  specially, as its name must be known to the binder, which has no
+  --  knowledge of Ghost status. In that case, the finalizer is not marked
+  --  as Ghost so that no prefix is added. Note that the special ___ghost_
+  --  prefix is retained when the Ghost unit is ignored, which still allows
+  --  inspecting the final executable for the presence of an ignored Ghost
+  --  finalizer procedure.
+
   if Is_Ghost_Entity (E)
 and then not Is_Compilation_Unit (E)
 and then (Name_Len < 9

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/ghost6.adb
@@ -0,0 +1,10 @@
+--  { dg-do link }
+--  { dg-options "-gnata -g" }
+
+with Ghost6_Pkg;
+
+procedure Ghost6 is
+   X : Ghost6_Pkg.T with Ghost;
+begin
+   null;
+end Ghost6;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/ghost6_pkg.ads
@@ -0,0 +1,7 @@
+with Ada.Finalization;
+
+package Ghost6_Pkg with
+  Ghost
+is
+   type T is new Ada.Finalization.Controlled with null record;
+end Ghost6_Pkg;



[Ada] Issue error on SPARK ownership rule violation

2019-07-23 Thread Pierre-Marie de Rodat
A modified rule in SPARK RM specifies that object declarations of
anonymous access type should only occur immediately in subprogram, entry
or block. Now checked.

There is no impact on compilation.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Yannick Moy  

gcc/ada/

* sem_spark.ads (Is_Local_Context): New function.
* sem_spark.adb (Check_Declaration): Issue errors on violations
of SPARK RM 3.10(4)
(Process_Path): Do not issue error on borrow/observe during
elaboration, as these are caught by the new rule.--- gcc/ada/sem_spark.adb
+++ gcc/ada/sem_spark.adb
@@ -1419,9 +1419,37 @@ package body Sem_SPARK is
 Check_Expression (Subtype_Indication (Decl), Read);
 
  when N_Object_Declaration =>
+Expr := Expression (Decl);
+
 Check_Type (Target_Typ);
 
-Expr := Expression (Decl);
+--  A declaration of a stand-alone object of an anonymous access
+--  type shall have an explicit initial value and shall occur
+--  immediately within a subprogram body, an entry body, or a
+--  block statement (SPARK RM 3.10(4)).
+
+if Is_Anonymous_Access_Type (Target_Typ) then
+   declare
+  Scop : constant Entity_Id := Scope (Target);
+   begin
+  if not Is_Local_Context (Scop) then
+ if Emit_Messages then
+Error_Msg_N
+  ("object of anonymous access type must be declared "
+   & "immediately within a subprogram, entry or block "
+   & "(SPARK RM 3.10(4))", Decl);
+ end if;
+  end if;
+   end;
+
+   if No (Expr) then
+  if Emit_Messages then
+ Error_Msg_N ("object of anonymous access type must be "
+  & "initialized (SPARK RM 3.10(4))", Decl);
+  end if;
+   end if;
+end if;
+
 if Present (Expr) then
Check_Assignment (Target => Target,
  Expr   => Expr);
@@ -2848,9 +2876,14 @@ package body Sem_SPARK is
  --  independently for R permission. Outputs are checked
  --  independently to have RW permission on exit.
 
- when Pragma_Contract_Cases
+ --  Postconditions are checked for correct use of 'Old, but starting
+ --  from the corresponding declaration, in order to avoid dealing with
+ --  with contracts on generic subprograms, which are not handled in
+ --  GNATprove.
+
+ when Pragma_Precondition
 | Pragma_Postcondition
-| Pragma_Precondition
+| Pragma_Contract_Cases
 | Pragma_Refined_Post
  =>
 null;
@@ -3993,6 +4026,16 @@ package body Sem_SPARK is
   end case;
end Is_Deep;
 
+   --
+   -- Is_Local_Context --
+   --
+
+   function Is_Local_Context (Scop : Entity_Id) return Boolean is
+   begin
+  return Is_Subprogram_Or_Entry (Scop)
+or else Ekind (Scop) = E_Block;
+   end Is_Local_Context;
+

-- Is_Path_Expression --

@@ -4863,13 +4906,10 @@ package body Sem_SPARK is
 
  when Borrow =>
 
---  Forbidden during elaboration
+--  Forbidden during elaboration, an error is already issued in
+--  Check_Declaration, just return.
 
 if Inside_Elaboration then
-   if not Inside_Procedure_Call and then Emit_Messages then
-  Error_Msg_N ("illegal borrow during elaboration", Expr);
-   end if;
-
return;
 end if;
 
@@ -4882,13 +4922,10 @@ package body Sem_SPARK is
 
  when Observe =>
 
---  Forbidden during elaboration
+--  Forbidden during elaboration, an error is already issued in
+--  Check_Declaration, just return.
 
 if Inside_Elaboration then
-   if not Inside_Procedure_Call and then Emit_Messages then
-  Error_Msg_N ("illegal observe during elaboration", Expr);
-   end if;
-
return;
 end if;
 

--- gcc/ada/sem_spark.ads
+++ gcc/ada/sem_spark.ads
@@ -162,4 +162,8 @@ package Sem_SPARK is
 
function Is_Traversal_Function (E : Entity_Id) return Boolean;
 
+   function Is_Local_Context (Scop : Entity_Id) return Boolean;
+   --  Return if a given scope defines a local context where it is legal to
+   --  declare a variable of anonymous access type.
+
 end Sem_SPARK;



[Ada] Iterators are view-specific

2019-07-23 Thread Pierre-Marie de Rodat
Operational aspects, such as Default_Iterator, are view-specific, and if
such an aspect appears on the full view of a private type, an object of
the type cannot be iterated upon if it is not in the scope of the full
view, This patch diagnoses properly an attempt to iterate over such an
object.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Ed Schonberg  

gcc/ada/

* aspects.ads: New table Operational_Aspect, used to distinguish
between aspects that are view-specific, such as those related to
iterators, and representation aspects that apply to all views of
a type.
* aspects.adb (Find_Aspect): If the aspect being sought is
operational, do not ecamine the full view of a private type to
retrieve it.
* sem_ch5.adb (Analyze_Iterator_Specification): Improve error
message when the intended domain of iteration does not implement
the required iterator aspects.

gcc/testsuite/

* gnat.dg/iter5.adb: Add an expected error.
* gnat.dg/iter6.adb: New testcase.--- gcc/ada/aspects.adb
+++ gcc/ada/aspects.adb
@@ -225,7 +225,10 @@ package body Aspects is
 Owner := Root_Type (Owner);
  end if;
 
- if Is_Private_Type (Owner) and then Present (Full_View (Owner)) then
+ if Is_Private_Type (Owner)
+and then Present (Full_View (Owner))
+and then not Operational_Aspect (A)
+ then
 Owner := Full_View (Owner);
  end if;
   end if;

--- gcc/ada/aspects.ads
+++ gcc/ada/aspects.ads
@@ -277,6 +277,20 @@ package Aspects is
   Aspect_Warnings   => True,
   others=> False);
 
+   --  The following array indicates aspects that specify operational
+   --  characteristics, and thus are view-specific. Representation
+   --  aspects break privacy, as they are needed during expansion and
+   --  code generation.
+   --  List is currently incomplete ???
+
+   Operational_Aspect : constant array (Aspect_Id) of Boolean :=
+ (Aspect_Constant_Indexing  => True,
+  Aspect_Default_Iterator   => True,
+  Aspect_Iterator_Element   => True,
+  Aspect_Iterable   => True,
+  Aspect_Variable_Indexing  => True,
+  others=> False);
+
--  The following array indicates aspects for which multiple occurrences of
--  the same aspect attached to the same declaration are allowed.
 

--- gcc/ada/sem_ch5.adb
+++ gcc/ada/sem_ch5.adb
@@ -2234,8 +2234,17 @@ package body Sem_Ch5 is
It : Interp;
 
 begin
+   --  THe domain of iteralion must implement either the RM
+   --  iterator interface, or the SPARK Iterable aspect.
+
if No (Iterator) then
-  null;  --  error reported below
+  if No
+ (Find_Aspect (Etype (Iter_Name), Aspect_Iterable))
+  then
+ Error_Msg_NE ("cannot iterate over&",
+   N, Base_Type (Etype (Iter_Name)));
+ return;
+  end if;
 
elsif not Is_Overloaded (Iterator) then
   Check_Reverse_Iteration (Etype (Iterator));

--- gcc/testsuite/gnat.dg/iter5.adb
+++ gcc/testsuite/gnat.dg/iter5.adb
@@ -4,7 +4,7 @@ with Iter5_Pkg;
 
 procedure Iter5 is
 begin
-   for The_Filename of Iter5_Pkg.Iterator_For ("C:\Program_Files") loop
+   for The_Filename of Iter5_Pkg.Iterator_For ("C:\Program_Files") loop  --  { dg-error "cannot iterate over \"Item\"" }
   null;
end loop;
 end Iter5;

--- /dev/null
new file mode 100644
+++ gcc/testsuite/gnat.dg/iter6.adb
@@ -0,0 +1,40 @@
+--  { dg-do compile }
+
+with Ada.Iterator_Interfaces;
+
+procedure Iter6 is
+  package Pkg is
+  type Item (<>) is limited private;
+   private
+
+   type Cursor is null record;
+
+  function Constant_Reference (The_Item : aliased Item;
+   Unused_Index : Cursor) return String
+ is ("");
+
+  function Has_More (Data : Cursor) return Boolean is (False);
+
+  package List_Iterator_Interfaces is new Ada.Iterator_Interfaces
+(Cursor, Has_More);
+
+   function Iterate (The_Item : Item)
+		return List_Iterator_Interfaces.Forward_Iterator'class
+ is (raise Program_Error);
+
+  type Item (Name_Length : Natural) is tagged limited record
+ null;
+  end record
+  with
+Constant_Indexing => Constant_Reference,
+Default_Iterator  => Iterate,
+Iterator_Element  => String;
+  end Pkg; use Pkg;
+
+  type Item_Ref is access Item;
+  function F return Item_Ref is (null);
+begin
+   for I of F.all loop --  { dg-error "cannot iterate over \"Item\"" }
+ null;
+   end loop;
+end;



[Ada] Minor tweak to -gnatR output

2019-07-23 Thread Pierre-Marie de Rodat
This makes sure that the numbers present in the -gnatR output are
printed in decimal format in all cases, since the hexadecimal format is
not compatible with the JSON syntax.

Tested on x86_64-pc-linux-gnu, committed on trunk

2019-07-23  Eric Botcazou  

gcc/ada/

* repinfo.adb (List_Component_Layout): Pass Decimal to UI_Write.
(Write_Val): Likewise.--- gcc/ada/repinfo.adb
+++ gcc/ada/repinfo.adb
@@ -1150,7 +1150,7 @@ package body Repinfo is
 if Ekind (Ent) = E_Discriminant then
Spaces (Indent);
Write_Str ("  ""discriminant"": ");
-   UI_Write (Discriminant_Number (Ent));
+   UI_Write (Discriminant_Number (Ent), Decimal);
Write_Line (",");
 end if;
 Spaces (Indent);
@@ -1181,7 +1181,7 @@ package body Repinfo is
 Spaces (Max_Spos_Length - 2);
 
 if Starting_Position /= Uint_0 then
-   UI_Write (Starting_Position);
+   UI_Write (Starting_Position, Decimal);
Write_Str (" + ");
 end if;
 
@@ -1205,7 +1205,7 @@ package body Repinfo is
 Sbit := Sbit - SSU;
  end if;
 
- UI_Write (Sbit);
+ UI_Write (Sbit, Decimal);
 
  if List_Representation_Info_To_JSON then
 Write_Line (", ");
@@ -1227,13 +1227,13 @@ package body Repinfo is
 Lbit := Sbit + Esiz - 1;
 
 if List_Representation_Info_To_JSON then
-   UI_Write (Esiz);
+   UI_Write (Esiz, Decimal);
 else
if Lbit >= 0 and then Lbit < 10 then
   Write_Char (' ');
end if;
 
-   UI_Write (Lbit);
+   UI_Write (Lbit, Decimal);
 end if;
 
  --  The test for Esize (Ent) not Uint_0 here is an annoying special
@@ -2414,7 +2414,7 @@ package body Repinfo is
  end if;
 
   else
- UI_Write (Val);
+ UI_Write (Val, Decimal);
   end if;
end Write_Val;
 



Re: [PATCH] Remove a pointless global var

2019-07-23 Thread Richard Biener
On Mon, 22 Jul 2019, Alexander Monakov wrote:

> Hi!
> 
> On Mon, 22 Jul 2019, Richard Biener wrote:
> 
> > I've noticed that we have quite some instances using qsort
> > and passing down state to the comparator via a global var.
> > Now that we have our own qsort implementation I wonder what
> > the best course of action is to add a variant passing down
> > such state to the comparator?  Copying nearly the whole
> > implementation would be possible but also some marshalling
> > of three-argument comparator calls to two-argument functions
> > (and some ABIs can do without actual marshalling).  The
> > other option is templating everything (ugh).
> 
> I think there are three choices.
> 
> 1. Have gcc/sort.cc implement only qsort_r, use a "universal comparator
> adapter"
> 
> int cmp2to3(void *a, void *b, void *ctx)
> {
>   return (int (*)(void *, void *))ctx(a, b);
> }
> 
> to adapt existing qsort uses.
> 
> 2. Have gcc/sort.cc implement both qsort and qsort_r by duplicating code,
> either copying manually (ugh) or by turning netsort, mergesort, qsort_chk
> to templates.
> 
> 3. Have gcc/sort.cc implement only qsort_r, but have existing qsort callers
> pass their comparators to a fancy C++ "universal comparator adapter" so
> they can be inlined into the three-argument wrapper and thus
> speed/size penalties are tiny (only from always loading the context pointer
> that is ignored by legacy two-argument comparators):
> 
> void qsort_r(void *, size_t, size_t, int cmp(void *, void *, void *));
> 
> template
> int cmp2to3(void *a, void *b, void *c)
> {
>   return cmp(a, b);
> }
> 
> #define qsort(base, sz, n, cmp) \
>   qsort_r(base, sz, n, cmp2to3)
> 
> static int my_cmp(void *a, void *b)
> {
>   return 0;
> }
> 
> void test(void *base, size_t sz, size_t n)
> {
>   qsort(base, sz, n, my_cmp);
> }

OK, so like below.  Slight complication is that C++ doesn't allow
the void * to int (*)() casting which means we probably have to go
with the template wrapper.  I have to check if we use gcc_qsort
from C code, the prototypes are currently in system.h.

Of course easiest is to simply rewrite all qsort calls to use
[gcc_]qsort_r and not provide qsort and poison it.  I'm leaning
towards this "solution".

On targets with register passing conventions we can also
completely elide the cmp2to3 wrapper (ok, that's a hack).

Richard.

Index: gcc/system.h
===
--- gcc/system.h(revision 273657)
+++ gcc/system.h(working copy)
@@ -1200,10 +1200,15 @@ helper_const_non_const_cast (const char
 /* GCC qsort API-compatible functions: except in release-checking compilers,
redirect 4-argument qsort calls to gcc_qsort; keep 1-argument invocations
corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
-void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
+void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *,
+   void *), void *);
 void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
+void gcc_qsort_r (void *, size_t, size_t, int (*)(const void *, const void *,
+ void *), void *);
 void gcc_stablesort (void *, size_t, size_t,
 int (*)(const void *, const void *));
+void gcc_stablesort_r (void *, size_t, size_t,
+  int (*)(const void *, const void *, void *), void *);
 #define PP_5th(a1, a2, a3, a4, a5, ...) a5
 #undef qsort
 #define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) 
(__VA_ARGS__)
Index: gcc/sort.cc
===
--- gcc/sort.cc (revision 273657)
+++ gcc/sort.cc (working copy)
@@ -46,12 +46,13 @@ along with GCC; see the file COPYING3.
 #endif
 
 /* C-style qsort comparator function type.  */
-typedef int cmp_fn (const void *, const void *);
+typedef int cmp_fn (const void *, const void *, void *);
 
 /* Structure holding read-mostly (read-only in netsort) context.  */
 struct sort_ctx
 {
   cmp_fn *cmp; // pointer to comparator
+  void *cmp_d; // pointer to comparator data
   char   *out; // output buffer
   size_t n;// number of elements
   size_t size; // element size
@@ -128,10 +129,10 @@ do {
This is noinline to avoid code growth and confine invocation
to a single call site, assisting indirect branch prediction.  */
 noinline static intptr_t
-cmp1 (char *e0, char *e1, cmp_fn *cmp)
+cmp1 (char *e0, char *e1, cmp_fn *cmp, void *cmp_d)
 {
   intptr_t x = (intptr_t)e0 ^ (intptr_t)e1;
-  return x & (cmp (e0, e1) >> 31);
+  return x & (cmp (e0, e1, cmp_d) >> 31);
 }
 
 /* Execute network sort on 2 to 5 elements from IN, placing them into C->OUT.
@@ -141,7 +142,7 @@ netsort (char *in, sort_ctx *c)
 {
 #define CMP(e0, e1)   \
 do {  \
-  intptr_t x = cmp1 (e1, e0, c->cmp); \
+  intptr_t x = cmp1 (e1, 

Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-23 Thread Richard Biener
On Tue, 23 Jul 2019, Kewen.Lin wrote:

> on 2019/7/22 下午3:18, Richard Biener wrote:
> > On Mon, 22 Jul 2019, Segher Boessenkool wrote:
> > 
> >> Hi!
> >>
> >> (Maybe I am missing half of the discussion -- sorry if so).
> >>
> >> I think we should have a new iv for just the doloop (which can have the
> >> same starting value and step and type as another iv).
> >>
> >> Has this been considered?
> > 
> > I was also suggesting this (maybe with too many words ;)).  If
> > it's a doloop target add such IV (candidate!) which has zero
> > use-cost for the increment and compare but a (target configurable)
> > penalty for other uses.  Invasiveness of this approach is probably
> > that you need to distinguish this candidate by making it a new
> > kind (or maybe we can just have a specia candidate number...).
> > 
> 
> Hi Richard,
> 
> Really appreciate your comments on this, very sorry not to go with this.
> Since this patch is for TARGET_HAVE_COUNT_REG_DECR_P, I was thinking
> it's fairly enough to reuse the existing IV cands and just zeroing doloop
> use cost with them.  I'm very happy to unify it.  If you/Segher/Bin don't
> have any concerns, I'd like to make it as one follow up item.
> 
> One thing to double check is this dedicated IV will follow decrement
> instead of increment align with doloop optimize?  Then it looks to shape
> the loop closing to doloop pattern, at least it's decrement.

I think doloop support should be as "simple" as always adding a
candidate starting from niter (-1?), step -1 marked as DOLOOP_IV
(which is then used in costing, making uses in the IV update and
the compare zero cost and uses in other places according to the
target by using an appropriate hardreg for the fake RTL we create).

IV costing and elimination should then choose the doloop IV if that's
profitable.

Richard.

Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-23 Thread Richard Biener
On Mon, 22 Jul 2019, Segher Boessenkool wrote:

> On Mon, Jul 22, 2019 at 09:18:10AM +0200, Richard Biener wrote:
> > On Mon, 22 Jul 2019, Segher Boessenkool wrote:
> > 
> > > Hi!
> > > 
> > > (Maybe I am missing half of the discussion -- sorry if so).
> > > 
> > > I think we should have a new iv for just the doloop (which can have the
> > > same starting value and step and type as another iv).
> > > 
> > > Has this been considered?
> > 
> > I was also suggesting this (maybe with too many words ;)).  If
> > it's a doloop target add such IV (candidate!) which has zero
> > use-cost for the increment and compare but a (target configurable)
> > penalty for other uses.  Invasiveness of this approach is probably
> > that you need to distinguish this candidate by making it a new
> > kind (or maybe we can just have a specia candidate number...).
> 
> Or just set some (boolean) flag in the candidate.
> 
> I think it should simply not be allowed for any use except the doloop
> uses at all?  You can have multiple ivs for the same loop just fine,
> right?  And costs will make everything work out, if the costs are set
> correctly?

Sure.  Upthread it was mentioned some targets can easily use the
counter IV in other IV uses so it's really a matter of costs.  That is,
IVOPTs generated "fake" RTL should, for doloop IVs, choose an
appropriate register so the target can do costing.

Richard.


[PATCH V2, rs6000] Support vrotr3 for int vector types

2019-07-23 Thread Kewen.Lin
Hi Segher,

Thanks for the clarification! 

Compared with the previous one, this add vrl_and define_insn(s)
for explicit AND (truncation) as Jakub's suggestion.

Bootstrapped and regtested on powerpc64le-unknown-linux-gnu, is it ok
for trunk?

Thanks,
Kewen


gcc/ChangeLog

2019-07-23  Kewen Lin  

* config/rs6000/predicates.md (vint_reg_or_const_vector): New predicate.
* config/rs6000/vector.md (vrotr3): New define_expand.
* config/rs6000/altivec.md (vrl_and): New define_insns.

gcc/testsuite/ChangeLog

2019-07-23  Kewen Lin  

* gcc.target/powerpc/vec_rotate-1.c: New test.
* gcc.target/powerpc/vec_rotate-2.c: New test.

on 2019/7/19 下午11:06, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Jul 19, 2019 at 10:21:06AM +0800, Kewen.Lin wrote:
>> on 2019/7/19 上午3:48, Segher Boessenkool wrote:
>>> On Thu, Jul 18, 2019 at 01:44:36PM +0800, Kewen.Lin wrote:
 on 2019/7/17 下午9:40, Segher Boessenkool wrote:
> On Wed, Jul 17, 2019 at 04:32:15PM +0800, Kewen.Lin wrote:
>> Regression testing just launched, is it OK for trunk if it's bootstrapped
>> and regresstested on powerpc64le-unknown-linux-gnu?
>
>> +;; Expanders for rotatert to make use of vrotl
>> +(define_expand "vrotr3"
>> +  [(set (match_operand:VEC_I 0 "vint_operand")
>> +(rotatert:VEC_I (match_operand:VEC_I 1 "vint_operand")
>> +  (match_operand:VEC_I 2 
>> "vint_reg_or_const_vector")))]
>
> Having any rotatert in a define_expand or define_insn will regress.
> 
> This is wrong.  I don't know why I thought this for a while.
> 
> There shouldn't be any rotatert in anything that goes through recog, but
> that is everything *except* define_expand.  So define_insn, define_split,
> define_peephole, define_peephole2 (and define_insn_and_split, which is
> just syntactic sugar).
> 
>> Thanks for further explanation!  Sorry that, but I didn't find this 
>> HAVE_rotatert definition.  I guess it's due to the preparation is always 
>> "DONE"?  Then it doesn't really generate rotatert. 
> 
> You only had one in a define_expand.  That is fine, that pattern is never
> recognised against.  HAVE_rotatert means that something somewhere will
> recognise rotatert RTL insns; if it isn't set, it doesn't make sense to
> ever create them, because they will never match.
> 
>> although I can see rotatert in insn like below, it seems fine in note?
> 
> Sure, many things are allowed in notes that can never show up in RTL
> proper.
> 
> So, this approach will work fine, and not be too bad.  Could you do a
> new patch with it?  It's simple to do, and even if the generic thing
> will happen eventually, this is a nice stepping stone for that.
> 
> Thanks, and sorry for the confusion,
> 
> 
> Segher
> 
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index b6a22d9010c..2b6a957d4a6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1666,6 +1666,60 @@
   "vrl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
+;; Here these vrl_and are for vrotr3 expansion.
+;; since SHIFT_COUNT_TRUNCATED is set as zero, to append one explicit
+;; AND to indicate truncation but emit vrl insn.
+(define_insn "vrlv2di_and"
+  [(set (match_operand:V2DI 0 "register_operand" "=v")
+   (and:V2DI
+ (rotate:V2DI (match_operand:V2DI 1 "register_operand" "v")
+(match_operand:V2DI 2 "register_operand" "v"))
+ (const_vector:V2DI [(const_int 63) (const_int 63)])))]
+  "VECTOR_UNIT_P8_VECTOR_P (V2DImode)"
+  "vrld %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv4si_and"
+  [(set (match_operand:V4SI 0 "register_operand" "=v")
+   (and:V4SI
+ (rotate:V4SI (match_operand:V4SI 1 "register_operand" "v")
+(match_operand:V4SI 2 "register_operand" "v"))
+ (const_vector:V4SI [(const_int 31) (const_int 31)
+ (const_int 31) (const_int 31)])))]
+  "VECTOR_UNIT_ALTIVEC_P (V4SImode)"
+  "vrlw %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv8hi_and"
+  [(set (match_operand:V8HI 0 "register_operand" "=v")
+   (and:V8HI
+ (rotate:V8HI (match_operand:V8HI 1 "register_operand" "v")
+(match_operand:V8HI 2 "register_operand" "v"))
+ (const_vector:V8HI [(const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)
+ (const_int 15) (const_int 15)])))]
+  "VECTOR_UNIT_ALTIVEC_P (V8HImode)"
+  "vrlh %0,%1,%2"
+  [(set_attr "type" "vecsimple")])
+
+(define_insn "vrlv16qi_and"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (and:V16QI
+   (rotate:V16QI (match_operand:V16QI 1 "register_operand" "v")
+  (match_operand:V16QI 2 "register_operand" "v"))
+   (const_vector:V16QI [(const_int 7) (const_int 7)
+   (const_int 7) (const_int 7)
+   

[PATCH v5 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-23 Thread Kewen.Lin
Hi Bin,

This patch follows your suggestion, to avoid use infinite cost iv cand to 
rewrite.
In order to allow other IV cands to be considered, zeroing the iv cand cost if 
its users are only doloop uses.  (See the typical case in previous reply.)

Could you please have a look?  Thanks in advance!


Kewen
-

gcc/ChangeLog

2019-07-23  Kewen Lin  

PR middle-end/80791
* target.def (have_count_reg_decr_p): New hook.
* doc/tm.texi.in (TARGET_HAVE_COUNT_REG_DECR_P): New hook.
* doc/tm.texi: Regenerate.
* config/rs6000/rs6000.c (rs6000_have_count_reg_decr_p): New function.
(TARGET_HAVE_COUNT_REG_DECR_P): New macro.
* tree-ssa-loop-ivopts.c (adjust_group_iv_cost_for_doloop): New 
function.
(find_doloop_use): Likewise.
(record_group): Init doloop_p.
(determine_group_iv_cost): Call adjust_group_iv_cost_for_doloop.
(tree_ssa_iv_optimize_loop): Call function have_count_reg_decr_p, 
generic_predict_doloop_p and find_doloop_use.
(generic_predict_doloop_p): Update attribute.
(iv_ca_set_no_cp): Adjust cand cost handling for doloop.
(iv_ca_set_cp): Likewise.
(iv_ca_new): Init n_cand_doloop_uses.
(iv_ca_free): Free n_cand_doloop_uses.

gcc/testsuite/ChangeLog

2019-07-23  Kewen Lin  

PR middle-end/80791
* gcc.dg/tree-ssa/ivopts-lt.c: Adjust.



diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6667cd0..e98aba9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1912,6 +1912,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_PREDICT_DOLOOP_P
 #define TARGET_PREDICT_DOLOOP_P rs6000_predict_doloop_p
 
+#undef TARGET_HAVE_COUNT_REG_DECR_P
+#define TARGET_HAVE_COUNT_REG_DECR_P true
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c2aa4d0..5477294 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11618,6 +11618,14 @@ loops, and will help ivopts to make some decisions.
 The default version of this hook returns false.
 @end deftypefn
 
+@deftypevr {Target Hook} bool TARGET_HAVE_COUNT_REG_DECR_P
+Return true if the target supports hardware count register for decrement
+and branch.  This count register can't be used as general register since
+moving to/from a general register from/to it is very expensive.
+For the targets with this support, ivopts can take doloop use as zero cost.
+The default value is false.
+@end deftypevr
+
 @deftypefn {Target Hook} bool TARGET_CAN_USE_DOLOOP_P (const widest_int 
@var{}, const widest_int @var{_max}, unsigned int 
@var{loop_depth}, bool @var{entered_at_top})
 Return true if it is possible to use low-overhead loops (@code{doloop_end}
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b4d57b8..5f43b27 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7946,6 +7946,8 @@ to by @var{ce_info}.
 
 @hook TARGET_PREDICT_DOLOOP_P
 
+@hook TARGET_HAVE_COUNT_REG_DECR_P
+
 @hook TARGET_CAN_USE_DOLOOP_P
 
 @hook TARGET_INVALID_WITHIN_DOLOOP
diff --git a/gcc/target.def b/gcc/target.def
index 71b6972..8a64e5b 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4246,6 +4246,16 @@ The default version of this hook returns false.",
  bool, (struct loop *loop),
  default_predict_doloop_p)
 
+DEFHOOKPOD
+(have_count_reg_decr_p,
+ "Return true if the target supports hardware count register for decrement\n\
+and branch.  This count register can't be used as general register since\n\
+moving to/from a general register from/to it is very expensive.\n\
+For the targets with this support, ivopts can take doloop use as zero cost.\n\
+The default value is false.",
+ bool, false)
+
+
 DEFHOOK
 (can_use_doloop_p,
  "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
index 7d5859b..3486e1a 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ivopts-lt.c
@@ -18,5 +18,5 @@ f1 (char *p, uintptr_t i, uintptr_t n)
 }
 
 /* { dg-final { scan-tree-dump-times "PHI" 1 "ivopts" } } */
-/* { dg-final { scan-tree-dump-times "PHI  vuses;
 };
@@ -612,6 +614,9 @@ struct ivopts_data
 
   /* Whether the loop body can only be exited via single exit.  */
   bool loop_single_exit_p;
+
+  /* Whether the loop has doloop comparison use.  */
+  bool doloop_use_p;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -630,6 +635,9 @@ struct iv_ca
   /* Number of times each candidate is used.  */
   unsigned *n_cand_uses;
 
+  /* How many doloop uses for each candidates.  */
+  unsigned *n_cand_doloop_uses;
+
   /* The candidates used.  */
   bitmap cands;
 
@@ -1528,6 +1536,7 @@ record_group (struct ivopts_data *data, enum 

Re: [PATCH v3 3/3] PR80791 Consider doloop cmp use in ivopts

2019-07-23 Thread Kewen.Lin
Hi Segher,

on 2019/7/23 上午5:43, Segher Boessenkool wrote:
> On Mon, Jul 22, 2019 at 09:18:10AM +0200, Richard Biener wrote:
>> On Mon, 22 Jul 2019, Segher Boessenkool wrote:
>>
>>> Hi!
>>>
>>> (Maybe I am missing half of the discussion -- sorry if so).
>>>
>>> I think we should have a new iv for just the doloop (which can have the
>>> same starting value and step and type as another iv).
>>>
>>> Has this been considered?
>>
>> I was also suggesting this (maybe with too many words ;)).  If
>> it's a doloop target add such IV (candidate!) which has zero
>> use-cost for the increment and compare but a (target configurable)
>> penalty for other uses.  Invasiveness of this approach is probably
>> that you need to distinguish this candidate by making it a new
>> kind (or maybe we can just have a specia candidate number...).
> 
> Or just set some (boolean) flag in the candidate.
> 
> I think it should simply not be allowed for any use except the doloop
> uses at all?  

For the targets where the iteration count doesn't sit in its hardware count
register, we may need to allow the IV to be used for other suitable uses?

> You can have multiple ivs for the same loop just fine,
> right?  

Yes.

> And costs will make everything work out, if the costs are set
> correctly?

There are some cases requiring to do IV elimination, it might require some
cost adjustment/tuning to keep this.  I met this when I did pre-bind the
BIV for it, if the dedicated IV has the best cost and is associated to
doloop use, it probably stops the others to merge.

If my understanding is correct, this is more like to transform the loop
into doloop pattern earlier, the penalty of mis-predication of doloop can
be more? Pros is the setup code sequence for iteration count happens in
middle-end, can be optimized better (RTL misses some range info).