C++ PATCH for c++/69688 (bogus error with -Wsign-compare)

2016-02-05 Thread Marek Polacek
This issue is similar to c++/68586 -- maybe_constant_value returned stale
value for a decl from the cache.  Fixed by clearing the caches when we
store init value for a decl.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-02-05  Marek Polacek  

PR c++/69688
* typeck2.c (store_init_value): Clear cv and fold caches.

* g++.dg/init/const12.C: New test.

diff --git gcc/cp/typeck2.c gcc/cp/typeck2.c
index 419faa2..737dfe4 100644
--- gcc/cp/typeck2.c
+++ gcc/cp/typeck2.c
@@ -783,6 +783,10 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   if (TREE_CODE (type) == ERROR_MARK)
 return NULL_TREE;
 
+  /* Here, DECL may change value; purge caches.  */
+  clear_fold_cache ();
+  clear_cv_cache ();
+
   if (MAYBE_CLASS_TYPE_P (type))
 {
   if (TREE_CODE (init) == TREE_LIST)
diff --git gcc/testsuite/g++.dg/init/const12.C 
gcc/testsuite/g++.dg/init/const12.C
index e69de29..2f6f9b2 100644
--- gcc/testsuite/g++.dg/init/const12.C
+++ gcc/testsuite/g++.dg/init/const12.C
@@ -0,0 +1,20 @@
+// PR c++/69688
+// { dg-do compile }
+// { dg-options "-Wsign-compare" }
+
+struct S
+{
+  static const int s;
+  static const char c[];
+  static wchar_t w[];
+
+  S ()
+{
+  for (int i = 0; i < s; i++)
+   w[i] = 0;
+}
+};
+
+const char S::c[] = "x";
+const int S::s = sizeof (S::c) - 1;
+wchar_t S::w[S::s];

Marek


Re: [PATCH][wwwdocs] Add blurb to top of gcc-6/changes.html

2016-02-05 Thread Kyrill Tkachov

Hi David,

On 05/02/16 15:24, David Malcolm wrote:

The attached patch adds a summary blurb to the top of the GCC 6 changes
page, emphasizing that there have been more improvements that we could
ever hope to enumerate, and providing prominent links to the porting
page, and to the general documentation (albeit to the "Development"
version for now, since we haven't yet branched).


I like the idea. I've been trying to get to the porting-to
page and always end up giving up and just typing the URL explicitly.

+
+This page is a brief summary of the huge number of improvements made to
+GCC in GCC 6.  For more information, see the
+Porting to GCC 6 page and the
+full GCC documentation.
+


I'd say "brief summary of some of the huge number of improvements..."
as we don't summarise all of the improvements made ;)

Cheers,
Kyrill


RE: [Patch, MIPS] Patch for PR 68400, a mips16 bug

2016-02-05 Thread Andrew Bennett


> -Original Message-
> From: Richard Sandiford [mailto:rdsandif...@googlemail.com]
> Sent: 03 February 2016 22:45
> To: Andrew Bennett
> Cc: Matthew Fortune; Steve Ellcey; gcc-patches@gcc.gnu.org;
> c...@codesourcery.com
> Subject: Re: [Patch, MIPS] Patch for PR 68400, a mips16 bug
> 
> Andrew Bennett  writes:
> >> -Original Message-
> >> From: Matthew Fortune
> >> Sent: 30 January 2016 16:46
> >> To: Richard Sandiford; Steve Ellcey
> >> Cc: gcc-patches@gcc.gnu.org; c...@codesourcery.com; Andrew Bennett
> >> Subject: RE: [Patch, MIPS] Patch for PR 68400, a mips16 bug
> >>
> >> Richard Sandiford  writes:
> >> > "Steve Ellcey "  writes:
> >> > > Here is a patch for PR6400.  The problem is that and_operands_ok was
> >> checking
> >> > > one operand to see if it was a memory_operand but MIPS16 addressing is
> >> more
> >> > > restrictive than what the general memory_operand allows.  The fix was
> to
> >> > > call mips_classify_address if TARGET_MIPS16 is set because it will do a
> >> > > more complete mips16 addressing check and reject operands that do not
> meet
> >> > > the more restrictive requirements.
> >> > >
> >> > > I ran the GCC testsuite with no regressions and have included a test
> case
> >> as
> >> > > part of this patch.
> >> > >
> >> > > OK to checkin?
> >> > >
> >> > > Steve Ellcey
> >> > > sell...@imgtec.com
> >> > >
> >> > >
> >> > > 2016-01-26  Steve Ellcey  
> >> > >
> >> > >PR target/68400
> >> > >* config/mips/mips.c (and_operands_ok): Add MIPS16 check.
> >> > >
> >> > >
> >> > >
> >> > > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> >> > > index dd54d6a..adeafa3 100644
> >> > > --- a/gcc/config/mips/mips.c
> >> > > +++ b/gcc/config/mips/mips.c
> >> > > @@ -8006,9 +8006,18 @@ mask_low_and_shift_p (machine_mode mode, rtx
> mask,
> >> rtx shift, int
> >> > maxlen)
> >> > >  bool
> >> > >  and_operands_ok (machine_mode mode, rtx op1, rtx op2)
> >> > >  {
> >> > > -  return (memory_operand (op1, mode)
> >> > > -? and_load_operand (op2, mode)
> >> > > -: and_reg_operand (op2, mode));
> >> > > +
> >> > > +  if (memory_operand (op1, mode))
> >> > > +{
> >> > > +  if (TARGET_MIPS16) {
> >> > > +  struct mips_address_info addr;
> >> > > +  if (!mips_classify_address (, op1, mode, false))
> >> > > +return false;
> >> > > +  }
> >> >
> >> > Nit: brace formatting.
> >> >
> >> > It looks like the patch only works by accident.  The code above
> >> > is passing the MEM, rather than the address inside the MEM, to
> >> > mips_classify_address.  Since (mem (mem ...)) is never valid on MIPS,
> >> > the effect is to disable the memory alternatives of *and3_mips16
> >> > unconditionally.
> >> >
> >> > The addresses that occur in the testcase are valid as far as
> >> > mips_classify_address is concerned.  FWIW, there shouldn't be any
> >> > difference between the addresses that memory_operand accepts and the
> >> > addresses that mips_classify_address accepts.
> >> >
> >> > In theory, the "W" constraints in *and3_mips16 are supposed to
> >> > tell the target-independent code that this instruction cannot handle
> >> > constant addresses or stack-based addresses.  That seems to be working
> >> > correctly during RA for the testcase.  The problem comes in regcprop,
> >> > which ends up creating a second stack pointer rtx distinct from
> >> > stack_pointer_rtx:
> >> >
> >> > (reg/f:SI 29 $sp [375])
> >> >
> >> > (Note the ORIGINAL_REGNO of 375.)  This then defeats the test in
> >> > mips_stack_address_p:
> >> >
> >> > bool
> >> > mips_stack_address_p (rtx x, machine_mode mode)
> >> > {
> >> >   struct mips_address_info addr;
> >> >
> >> >   return (mips_classify_address (, x, mode, false)
> >> >&& addr.type == ADDRESS_REG
> >> >&& addr.reg == stack_pointer_rtx);
> >> > }
> >> >
> >> > Change the == to rtx_equal_p and the test passes.  I don't think that's
> >> > the correct fix though -- the fix is to stop a second stack pointer rtx
> >> > from being created.
> >>
> >> Agreed, though I'm inclined to say do both. We actually hit this
> >> same issue while testing some 4.9.2 based tools recently but I hadn't
> >> got confirmation from Andrew (cc'd) whether it was definitely the same
> >> issue. Andrew fixed this by updating all tests against stack_pointer_rtx
> >> to compare register numbers instead (but rtx_equal_p is better still).
> 
> It looks from the patch like it's only "all" for the MIPS target.
> Target-independent code would continue to expect pointer equality.
> 
> So sorry to be awkward, but I really don't think it's a good idea
> to do both.  If we want to allow more than one stack pointer rtx,
> we should do it consistently across the codebase rather than in
> specific parts of one target.  And if we do that, there's no
> need to "fix" the regcprop.c issue; we'd then have redefined
> things so that the 

[PATCH] S/390: PR 69625: Add test case

2016-02-05 Thread Dominik Vogt
The attached patch adds a testcase for PR 69625.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
ChangeLog

* gcc.target/s390/pr69625.c: Add test case.
>From 08df803a901078cabca938d0a7d0d6120b7c6132 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Fri, 5 Feb 2016 15:13:08 +0100
Subject: [PATCH] S/390: PR 69625: Add test case.

---
 gcc/testsuite/gcc.target/s390/pr69625.c | 37 +
 1 file changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr69625.c

diff --git a/gcc/testsuite/gcc.target/s390/pr69625.c b/gcc/testsuite/gcc.target/s390/pr69625.c
new file mode 100644
index 000..4536307
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr69625.c
@@ -0,0 +1,37 @@
+/* Test for PR 69625; make sure that a leaf vararg function does not overwrite
+   the caller's r6.  */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+extern void abort (void);
+
+__attribute__ ((noinline))
+int
+foo (int x, ...)
+{
+  __builtin_va_list vl;
+  int i;
+
+  __asm__ __volatile__ ("lghi %%r6,0" : : : "r6");
+  __builtin_va_start(vl, x);
+  for (i = 2; i <= 6; i++)
+x += __builtin_va_arg(vl, int);
+  __builtin_va_end (vl);
+
+  return x;
+}
+
+__attribute__ ((noinline))
+void
+bar (int r2, int r3, int r4, int r5, int r6)
+{
+  foo (r2, r3, r4, r5, r6);
+  if (r6 != 6)
+abort ();
+}
+
+int
+main (void)
+{
+  bar (2, 3, 4, 5, 6);
+}
-- 
2.3.0



Re: [PATCH] Fix c/69643, named address space wrong-code

2016-02-05 Thread Tom Tromey
> "rth" == Richard Henderson  writes:

rth> The user-friendly way to do this would probably be some sort of pragma
rth> that allows user-defined address spaces, and user-defined conversion
rth> between them. But that's certainly not going to happen in the
rth> near-term.

Related is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850

When I was playing with this I think I tried initially to base the
user-defined address spaces on the built-in support.  I think I started
by trying to save part of the number space for user-defined spaces.
However, this got messy and in the end I just went with a separate
attribute... not very nice really.

Tom


Re: [PATCH] add basic .gitattributes files to notice whitespace issues

2016-02-05 Thread David Malcolm
On Thu, 2016-02-04 at 21:00 -0500, tbsaunde+...@tbsaunde.org wrote:
> From: Trevor Saunders 
> 
> Hi,
> 
> We can tell git to highlight whitespace errors in diffs, and if you
> enable the
> default pre-commit hook git won't allow you to make a commit with a
> whitespace
> error violating the rules you told it about.  These files as are
> could be
> improved some, they don't enforce  whitespace rules on testsuite .exp
> files, and
> I'm not sure if we want to allow whitespace errors in testsuites for
> libraries,
> but I'd like to see if other people can suggest other improvements.

Does this only apply to changed lines in a commit?

Does the gcc/testsuite/.gitattributes file fully disable the top-level
one within gcc/testsuite?  Note that in the testsuite we'd want to have
the ability to have testcases with poor whitespace: in particular 
-Wmisleading-indentation needs to be able to be tested with poor
whitespace, and I suspect we'll want to add testcases for how well diag
nostics cope with mixed tabs and spaces etc.

> gcc/testsuite/ChangeLog:
> 
> 2016-02-04  Trevor Saunders  
> 
>   * .gitattributes: New file.
> 
> ChangeLog:
> 
> 2016-02-04  Trevor Saunders  
> 
>   * .gitattributes: New file.
> ---
>  .gitattributes   | 1 +
>  gcc/testsuite/.gitattributes | 1 +
>  2 files changed, 2 insertions(+)
>  create mode 100644 .gitattributes
>  create mode 100644 gcc/testsuite/.gitattributes
> 
> diff --git a/.gitattributes b/.gitattributes
> new file mode 100644
> index 000..b38d7f1
> --- /dev/null
> +++ b/.gitattributes
> @@ -0,0 +1 @@
> +*.{c,C,cc,h} whitespace=indent-with-non-tab,space-before
> -tab,trailing-space
> diff --git a/gcc/testsuite/.gitattributes
> b/gcc/testsuite/.gitattributes
> new file mode 100644
> index 000..562b12e
> --- /dev/null
> +++ b/gcc/testsuite/.gitattributes
> @@ -0,0 +1 @@
> +* -whitespace


RE: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-05 Thread Claudiu Zissulescu
> P.S.: if code that is missing prototypes for stdarg functions is of no 
> concern,
> there is another ABI alternative that might give good code density for
> architectures like ARC that have pre-decrement addressing modes and allow
> immediates to be pushed:
> 
> You could put all unnamed arguments on the stack (thus simplifying varargs
> processing), and leave all registers not used for argument passing call-saved.
> Thus, the callers wouldn't have to worry about saving these registers or
> reloading their values from the stack.
> 
> For gcc, this would require making the call fusage really work - probably
> involving a hook to tell the middle-end that the port really wants that - or a
> kludge to make affected call insn not look like call insns, similar to the 
> sfuncs.

Unfortunately, we need to be compatible with the previous ABI for the time 
being.
I am now investigating passing the DI like modes in non even-odd registers. The 
biggest challenge is how to pass such a mode partially, without introducing 
odd/even register classes.


Re: [PING, PATCH] PR/68089: C++-11: Ingore "alignas(0)".

2016-02-05 Thread Dominik Vogt
Can this be approved?

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69089

On Mon, Jan 04, 2016 at 12:33:21PM +0100, Dominik Vogt wrote:
> On Fri, Jan 01, 2016 at 05:53:08PM -0700, Martin Sebor wrote:
> > On 12/31/2015 04:50 AM, Dominik Vogt wrote:
> > >The attached patch fixes C++-11 handling of "alignas(0)" which
> > >should be ignored but currently generates an error message.  A
> > >test case is included; the patch has been tested on S390x.  Since
> > >it's a language issue it should be independent of the backend
> > >used.
> > 
> > The patch doesn't handle value-dependent expressions(*).
> 
> > It
> > seems that the problem is in handle_aligned_attribute() calling
> > check_user_alignment() with the second argument (ALLOW_ZERO)
> > set to false.  Calling it with true fixes the problem and handles
> > value-dependent expressions (I haven't done any more testing beyond
> > that).
> 
> Like the attached patch?  (Passes the testsuite on s390x.)
> 
> But wouldn't an "aligned" attribute be added, allowing the backend
> to possibly generate an error or a warning?

> gcc/c-family/ChangeLog
> 
>   PR/69089
>   * c-common.c (handle_aligned_attribute): Allow 0 as an argument to the
>   "aligned" attribute.
> 
> gcc/testsuite/ChangeLog
> 
>   PR/69089
>   * g++.dg/cpp0x/alignas5.C: New test.

> >From 2461293b9070da74950fd0ae055d1239cc69ce67 Mon Sep 17 00:00:00 2001
> From: Dominik Vogt 
> Date: Wed, 30 Dec 2015 15:08:52 +0100
> Subject: [PATCH] C++-11: Ingore "alignas(0)" instead of generating an
>  error message.
> 
> This is required by the C++-11 standard.
> ---
>  gcc/c-family/c-common.c   |  2 +-
>  gcc/testsuite/g++.dg/cpp0x/alignas5.C | 29 +
>  2 files changed, 30 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/alignas5.C
> 
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 653d1dc..9eb25a9 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -7804,7 +7804,7 @@ handle_aligned_attribute (tree *node, tree ARG_UNUSED 
> (name), tree args,
>else if (TYPE_P (*node))
>  type = node, is_type = 1;
>  
> -  if ((i = check_user_alignment (align_expr, false)) == -1
> +  if ((i = check_user_alignment (align_expr, true)) == -1
>|| !check_cxx_fundamental_alignment_constraints (*node, i, flags))
>  *no_add_attrs = true;
>else if (is_type)
> diff --git a/gcc/testsuite/g++.dg/cpp0x/alignas5.C 
> b/gcc/testsuite/g++.dg/cpp0x/alignas5.C
> new file mode 100644
> index 000..f3252a9
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/alignas5.C
> @@ -0,0 +1,29 @@
> +// PR c++/69089
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-Wno-attributes" }
> +
> +alignas (0) int valid1;
> +alignas (1 - 1) int valid2;
> +struct Tvalid
> +{
> +  alignas (0) int i;
> +  alignas (2 * 0) int j;
> +};
> +
> +alignas (-1) int invalid1; /* { dg-error "not a positive power of 2" } */
> +alignas (1 - 2) int invalid2; /* { dg-error "not a positive power of 2" } */
> +struct Tinvalid
> +{
> +  alignas (-1) int i; /* { dg-error "not a positive power of 2" } */
> +  alignas (2 * 0 - 1) int j; /* { dg-error "not a positive power of 2" } */
> +};
> +
> +template  struct TNvalid1 { alignas (N) int i; };
> +TNvalid1<0> SNvalid1;
> +template  struct TNvalid2 { alignas (N) int i; };
> +TNvalid2<1 - 1> SNvalid2;
> +
> +template  struct TNinvalid1 { alignas (N) int i; }; /* { dg-error 
> "not a positive power of 2" } */
> +TNinvalid1<-1> SNinvalid1;
> +template  struct TNinvalid2 { alignas (N) int i; }; /* { dg-error 
> "not a positive power of 2" } */
> +TNinvalid2<1 - 2> SNinvalid2;
> -- 
> 2.3.0
> 

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



[PATCH][wwwdocs] Add blurb to top of gcc-6/changes.html

2016-02-05 Thread David Malcolm
The attached patch adds a summary blurb to the top of the GCC 6 changes
page, emphasizing that there have been more improvements that we could
ever hope to enumerate, and providing prominent links to the porting
page, and to the general documentation (albeit to the "Development"
version for now, since we haven't yet branched).

Validates.

OK to commit?

DaveIndex: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.56
diff -u -p -r1.56 changes.html
--- htdocs/gcc-6/changes.html	3 Feb 2016 22:57:05 -	1.56
+++ htdocs/gcc-6/changes.html	5 Feb 2016 15:25:04 -
@@ -11,6 +11,13 @@
 
 GCC 6 Release SeriesChanges, New Features, and Fixes
 
+
+This page is a brief summary of the huge number of improvements made to
+GCC in GCC 6.  For more information, see the
+Porting to GCC 6 page and the
+full GCC documentation.
+
+
 
 Caveats
   


Re: [PATCH] Fix up bootstrap on i686 --with-arch=corei7 --with-fpmath=sse (PR bootstrap/69677)

2016-02-05 Thread H.J. Lu
On Fri, Feb 5, 2016 at 1:11 AM, Uros Bizjak  wrote:
> On Fri, Feb 5, 2016 at 12:04 AM, Jakub Jelinek  wrote:
>> Hi!
>>
>> As mentioned in the PR, the convert_scalars_to_vector hunk is important,
>> without that we e.g. miscompile simplify-rtx.c.
>> The following patch restores that hunk and extends disabling of TARGET_STV
>> also for the 64-bit, but not 128-bit, aligned preferred or incoming stack
>> boundaries (also non-default).
>>
>> Bootstrapped/regtested on i686-linux --with-arch=corei7 --with-tune=corei7
>> --with-fpmath=sse.
>>
>> Alternatively, it is enough to just adjust stack_alignment_estimated in
>> there and keep stack_alignment_needed as is, that version has also been
>> successfully bootstrapped on i686-linux --with-arch=corei7
>> --with-tune=corei7 --with-fpmath=sse.
>>
>> 2016-02-04  Jakub Jelinek  
>>
>> PR bootstrap/69677
>> * config/i386/i386.c (convert_scalars_to_vector): Readd stack
>> alignment fixes.
>> (ix86_option_override_internal): Disable TARGET_STV even for
>> -m{incoming,preferred}-stack-boundary=3.
>
> Let's go with this patch to resolve the bootstrap problem. As said
> earlier, after gcc-6 is released, we can fix the problem in a proper
> way.
>
> OK.
>
> Thanks,
> Uros.
>
>> --- gcc/config/i386/i386.c.jj   2016-02-04 18:59:38.309204574 +0100
>> +++ gcc/config/i386/i386.c  2016-02-04 21:54:02.439904261 +0100
>> @@ -3588,6 +3588,16 @@ convert_scalars_to_vector ()
>>bitmap_obstack_release (NULL);
>>df_process_deferred_rescans ();
>>
>> +  /* Conversion means we may have 128bit register spills/fills
>> + which require aligned stack.  */
>> +  if (converted_insns)
>> +{
>> +  if (crtl->stack_alignment_needed < 128)
>> +   crtl->stack_alignment_needed = 128;
>> +  if (crtl->stack_alignment_estimated < 128)
>> +   crtl->stack_alignment_estimated = 128;
>> +}
>> +
>>return 0;
>>  }
>> @@ -5443,12 +5453,12 @@ ix86_option_override_internal (bool main
>>  opts->x_target_flags |= MASK_VZEROUPPER;
>>if (!(opts_set->x_target_flags & MASK_STV))
>>  opts->x_target_flags |= MASK_STV;
>> -  /* Disable STV if -mpreferred-stack-boundary=2 or
>> - -mincoming-stack-boundary=2 - the needed
>> +  /* Disable STV if -mpreferred-stack-boundary={2,3} or
>> + -mincoming-stack-boundary={2,3} - the needed
>>   stack realignment will be extra cost the pass doesn't take into
>>   account and the pass can't realign the stack.  */
>> -  if (ix86_preferred_stack_boundary < 64
>> -  || ix86_incoming_stack_boundary < 64)
>> +  if (ix86_preferred_stack_boundary < 128
>> +  || ix86_incoming_stack_boundary < 128)
>>  opts->x_target_flags &= ~MASK_STV;
>>if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL]
>>&& !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
>>
>> Jakub

I checked in this testcase.

-- 
H.J.
--
Index: ChangeLog
===
--- ChangeLog (revision 233179)
+++ ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2016-02-05  H.J. Lu  
+
+ PR target/69677
+ * gcc.target/i386/pr69677.c: New test.
+
 2016-02-05  Patrick Palka  

  PR c++/68948
Index: gcc.target/i386/pr69677.c
===
--- gcc.target/i386/pr69677.c (nonexistent)
+++ gcc.target/i386/pr69677.c (working copy)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { ia32 } } } */
+/* { dg-options "-O2 -mno-avx -march=corei7 -fdump-rtl-final" } */
+
+extern const unsigned int a[];
+extern const unsigned long long b[];
+
+int
+fn1 (unsigned int p1, unsigned long long p2)
+{
+  unsigned int p3;
+
+  p3 = a[p1];
+  if (p3 == 0 || p3 > 64)
+return 0;
+
+  p2 &= b[p1];
+  return p2 == ((unsigned long long) 1 << (p3 - 1));
+}
+
+// { dg-final { scan-rtl-dump-not "S16
A64\[^\n\]*\\\*movv2di_internal" "final" } }


Re: [PATCH PR69652, Regression]

2016-02-05 Thread Yuri Rumyantsev
Hi All,

Here is updated patch - I came back to move call statements also since
 masked loads are presented by internal call. I also assume that for
the following simple loop
  for (i = 0; i < n; i++)
if (b1[i])
  a1[i] = sqrtf(a2[i] * a2[i] + a3[i] * a3[i]);
motion must be done for all vector statements in semi-hammock including SQRT.

Bootstrap and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:

2016-02-05  Yuri Rumyantsev  

PR tree-optimization/69652
* tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
skipped scalar statements, introduce variable LAST_VUSE to keep
vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
begining of current masked store processing, did source re-formatting,
skip parsing of debug gimples, stop processing if a gimple with
volatile operand has been encountered, save scalar statement
with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
iterator, change vuse of all saved scalar statements to LAST_VUSE if
it makes sence.

gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr69652.c: New test.

2016-02-04 19:40 GMT+03:00 Jakub Jelinek :
> On Thu, Feb 04, 2016 at 05:46:27PM +0300, Yuri Rumyantsev wrote:
>> Here is a patch that cures the issues with non-correct vuse for scalar
>> statements during code motion, i.e. if vuse of scalar statement is
>> vdef of masked store which has been sunk to new basic block, we must
>> fix it up.  The patch also fixed almost all remarks pointed out by
>> Jacub.
>>
>> Bootstrapping and regression testing on v86-64 did not show any new failures.
>> Is it OK for trunk?
>>
>> ChangeLog:
>> 2016-02-04  Yuri Rumyantsev  
>>
>> PR tree-optimization/69652
>> * tree-vect-loop.c (optimize_mask_stores): Move declaration of STMT1
>> to nested loop, introduce new SCALAR_VUSE vector to keep vuse of all
>> skipped scalar statements, introduce variable LAST_VUSE that has
>> vuse of LAST_STORE, add assertion that SCALAR_VUSE is empty in the
>> begining of current masked store processing, did source re-formatting,
>> skip parsing of debug gimples, stop processing when call or gimple
>> with volatile operand habe been encountered, save scalar statement
>> with vuse in SCALAR_VUSE, skip processing debug statements in IMM_USE
>> iterator, change vuse of all saved scalar statements to LAST_VUSE if
>> it makes sence.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/torture/pr69652.c: New test.
>
> Your mailer breaks ChangeLog formatting, so it is hard to check the
> formatting of the ChangeLog entry.
>
> diff --git a/gcc/testsuite/gcc.dg/torture/pr69652.c 
> b/gcc/testsuite/gcc.dg/torture/pr69652.c
> new file mode 100644
> index 000..91f30cf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr69652.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -ffast-math -ftree-vectorize " } */
> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */
> +
> +void fn1(double **matrix, int column, int row, int n)
> +{
> +  int k;
> +  for (k = 0; k < n; k++)
> +if (matrix[row][k] != matrix[column][k])
> +  {
> +   matrix[column][k] = -matrix[column][k];
> +   matrix[row][k] = matrix[row][k] - matrix[column][k];
> +  }
> +}
> \ No newline at end of file
>
> Please make sure the last line of the test is a new-line.
>
> @@ -6971,6 +6972,8 @@ optimize_mask_stores (struct loop *loop)
>gsi_next ())
> {
>   stmt = gsi_stmt (gsi);
> + if (is_gimple_debug (stmt))
> +   continue;
>   if (is_gimple_call (stmt)
>   && gimple_call_internal_p (stmt)
>   && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
>
> This is not needed, you do something only for is_gimple_call,
> which is never true if is_gimple_debug, so the code used to be fine as is.
>
> + /* Skip debug sstatements.  */
>
> s/ss/s/
>
> + if (is_gimple_debug (gsi_stmt (gsi)))
> +   continue;
> + stmt1 = gsi_stmt (gsi);
> + /* Do not consider writing to memory,volatile and call
>
> Missing space after ,
>
> + /* Skip scalar statements.  */
> + if (!VECTOR_TYPE_P (TREE_TYPE (lhs)))
> +   {
> + /* If scalar statement has vuse we need to modify it
> +when another masked store will be sunk.  */
> + if (gimple_vuse (stmt1))
> +   scalar_vuse.safe_push (stmt1);
>   continue;
> +   }
>
> I don't think it is safe to take for granted that the scalar stmts are all
> going to be DCEd, but I could be wrong.
>
> + /* Check that LHS does not have uses outside of STORE_BB.  */
> + res = true;
> + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs)
> +   {
> + gimple 

Re: [PATCH 3/3] [RFC] Treat a gimplification failure as an internal error

2016-02-05 Thread Patrick Palka
On Thu, Jan 14, 2016 at 4:31 PM, Jeff Law  wrote:
> On 01/10/2016 08:20 PM, Patrick Palka wrote:
>>
>> On Thu, Dec 31, 2015 at 10:40 AM, Patrick Palka 
>> wrote:
>>>
>>> This patch makes it so that a gimplification failure is considered to be
>>> an internal error under normal circumstances, so that we otherwise avoid
>>> silently generating wrong code if e.g. a buggy frontend fed us a
>>> malformed tree.
>>>
>>> The rationale for this change is that it's better to abort compilation
>>> than to silently generate wrong code.  During gimplification we should
>>> only see e.g. an error_mark_node if the frontend has already issued an
>>> error.  Otherwise it is likely a bug in frontend.
>>>
>>> This patch, for example, turns the PR c++/68948 wrong-code bug into an
>>> ICE on invalid bug.  During testing it also caught two latent "bugs"
>>> (patches 1 and 2 in this series).
>>>
>>> This series was tested on x86_64-pc-linux-gnu, with
>>> --enable-languages=all,ada,go,
>>> no new regressions.
>>>
>>> Does this seem like a reasonable invariant to add to the gimplifier?
>>>
>>> gcc/cp/ChangeLog:
>>>
>>>  * cp-gimplify.c (gimplify_expr_stmt): Don't convert an
>>>  error_mark_node to an empty statement.
>
> So this passes any such error_mark_nodes through to the gimplifier, which
> will give us a nice error.  Right?

(Sorry for the late reply..)

Yes, this change to gimplify_expr_stmt() and the change made to
gimplfy_decl_expr() are to make sure that we propagate any relevant
internal error_mark_nodes to gimplify_expr(), which will trigger the
assertion therein.  This one is particular is pretty important since
the C++ FE seems to make a lot of EXPR_STMTs.

>
>>>
>>> gcc/ChangeLog:
>>>
>>>  * gimplify.c (gimplify_return_expr): Remove a redundant test
>>>  for error_mark_node.

This one is just a simplification -- earlier in the function we have
already tested for error_mark_node.

>>>  (gimplify_decl_expr): Return GS_ERROR if an initializer is an
>>>  error_mark_node.
>>>  (gimplify_expr): Treat a gimplification failure as an internal
>>>  error.  Remove now-redundant GIMPLE_CHECKING checking code.
>
> I'd generally be in favor of a change like this; I don't offhand recall any
> rationale behind allowing gimplification to continue after hitting an error.
>
> My worry is that we're potentially opening ourselves up to a slew of ICEs as
> the gimplifier as a whole I don't think has been audited to ensure that it
> handles error_mark_node is a sane fashion.
>
> So I'd tend to want to wait for the next stage1.

No problem, I will make sure to ping this patch then.


Re: [PATCH] add basic .gitattributes files to notice whitespace issues

2016-02-05 Thread Trevor Saunders
On Fri, Feb 05, 2016 at 10:33:51AM -0500, David Malcolm wrote:
> On Thu, 2016-02-04 at 21:00 -0500, tbsaunde+...@tbsaunde.org wrote:
> > From: Trevor Saunders 
> > 
> > Hi,
> > 
> > We can tell git to highlight whitespace errors in diffs, and if you
> > enable the
> > default pre-commit hook git won't allow you to make a commit with a
> > whitespace
> > error violating the rules you told it about.  These files as are
> > could be
> > improved some, they don't enforce  whitespace rules on testsuite .exp
> > files, and
> > I'm not sure if we want to allow whitespace errors in testsuites for
> > libraries,
> > but I'd like to see if other people can suggest other improvements.
> 
> Does this only apply to changed lines in a commit?

yes

> Does the gcc/testsuite/.gitattributes file fully disable the top-level
> one within gcc/testsuite?  Note that in the testsuite we'd want to have
> the ability to have testcases with poor whitespace: in particular 

yes, and this is the exact reason I did it.  Though I suspect we could
be stricter about the .exp files and the ChangeLogs, but something is
better than nothing.

Trev

> -Wmisleading-indentation needs to be able to be tested with poor
> whitespace, and I suspect we'll want to add testcases for how well diag
> nostics cope with mixed tabs and spaces etc.
> 
> > gcc/testsuite/ChangeLog:
> > 
> > 2016-02-04  Trevor Saunders  
> > 
> > * .gitattributes: New file.
> > 
> > ChangeLog:
> > 
> > 2016-02-04  Trevor Saunders  
> > 
> > * .gitattributes: New file.
> > ---
> >  .gitattributes   | 1 +
> >  gcc/testsuite/.gitattributes | 1 +
> >  2 files changed, 2 insertions(+)
> >  create mode 100644 .gitattributes
> >  create mode 100644 gcc/testsuite/.gitattributes
> > 
> > diff --git a/.gitattributes b/.gitattributes
> > new file mode 100644
> > index 000..b38d7f1
> > --- /dev/null
> > +++ b/.gitattributes
> > @@ -0,0 +1 @@
> > +*.{c,C,cc,h} whitespace=indent-with-non-tab,space-before
> > -tab,trailing-space
> > diff --git a/gcc/testsuite/.gitattributes
> > b/gcc/testsuite/.gitattributes
> > new file mode 100644
> > index 000..562b12e
> > --- /dev/null
> > +++ b/gcc/testsuite/.gitattributes
> > @@ -0,0 +1 @@
> > +* -whitespace


[PATCH, testsuite]: Require avx_runtime target for gcc.target/i386/pr69577.c

2016-02-05 Thread Uros Bizjak
This is a runtime test and requires correct runtime support.

2016-02-05  Uros Bizjak  

* gcc.target/i386/pr69577.c: Require avx_runtime target.

Tested on x86_64-linux-gnu AVX target and committed to mainline SVN.

Uros.
Index: gcc.target/i386/pr69577.c
===
--- gcc.target/i386/pr69577.c   (revision 233143)
+++ gcc.target/i386/pr69577.c   (working copy)
@@ -1,5 +1,4 @@
-/* { dg-do run } */
-/* { dg-require-effective-target avx } */
+/* { dg-do run { target avx_runtime } } */
 /* { dg-require-effective-target int128 } */
 /* { dg-options "-O -fno-forward-propagate -fno-split-wide-types -mavx" } */
 


Re: [PATCH, testsuite]: Move gcc.dg/sancov/asan.c to gcc.dg/asan/sancov-1.c

2016-02-05 Thread Uros Bizjak
On Thu, Feb 4, 2016 at 2:21 PM, Uros Bizjak  wrote:
> On Thu, Feb 4, 2016 at 1:49 PM, Andreas Schwab  wrote:
>
 OTOH, does this testcase even gets a chance to run?
>>>
>>> It's not a runtime check.
>>
>> But it didn't link until today:
>>
>> Running /opt/gcc/gcc-20160203/gcc/testsuite/gcc.dg/sancov/sancov.exp ...
>> Executing on host: /opt/gcc/gcc-20160203/Build/gcc/xgcc 
>> -B/opt/gcc/gcc-20160203/Build/gcc/ fsanitize_address29346.c 
>> -fno-diagnostics-show-caret -fdiagnostics-color=never  -fsanitize=address  
>> -lm  -o fsanitize_address29346.exe(timeout = 300)
>> spawn -ignore SIGHUP /opt/gcc/gcc-20160203/Build/gcc/xgcc 
>> -B/opt/gcc/gcc-20160203/Build/gcc/ fsanitize_address29346.c 
>> -fno-diagnostics-show-caret -fdiagnostics-color=never -fsanitize=address -lm 
>> -o fsanitize_address29346.exe.
>> /usr/aarch64-suse-linux/bin/ld: cannot find libasan_preinit.o: No such file 
>> or directory.
>> /usr/aarch64-suse-linux/bin/ld: cannot find -lasan.
>> collect2: error: ld returned 1 exit status.
>> compiler exited with status 1
>> output is:
>> /usr/aarch64-suse-linux/bin/ld: cannot find libasan_preinit.o: No such file 
>> or directory.
>> /usr/aarch64-suse-linux/bin/ld: cannot find -lasan.
>> collect2: error: ld returned 1 exit status.
>>
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O0
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O1
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O2
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O3
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O0 -g
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O1 -g
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O2 -g
>> UNSUPPORTED: gcc.dg/sancov/asan.c   -O3 -g
>
> We have to move the testcase from sancov to asan directory, otherwise
> the test won't work (and it didn't work even on x86_64, since the
> effective target test always failed).
>
> 2016-02-04  Uros Bizjak  
>
> * gcc.dg/sancov/asan.c: Move to ...
> * gcc.dg/asan/sancov-1.c: ... here.
>
> The patch was tested on x86_64-linux-gnu and I have double checked
> that the testcase passes.

Now committed to mainline SVN.

Uros.


Re: [PATCH] tweak -Wplacement-new to fix #69662

2016-02-05 Thread Martin Sebor

struct A
{
   int i, ar[];
};

int main()
{
   int k = 24;
   struct A a = { 1, 2, 3, 4 };
   int j = 42;
   return a.ar[1];
}

G++ accepts this testcase and happily puts k and j in the same stack
slots as elements of a.ar[], while GCC rejects it.  We shouldn't accept
it.  In any case, we should usually follow the C front end's lead on
this C compatibility feature.


I agree with that approach.  Going forward, after the 6.0 release,
I'd like to get back to this area and fix these remaining problems
and where it makes sense tighten up the C++ requirements to bring
them closer to C's.

But since GCC does allow global/static objects of structs with
flexible array members to be initialized, and (presumably) will
continue to even after the above is rejected in C++, the code
in the patch that detects overflowing such variables will continue
to serve its purpose.  By way of an example, I this is diagnosed
with the patch and should continue to be:

  typedef __typeof__ (sizeof 0) size_t;

  void* operator new (size_t, void *p) { return p; }
  void* operator new[] (size_t, void *p) { return p; }

  struct Ax { char n, a []; } ax = { 1, { 2, 3, 4 } };

  void foo () {
new (ax.a) short;
new (ax.a) int;
  }
  t.c:6:51: warning: initialization of a flexible array member [-Wpedantic]
   struct Ax { char n, a []; } ax = { 1, { 2, 3, 4 } };
   ^
  t.c: In function ‘void foo()’:
  t.c:10:16: warning: placement new constructing an object of type 
‘int’ and size ‘4’ in a region of type ‘char []’ and size ‘3’ 
[-Wplacement-new=]

   new (ax.a) int;
  ^~~

That said, I'm interested in improving the -Wplacement-new warning
after 6.0 is done.  But unless there is a problem with the patch,
I would like to (need to) get back to my other projects that have
been put on hold to help with the release.

Martin

PS As an aside, I believe the root cause of the bug in your test
case is the same as in 28865 that was fixed in the C front end
just a couple of years ago by rejecting initialization of auto
variables with flexible array members(*).  An alternate solution
would have been to correct the .size directive to reflect the
size of the object rather than the size of the type and continue
to accept the construct.

[*] GCC in C mode still allows the following which has the same
problem as 28865.

struct A
{
  int i, ar[];
} aa = { 1, 2, 3, 4 };

int main()
{
  int k = 24;
  struct A a = aa;
  int j = 42;
  return a.ar[1];
}



[PATCH] Fix up move_plus_up (PR rtl-optimization/69691)

2016-02-05 Thread Jakub Jelinek
Hi!

As mentioned in the PR, move_plus_up on
(subreg:SI (plus:DI (reg/f:DI 20 frame)
(const_int 16 [0x10])) 0)
returns
(plus:SI (plus:SI (subreg:SI (reg/f:DI 20 frame) 0)
(const_int 16 [0x10]))
(const_int 16 [0x10]))
which is wrong, the original added just 16, but the returned
rtx adds double that.
The problem is that subreg_reg is (verified in the conditions)
a PLUS, and we want to sum up the lowpart of the PLUS first operand
with CST, which is the lowpart of the second PLUS operand,
but we were actually returning lowpart of the whole PLUS plus
the lowpart of the PLUS second operand.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2016-02-05  Jakub Jelinek  

PR rtl-optimization/69691
* lra-eliminations.c (move_plus_up): Don't add the addend twice.

* gcc.c-torture/execute/pr69691.c: New test.

--- gcc/lra-eliminations.c.jj   2016-01-14 20:57:03.0 +0100
+++ gcc/lra-eliminations.c  2016-02-05 16:54:42.142004224 +0100
@@ -303,7 +303,8 @@ move_plus_up (rtx x)
 subreg_lowpart_offset (x_mode,
subreg_reg_mode));
   if (cst && CONSTANT_P (cst))
-   return gen_rtx_PLUS (x_mode, lowpart_subreg (x_mode, subreg_reg,
+   return gen_rtx_PLUS (x_mode, lowpart_subreg (x_mode,
+XEXP (subreg_reg, 0),
 subreg_reg_mode), cst);
 }
   return x;
--- gcc/testsuite/gcc.c-torture/execute/pr69691.c.jj2016-02-05 
17:08:31.582557031 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr69691.c   2016-02-05 
17:08:24.0 +0100
@@ -0,0 +1,127 @@
+/* PR rtl-optimization/69691 */
+
+char u[] = { 46, 97, 99, 104, 52, 0 };
+char *v[] = { u, 0 };
+struct S { char a[10]; struct S *b[31]; };
+struct S r[7], *r2 = r;
+static struct S *w = 0;
+
+__attribute__((noinline, noclone)) int
+fn (int x)
+{
+  if (__builtin_strchr (u, x) || x == 96)
+return x;
+  __builtin_abort ();
+}
+
+__attribute__((noinline, noclone)) int
+foo (char x)
+{
+  if (x == 0)
+__builtin_abort ();
+  if (fn (x) >= 96 && fn (x) <= 122)
+return (fn (x) - 96);
+  else if (x == 46)
+return 0;
+  else
+{
+  __builtin_printf ("foo %d\n", x);
+  return -1;
+}
+}
+
+__attribute__((noinline, noclone)) void
+bar (char **x)
+{
+  char **b, c, *d, e[500], *f, g[10];
+  int z, l, h, i;
+  struct S *s;
+
+  w = r2++;
+  for (b = x; *b; b++)
+{
+  __builtin_strcpy (e, *b);
+  f = e;
+  do
+   {
+ d = __builtin_strchr (f, 32);
+ if (d)
+   *d = 0;
+ l = __builtin_strlen (f);
+ h = 0;
+ s = w;
+ __builtin_memset (g, 0, sizeof (g));
+ for (z = 0; z < l; z++)
+   {
+ c = f[z];
+ if (c >= 48 && c <= 57)
+   g[h] = c - 48;
+ else
+   {
+ i = foo (c);
+ if (!s->b[i])
+   {
+ s->b[i] = r2++;
+ if (r2 == [7])
+   __builtin_abort ();
+   }
+ s = s->b[i];
+ h++;
+   }
+   }
+ __builtin_memcpy (s->a, g, 10);
+ if (d)
+   f = d + 1;
+   }
+  while (d);
+}
+}
+
+__attribute__((noinline, noclone)) void
+baz (char *x)
+{
+  char a[300], b[300];
+  int z, y, t, l;
+  struct S *s;
+
+  l = __builtin_strlen (x);
+  *a = 96;
+  for (z = 0; z < l; z++)
+{
+  a[z + 1] = fn ((unsigned int) x[z]);
+  if (foo (a[z + 1]) <= 0)
+   return;
+}
+  a[l + 1] = 96;
+  l += 2;
+  __builtin_memset (b, 0, l + 2);
+
+  if (!w)
+return;
+
+  for (z = 0; z < l; z++)
+{
+  s = w;
+  for (y = z; y < l; y++)
+   {
+ s = s->b[foo (a[y])];
+ if (!s)
+   break;
+ for (t = 0; t <= y - z + 2; t++)
+   if (s->a[t] > b[z + t])
+ b[z + t] = s->a[t];
+   }
+}
+  for (z = 3; z < l - 2; z++)
+if ((b[z] & 1) == 1)
+ asm ("");
+}
+
+int
+main ()
+{
+  bar (v);
+  char c[] = { 97, 97, 97, 97, 97, 0 };
+  baz (c);
+  return 0;
+}

Jakub


Re: [Patch, Fortran] PR 69495: unused-label warning does not tell which flag triggered it

2016-02-05 Thread Janus Weil
Hi all,

I have slightly updated the patch now to avoid string-breaking issues
(even if it may not be a problem at all, as mentioned by Jospeh). Also
I removed the questionable part in intrinsic.c that I was not sure
about.

This version of the patch should not be too far from obvious now. Ok for trunk?

Cheers,
Janus



2016-02-03 23:27 GMT+01:00 Joseph Myers :
> On Wed, 3 Feb 2016, Manfred Schwarb wrote:
>
>> There are 2 things with translation, and there is a third issue:
>> - As you noticed, breaking things differently means translation has to be
>>   done again.
>> - Normally, each string is translated independently, and depending on the
>>   language there may be lack of context (e.g. adjectives get different
>> suffixes
>>   depending on the noun).
>
> I believe gettext works fine with (compile-time) string constant
> concatenation - that is, extracts the whole concatenated string for
> translation, so these are non-issues.  What doesn't work includes:
>
> * Runtime concatenation of strings or otherwise putting English fragments
> together at runtime.
>
> * String constant concatenation where one of the concatenated pieces comes
> from a macro expansion.
>
> * The argument for translation being a conditional expression:
>
>   error (cond ? "message 1" : "message 2");
>
> (in this case, only one of the messages may be extracted for translation,
> so you need to mark both of them up with appropriate macros such as G_).
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
Index: gcc/fortran/check.c
===
--- gcc/fortran/check.c (Revision 233182)
+++ gcc/fortran/check.c (Arbeitskopie)
@@ -5180,7 +5180,8 @@ gfc_check_transfer (gfc_expr *source, gfc_expr *mo
 return true;
 
   if (source_size < result_size)
-gfc_warning (0, "Intrinsic TRANSFER at %L has partly undefined result: "
+gfc_warning (OPT_Wsurprising,
+"Intrinsic TRANSFER at %L has partly undefined result: "
 "source size %ld < result size %ld", >where,
 (long) source_size, (long) result_size);
 
Index: gcc/fortran/frontend-passes.c
===
--- gcc/fortran/frontend-passes.c   (Revision 233182)
+++ gcc/fortran/frontend-passes.c   (Arbeitskopie)
@@ -715,10 +715,12 @@ do_warn_function_elimination (gfc_expr *e)
   if (e->expr_type != EXPR_FUNCTION)
 return;
   if (e->value.function.esym)
-gfc_warning (0, "Removing call to function %qs at %L",
+gfc_warning (OPT_Wfunction_elimination,
+"Removing call to function %qs at %L",
 e->value.function.esym->name, &(e->where));
   else if (e->value.function.isym)
-gfc_warning (0, "Removing call to function %qs at %L",
+gfc_warning (OPT_Wfunction_elimination,
+"Removing call to function %qs at %L",
 e->value.function.isym->name, &(e->where));
 }
 /* Callback function for the code walker for doing common function
Index: gcc/fortran/invoke.texi
===
--- gcc/fortran/invoke.texi (Revision 233182)
+++ gcc/fortran/invoke.texi (Arbeitskopie)
@@ -709,8 +709,10 @@ Check the code for syntax errors, but do not actua
 will generate module files for each module present in the code, but no
 other output file.
 
-@item -pedantic
+@item -Wpedantic
+@itemx -pedantic
 @opindex @code{pedantic}
+@opindex @code{Wpedantic}
 Issue warnings for uses of extensions to Fortran 95.
 @option{-pedantic} also applies to C-language constructs where they
 occur in GNU Fortran source files, such as use of @samp{\e} in a
Index: gcc/fortran/resolve.c
===
--- gcc/fortran/resolve.c   (Revision 233182)
+++ gcc/fortran/resolve.c   (Arbeitskopie)
@@ -2127,7 +2127,8 @@ resolve_elemental_actual (gfc_expr *expr, gfc_code
  && (set_by_optional || arg->expr->rank != rank)
  && !(isym && isym->id == GFC_ISYM_CONVERSION))
{
- gfc_warning (0, "%qs at %L is an array and OPTIONAL; IF IT IS "
+ gfc_warning (OPT_Wpedantic,
+  "%qs at %L is an array and OPTIONAL; IF IT IS "
   "MISSING, it cannot be the actual argument of an "
   "ELEMENTAL procedure unless there is a non-optional "
   "argument with the same rank (12.4.1.5)",
@@ -3685,7 +3686,8 @@ resolve_operator (gfc_expr *e)
  else
msg = "Inequality comparison for %s at %L";
 
- gfc_warning (0, msg, gfc_typename (>ts), >where);
+ gfc_warning (OPT_Wcompare_reals, msg,
+  gfc_typename (>ts), >where);
}
}
 
@@ -14890,12 +14892,13 @@ warn_unused_fortran_label (gfc_st_label *label)
   switch (label->referenced)
 {
 case ST_LABEL_UNKNOWN:

[RFC] [MIPS] Enable non-executable PT_GNU_STACK support

2016-02-05 Thread Faraz Shahbazker
Enable non-executable stack mode if assembler and linker support it.

Currently the MIPS FPU emulator uses eXecute Out of Line (XOL) on the stack to
handle instructions in the delay slots of FPU branches.  Because of this MIPS
cannot have a non-executable stack. While the solution on the kernel side is
not yet finalized, we propose changes required on the tools-side to make them
ready for a seamless transition whenever a fixed kernel becomes available.

glibc/dynamic linker:

* When non-executable stack is requested, first check AT_FLAGS in the
  auxiliary vector to decide if this kernel supports a non-executable
  stack. Persist with the non-executable mode specified on the
  PT_GNU_STACK segment only if kernel supports it, else revert to an
  executable stack.

* The 25th bit (1<<24) in AT_FLAGS is reserved for use by the kernel to
  indicate that it supports a non-executable stack on MIPS.

* glibc's ABIVERSION is incremented from 3 to 5, so that applications linked
  for this glibc can't be accidentally run against older versions. ABIVERSION
  4 has been skipped over because it was chosen for IFUNC support, which is
  still under review.

Patch under review: https://sourceware.org/ml/libc-alpha/2016-01/msg00567.html

binutils:

* Increment the ABIVERSION to 5 for objects with non-executable stacks.

Patch under review: https://sourceware.org/ml/binutils/2016-02/msg00087.html

gcc:

* Check if assembler/dynamic linker support the new behaviour
  (ABIVERSION >= 5). If yes, enable non-executable stack by default
  for all objects.

gcc/ChangeLog
* configure.ac: Check if assembler supports the new PT_GNU_STACK
ABI change; if yes, enable non-executable stack mode by default.
* configure: Regenerate.
* config.in: Regenerate.
* config/mips/mips.c: Define TARGET_ASM_FILE_END to indicate
stack mode for each C file if LD_MIPS_GNUSTACK is enabled.

libgcc/ChangeLog
config/mips/crti.S: Add .note.GNU-stack marker if LD_MIPS_GNUSTACK
support is enabled.
config/mips/crtn.S: Add .note.GNU-stack marker if LD_MIPS_GNUSTACK
support is enabled.
config/mips/mips16.S: Add .note.GNU-stack marker if
LD_MIPS_GNUSTACK support is enabled.
config/mips/vr4120-div.S: Add .note.GNU-stack marker if
LD_MIPS_GNUSTACK support is enabled.

-- gcc/configure.ac gcc/config/mips/mip.c config/mips/crti.S config/mips/crtn.S 
config/mips/mips16.S config/mips/vr4120-div.S
---
 gcc/config/mips/mips.c  |5 +
 gcc/configure.ac|   23 +++
 libgcc/config/mips/crti.S   |6 ++
 libgcc/config/mips/crtn.S   |6 ++
 libgcc/config/mips/mips16.S |7 +++
 libgcc/config/mips/vr4120-div.S |7 +++
 6 files changed, 54 insertions(+)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index ea18ad6..c3eefc0 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -20194,6 +20194,11 @@ mips_promote_function_mode (const_tree type 
ATTRIBUTE_UNUSED,
 #undef TARGET_HARD_REGNO_SCRATCH_OK
 #define TARGET_HARD_REGNO_SCRATCH_OK mips_hard_regno_scratch_ok
 
+#if HAVE_LD_MIPS_GNUSTACK
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END file_end_indicate_exec_stack
+#endif
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-mips.h"
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 0a626e9..9b8190e 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4562,6 +4562,29 @@ pointers into PC-relative form.])
   AC_MSG_ERROR(
[Requesting --with-nan= requires assembler support for -mnan=])
 fi
+
+AC_CACHE_CHECK([linker for GNU-stack ABI support],
+  [gcc_cv_ld_mips_gnustack],
+  [gcc_cv_ld_mips_gnustack=no
+   if test x$gcc_cv_as != x \
+  -a x$gcc_cv_ld != x \
+  -a x$gcc_cv_readelf != x ; then
+cat > conftest.s < /dev/null 2>&1 \
+   && $gcc_cv_ld -o conftest conftest.o > /dev/null 2>&1; then
+  abi_version=`$gcc_cv_readelf -h conftest 2>&1 | grep "ABI Version:" 
| cut -d: -f2 | tr -d '[[:space:]]'`
+  if test "$abi_version" -ge 5; then
+gcc_cv_ld_mips_gnustack=yes
+  fi
+fi
+   fi
+   rm -f conftest.s conftest.o conftest])
+if test x$gcc_cv_ld_mips_gnustack = xyes; then
+   AC_DEFINE(HAVE_LD_MIPS_GNUSTACK, 1,
+  [Define if your linker can handle PT_GNU_STACK segments correctly.])
+fi
 ;;
 s390*-*-*)
 gcc_GAS_CHECK_FEATURE([.gnu_attribute support],
diff --git a/libgcc/config/mips/crti.S b/libgcc/config/mips/crti.S
index 8521d3c..aa85d94 100644
--- a/libgcc/config/mips/crti.S
+++ b/libgcc/config/mips/crti.S
@@ -21,6 +21,12 @@ a copy of the GCC Runtime Library Exception along with this 
program;
 see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 .  */
 
+#include "config.h"
+#if HAVE_LD_MIPS_GNUSTACK
+/* An executable 

[PATCH PR68021]Don't add biv candidate if it's not incremented by a single stmt

2016-02-05 Thread Bin Cheng
Hi,
As reported by PR68021, there is an ivopt bug all the time.  As designed, ivopt 
tries to identify and reuse induction variables in the original input.  
Apparently we don't need to compute such original biv because the computation 
is already there.  Every time an iv use is rewritten, ivopt checks if the 
candidate is an original biv using below code:

  /* An important special case -- if we are asked to express value of
 the original iv by itself, just exit; there is no need to
 introduce a new computation (that might also need casting the
 variable to unsigned and back).  */
  if (cand->pos == IP_ORIGINAL
  && cand->incremented_at == use->stmt)
{
  enum tree_code stmt_code;

  gcc_assert (is_gimple_assign (use->stmt));
  gcc_assert (gimple_assign_lhs (use->stmt) == cand->var_after);

  /* Check whether we may leave the computation unchanged.
 This is the case only if it does not rely on other
 computations in the loop -- otherwise, the computation
 we rely upon may be removed in remove_unused_ivs,
 thus leading to ICE.  */
  stmt_code = gimple_assign_rhs_code (use->stmt);
  if (stmt_code == PLUS_EXPR
  || stmt_code == MINUS_EXPR
  || stmt_code == POINTER_PLUS_EXPR)
{
  if (gimple_assign_rhs1 (use->stmt) == cand->var_before)
op = gimple_assign_rhs2 (use->stmt);
  else if (gimple_assign_rhs2 (use->stmt) == cand->var_before)
op = gimple_assign_rhs1 (use->stmt);
  else
op = NULL_TREE;
}
  else
op = NULL_TREE;

  if (op && expr_invariant_in_loop_p (data->current_loop, op))
return;
}

Note this code can only handle specific form biv, in which there is a single 
explicit increment stmt in the form of "biv_after = biv_before + step".

Unfortunately, in rare case like this, the biv is increased in two stmts, like:
  biv_x = biv_before + step_part_1;
  biv_after = biv_x + step_part_2;

That's why control flow goes to ICE point.  We should not fix code at the ICE 
point because:
1) We shouldn't rewrite biv candidate.  Even there is no correctness issue, it 
will introduce redundant code by rewriting it.
2) For non biv candidate, all the computation at ICE point has already been 
done before at iv cost computation part.  In other words, if control flow goes 
here, gcc_assert (comp != NULL" will be true.

Back to this issue, there are two possible fixes.  First one is to specially 
rewrite mentioned increment stmts into:
  biv_after = biv_before + step
This fix needs more change because we are already after candidate creation 
point and need to do computation on ourself.

Another fix is just don't add biv.  In this way, we check stricter condition 
when adding biv candidate, guarantee control flow doesn't go to ICE point.  It 
won't cause worse code (Well, maybe a type conversion from unsigned to signed) 
since we add exact the same candidate anyway (but not as a biv candidate).  As 
a matter of fact, we create/use another candidate which has the same {base, 
step} as the biv.  The computation of biv now becomes dead code and will be 
removed by following passes.

This patch fixes the issue by the 2nd method.  Bootstrap and test on x86_64 and 
AArch64 (test ongoing).  Is it OK if no failures?

Thanks,
bin

2016-02-04  Bin Cheng  

PR tree-optimization/68021
* tree-ssa-loop-ivopts.c (increment_stmt_for_biv_p): New function.
(add_iv_candidate_for_biv, rewrite_use_nonlinear_expr): Call above
function checking if stmt is increment stmt for biv.

gcc/testsuite/ChangeLog
2016-02-04  Bin Cheng  

PR tree-optimization/68021
* gcc.dg/tree-ssa/pr68021.c: New test.

diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 3faed93..adc28aa 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -3035,6 +3035,27 @@ add_standard_iv_candidates (struct ivopts_data *data)
   build_int_cst (long_long_integer_type_node, 1), true, NULL);
 }
 
+/* Return true if STMT is the increment stmt for a biv.  BEFORE and AFTER
+   are the corresponding vars before and after increment.  */
+
+static bool
+increment_stmt_for_biv_p (gimple *stmt, tree before, tree after)
+{
+  enum tree_code code;
+
+  gcc_assert (is_gimple_assign (stmt));
+  gcc_assert (gimple_assign_lhs (stmt) == after);
+
+  code = gimple_assign_rhs_code (stmt);
+  if (code != PLUS_EXPR && code != MINUS_EXPR && code != POINTER_PLUS_EXPR)
+return false;
+
+  if (gimple_assign_rhs1 (stmt) != before
+  && gimple_assign_rhs2 (stmt) != before)
+return false;
+
+  return true;
+}
 
 /* Adds candidates bases on the old induction variable IV.  */
 
@@ -3078,7 +3099,11 @@ add_iv_candidate_for_biv (struct ivopts_data *data, 
struct iv *iv)
   /* Don't add candidate if it's from another PHI node because
 it's an affine iv appearing 

Re: [PATCH] Fix valgrind reported issue during char constant lexing (PR c++/69628)

2016-02-05 Thread Bernd Schmidt

On 02/03/2016 09:05 PM, Jakub Jelinek wrote:

2016-02-03  Jakub Jelinek  

PR c++/69628
* charset.c (cpp_interpret_charconst): Clear *PCHARS_SEEN
and *UNSIGNEDP if bailing out early due to errors.

* g++.dg/parse/pr69628.C: New test.


Ok.


Bernd



Re: [PATCH] tweak -Wplacement-new to fix #69662

2016-02-05 Thread Jason Merrill

On 02/05/2016 11:41 AM, Martin Sebor wrote:

But since GCC does allow global/static objects of structs with
flexible array members to be initialized, and (presumably) will
continue to even after the above is rejected in C++, the code
in the patch that detects overflowing such variables will continue
to serve its purpose.


Good point.


+@item -Wplacement-new=1
+This is the default warning level of @option{-Wplacement-new}.  At this
+level the warning is not issued for some strictly invalid constructs that
+GCC allows as extensions for compatibility with legacy code.  For example,
+the following invalid @code{new} expression is not diagnosed at this level.
+@smallexample
+struct S @{ int n, a[1]; @};
+S *s = (S *)malloc (sizeof *s + 31 * sizeof s->a[0]);
+new (s->a)int [32]();
+@end smallexample


I'd say "undefined" rather than "invalid" here.

OK with that change.

Jason



Re: Fix c/69522, memory management issue in c-parser

2016-02-05 Thread Bernd Schmidt

Ping.

On 01/29/2016 12:40 PM, Bernd Schmidt wrote:

Let's say we have

struct a {
  int x[1];
  int y[1];
} x = { 0, { 0 } };
^

When we reach the marked brace, we call into push_init_level, where we
notice that we have implicit initializers (for x[]) lying around that we
should deal with now that we've seen another open brace. The problem is
that we've created a new obstack for the initializer of y, and this is
where we also put data for the inits of x, freeing it when we see the
close brace for the initialization of y.

In the actual testcase, which is a little more complex to actually
demonstrate the issue, we end up allocating two init elts at the same
address (because of premature freeing) and place them in the same tree,
which ends up containing a cycle because of this. Then we hang.

Fixed by this patch, which splits off a new function
finish_implicit_inits from push_init_level and ensures it's called with
the outer obstack instead of the new one in the problematic case.

Bootstrapped and tested on x86_64-linux, ok?


Bernd


Re: [PATCH] Fix PR c++/68948 (wrong code generation due to invalid constructor call)

2016-02-05 Thread Jason Merrill

On 02/05/2016 09:13 AM, Jason Merrill wrote:

On 02/05/2016 07:54 AM, Patrick Palka wrote:

On Thu, 4 Feb 2016, Patrick Palka wrote:


The compiler correctly detects and diagnoses invalid constructor calls
such as C::C () in a non-template context but it fails to do so while
processing a class template.  [ Section 3.4.3.1 of the standard is what
makes these forms of constructor calls illegal -- see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68948#c9  ]

In a non-template context this diagnosis would take place in
build_new_method_call, called from finish_call_expr, but while
processing a class template we may exit early out of finish_call_expr
and never call build_new_method_call.

Thus we never diagnose this invalid constructor call during template
processing.  So then during instantiation of the enclosing template we
call tsubst_baselink on this constructor call, during which the call to
lookup_fnfields returns NULL (because it finds the injected class type C
not the constructor C).  Because the lookup failed, tsubst_baselink
returns error_mark_node and this node persists all the way through to
gimplification where it silently gets discarded.

This patch fixes this problem by diagnosing these invalid constructor
calls in tsubst_baselink.  Alternatively, we can rewire finish_call_expr
avoid exiting early while processing a template if the call in question
is a constructor call.  I'm not sure which approach is better.  This
approach seems more conservative, since it's just attaching an error
message to an existing error path.


And here is the other approach, which rewires finish_call_expr:


I like the second approach better, but you're right that the first is
more conservative, so let's go with the first for GCC 6 and switch to
the second for GCC 7.


I'm also applying this patch so that similar issues ICE rather than 
silently generate bad code.


commit f15c5ca5e31d39fb13ef700afeb43aad0c8c7903
Author: Jason Merrill 
Date:   Fri Feb 5 10:34:33 2016 -0500

	Make issues similar to PR c++/68948 fail loudly.

	* semantics.c (finish_expr_stmt): If expr is error_mark_node,
	make sure we've seen_error().

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 95c4f19..c9f9db4 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -673,6 +673,9 @@ finish_expr_stmt (tree expr)
 
   if (expr != NULL_TREE)
 {
+  /* If we ran into a problem, make sure we complained.  */
+  gcc_assert (expr != error_mark_node || seen_error ());
+
   if (!processing_template_decl)
 	{
 	  if (warn_sequence_point)


Remove -fshort-double (PR60410)

2016-02-05 Thread Bernd Schmidt
This patch fixes PR60410 by removing -fshort-double. Nick earlier 
propsed a fix for the crash, but Richard B suggested removing the option 
entirely, and I'd agree with that. It's a pointless ABI-changing option 
on most targets, and if a port really needs it, it should be a -m option 
that tweaks DOUBLE_TYPE_SIZE.


It turns out that there is still a mips config that enables it for a set 
of multilibs. As mentioned here:

  https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02140.html
I've not managed to make that config build, it fails configuring libgcc. 
A mips maintainer would have to speak up as to whether that config is 
useful at all or not.


In any case, the patch below was bootstrapped and tested on 
x86_64-linux. Ok?



Bernd

	PR target/60410
	* tree.c (build_common_tree_nodes): Remove short_double argument.
	All callers changed.
	* tree.h (build_common_tree_nodes): Adjust declaration.
	* doc/invoke.texi (-fshort-double): Remove documentation.
	* config/mips/t-img-elf (MULTILIB_OPTIONS, MULTILIB_DIRNAMES,
	MULTILIB_EXCEPTIONS): Remove -fshort-double variant.
	* lto-wrapper.c (merge_and_complain, append_compiler_options,
	append_linker_options): Don't handle OPT_fshort_double.
	
c-family/
	PR target/60410
	* c.opt (fshort-double): Remove.

testsuite/
	PR target/60410
	* gcc.dg/lto/pr55113_0.c: Remove test.

diff --git a/gcc/ada/gcc-interface/misc.c b/gcc/ada/gcc-interface/misc.c
index 992ac0a..75e467b 100644
--- a/gcc/ada/gcc-interface/misc.c
+++ b/gcc/ada/gcc-interface/misc.c
@@ -355,7 +355,7 @@ gnat_init (void)
 {
   /* Do little here, most of the standard declarations are set up after the
  front-end has been run.  Use the same `char' as C for Interfaces.C.  */
-  build_common_tree_nodes (flag_signed_char, false);
+  build_common_tree_nodes (flag_signed_char);
 
   /* In Ada, we use an unsigned 8-bit type for the default boolean type.  */
   boolean_type_node = make_unsigned_type (8);
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 378afae..3d84316 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -5237,7 +5237,7 @@ c_common_nodes_and_builtins (void)
   tree va_list_arg_type_node;
   int i;
 
-  build_common_tree_nodes (flag_signed_char, flag_short_double);
+  build_common_tree_nodes (flag_signed_char);
 
   /* Define `int' and `char' first so that dbx will output them first.  */
   record_builtin_type (RID_INT, NULL, integer_type_node);
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index f243744..24858cd 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1435,10 +1435,6 @@ frtti
 C++ ObjC++ Optimization Var(flag_rtti) Init(1)
 Generate run time type descriptor information.
 
-fshort-double
-C ObjC C++ ObjC++ LTO Optimization Var(flag_short_double)
-Use the same size for double as for float.
-
 fshort-enums
 C ObjC C++ ObjC++ LTO Optimization Var(flag_short_enums)
 Use the narrowest integer type possible for enumeration types.
diff --git a/gcc/config/mips/t-img-elf b/gcc/config/mips/t-img-elf
index eca0a2e..8c22853 100644
--- a/gcc/config/mips/t-img-elf
+++ b/gcc/config/mips/t-img-elf
@@ -20,19 +20,14 @@
 # A multilib for mips32r6+LE
 # A multilib for mips64r6
 # A multilib for mips64r6+LE
-# A multilib for mips32r6+LE+singlefloat+shortdouble
 
-MULTILIB_OPTIONS = mips64r6 mabi=64 EL msoft-float/msingle-float fshort-double
-MULTILIB_DIRNAMES = mips64r6 64 el sof sgl short
+MULTILIB_OPTIONS = mips64r6 mabi=64 EL msoft-float/msingle-float
+MULTILIB_DIRNAMES = mips64r6 64 el sof sgl
 MULTILIB_MATCHES = EL=mel EB=meb
 
 # Don't build 64r6 with single-float
 MULTILIB_EXCEPTIONS += mips64r6/*msingle-float*
-MULTILIB_EXCEPTIONS += mips64r6/*fshort-double*
 
 MULTILIB_EXCEPTIONS += mabi=64*
 MULTILIB_EXCEPTIONS += msingle-float*
 MULTILIB_EXCEPTIONS += *msingle-float
-MULTILIB_EXCEPTIONS += fshort-double
-MULTILIB_EXCEPTIONS += EL/fshort-double
-MULTILIB_EXCEPTIONS += *msoft-float/fshort-double
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fcc404e..ad536ba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -488,8 +488,7 @@ Objective-C and Objective-C++ Dialects}.
 -fpcc-struct-return  -fpic  -fPIC -fpie -fPIE -fno-plt @gol
 -fno-jump-tables @gol
 -frecord-gcc-switches @gol
--freg-struct-return  -fshort-enums @gol
--fshort-double  -fshort-wchar @gol
+-freg-struct-return  -fshort-enums  -fshort-wchar @gol
 -fverbose-asm  -fpack-struct[=@var{n}]  @gol
 -fleading-underscore  -ftls-model=@var{model} @gol
 -fstack-reuse=@var{reuse_level} @gol
@@ -11090,14 +11089,6 @@ is equivalent to the smallest integer type that has enough room.
 code that is not binary compatible with code generated without that switch.
 Use it to conform to a non-default application binary interface.
 
-@item -fshort-double
-@opindex fshort-double
-Use the same size for @code{double} as for @code{float}.
-
-@strong{Warning:} the @option{-fshort-double} switch causes GCC to generate
-code that is not binary compatible with code generated without that 

Re: [PATCH] Fix PR c++/69283 (auto deduction fails when ADL is required)

2016-02-05 Thread Patrick Palka

On Tue, 19 Jan 2016, Patrick Palka wrote:


On Tue, Jan 19, 2016 at 9:56 AM, Jason Merrill  wrote:

On 01/18/2016 10:55 PM, Patrick Palka wrote:


mark_used is wrongly diagnosing a use of a TEMPLATE_DECL (e.g. the call
to f1 in function f3 of auto-fn29.C below) for having an undeduced
'auto' return type.  This doesn't make sense, because an 'auto' used
inside a template doesn't get deduced until after the template is
instantiated.  So for a TEMPLATE_DECL we shouldn't diagnose a use of
undeduced 'auto' here.  After instantiation, presumably we will call
mark_used on the resulting FUNCTION_DECL which will check for undeduced
auto appropriately.
@@ -5112,7 +5112,9 @@ mark_used (tree decl, tsubst_flags_t complain)
|| DECL_LANG_SPECIFIC (decl) == NULL
|| DECL_THUNK_P (decl))
  {
-  if (!processing_template_decl && type_uses_auto (TREE_TYPE (decl)))
+  if (!processing_template_decl
+ && TREE_CODE (decl) != TEMPLATE_DECL
+ && type_uses_auto (TREE_TYPE (decl)))



How does a TEMPLATE_DECL get in here?  Does it have null DECL_LANG_SPECIFIC?


(In the test case auto-fn29.C,) When instantiating the template
function f3,we call tsubst on the CALL_EXPR "f1 (v);".  There, ADL is
performed on the identifier f1 (which is the CALL_EXPR_FN) which
returns the TEMPLATE_DECL f1.  Then mark_used is called on this
CALL_EXPR_FN, only if it's a decl.

If in the test case the call to "f1 (v);" is replaced with "Ape::f1
(v);" then the CALL_EXPR_FN is then an OVERLOAD (to the TEMPLATE_DECL
f1), i.e. not a decl, so we don't call mark_used on it in tsubst.

The DECL_LANG_SPECIFIC of this decl is not null.



I'd think mark_used of a TEMPLATE_DECL should return after setting
TREE_USED, there's nothing else to do with it.


Consider it changed.


Here's the updated patch, with mark_used made to return earlier in case
of a TEMPLATE_DECL.  Also, auto-fn31.C is now consolidated into
auto-fn29.C since the two test cases were so similar.

-- >8 --

gcc/cp/ChangeLog:

PR c++/69283
PR c++/67835
* decl2.c (mark_used): When given a TEMPLATE_DECL, return after
setting its TREE_USED.

gcc/testsuite/ChangeLog:

PR c++/69283
PR c++/67835
* g++.dg/cpp1y/auto-fn29.C: New test.
* g++.dg/cpp1y/auto-fn30.C: New test.
---
 gcc/cp/decl2.c |  4 
 gcc/testsuite/g++.dg/cpp1y/auto-fn29.C | 34 ++
 gcc/testsuite/g++.dg/cpp1y/auto-fn30.C | 21 +
 3 files changed, 59 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn29.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/auto-fn30.C

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 7d68961..15d7617 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -5068,6 +5068,10 @@ mark_used (tree decl, tsubst_flags_t complain)

   /* Set TREE_USED for the benefit of -Wunused.  */
   TREE_USED (decl) = 1;
+
+  if (TREE_CODE (decl) == TEMPLATE_DECL)
+return true;
+
   if (DECL_CLONED_FUNCTION_P (decl))
 TREE_USED (DECL_CLONED_FUNCTION (decl)) = 1;

diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn29.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn29.C
new file mode 100644
index 000..f9260e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn29.C
@@ -0,0 +1,34 @@
+// PR c++/69283
+// { dg-do compile { target c++14 } }
+
+namespace Ape {
+   struct Type {};
+
+   template 
+   auto f1(T const& v){
+   return true;
+   }
+
+   template 
+   auto f2(T const& v){
+   return f2(v); // { dg-error "auto" }
+   }
+}
+
+namespace Baboon {
+   template 
+   bool f3(T const& v){
+   return f1(v);
+   }
+
+   template 
+   bool f4(T const& v){
+   f2(v);
+   }
+}
+
+int main(){
+   Ape::Type x;
+   Baboon::f3(x);
+   Baboon::f4(x);
+}
diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn30.C 
b/gcc/testsuite/g++.dg/cpp1y/auto-fn30.C
new file mode 100644
index 000..e005e6e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn30.C
@@ -0,0 +1,21 @@
+// PR c++/67835
+// { dg-do compile { target c++14 } }
+
+template
+auto g(Tag tag, T x) {
+ return f(tag, x);
+}
+
+namespace abc {
+struct tag {};
+
+struct A {};
+
+template
+auto f(tag, T x) { return x; }
+}
+
+int main() {
+ g(abc::tag(), abc::A());
+ return 0;
+}
--
2.7.0.303.g36d4cae



Re: [PATCH v3] PR48344: Fix unrecognizable insn error with -fstack-limit-register=r2

2016-02-05 Thread Kelvin Nilsen




Ping.  Thanks.

On 01/27/2016 11:12 AM, Kelvin Nilsen wrote:

This patch has bootstrapped and tested on
powerpc64le-unknown-linux-gnu with no regressions.  Is this ok for the
trunk?

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48344 for the
original problem report.  The error resulted because gcc's processing
of command-line options within gcc initialization code originally
preceded the processing of target-specific configuration hooks.

In the unpatched gcc implementation, the Pmode (pointer mode) variable
has not been initialized at the time the -fstack-limit-register
command-line option is processed.  As a consequence, the
stack-limiting register is not assigned a proper mode.  Thus, rtl
instructions that make use of this stack-limiting register have an
unspecified mode, and are therefore not matched by any known
instructions.

The fix represented in this patch is to defer the command-line
processing related to command-line specification of a stack-limiting
register until after target-specific initialization has been completed.

Some questions and issues raised in response to version 2 of this
patch are addressed below:

1. Concerns regarding possible unintended consequences that might
result from moving all target-specific initialization to precede the
invocation of the handle_common_deferred_options () function are
addressed by preserving the original initialization order and moving
the relevant command-line processing to follow the target-specific
initialization.

2. A question was raised as to whether Pmode can change with attribute
target.  It cannot.



Here is the original message:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02146.html





Re: [PATCH] s390: Add -fsplit-stack support

2016-02-05 Thread Ulrich Weigand
Marcin Kościelnicki wrote:

> I'll stay with checking for larl - while I can imagine someone adding a 
> new conditional branch instruction, I don't see a need for another 
> larl-like instruction.  Besides, this way the failure mode for an 
> unknown instruction would be producing an error, instead of silently 
> emitting code with unfixed prologue.

OK, fine with me.  B.t.w. Andreas has checked in the sibcall fix,
so you no longer should be seeing larl used for sibcalls.

> I've updated and resubmitted the gold patch.

Thanks!

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



New Finnish PO file for 'gcc' (version 6.1-b20160131)

2016-02-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Finnish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/fi.po

(This file, 'gcc-6.1-b20160131.fi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




Re: [PATCH] Fix PR c++/68948 (wrong code generation due to invalid constructor call)

2016-02-05 Thread Patrick Palka
On Fri, Feb 5, 2016 at 12:51 PM, Jason Merrill  wrote:
> On 02/05/2016 09:13 AM, Jason Merrill wrote:
>>
>> On 02/05/2016 07:54 AM, Patrick Palka wrote:
>>>
>>> On Thu, 4 Feb 2016, Patrick Palka wrote:
>>>
 The compiler correctly detects and diagnoses invalid constructor calls
 such as C::C () in a non-template context but it fails to do so while
 processing a class template.  [ Section 3.4.3.1 of the standard is what
 makes these forms of constructor calls illegal -- see
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68948#c9  ]

 In a non-template context this diagnosis would take place in
 build_new_method_call, called from finish_call_expr, but while
 processing a class template we may exit early out of finish_call_expr
 and never call build_new_method_call.

 Thus we never diagnose this invalid constructor call during template
 processing.  So then during instantiation of the enclosing template we
 call tsubst_baselink on this constructor call, during which the call to
 lookup_fnfields returns NULL (because it finds the injected class type C
 not the constructor C).  Because the lookup failed, tsubst_baselink
 returns error_mark_node and this node persists all the way through to
 gimplification where it silently gets discarded.

 This patch fixes this problem by diagnosing these invalid constructor
 calls in tsubst_baselink.  Alternatively, we can rewire finish_call_expr
 avoid exiting early while processing a template if the call in question
 is a constructor call.  I'm not sure which approach is better.  This
 approach seems more conservative, since it's just attaching an error
 message to an existing error path.
>>>
>>>
>>> And here is the other approach, which rewires finish_call_expr:
>>
>>
>> I like the second approach better, but you're right that the first is
>> more conservative, so let's go with the first for GCC 6 and switch to
>> the second for GCC 7.
>
>
> I'm also applying this patch so that similar issues ICE rather than silently
> generate bad code.
>

Cool! Good idea.


Re: C++ PATCH for c++/69688 (bogus error with -Wsign-compare)

2016-02-05 Thread Jason Merrill

On 02/05/2016 05:32 PM, Marek Polacek wrote:

   if (TREE_CODE (type) == ERROR_MARK)
 return NULL_TREE;

+  /* Here, DECL may change value; purge caches.  */
+  clear_fold_cache ();
+  clear_cv_cache ();
+
   if (MAYBE_CLASS_TYPE_P (type))


This should happen after computing the value to be stored, not before. 
Also, could you combine those two functions into one?  There's no reason 
for callers such as this to need to call two different functions.


Jason



Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Vladimir Makarov

On 02/05/2016 04:25 AM, Richard Biener wrote:

The following patch fixes the performance regression for 435.gromacs
on x86_64 with AVX2 (Haswell or bdver2) caused by

2015-12-18  Andreas Krebbel  

* ira.c (ira_setup_alts): Move the scan for commutative modifier
to the first loop to make it work even with disabled alternatives.

which in itself is a desirable change giving the RA more freedom.

It turns out the fix makes an existing issue more severe in detecting
more swappable alternatives and thus exiting ira_setup_alts with
operands swapped in recog_data.  This seems to give a slight preference
to choose alternatives with the operands swapped (I didn't try to
investigate how IRA handles the "merged" alternative mask and
operand swapping in its further processing).
Alternative mask excludes alternatives which will be definitely rejected 
in LRA.  This approach is to speed up LRA (a lot was done to speed up RA 
but still it consumes a big chunk of compiler time which is unusual for 
all compilers).


LRA and reload prefer insns without commutative operands swap when all 
other costs are the same.

Of course previous RTL optimizers and canonicalization rules as well
as backend patterns are tuned towards the not swapped variant and thus
it happens doing more swaps ends up in slower code (I didn't closely
investigate).

So I tested the following patch which simply makes sure that
ira_setup_alts does not alter recog_data.

On a Intel Haswell machine I get (base is with the patch, peak is with
the above change reverted):

   Estimated
Estimated
 Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -
-
435.gromacs  7140264   27.1 S7140270
26.5 S
435.gromacs  7140264   27.1 *7140269
26.6 S
435.gromacs  7140263   27.1 S7140269
26.5 *
==
435.gromacs  7140264   27.1 *7140269
26.5 *

which means the patched result is even better than before Andreas
change.  Current trunk homes in at a Run Time of 321s (which is
the regression to fix).
  Thanks for working on this, Richard.  It is not easy to find reasons 
for worse code on modern processors after such small change.  As RA is 
based on heuristics it hard to predict the change for a specific 
benchmark.  I remember I checked  Andreas patch on SPEC2000 in a hope 
that it also improves x86-64 code but I did not see a difference.


It is even hard to say sometimes how a specific (non-heuristic) 
optimization will affect a specific benchmark performance when a lot of 
unknown (from alignments to CPU internals are involved).  An year ago I 
tried to use ML to choose best options.  I used a set of about 100 C 
benchmarks (and even more functions).  For practically every benchmark, 
I had an option modification to -Ofast resulting in faster code but ML 
prediction did not work at all.

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?


OK.  Thanks again.

2016-02-05   Richard Biener  

PR rtl-optimization/69274
* ira.c (ira_setup_alts): Do not change recog_data.operand
order.

Index: gcc/ira.c
===
--- gcc/ira.c   (revision 231814)
+++ gcc/ira.c   (working copy)
@@ -1888,10 +1888,11 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
}
if (commutative < 0)
break;
-  if (curr_swapped)
-   break;
+  /* Swap forth and back to avoid changing recog_data.  */
std::swap (recog_data.operand[commutative],
 recog_data.operand[commutative + 1]);
+  if (curr_swapped)
+   break;
  }
  }
  




RFA (convert): PATCH for c++/69631 (wrong overflow error with -fwrapv)

2016-02-05 Thread Jason Merrill
The problem here was that the call to convert_to_integer_nofold was 
still pushing the conversion down into the multiply expression, and when 
we do the multiplication in unsigned short it overflows.  This patch 
fixes convert_to_integer_1 to not do any truncation distribution when 
!dofold.  But that broke several testcases, because fold() doesn't ever 
try to do that distribution, and so was missing some optimizations 
revealed by the distribution.  So this patch changes cp_fold to redo the 
conversion with dofold enabled.  I also change the C++ convert() entry 
point to do folding conversion again, since a few places in 
convert_to_integer_1, and probably elsewhere in the back end, expect that.


For this email I didn't re-indent the truncation distribution code in 
convert_to_integer_1 to make it easier to read; for what I check in 
should I leave it like this, re-indent it, or do something else like a 
goto to avoid the need for re-indentation?


Tested x86_64-pc-linux-gnu, OK for trunk?
commit 11d9ebd11a5951c60e132456075cd4dc42ff7e71
Author: Jason Merrill 
Date:   Wed Feb 3 16:57:42 2016 -0500

	PR c++/69631
	* convert.c (convert_to_integer_1): Check dofold on truncation
	distribution.

diff --git a/gcc/convert.c b/gcc/convert.c
index dd7d818..b828bdf 100644
--- a/gcc/convert.c
+++ b/gcc/convert.c
@@ -105,12 +105,12 @@ convert_to_pointer (tree type, tree expr)
 }
 
 /* A wrapper around convert_to_pointer_1 that only folds the
-   expression if it is CONSTANT_CLASS_P.  */
+   expression if DOFOLD, or if it is CONSTANT_CLASS_P.  */
 
 tree
-convert_to_pointer_nofold (tree type, tree expr)
+convert_to_pointer_maybe_fold (tree type, tree expr, bool dofold)
 {
-  return convert_to_pointer_1 (type, expr, CONSTANT_CLASS_P (expr));
+  return convert_to_pointer_1 (type, expr, dofold || CONSTANT_CLASS_P (expr));
 }
 
 /* Convert EXPR to some floating-point type TYPE.
@@ -403,12 +403,12 @@ convert_to_real (tree type, tree expr)
 }
 
 /* A wrapper around convert_to_real_1 that only folds the
-   expression if it is CONSTANT_CLASS_P.  */
+   expression if DOFOLD, or if it is CONSTANT_CLASS_P.  */
 
 tree
-convert_to_real_nofold (tree type, tree expr)
+convert_to_real_maybe_fold (tree type, tree expr, bool dofold)
 {
-  return convert_to_real_1 (type, expr, CONSTANT_CLASS_P (expr));
+  return convert_to_real_1 (type, expr, dofold || CONSTANT_CLASS_P (expr));
 }
 
 /* Convert EXPR to some integer (or enum) type TYPE.
@@ -669,6 +669,7 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
 	 two narrow values can be combined in their narrow type even to
 	 make a wider result--are handled by "shorten" in build_binary_op.  */
 
+  if (dofold)
   switch (ex_form)
 	{
 	case RSHIFT_EXPR:
@@ -857,9 +858,6 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
 	  /* This is not correct for ABS_EXPR,
 	 since we must test the sign before truncation.  */
 	  {
-	if (!dofold)
-	  break;
-
 	/* Do the arithmetic in type TYPEX,
 	   then convert result to TYPE.  */
 	tree typex = type;
@@ -895,7 +893,6 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
 	 the conditional and never loses.  A COND_EXPR may have a throw
 	 as one operand, which then has void type.  Just leave void
 	 operands as they are.  */
-	  if (dofold)
 	return
 	  fold_build3 (COND_EXPR, type, TREE_OPERAND (expr, 0),
 			   VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 1)))
@@ -968,19 +965,13 @@ convert_to_integer (tree type, tree expr)
   return convert_to_integer_1 (type, expr, true);
 }
 
-/* Convert EXPR to some integer (or enum) type TYPE.
-
-   EXPR must be pointer, integer, discrete (enum, char, or bool), float,
-   fixed-point or vector; in other cases error is called.
-
-   The result of this is always supposed to be a newly created tree node
-   not in use in any existing structure.  The tree node isn't folded,
-   beside EXPR is of constant class.  */
+/* A wrapper around convert_to_complex_1 that only folds the
+   expression if DOFOLD, or if it is CONSTANT_CLASS_P.  */
 
 tree
-convert_to_integer_nofold (tree type, tree expr)
+convert_to_integer_maybe_fold (tree type, tree expr, bool fold)
 {
-  return convert_to_integer_1 (type, expr, CONSTANT_CLASS_P (expr));
+  return convert_to_integer_1 (type, expr, fold || CONSTANT_CLASS_P (expr));
 }
 
 /* Convert EXPR to the complex type TYPE in the usual ways.  If FOLD_P is
@@ -1059,12 +1050,12 @@ convert_to_complex (tree type, tree expr)
 }
 
 /* A wrapper around convert_to_complex_1 that only folds the
-   expression if it is CONSTANT_CLASS_P.  */
+   expression if DOFOLD, or if it is CONSTANT_CLASS_P.  */
 
 tree
-convert_to_complex_nofold (tree type, tree expr)
+convert_to_complex_maybe_fold (tree type, tree expr, bool dofold)
 {
-  return convert_to_complex_1 (type, expr, CONSTANT_CLASS_P (expr));
+  return convert_to_complex_1 (type, expr, dofold || CONSTANT_CLASS_P (expr));
 }
 
 /* Convert 

Re: Remove -fshort-double (PR60410)

2016-02-05 Thread Jeff Law

On 02/05/2016 12:31 PM, Bernd Schmidt wrote:

This patch fixes PR60410 by removing -fshort-double. Nick earlier
propsed a fix for the crash, but Richard B suggested removing the option
entirely, and I'd agree with that. It's a pointless ABI-changing option
on most targets, and if a port really needs it, it should be a -m option
that tweaks DOUBLE_TYPE_SIZE.

It turns out that there is still a mips config that enables it for a set
of multilibs. As mentioned here:
   https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02140.html
I've not managed to make that config build, it fails configuring libgcc.
A mips maintainer would have to speak up as to whether that config is
useful at all or not.

In any case, the patch below was bootstrapped and tested on
x86_64-linux. Ok?
Shouldn't c-opt.def be changed to note the option is deprecated and 
ignored rather than totally removing it?


OK with that change.

Jeff


Re: [PATCH] s390: Add -fsplit-stack support

2016-02-05 Thread Marcin Kościelnicki

On 04/02/16 17:27, Ulrich Weigand wrote:

Marcin Kościelnicki wrote:


Fair enough.  Here's what I'm going to implement in gold:

- any PLT relocation: call
- PC32DBL on a larl: non-call
- PC32DBL otherwise: call
- any other relocation: non-call

Does that sound right?


Hmm, I'm wondering about the PC32DBL choices.  There are now
a large number of other non-call instructions that use PC32DBL,
including lrl, strl, crl, cgrl, cgfrl, ...

However, these all access *data* at the pointed-to location,
so it is quite unlikely they would ever be used with a
function symbol.  So, assuming that you also check that the
target of the relocation is a function symbol, treating only
larl as non-call might be OK.


Yeah, I make sure the symbol is a STT_FUNC.


Maybe a more conservative approach might be to  make the decision
the other way round: for PC32DBL check for *branch* instructions,
and treat only those are calls.  There's just a few branch
instruction using PC32DBL:

brasl  (call)
brcl   (conditional or unconditional sibcall)
brcth  (???)

where the last one is extremely unlikely (but theorically
possible) to be used as conditional sibcall combined with
a register decrement; I don't think this can ever happen
with current compilers however.


I'll stay with checking for larl - while I can imagine someone adding a 
new conditional branch instruction, I don't see a need for another 
larl-like instruction.  Besides, this way the failure mode for an 
unknown instruction would be producing an error, instead of silently 
emitting code with unfixed prologue.


For full completeness, there are also PC16DBL relocations that
*could* target called functions, but only when compiling with
the -msmall-exec flag to assume total executable size is less
than 64 KB.  These are used by the following instructions:

bras
brc
brct
brctg
brxh
brxhg
brxle
brxlg
crj
cgrj
clrj
clgrj
cij
cgij
clij
clgij

Note that those are *all* branch instructions, so it might
make sense to add any PC16DBL targetting a function symbol
to the list of calls, just in case.  (But since basically
nobody ever uses -msmall-exec, it doesn't really matter
much either.)


Ah right, I've added PC16DBL to the "always call" list.


Bye,
Ulrich



I've updated and resubmitted the gold patch.

Marcin Kościelnicki


Re: [PATCH] PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges.

2016-02-05 Thread Jeff Law

On 02/05/2016 06:43 AM, Alexander Fomin wrote:

Hi!

Some kind of this patch was submitted about a year ago by Igor
Zamyatin. It's an attempt to fix PR rtl-optimization/64081 by enabling
RTL loop unrolling for duplicated exit blocks and back edges.

At the time it caused AIX bootstrap failure, but now it's OK according
to David's testing. I've also bootstrapped and regtested it on
x86_64-linux-gnu.

Is it still OK for trunk now, or you consider this v7 stuff?
Anyway, it's a regression.

Thanks,
Alexander
---
gcc/

PR rtl-optimization/64081
* loop-iv.c (def_pred_latch_p): New function.
(latch_dominating_def): Allow specific cases with non-single
definitions.
(iv_get_reaching_def): Likewise.
(check_complex_exit_p): New function.
(check_simple_exit): Use check_complex_exit_p to allow certain cases
with exits not executing on any iteration.

gcc/testsuite

PR rtl-optimization/64081
* gcc.dg/pr64081.c: New test.
Normally I'd say that if it was approved before, then it's still good to 
go since there haven't been major conceptual changes in this code since 
the patch was originally written and now.


However, in this instance the patch had been reported to cause problems 
on AIX, problems that we can't reproduce now -- which makes me want to 
be more cautious.  Was it a problem with the patch, or some other latent 
issue -- we don't know at this point.


So I think the way to go is to apply this patch on top of r219827 where 
it caused the AIX failure.  Then bootstrap on aix and determine the root 
cause of of the AIX bootstrap failure.  If it's this patch, then update 
the patch as needed.  If the patch is just exposing a latent bug 
elsewhere, we should evaluate whether or not that latent but has been 
fixed or not before applying this fix to the trunk.


It's considerably more work, but ISTM it's the right thing to do.

jeff


Re: [PATCH] Fix up move_plus_up (PR rtl-optimization/69691)

2016-02-05 Thread Jeff Law

On 02/05/2016 11:17 AM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, move_plus_up on
(subreg:SI (plus:DI (reg/f:DI 20 frame)
 (const_int 16 [0x10])) 0)
returns
(plus:SI (plus:SI (subreg:SI (reg/f:DI 20 frame) 0)
 (const_int 16 [0x10]))
 (const_int 16 [0x10]))
which is wrong, the original added just 16, but the returned
rtx adds double that.
The problem is that subreg_reg is (verified in the conditions)
a PLUS, and we want to sum up the lowpart of the PLUS first operand
with CST, which is the lowpart of the second PLUS operand,
but we were actually returning lowpart of the whole PLUS plus
the lowpart of the PLUS second operand.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2016-02-05  Jakub Jelinek  

PR rtl-optimization/69691
* lra-eliminations.c (move_plus_up): Don't add the addend twice.

* gcc.c-torture/execute/pr69691.c: New test.

OK.
jeff



Re: [PATCH][ARM][0/4] Fixing PR target/65932

2016-02-05 Thread Kyrill Tkachov


On 04/02/16 19:35, H.J. Lu wrote:

On Thu, Feb 4, 2016 at 1:34 AM, Kyrill Tkachov
 wrote:

On 04/02/16 09:13, Ramana Radhakrishnan wrote:

On Fri, Jan 22, 2016 at 9:52 AM, Kyrill Tkachov
 wrote:

Hi all,

PR target/65932 is a wrong-code bug affecting arm and has manifested
itself
when compiling the Linux kernel, so it's something that we really
ought to fix. The problem stems from the fact that PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
PROMOTE_MODE also marks the promotion as unsigned, whereas the
TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
being wrongly zero-extended instead of sign-extended.  This also occurs
in PR target/67714.

Jim Wilson tried a few approaches and from the discussion
on the PR and on the ML
(https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
the preferred approach is to make PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE
match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be
an
ABI
change so we don't want to do that.  Changing PROMOTE_MODE to not
zero-extend
fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
series.

It has been posted at
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
and this series is based on top of the arm.h hunk of that patch.

There have been some concerns about the codegen quality fallout from
Jim's
patch, which is what the remaining patches in this series address:

* 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
wmul-1.c and wmul-2.c are cases where we no longer generate
sign-extend+multiply (and accumulate) instructions but instead generate
the normal full-width multiplies (the operands are sign-extended from
preceeding sign-extending loads).  This is a regression for some targets
on which the sign-extending form is faster. Patches 2 and 3 address this.
gcc.target/arm/wmul-3.c is a test where we actually end up generating
better code and so the testcase just needs to be adjusted.
Patch 4 deals with that.

* Sign-extending rather than zero-extending short values means we make
more
use of the ldrsb and ldrsh arm instructions rather than the
zero-extending
ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited
addressing
modes (only REG + REG), which could hurt code size. However, the change
also
means that we can now merge sequences of load-zero-extend followed by a
sign-extend
into a single load-sign-extend.
So we'd turn a (ldrh ; sxth) into an (ldrsh).

I've built a few popular embedded benchmarks for a Cortex-M0 target
(Thumb1)
and looked
at the generated assembly for -Os and -O2. I did see both effects. That
is,
ldrh instructions with an immediate being turned into two instructions:
  ldrhr4, [r4, #12]
->
  movsr5, #12
  ldrshr4, [r4, r5]

But I also observed the beneficial effect. Various register allocation
perturbations also featured in the changes
Overall, for every codebase I've looked at both -Os and -O2 the new code
was
slightly smaller. Probably not enough to call this an outright win, but
definitely not a regression overall.

As for ARM and Thumb2 states, this series doesn't have a major impact.
SPEC2006 bencharks are slightly reduced in size (but nothing to write
home
about) and there are no code quality regressions. And even the
regressions
caught by the testsuite in the wmul-[12].c tests don't actually manifest
in practice in SPEC, but they are fixed anyway.

The series contains one target-independent change to CSE in patch 3 that
I'll explain there.

The series has been bootstrapped and tested on arm, aarch64 and x86_64.
Is this ok for trunk together with Jim's arm.h hunk from
g ?


The whole series is OK provided the middle-end hunk has been approved.


Thanks.
Richi approved the midend hunk at:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01733.html

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69676



Yeah, we already have https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69671.
Seems to be something with the x86 patterns. Kirill has assigned it to himself.

Thanks,
Kyrill




Re: [Patch, testsuite] Require int32 target support in sso tests

2016-02-05 Thread Eric Botcazou
> Anyone have a sense of how this is supposed to work and what is wrong?  The
> lines appear to be the same to me.  :-(

You probably need to tweak the regexps again to make it accept the ^M.

-- 
Eric Botcazou


[PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener

The following patch fixes the performance regression for 435.gromacs
on x86_64 with AVX2 (Haswell or bdver2) caused by

2015-12-18  Andreas Krebbel  

* ira.c (ira_setup_alts): Move the scan for commutative modifier
to the first loop to make it work even with disabled alternatives.

which in itself is a desirable change giving the RA more freedom.

It turns out the fix makes an existing issue more severe in detecting
more swappable alternatives and thus exiting ira_setup_alts with
operands swapped in recog_data.  This seems to give a slight preference
to choose alternatives with the operands swapped (I didn't try to
investigate how IRA handles the "merged" alternative mask and
operand swapping in its further processing).

Of course previous RTL optimizers and canonicalization rules as well
as backend patterns are tuned towards the not swapped variant and thus
it happens doing more swaps ends up in slower code (I didn't closely
investigate).

So I tested the following patch which simply makes sure that
ira_setup_alts does not alter recog_data.

On a Intel Haswell machine I get (base is with the patch, peak is with
the above change reverted):

  Estimated   
Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
435.gromacs  7140264   27.1 S7140270   
26.5 S
435.gromacs  7140264   27.1 *7140269   
26.6 S
435.gromacs  7140263   27.1 S7140269   
26.5 *
==
435.gromacs  7140264   27.1 *7140269   
26.5 *

which means the patched result is even better than before Andreas
change.  Current trunk homes in at a Run Time of 321s (which is
the regression to fix).

Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?

Thanks,
Richard.

2016-02-05   Richard Biener  

PR rtl-optimization/69274
* ira.c (ira_setup_alts): Do not change recog_data.operand
order.

Index: gcc/ira.c
===
--- gcc/ira.c   (revision 231814)
+++ gcc/ira.c   (working copy)
@@ -1888,10 +1888,11 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
}
   if (commutative < 0)
break;
-  if (curr_swapped)
-   break;
+  /* Swap forth and back to avoid changing recog_data.  */
   std::swap (recog_data.operand[commutative],
 recog_data.operand[commutative + 1]);
+  if (curr_swapped)
+   break;
 }
 }
 


Re: [PATCH] S/390: Correct documentation typos.

2016-02-05 Thread Andreas Krebbel
On 02/01/2016 02:12 PM, Dominik Vogt wrote:
> gcc/ChangeLog
> 
>   * doc/extend.texi: S/390: Correct some typos.
Applied. Thanks!

-Andreas-



Re: [PATCH] Fix up bootstrap on i686 --with-arch=corei7 --with-fpmath=sse (PR bootstrap/69677)

2016-02-05 Thread Uros Bizjak
On Fri, Feb 5, 2016 at 12:04 AM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, the convert_scalars_to_vector hunk is important,
> without that we e.g. miscompile simplify-rtx.c.
> The following patch restores that hunk and extends disabling of TARGET_STV
> also for the 64-bit, but not 128-bit, aligned preferred or incoming stack
> boundaries (also non-default).
>
> Bootstrapped/regtested on i686-linux --with-arch=corei7 --with-tune=corei7
> --with-fpmath=sse.
>
> Alternatively, it is enough to just adjust stack_alignment_estimated in
> there and keep stack_alignment_needed as is, that version has also been
> successfully bootstrapped on i686-linux --with-arch=corei7
> --with-tune=corei7 --with-fpmath=sse.
>
> 2016-02-04  Jakub Jelinek  
>
> PR bootstrap/69677
> * config/i386/i386.c (convert_scalars_to_vector): Readd stack
> alignment fixes.
> (ix86_option_override_internal): Disable TARGET_STV even for
> -m{incoming,preferred}-stack-boundary=3.

Let's go with this patch to resolve the bootstrap problem. As said
earlier, after gcc-6 is released, we can fix the problem in a proper
way.

OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-02-04 18:59:38.309204574 +0100
> +++ gcc/config/i386/i386.c  2016-02-04 21:54:02.439904261 +0100
> @@ -3588,6 +3588,16 @@ convert_scalars_to_vector ()
>bitmap_obstack_release (NULL);
>df_process_deferred_rescans ();
>
> +  /* Conversion means we may have 128bit register spills/fills
> + which require aligned stack.  */
> +  if (converted_insns)
> +{
> +  if (crtl->stack_alignment_needed < 128)
> +   crtl->stack_alignment_needed = 128;
> +  if (crtl->stack_alignment_estimated < 128)
> +   crtl->stack_alignment_estimated = 128;
> +}
> +
>return 0;
>  }
> @@ -5443,12 +5453,12 @@ ix86_option_override_internal (bool main
>  opts->x_target_flags |= MASK_VZEROUPPER;
>if (!(opts_set->x_target_flags & MASK_STV))
>  opts->x_target_flags |= MASK_STV;
> -  /* Disable STV if -mpreferred-stack-boundary=2 or
> - -mincoming-stack-boundary=2 - the needed
> +  /* Disable STV if -mpreferred-stack-boundary={2,3} or
> + -mincoming-stack-boundary={2,3} - the needed
>   stack realignment will be extra cost the pass doesn't take into
>   account and the pass can't realign the stack.  */
> -  if (ix86_preferred_stack_boundary < 64
> -  || ix86_incoming_stack_boundary < 64)
> +  if (ix86_preferred_stack_boundary < 128
> +  || ix86_incoming_stack_boundary < 128)
>  opts->x_target_flags &= ~MASK_STV;
>if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL]
>&& !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
>
> Jakub


Re: [PATCH][ARM] PR target/69161: Don't ignore mode when matching comparison operator in cstore-like patterns

2016-02-05 Thread Kyrill Tkachov

Ping
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02309.html

Thanks,
Kyrill

On 29/01/16 14:27, Kyrill Tkachov wrote:

Hi all,

Similar to aarch64, the arm port also suffers from PR target/69161 when combine
tries to propagate a CCmode comparison into a vec_duplicate, creating invalid
RTL that ICEs.
Please refer to the PR and the aarch64 fix for more info.
The fix for arm is very similar.
We define a new predicate identical to arm_comparison_operator but
make it not special so that it gets the normal mode checks.
This prevents combine from matching an intermediate CCmode cstore
(where it's doing an SImode SET of a CCmode source) which it then
tries to propagate into a V4SImode vec_duplicate.

The offending patterns are the cstore patterns, so this patch updates them
to use the new predicate with mode checks. Both arm and thumb patterns
are updated.

There was no codegen difference observed on SPEC2006 for arm.
Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-01-29  Kyrylo Tkachov  

PR target/69161
* config/arm/predicates.md (arm_comparison_operator_mode):
New predicate.
* config/arm/arm.md (*mov_scc): Use arm_comparison_operator_mode
instead of arm_comparison_operator.
(*mov_negscc): Likewise.
(*mov_notscc): Likewise.
* config/arm/thumb2.md (*thumb2_mov_scc): Likewise.
(*thumb2_mov_negscc): Likewise.
(*thumb2_mov_negscc_strict_it): Likewise.
(*thumb2_mov_notscc): Likewise.
(*thumb2_mov_notscc_strict_it): Likewise.




[PATCH][ARM] Tests for arm_restrict_it patterns in thumb2.md

2016-02-05 Thread Kyrill Tkachov

Hi all,

I've been auditing the patterns in the arm backend that were added/modified for 
-mrestrict-it
and I've come up with a few runtime tests that end up generating those patterns.
This patch contains 4 tests for 4 patterns that have paths specific to 
-mrestrict-it.

There were some patterns like thumb2_mov_negscc_strict_it and 
thumb2_mov_notscc_strict_it
that I could not generate at all, because the earlier RTL optimisers always 
generated some
equivalent but different (and at least as good) so these splitters never 
matched.
I think we could safely remove them, but not at this stage.

These tests should give us a bit more test coverage into the -mrestrict-it 
functionality.

Ok for trunk?

Thanks,
Kyrill

2016-02-05  Kyrylo Tkachov  

* gcc.target/arm/cond_sub_restrict_it.c: New test.
* gcc.target/arm/condarith_restrict_it.c: Likewise.
* gcc.target/arm/movcond_restrict_it.c: Likewise.
* gcc.target/arm/negscc_restrict_it.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/arm/cond_sub_restrict_it.c b/gcc/testsuite/gcc.target/arm/cond_sub_restrict_it.c
new file mode 100644
index ..8411643e95782876b71b3beccc699a2e85022e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cond_sub_restrict_it.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-mthumb -O -mrestrict-it" }  */
+
+int a;
+
+__attribute__((noinline, noclone)) int
+fn1 (int c, int d)
+{
+  a -= c == d;
+  return a;
+}
+
+int
+main (void)
+{
+  a = 10;
+  if (fn1 (4, 4) != 9)
+__builtin_abort ();
+
+  a = 5;
+  if (fn1 (3, 4) != 5)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/condarith_restrict_it.c b/gcc/testsuite/gcc.target/arm/condarith_restrict_it.c
new file mode 100644
index ..ad0d15b0ebdb09988fa7ca11abb1e096edf3bcf1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/condarith_restrict_it.c
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-mthumb -O2 -mrestrict-it" }  */
+
+__attribute__ ((noinline, noclone)) void
+fn2 ()
+{
+  __builtin_printf ("4");
+}
+
+enum
+{
+  ONE = 1,
+  TWO
+} a;
+
+int b;
+
+__attribute__ ((noinline, noclone)) int
+fn1 ()
+{
+  int c = b == 0;
+  if (a <= ONE)
+if (b == 0)
+  fn2 ();
+  if (a)
+if (c)
+  a = 0;
+
+  return a;
+}
+
+int
+main (void)
+{
+  a = ONE;
+  b = 1;
+  if (fn1 () != ONE)
+__builtin_abort ();
+
+  a = TWO;
+  b = 0;
+  if (fn1 () != 0)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/movcond_restrict_it.c b/gcc/testsuite/gcc.target/arm/movcond_restrict_it.c
new file mode 100644
index ..f1f9cfa351bb8c2732c3dced30d8bc17f0aa573d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/movcond_restrict_it.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-mthumb -O3 -mrestrict-it" }  */
+
+int a;
+
+__attribute__ ((noinline, noclone)) int
+fn1 (int c, int d)
+{
+  if (c > d)
+a = 1;
+  else
+a = -1;
+  return a;
+}
+
+int
+main (void)
+{
+  if (fn1 (4, 5) != -1)
+__builtin_abort ();
+
+  if (fn1 (5, 4) != 1)
+__builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/negscc_restrict_it.c b/gcc/testsuite/gcc.target/arm/negscc_restrict_it.c
new file mode 100644
index ..b24c6ece7271dc4cfbb718155841282ff8f63bf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/negscc_restrict_it.c
@@ -0,0 +1,19 @@
+/* { dg-do run } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-options "-mthumb -O2 -mrestrict-it" }  */
+
+__attribute__ ((noinline, noclone)) int
+fn1 (int a, int b)
+{
+  return (a == b ? 0 : -1);
+}
+
+int
+main (void)
+{
+  if (fn1 (3, 3) != 0)
+__builtin_abort ();
+
+  if (fn1 (6, 7) != -1)
+__builtin_abort ();
+}


Re: [PATCH] libstdc++: S/390: Add missing baseline_symbols.txt for s390x/-m31.

2016-02-05 Thread Andreas Krebbel
On 02/01/2016 05:53 PM, Dominik Vogt wrote:
> The attached patch copies the existing 
> libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
> to .../s390x-linux-gnu/32/baseline_symbols.txt.  This fixes the
> abi test failure on s390x with -m31.
> 
> The patch is gzip'ed because it's large.

Applied.  Thanks!

-Andreas-




Re: [patch, Fortran] Fix PR 60526, variable name has already been declared as a type

2016-02-05 Thread Andre Vehreschild
Hi Thomas,

please note: This is not a review. I don't have the privileges to do so.

In preventing memory clutter I like to advise the use of:

char u_name[GFC_MAX_SYMBOL_LEN + 1]; 

and safe us all the dynamic memory allocation/free. Furthermore, how
about switching:

  strncpy (u_name, name, nlen+ 1);
  u_name[0] = TOUPPER(u_name[0]);

that way strncpy() can use its full power and copy aligned data using
longs, vector instructions or whatever. At least it has the potential.
with (u_)name+1 we always have an uneven start address and I doubt
strncpy can use an optimized copy algorithm. I know, that now we copy
one byte twice, but that shouldn't really matter.

Besides my ideas above the patch and test looks ok to me (I didn't do a
regtest though).

Regards,
Andre

On Thu, 4 Feb 2016 23:03:13 +0100
Thomas Koenig  wrote:

> Hello world,
> 
> For a type 'foo', we use a symtree 'Foo'. This led to accept-invalid
> when a variable name was already declared as a type.  This rather
> self-explanatory patch fixes that.
> 
> Regression-tested.  OK for trunk and 5?  (Do we still care about 4.9?)
> 
> Regards
> 
>   Thomas
> 
> 
> 2016-02-03  Thomas Koenig  
> 
>  PR fortran/60526
>  * decl.c (build_sym):  If the name has already been defined as a
>  type, issue error and return false.
> 
> 2016-02-03  Thomas Koenig  
> 
>  PR fortran/60526
>  * gfortran.dg/type_decl_4.f90:  New test.


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] S/390: Do not require -march=z13 on s390 but only on s390x.

2016-02-05 Thread Andreas Krebbel
On 02/04/2016 01:28 PM, Dominik Vogt wrote:
> gcc/testsuite/ChangeLog
> 
>   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove -march=z13 for s390 (only
>   necessary on s390x).

Applied. Thanks!

-Andreas-



Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-05 Thread Eric Botcazou
> True, but the compiler bug is in the backends - it just now gets
> exposed more easily (read: w/o user intervention).

As far as I can see there are no problematic types in the source code, i.e. 
it's plain ANSI C, so we shouldn't be creating types with non-canonical 
alignment from that.

> Doesn't "fix" the same issue popping up in other passes.

Which pass does the alignment promotion?  Maybe it's the one to be fixed.

-- 
Eric Botcazou


Re: [PATCH] S/390: Fix r6 vararg handling.

2016-02-05 Thread Jakub Jelinek
On Fri, Feb 05, 2016 at 12:43:38PM +0100, Andreas Krebbel wrote:
> Dominik just made me aware of this stupid copy and paste bug which made me 
> ending up with the very
> same loops twice :(
> I've committed the attached patch to fix this:
> 
> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
> index 1667c11..2cf7096 100644
> --- a/gcc/config/s390/s390.c
> +++ b/gcc/config/s390/s390.c
> @@ -9326,10 +9326,6 @@ s390_register_info_set_ranges ()
>for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
>cfun_frame_layout.first_restore_gpr = (i == 16) ? -1 : i;
>cfun_frame_layout.last_restore_gpr = (i == 16) ? -1 : j;
> -
> -  /* Now the range of GPRs which need saving.  */
> -  for (i = 0; i < 16 && cfun_gpr_save_slot (i) != SAVE_SLOT_STACK; i++);
> -  for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
>cfun_frame_layout.first_save_gpr = (i == 16) ? -1 : i;
>cfun_frame_layout.last_save_gpr = (i == 16) ? -1 : j;
>  }

Thus
  cfun_frame_layout.first_save_gpr = cfun_frame_layout.first_restore_gpr;
  cfun_frame_layout.last_save_gpr = cfun_frame_layout.last_restore_gpr;
?  Are those supposed to be equivalent just here, or everywhere?

Jakub


Use plain -fopenacc to enable OpenACC kernels processing (was: [PATCH, 6/16] Add pass_oacc_kernels)

2016-02-05 Thread Thomas Schwinge
Hi!

On Mon, 9 Nov 2015 18:39:19 +0100, Tom de Vries  wrote:
> On 09/11/15 16:35, Tom de Vries wrote:
> > this patch series for stage1 trunk adds support to:
> > - parallelize oacc kernels regions using parloops, and
> > - map the loops onto the oacc gang dimension.

> Atm, the parallelization behaviour for the kernels region is controlled 
> by flag_tree_parallelize_loops, which is also used to control generic 
> auto-parallelization by autopar using omp. That is not ideal, and we may 
> want a separate flag (or param) to control the behaviour for oacc 
> kernels, f.i. -foacc-kernels-gang-parallelize=. I'm open to suggestions.

I suggest to use plain -fopenacc to enable OpenACC kernels processing
(which just makes sense, I hope) ;-) and have later processing stages
determine the actual parametrization (currently: number of gangs) (that
is, Nathan's recent "Default compute dimensions" patches).

The code changes are simple enough; OK for trunk?  (This patch depends on
my 'Un-parallelized OpenACC kernels constructs with nvptx offloading:
"avoid offloading"' pending review,
.)

Originally, I want to use:

OMP_CLAUSE_NUM_GANGS_EXPR (clause) = build_int_cst (integer_type_node, 
n_threads == 0 ? -1 : n_threads);

... to store -1 "have the compiler decidew" (instead of now 0 "have the
run-time decide", which might prevent some code optimizations, as I
understand it) for the n_threads == 0 case, but it seems that for an
offloaded OpenACC kernels region, gcc/omp-low.c:oacc_validate_dims is
called with the parameter "used" set to 0 instead of "gang", and then the
"Default anything left to 1 or a partitioned default" logic will default
dims["gang"] to oacc_min_dims["gang"] (that is, 1) instead of the
oacc_default_dims["gang"] (that is, 32).  Nathan, does that smell like a
bug (and could you look into that)?

diff --git gcc/tree-parloops.c gcc/tree-parloops.c
index 139e38c..e498e5b 100644
--- gcc/tree-parloops.c
+++ gcc/tree-parloops.c
@@ -2016,7 +2016,8 @@ transform_to_exit_first_loop (struct loop *loop,
 /* Create the parallel constructs for LOOP as described in gen_parallel_loop.
LOOP_FN and DATA are the arguments of GIMPLE_OMP_PARALLEL.
NEW_DATA is the variable that should be initialized from the argument
-   of LOOP_FN.  N_THREADS is the requested number of threads.  */
+   of LOOP_FN.  N_THREADS is the requested number of threads, which can be 0 if
+   that number is to be determined later.  */
 
 static void
 create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
@@ -2049,6 +2050,7 @@ create_parallel_loop (struct loop *loop, tree loop_fn, 
tree data,
   basic_block paral_bb = single_pred (bb);
   gsi = gsi_last_bb (paral_bb);
 
+  gcc_checking_assert (n_threads != 0);
   t = build_omp_clause (loc, OMP_CLAUSE_NUM_THREADS);
   OMP_CLAUSE_NUM_THREADS_EXPR (t)
= build_int_cst (integer_type_node, n_threads);
@@ -2221,7 +2223,8 @@ create_parallel_loop (struct loop *loop, tree loop_fn, 
tree data,
 }
 
 /* Generates code to execute the iterations of LOOP in N_THREADS
-   threads in parallel.
+   threads in parallel, which can be 0 if that number is to be determined
+   later.
 
NITER describes number of iterations of LOOP.
REDUCTION_LIST describes the reductions existent in the LOOP.  */
@@ -2318,6 +2321,7 @@ gen_parallel_loop (struct loop *loop,
   else
m_p_thread=MIN_PER_THREAD;
 
+  gcc_checking_assert (n_threads != 0);
   many_iterations_cond =
fold_build2 (GE_EXPR, boolean_type_node,
 nit, build_int_cst (type, m_p_thread * n_threads));
@@ -3177,7 +3181,7 @@ oacc_entry_exit_ok (struct loop *loop,
 static bool
 parallelize_loops (bool oacc_kernels_p)
 {
-  unsigned n_threads = flag_tree_parallelize_loops;
+  unsigned n_threads;
   bool changed = false;
   struct loop *loop;
   struct loop *skip_loop = NULL;
@@ -3199,6 +3203,13 @@ parallelize_loops (bool oacc_kernels_p)
   if (cfun->has_nonlocal_label)
 return false;
 
+  /* For OpenACC kernels, n_threads will be determined later; otherwise, it's
+ the argument to -ftree-parallelize-loops.  */
+  if (oacc_kernels_p)
+n_threads = 0;
+  else
+n_threads = flag_tree_parallelize_loops;
+
   gcc_obstack_init (_obstack);
   reduction_info_table_type reduction_list (10);
 
@@ -3361,7 +3372,13 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual bool gate (function *) { return flag_tree_parallelize_loops > 1; }
+  virtual bool gate (function *)
+  {
+if (oacc_kernels_p)
+  return flag_openacc;
+else
+  return flag_tree_parallelize_loops > 1;
+  }
   virtual unsigned int execute (function *);
   opt_pass * clone () { return new pass_parallelize_loops (m_ctxt); }
   void set_pass_param (unsigned int n, bool param)
diff --git gcc/tree-ssa-loop.c gcc/tree-ssa-loop.c
index bdbade5..4c39fbc 100644
--- 

Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Jakub Jelinek
On Fri, Feb 05, 2016 at 01:10:26PM +0100, Richard Biener wrote:
> Otherwise bootstrap / testing went ok and a full SPEC 2k6 run doesn't
> show any regressions.

Any improvements there?

Jakub


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Jakub Jelinek
On Fri, Feb 05, 2016 at 01:35:03PM +0100, Richard Biener wrote:
> On Fri, 5 Feb 2016, Jakub Jelinek wrote:
> 
> > On Fri, Feb 05, 2016 at 01:10:26PM +0100, Richard Biener wrote:
> > > Otherwise bootstrap / testing went ok and a full SPEC 2k6 run doesn't
> > > show any regressions.
> > 
> > Any improvements there?
> 
> Noise, but I only did 1 run (I did 3 only for 435.gromacs to confirm
> the progress over pre-Andreas-patch state which I would otherwise
> classified as noise as well).

And on that single run gromacs was non-noise?

> I can do a 3-run over the weekend if you think that's useful.  OTOH
> checking the patch in will get more benchmark/flag/HW covering
> from our auto-testers (which also only do 1-runs of course and will
> see unrelated changes as well).

Maybe that is good enough.  Anyway, the patch is Vlad's area of expertise...

Jakub


Re: [PATCH] Fix c/69643, named address space wrong-code

2016-02-05 Thread Richard Biener
On Thu, Feb 4, 2016 at 11:35 PM, Richard Henderson  wrote:
> On 02/05/2016 08:59 AM, Richard Biener wrote:
>>>
>>> This version fails to fall through to the next code block when
>>>(1) Both types are pointers,
>>>(2) Both types have the same address space,
>>> which will do the wrong thing when
>>>(3) The pointers have different modes.
>>>
>>> Recall that several ports allow multiple modes for pointers.
>>
>>
>> Oh, I thought they would be different address spaces.
>
>
> They probably should be.
>
>> So we'd need to add a mode check as well.
>
>
> Yes.  But why make this one expression so complicated that it's hard to
> read,
> as opposed to letting the existing code that checks modes check the mode?

Works for me.

>> I hope we don't have different type bit-precision with the same mode for
>> pointers here?
>
>
> Not that I'm aware.  ;-)
>
>
> r~


Re: [PATCH] Fix PR c++/68948 (wrong code generation due to invalid constructor call)

2016-02-05 Thread Patrick Palka

On Thu, 4 Feb 2016, Patrick Palka wrote:


The compiler correctly detects and diagnoses invalid constructor calls
such as C::C () in a non-template context but it fails to do so while
processing a class template.  [ Section 3.4.3.1 of the standard is what
makes these forms of constructor calls illegal -- see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68948#c9  ]

In a non-template context this diagnosis would take place in
build_new_method_call, called from finish_call_expr, but while
processing a class template we may exit early out of finish_call_expr
and never call build_new_method_call.

Thus we never diagnose this invalid constructor call during template
processing.  So then during instantiation of the enclosing template we
call tsubst_baselink on this constructor call, during which the call to
lookup_fnfields returns NULL (because it finds the injected class type C
not the constructor C).  Because the lookup failed, tsubst_baselink
returns error_mark_node and this node persists all the way through to
gimplification where it silently gets discarded.

This patch fixes this problem by diagnosing these invalid constructor
calls in tsubst_baselink.  Alternatively, we can rewire finish_call_expr
avoid exiting early while processing a template if the call in question
is a constructor call.  I'm not sure which approach is better.  This
approach seems more conservative, since it's just attaching an error
message to an existing error path.


And here is the other approach, which rewires finish_call_expr:

-- >8 --

gcc/cp/ChangeLog:

PR c++/68948
* semantics.c (finish_call_expr): Don't assume a constructor
call is dependent if the "this" pointer is dependent.

gcc/testsuite/ChangeLog:

PR c++/68948
* g++.dg/template/pr68948.C: New test.
---
 gcc/cp/semantics.c  | 14 +--
 gcc/testsuite/g++.dg/template/pr68948.C | 41 +
 2 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/pr68948.C

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 95c4f19..31c03ae 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -2270,6 +2270,7 @@ finish_call_expr (tree fn, vec **args, bool 
disallow_virtual,
 related to CWG issues 515 and 1005.  */
  || (TREE_CODE (fn) != COMPONENT_REF
  && non_static_member_function_p (fn)
+ && !DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (get_first_fn (fn))
  && current_class_ref
  && type_dependent_expression_p (current_class_ref)))
{
@@ -2348,8 +2349,17 @@ finish_call_expr (tree fn, vec **args, bool 
disallow_virtual,
[class.access.base] says that we need to convert 'this' to B* as
part of the access, so we pass 'B' to maybe_dummy_object.  */

-  object = maybe_dummy_object (BINFO_TYPE (BASELINK_ACCESS_BINFO (fn)),
-  NULL);
+  if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (get_first_fn (fn)))
+   {
+ /* A constructor call always uses a dummy object.  (This constructor
+call which has the form A::A () is actually invalid and we are
+going to reject it later in build_new_method_call.)  */
+ object = build_dummy_object (BINFO_TYPE (BASELINK_ACCESS_BINFO (fn)));
+ gcc_assert (!type_dependent_expression_p (object));
+   }
+  else
+   object = maybe_dummy_object (BINFO_TYPE (BASELINK_ACCESS_BINFO (fn)),
+NULL);

   if (processing_template_decl)
{
diff --git a/gcc/testsuite/g++.dg/template/pr68948.C 
b/gcc/testsuite/g++.dg/template/pr68948.C
new file mode 100644
index 000..3cb6844
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/pr68948.C
@@ -0,0 +1,41 @@
+// PR c++/68948
+
+struct B { B (); B (int); };
+
+struct Time : B { };
+
+/* Here, A and B are unrelated types.  */
+
+template 
+struct A
+{
+  void TestBody ()
+  {
+B::B (); // { dg-error "cannot call constructor .B::B." }
+B::B::B (); // { dg-error "cannot call constructor .B::B." }
+B::B (0); // { dg-error "cannot call constructor .B::B." }
+  }
+};
+
+/* Here, C is (indirectly) derived from B.  */
+
+template 
+struct C : Time
+{
+  void TestBody ()
+  {
+B::B (); // { dg-error "cannot call constructor .B::B." }
+B::B::B (); // { dg-error "cannot call constructor .B::B." }
+B::B (0); // { dg-error "cannot call constructor .B::B." }
+Time::B (0);
+  }
+};
+
+int
+main (void)
+{
+  A a;
+  C c;
+  a.TestBody ();
+  c.TestBody ();
+}
--
2.7.0.303.g36d4cae



Re: [PATCH] S/390: Fix r6 vararg handling.

2016-02-05 Thread Andreas Krebbel
On 02/05/2016 12:50 PM, Jakub Jelinek wrote:
> On Fri, Feb 05, 2016 at 12:43:38PM +0100, Andreas Krebbel wrote:
>> Dominik just made me aware of this stupid copy and paste bug which made me 
>> ending up with the very
>> same loops twice :(
>> I've committed the attached patch to fix this:
>>
>> diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
>> index 1667c11..2cf7096 100644
>> --- a/gcc/config/s390/s390.c
>> +++ b/gcc/config/s390/s390.c
>> @@ -9326,10 +9326,6 @@ s390_register_info_set_ranges ()
>>for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
>>cfun_frame_layout.first_restore_gpr = (i == 16) ? -1 : i;
>>cfun_frame_layout.last_restore_gpr = (i == 16) ? -1 : j;
>> -
>> -  /* Now the range of GPRs which need saving.  */
>> -  for (i = 0; i < 16 && cfun_gpr_save_slot (i) != SAVE_SLOT_STACK; i++);
>> -  for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
>>cfun_frame_layout.first_save_gpr = (i == 16) ? -1 : i;
>>cfun_frame_layout.last_save_gpr = (i == 16) ? -1 : j;
>>  }
> 
> Thus
>   cfun_frame_layout.first_save_gpr = cfun_frame_layout.first_restore_gpr;
>   cfun_frame_layout.last_save_gpr = cfun_frame_layout.last_restore_gpr;
erm, right

> ?  Are those supposed to be equivalent just here, or everywhere?
They will differ if vararg is being used.

-Andreas-



Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Bernd Schmidt

On 02/05/2016 01:10 PM, Richard Biener wrote:

It fails

FAIL: gcc.target/i386/addr-sel-1.c scan-assembler b+1

on i?86 (or x86_64 -m32) though, generating

f:
.LFB0:
 .cfi_startproc
 movl4(%esp), %eax
 leal1(%eax), %edx
 movsbl  a+1(%eax), %eax
 movsbl  b(%edx), %edx
 addl%edx, %eax
 ret


Well, it looks like the first movsbl load clobbers the potentially 
better base register, so trivial propagation doesn't work.


It might be another case where allowing 2->2 in combine would help. Or 
enabling -frename-registers and rerunning reload_combine afterwards.



Bernd


[Committed] S/390: Fix 64 bit sibcall

2016-02-05 Thread Andreas Krebbel
This fixes a problem revealed during the split-stack work:
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00322.html

Bootstrapped on s390x. No regressions.

Committed to mainline. Needs to be committed to 4.9.x and 5.x as well.

Bye,

-Andreas-

gcc/ChangeLog:

2016-02-05  Andreas Krebbel  

* config/s390/s390.c (s390_emit_call): Add missing 64 bit check.
---
 gcc/config/s390/s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 3be64de..c8f66c6 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -12470,7 +12470,7 @@ s390_emit_call (rtx addr_location, rtx tls_call, rtx 
result_reg,
  replace the symbol itself with the PLT stub.  */
   if (flag_pic && !SYMBOL_REF_LOCAL_P (addr_location))
 {
- if (retaddr_reg != NULL_RTX)
+ if (TARGET_64BIT || retaddr_reg != NULL_RTX)
{
  addr_location = gen_rtx_UNSPEC (Pmode,
  gen_rtvec (1, addr_location),
-- 
1.9.1



Re: [PATCH] S/390: Fix r6 vararg handling.

2016-02-05 Thread Andreas Krebbel
On 02/04/2016 05:58 PM, Andreas Krebbel wrote:
> +static void
> +s390_register_info_set_ranges ()
> +{
> +  int i, j;
> +
> +  /* Find the first and the last save slot supposed to use the stack
> + to set the restore range.
> + Vararg regs might be marked as save to stack but only the
> + call-saved regs really need restoring (i.e. r6).  This code
> + assumes that the vararg regs have not yet been recorded in
> + cfun_gpr_save_slot.  */
> +  for (i = 0; i < 16 && cfun_gpr_save_slot (i) != SAVE_SLOT_STACK; i++);
> +  for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
> +  cfun_frame_layout.first_restore_gpr = (i == 16) ? -1 : i;
> +  cfun_frame_layout.last_restore_gpr = (i == 16) ? -1 : j;
> +
> +  /* Now the range of GPRs which need saving.  */
> +  for (i = 0; i < 16 && cfun_gpr_save_slot (i) != SAVE_SLOT_STACK; i++);
> +  for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
> +  cfun_frame_layout.first_save_gpr = (i == 16) ? -1 : i;
> +  cfun_frame_layout.last_save_gpr = (i == 16) ? -1 : j;

Dominik just made me aware of this stupid copy and paste bug which made me 
ending up with the very
same loops twice :(
I've committed the attached patch to fix this:

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1667c11..2cf7096 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -9326,10 +9326,6 @@ s390_register_info_set_ranges ()
   for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
   cfun_frame_layout.first_restore_gpr = (i == 16) ? -1 : i;
   cfun_frame_layout.last_restore_gpr = (i == 16) ? -1 : j;
-
-  /* Now the range of GPRs which need saving.  */
-  for (i = 0; i < 16 && cfun_gpr_save_slot (i) != SAVE_SLOT_STACK; i++);
-  for (j = 15; j > i && cfun_gpr_save_slot (j) != SAVE_SLOT_STACK; j--);
   cfun_frame_layout.first_save_gpr = (i == 16) ? -1 : i;
   cfun_frame_layout.last_save_gpr = (i == 16) ? -1 : j;
 }

Bye,

-Andreas-



Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener
On Fri, 5 Feb 2016, Richard Biener wrote:

> 
> The following patch fixes the performance regression for 435.gromacs
> on x86_64 with AVX2 (Haswell or bdver2) caused by
> 
> 2015-12-18  Andreas Krebbel  
> 
>   * ira.c (ira_setup_alts): Move the scan for commutative modifier
>   to the first loop to make it work even with disabled alternatives.
> 
> which in itself is a desirable change giving the RA more freedom.
> 
> It turns out the fix makes an existing issue more severe in detecting
> more swappable alternatives and thus exiting ira_setup_alts with
> operands swapped in recog_data.  This seems to give a slight preference
> to choose alternatives with the operands swapped (I didn't try to
> investigate how IRA handles the "merged" alternative mask and
> operand swapping in its further processing).
> 
> Of course previous RTL optimizers and canonicalization rules as well
> as backend patterns are tuned towards the not swapped variant and thus
> it happens doing more swaps ends up in slower code (I didn't closely
> investigate).
> 
> So I tested the following patch which simply makes sure that
> ira_setup_alts does not alter recog_data.
> 
> On a Intel Haswell machine I get (base is with the patch, peak is with
> the above change reverted):
> 
>   Estimated   
> Estimated
> Base Base   BasePeak Peak   Peak
> Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
> -- --  -  ---  -  
> -
> 435.gromacs  7140264   27.1 S7140270   
> 26.5 S
> 435.gromacs  7140264   27.1 *7140269   
> 26.6 S
> 435.gromacs  7140263   27.1 S7140269   
> 26.5 *
> ==
> 435.gromacs  7140264   27.1 *7140269   
> 26.5 *
> 
> which means the patched result is even better than before Andreas
> change.  Current trunk homes in at a Run Time of 321s (which is
> the regression to fix).
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu, ok for trunk?

It fails

FAIL: gcc.target/i386/addr-sel-1.c scan-assembler b+1

on i?86 (or x86_64 -m32) though, generating

f:
.LFB0:
.cfi_startproc
movl4(%esp), %eax
leal1(%eax), %edx
movsbl  a+1(%eax), %eax
movsbl  b(%edx), %edx
addl%edx, %eax
ret

before IRA we have

(insn 6 3 8 2 (parallel [
(set (reg:SI 87 [ _2 ])
(plus:SI (reg/v:SI 93 [ i ])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
]) 
/space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.target/i386/addr-sel-1.c:13 
218 {*addsi_1}
 (expr_list:REG_DEAD (reg/v:SI 93 [ i ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 8 6 10 2 (set (reg:SI 96)
(sign_extend:SI (mem/j:QI (plus:SI (reg:SI 87 [ _2 ])
(symbol_ref:SI ("a") [flags 0x2]  )) [0 a S1 A8]))) 
/space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.target/i386/addr-sel-1.c:13 
151 {extendqisi2}
 (nil))
(insn 10 8 11 2 (set (reg:SI 98)
(sign_extend:SI (mem/j:QI (plus:SI (reg:SI 87 [ _2 ])
(symbol_ref:SI ("b") [flags 0x2]  )) [0 b S1 A8]))) 
/space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.target/i386/addr-sel-1.c:13 
151 {extendqisi2}
 (expr_list:REG_DEAD (reg:SI 87 [ _2 ])

where I wonder if we can use a single insn why we didn't combine/forwprop
to this before RA.  Probably non-single-use issue (though I thought
forwprop doesn't have this limitation).

Without the patch it is postreload that manages to combine them.
After the patch we allocate dx so that

(insn 10 8 11 2 (set (reg:SI 1 dx [98])
(sign_extend:SI (mem/j:QI (plus:SI (reg:SI 1 dx [orig:87 _2 ] 
[87])
(symbol_ref:SI ("b") [flags 0x2]  )) [0 b S1 A8]))) 
/space/rguenther/src/svn/trunk3/gcc/testsuite/gcc.target/i386/addr-sel-1.c:13 
151 {extendqisi2}
 (nil))

which seems to confuse that transform enough to not carry it out
(reload_combine).  Hopefully somebody else can help here in a more
efficient way than I could do.

IMHO it shouldn't block the patch if that looks good itself (I think
it's sound).  I'll have a look at the reload_combine issue early
next week if nobody beats me to it.  PR69689.

Otherwise bootstrap / testing went ok and a full SPEC 2k6 run doesn't
show any regressions.

Richard.

> Thanks,
> Richard.
> 
> 2016-02-05   Richard Biener  
> 
>   PR rtl-optimization/69274
>   * ira.c (ira_setup_alts): Do not change recog_data.operand
>   order.
> 
> Index: gcc/ira.c
> ===
> --- gcc/ira.c (revision 231814)
> +++ gcc/ira.c (working copy)
> @@ -1888,10 

Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener
On Fri, 5 Feb 2016, Jakub Jelinek wrote:

> On Fri, Feb 05, 2016 at 01:10:26PM +0100, Richard Biener wrote:
> > Otherwise bootstrap / testing went ok and a full SPEC 2k6 run doesn't
> > show any regressions.
> 
> Any improvements there?

Noise, but I only did 1 run (I did 3 only for 435.gromacs to confirm
the progress over pre-Andreas-patch state which I would otherwise
classified as noise as well).

I can do a 3-run over the weekend if you think that's useful.  OTOH
checking the patch in will get more benchmark/flag/HW covering
from our auto-testers (which also only do 1-runs of course and will
see unrelated changes as well).

Richard.


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener
On Fri, 5 Feb 2016, Bernd Schmidt wrote:

> On 02/05/2016 01:10 PM, Richard Biener wrote:
> > It fails
> > 
> > FAIL: gcc.target/i386/addr-sel-1.c scan-assembler b+1
> > 
> > on i?86 (or x86_64 -m32) though, generating
> > 
> > f:
> > .LFB0:
> >  .cfi_startproc
> >  movl4(%esp), %eax
> >  leal1(%eax), %edx
> >  movsbl  a+1(%eax), %eax
> >  movsbl  b(%edx), %edx
> >  addl%edx, %eax
> >  ret
> 
> Well, it looks like the first movsbl load clobbers the potentially better base
> register, so trivial propagation doesn't work.
> 
> It might be another case where allowing 2->2 in combine would help. Or
> enabling -frename-registers and rerunning reload_combine afterwards.

Before postreload we have

(insn 6 21 8 2 (parallel [
(set (reg:SI 1 dx [orig:87 _2 ] [87])
(plus:SI (reg:SI 0 ax [99])
(const_int 1 [0x1])))
(clobber (reg:CC 17 flags))
])
(insn 8 6 10 2 (set (reg:SI 0 ax [96])
(sign_extend:SI (mem/j:QI (plus:SI (reg:SI 1 dx [orig:87 _2 ] 
[87])
(symbol_ref:SI ("a") [flags 0x2]  )) [0 a S1 A8])))
(insn 10 8 11 2 (set (reg:SI 1 dx [98])
(sign_extend:SI (mem/j:QI (plus:SI (reg:SI 1 dx [orig:87 _2 ] 
[87])
(symbol_ref:SI ("b") [flags 0x2]  )) [0 b S1 A8])))

so indeed the issue is not dx dieing in insn 10 but ax dieing in insn 8...

Maybe LRA can prefer to not do that if enough free registers are
available (that is, never re-use a register)?

With the above observation it seems less likely we can fix this
regression.  Should I continue to find a less invasive fix like
by computing 'commutative' as it were before Andreas patch as well
and only if they disagree (thus Andreas patch introduced extra
swapping in recog_data) swap back?

The variant below fixes the testcase but I have to check if it
also fixes the 416.gamess regression (I would guess so).

The patch is of course quite arbitrary, basically reverting the
operand-swapping part of Andreas patch.

Thanks,
Richard.

Index: gcc/ira.c
===
--- gcc/ira.c   (revision 233172)
+++ gcc/ira.c   (working copy)
@@ -1774,7 +1774,7 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
   int nop, nalt;
   bool curr_swapped;
   const char *p;
-  int commutative = -1;
+  int commutative = -1, alt_commutative = -1;
 
   extract_insn (insn);
   alternative_mask preferred = get_preferred_alternatives (insn);
@@ -1838,6 +1838,8 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
  
  case '%':
/* The commutative modifier is handled above.  */
+   if (alt_commutative < 0)
+ alt_commutative = nop;
break;
 
  case '0':  case '1':  case '2':  case '3':  case '4':
@@ -1889,10 +1891,13 @@ ira_setup_alts (rtx_insn *insn, HARD_REG
}
   if (commutative < 0)
break;
+  /* Swap forth and back to avoid changing recog_data.  */
+  if (! curr_swapped
+ || alt_commutative < 0)
+   std::swap (recog_data.operand[commutative],
+  recog_data.operand[commutative + 1]);
   if (curr_swapped)
break;
-  std::swap (recog_data.operand[commutative],
-recog_data.operand[commutative + 1]);
 }
 }
 


Re: [PATCH] Fix PR c++/68948 (wrong code generation due to invalid constructor call)

2016-02-05 Thread Jason Merrill

On 02/05/2016 07:54 AM, Patrick Palka wrote:

On Thu, 4 Feb 2016, Patrick Palka wrote:


The compiler correctly detects and diagnoses invalid constructor calls
such as C::C () in a non-template context but it fails to do so while
processing a class template.  [ Section 3.4.3.1 of the standard is what
makes these forms of constructor calls illegal -- see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68948#c9  ]

In a non-template context this diagnosis would take place in
build_new_method_call, called from finish_call_expr, but while
processing a class template we may exit early out of finish_call_expr
and never call build_new_method_call.

Thus we never diagnose this invalid constructor call during template
processing.  So then during instantiation of the enclosing template we
call tsubst_baselink on this constructor call, during which the call to
lookup_fnfields returns NULL (because it finds the injected class type C
not the constructor C).  Because the lookup failed, tsubst_baselink
returns error_mark_node and this node persists all the way through to
gimplification where it silently gets discarded.

This patch fixes this problem by diagnosing these invalid constructor
calls in tsubst_baselink.  Alternatively, we can rewire finish_call_expr
avoid exiting early while processing a template if the call in question
is a constructor call.  I'm not sure which approach is better.  This
approach seems more conservative, since it's just attaching an error
message to an existing error path.


And here is the other approach, which rewires finish_call_expr:


I like the second approach better, but you're right that the first is 
more conservative, so let's go with the first for GCC 6 and switch to 
the second for GCC 7.


Jason



Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-05 Thread Joern Wolfgang Rennecke
P.S.: if code that is missing prototypes for stdarg functions is of no 
concern, there is another ABI
alternative that might give good code density for architectures like ARC 
that have pre-decrement

addressing modes and allow immediates to be pushed:

You could put all unnamed arguments on the stack (thus simplifying 
varargs processing), and

leave all registers not used for argument passing call-saved.
Thus, the callers wouldn't have to worry about saving these registers or 
reloading their values

from the stack.

For gcc, this would require making the call fusage really work - 
probably involving a hook to tell
the middle-end that the port really wants that - or a kludge to make 
affected call insn not look

like call insns, similar to the sfuncs.


Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations

2016-02-05 Thread Prathamesh Kulkarni
On 4 February 2016 at 16:31, Ramana Radhakrishnan
 wrote:
> On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni
>  wrote:
>> On 31 July 2015 at 15:04, Ramana Radhakrishnan
>>  wrote:
>>>
>>>
>>> On 29/07/15 11:09, Prathamesh Kulkarni wrote:
 Hi,
 This patch tries to implement division with multiplication by
 reciprocal using vrecpe/vrecps
 with -funsafe-math-optimizations and -freciprocal-math enabled.
 Tested on arm-none-linux-gnueabihf using qemu.
 OK for trunk ?

 Thank you,
 Prathamesh

>>>
>>> I've tried this in the past and never been convinced that 2 iterations are 
>>> enough to get to stability with this given that the results are only 
>>> precise for 8 bits / iteration. Thus I've always believed you need 3 
>>> iterations rather than 2 at which point I've never been sure that it's 
>>> worth it. So the testing that you've done with this currently is not enough 
>>> for this to go into the tree.
>>>
>>> I'd like this to be tested on a couple of different AArch32 implementations 
>>> with a wider range of inputs to verify that the results are acceptable as 
>>> well as running something like SPEC2k(6) with atleast one iteration to 
>>> ensure correctness.
>> Hi,
>> I got results of SPEC2k6 fp benchmarks:
>> a15: +0.64% overall, 481.wrf: +6.46%
>> a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76%
>> a57: +0.35% overall, 481.wrf: +3.84%
>> The other benchmarks had (almost) identical results.
>
> Thanks for the benchmarking results -  Please repost the patch with
> the changes that I had requested in my previous review - given it is
> now stage4 , I would rather queue changes like this for stage1 now.
Hi,
Please find the updated patch attached.
It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and
arm-none-eabi.
However the test-case added in the patch (neon-vect-div-1.c) fails to
get vectorized at -O2
for armeb-none-linux-gnueabihf.
Charles suggested me to try with -O3, which worked.
It appears the test-case fails to get vectorized with
-fvect-cost-model=cheap (which is default enabled at -O2)
and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic

I can't figure out why it fails -fvect-cost-model=cheap.
From the vect dump (attached):
neon-vect-div-1.c:12:3: note: Setting misalignment to -1.
neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9

Thanks,
Prathamesh
>
> Thanks,
> Ramana
>
>>
>> Thanks,
>> Prathamesh
>>>
>>>
>>> moving on to the patches.
>>>
 diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
 index 654d9d5..28c2e2a 100644
 --- a/gcc/config/arm/neon.md
 +++ b/gcc/config/arm/neon.md
 @@ -548,6 +548,32 @@
  (const_string "neon_mul_")))]
  )

>>>
>>> Please add a comment here.
>>>
 +(define_expand "div3"
 +  [(set (match_operand:VCVTF 0 "s_register_operand" "=w")
 +(div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w")
 +   (match_operand:VCVTF 2 "s_register_operand" "w")))]
>>>
>>> I want to double check that this doesn't collide with Alan's patches for 
>>> FP16 especially if he reuses the VCVTF iterator for all the vcvt f16 cases.
>>>
 +  "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal_math"
 +  {
 +rtx rec = gen_reg_rtx (mode);
 +rtx vrecps_temp = gen_reg_rtx (mode);
 +
 +/* Reciprocal estimate */
 +emit_insn (gen_neon_vrecpe (rec, operands[2]));
 +
 +/* Perform 2 iterations of Newton-Raphson method for better accuracy 
 */
 +for (int i = 0; i < 2; i++)
 +  {
 + emit_insn (gen_neon_vrecps (vrecps_temp, rec, operands[2]));
 + emit_insn (gen_mul3 (rec, rec, vrecps_temp));
 +  }
 +
 +/* We now have reciprocal in rec, perform operands[0] = operands[1] * 
 rec */
 +emit_insn (gen_mul3 (operands[0], operands[1], rec));
 +DONE;
 +  }
 +)
 +
 +
  (define_insn "mul3add_neon"
[(set (match_operand:VDQW 0 "s_register_operand" "=w")
  (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" 
 "w")
 diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c 
 b/gcc/testsuite/gcc.target/arm/vect-div-1.c
 new file mode 100644
 index 000..e562ef3
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c
 @@ -0,0 +1,14 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target arm_v8_neon_ok } */
 +/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize 
 -fdump-tree-vect-all" } */
 +/* { dg-add-options arm_v8_neon } */
>>>
>>> No this is wrong.
>>>
>>> What is armv8 specific about this test ? This is just like another test 
>>> that is for Neon. vrecpe / vrecps are not instructions that were introduced 
>>> in the v8 version of the 

[PATCH: RL78] Optimize libgcc routines using clrw and clrb

2016-02-05 Thread Kaushik Phatak
Hi,
Please find below a simple patch which optimizes the loading of immediate value 
by using the clrw or clrb 
instruction in case a 0x00 is being loaded into the register.
The patch replaces movw/mov instruction with the smaller clrw/clrb instruction.
The clrw and clrb generates only 1 byte of opcode as compared to 3 or 2 bytes 
for movw and mov.

There is a total of about 94 bytes code size improvement with this patch in 
these libgcc routines.

The following routines have improved code size,
___mulsi3   : 2 bytes
___divsi3   : 20 bytes
___modsi3   : 20 bytes
___divhi3   : 10 bytes
___modhi3   : 10 bytes
___parityqi_internal : 2 bytes
__int_cmpsf : 2 bytes
___fixsfsi  : 5 bytes
___fixunssfsi : 2 bytes
___floatsisf  : 6 bytes
_int_unpack_sf : 1 bytes
___addsf3 : 5 bytes
__rl78_int_pack_a_r8 : 2 bytes
___mulsf3  : 2 bytes
___divsf3  : 3 bytes
__gcc_bcmp :  2 bytes


I have also attached a draft version of a similar patch 
(rl78_libgcc_optimize_draft.patch), which goes further and 
removes movw immediate to other saddr registers and replaces them with 2 
instructions, i.e.
 START_FUNC ___modhi3
;; r8 = 4[sp] % 6[sp]
-   movwde, #0
+   clrwax
+   movwde,ax
mov a, [sp+5]

This patch improves code size by 1 byte for each such substitution, however 
does add an extra clock cycle.

We may consider this patch in case we are purely looking for code size 
improvement, assuming the libraries
are built with -Os. This shows a total of 134 bytes improvement in code size.

Patch1: rl78_libgcc_optimize_clrw.patch - 94 bytes improvement in code size.
Patch2: rl78_libgcc_optimize_draft.patch - 134 bytes improvement in code size.

Kindly review this patch and let me know what you think.
This is regression tested for rl78 -msim.

Best Regards,
Kaushik

p.s. Kindly ignore any disclaimers at end of this e-mail as they are 
auto-inserted.
Apologies for the same.

2016-02-05  Kaushik Phatak 

* config/rl78/bit-count.S: Use clrw/clrb where possible.
* config/rl78/cmpsi2.S: Likewise.
* config/rl78/divmodhi.S Likewise.
* config/rl78/divmodsi.S Likewise.
* config/rl78/fpbit-sf.S Likewise.
* config/rl78/fpmath-sf.S Likewise.
* config/rl78/mulsi3.S Likewise.

Index: libgcc/config/rl78/bit-count.S
===
--- libgcc/config/rl78/bit-count.S  (revision 3174)
+++ libgcc/config/rl78/bit-count.S  (working copy)
@@ -139,7 +139,7 @@
xor1cy, a.5
xor1cy, a.6
xor1cy, a.7
-   movwax, #0
+   clrwax
bnc $1f
incwax
 1:
@@ -190,7 +190,7 @@
movwax, sp
addwax, #4
movwhl, ax
-   mov a, #0
+   clrba
 1:
xch a, b
mov a, [hl]
@@ -207,7 +207,7 @@
bnz $1b
 
mov x, a
-   mov a, #0
+   clrba
movwr8, ax
ret 
 END_FUNC   ___popcountqi_internal
Index: libgcc/config/rl78/cmpsi2.S
===
--- libgcc/config/rl78/cmpsi2.S (revision 3174)
+++ libgcc/config/rl78/cmpsi2.S (working copy)
@@ -162,8 +162,8 @@
 
;; They differ.  Subtract *S2 from *S1 and return as the result.
mov x, a
-   mov a, #0
-   mov r9, #0
+   clrba
+   clrbr9
subwax, r8
 1:
movwr8, ax
Index: libgcc/config/rl78/divmodhi.S
===
--- libgcc/config/rl78/divmodhi.S   (revision 3174)
+++ libgcc/config/rl78/divmodhi.S   (working copy)
@@ -576,7 +576,7 @@
 
 .macro NEG_AX
movwhl, ax
-   movwax, #0
+   clrwax
subwax, [hl]
movw[hl], ax
 .endm
Index: libgcc/config/rl78/divmodsi.S
===
--- libgcc/config/rl78/divmodsi.S   (revision 3174)
+++ libgcc/config/rl78/divmodsi.S   (working copy)
@@ -952,10 +952,10 @@
 
 .macro NEG_AX
movwhl, ax
-   movwax, #0
+   clrwax
subwax, [hl]
movw[hl], ax
-   movwax, #0
+   clrwax
sknc
decwax
subwax, [hl+2]
Index: libgcc/config/rl78/fpbit-sf.S
===
--- libgcc/config/rl78/fpbit-sf.S   (revision 3174)
+++ libgcc/config/rl78/fpbit-sf.S   (working copy)
@@ -117,7 +117,7 @@
call$!__int_iszero
bnz $2f
;; At this point, both args are zero.
-   mov a, #0
+   clrba
ret
 
 2:
@@ -151,7 +151,7 @@
bc  $ybig_cmpsf ; branch if X < Y
bnz $xbig_cmpsf ; branch if X > Y
 
-   mov a, #0
+   clrba
ret
 
 xbig_cmpsf:   

Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Bernd Schmidt

On 02/05/2016 01:42 PM, Richard Biener wrote:

so indeed the issue is not dx dieing in insn 10 but ax dieing in insn 8...

Maybe LRA can prefer to not do that if enough free registers are
available (that is, never re-use a register)?


Maybe, but at this stage that will probably also have some unpredictable 
random effects. Essentially it sounds like the gromacs regression was 
one of these where we just got unlucky.
It might help to know exactly how the gromacs slowdown occurred, in case 
there's another way to fix it. Maybe you can add a dbgcnt to your patch 
to help pinpoint the area.



With the above observation it seems less likely we can fix this
regression.  Should I continue to find a less invasive fix like
by computing 'commutative' as it were before Andreas patch as well
and only if they disagree (thus Andreas patch introduced extra
swapping in recog_data) swap back?


I wouldn't really worry about it all that much, but I also think it 
would be good to know more precisely what went wrong for gromacs.



Bernd


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener
On Fri, 5 Feb 2016, Jakub Jelinek wrote:

> On Fri, Feb 05, 2016 at 01:35:03PM +0100, Richard Biener wrote:
> > On Fri, 5 Feb 2016, Jakub Jelinek wrote:
> > 
> > > On Fri, Feb 05, 2016 at 01:10:26PM +0100, Richard Biener wrote:
> > > > Otherwise bootstrap / testing went ok and a full SPEC 2k6 run doesn't
> > > > show any regressions.
> > > 
> > > Any improvements there?
> > 
> > Noise, but I only did 1 run (I did 3 only for 435.gromacs to confirm
> > the progress over pre-Andreas-patch state which I would otherwise
> > classified as noise as well).
> 
> And on that single run gromacs was non-noise?

Well, it was like 264s vs. 269s but it looked good so I tried to
confirm progress independent of Andreas change ;)  You can look
at the usual variance of the machine here:
http://gcc.opensuse.org/SPEC/CFP/sb-czerny-head-64-2006/recent.html
(that's of course including source changes for each dot).

So yes, in the full 1-run I account +- 5s (out of ~260) as noise.

Note that all SPEC tests ran ontop of r23181[34].

> > I can do a 3-run over the weekend if you think that's useful.  OTOH
> > checking the patch in will get more benchmark/flag/HW covering
> > from our auto-testers (which also only do 1-runs of course and will
> > see unrelated changes as well).
> 
> Maybe that is good enough.  Anyway, the patch is Vlad's area of expertise...

Yes, and we need to decide on that fallout and if we should go for the
more ugly variant (it also fixes the gromacs regression for me).

Richard.


Re: [PATCH] Fix PR69274, 435.gromacs performance regression due to RA

2016-02-05 Thread Richard Biener
vfnmadd132ss%xmm10, %xmm0, %xmm2
vmovss  %xmm2, 16(%rax)
vmovd   %edx, %xmm6
vsubss  %xmm5, %xmm6, %xmm5
vsubss  %xmm11, %xmm5, %xmm11
vmovss  %xmm11, -4(%rbp)
addq$4, %r8
cmpq%r15, %r8
jne .L4
movq%r14, %rbp
vmovss  -52(%rsp), %xmm6
vaddss  -48(%rsp), %xmm6, %xmm8
vmovss  -44(%rsp), %xmm6
vaddss  -40(%rsp), %xmm6, %xmm2
vmovss  -36(%rsp), %xmm6
vaddss  -32(%rsp), %xmm6, %xmm0
vmovss  -52(%rsp), %xmm6
.L3:
leaq(%r11,%r12,4), %rax
vaddss  -12(%rax), %xmm6, %xmm12
vmovss  %xmm12, -12(%rax)
vmovss  -44(%rsp), %xmm6
vaddss  -8(%rax), %xmm6, %xmm11
vmovss  %xmm11, -8(%rax)
vmovss  -36(%rsp), %xmm6
vaddss  -4(%rax), %xmm6, %xmm5
vmovss  %xmm5, -4(%rax)
vmovss  -48(%rsp), %xmm6
vaddss  (%rax), %xmm6, %xmm11
vmovss  %xmm11, (%rax)
vmovss  -40(%rsp), %xmm6
vaddss  4(%rax), %xmm6, %xmm11
vmovss  %xmm11, 4(%rax)
leaq(%r11,%rbx,4), %rdx
vmovss  -32(%rsp), %xmm6
vaddss  -4(%rdx), %xmm6, %xmm11
vmovss  %xmm11, -4(%rdx)
vmovss  -28(%rsp), %xmm6
vaddss  12(%rax), %xmm6, %xmm1
vmovss  %xmm1, 12(%rax)
vmovss  -24(%rsp), %xmm5
vaddss  16(%rax), %xmm5, %xmm1
vmovss  %xmm1, 16(%rax)
leaq(%r11,%rdi,4), %rax
vaddss  -4(%rax), %xmm3, %xmm1
vmovss  %xmm1, -4(%rax)
movq216(%rsp), %rax
movq104(%rsp), %rbx
leaq(%rax,%rbx,4), %rax
vaddss  %xmm6, %xmm8, %xmm8
vaddss  (%rax), %xmm8, %xmm8
vmovss  %xmm8, (%rax)
vaddss  %xmm5, %xmm2, %xmm2
vaddss  4(%rax), %xmm2, %xmm2
vmovss  %xmm2, 4(%rax)
movq216(%rsp), %rax
leaq(%rax,%r9,4), %rax
vaddss  %xmm3, %xmm0, %xmm6
vaddss  -4(%rax), %xmm6, %xmm6
vmovss  %xmm6, -4(%rax)
movq224(%rsp), %rax
movslq  (%rax,%rbp,4), %rax
salq$2, %rax
movq%rax, %rdx
addq264(%rsp), %rdx
vmovss  -16(%rsp), %xmm6
vaddss  (%rdx), %xmm6, %xmm7
vmovss  %xmm7, (%rdx)
addq296(%rsp), %rax
vmovss  -12(%rsp), %xmm6
vaddss  (%rax), %xmm6, %xmm0
vmovss  %xmm0, (%rax)
addq$1, %rbp
cmpl%ebp, 116(%rsp)
jne .L6
.L9:
addq$160, %rsp
.cfi_remember_state
.cfi_def_cfa_offset 56
popq%rbx
.cfi_def_cfa_offset 48
popq%rbp
.cfi_def_cfa_offset 40
popq%r12
.cfi_def_cfa_offset 32
popq%r13
.cfi_def_cfa_offset 24
popq%r14
.cfi_def_cfa_offset 16
popq%r15
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L7:
.cfi_restore_state
vxorps  %xmm0, %xmm0, %xmm0
vmovaps %xmm0, %xmm2
vmovaps %xmm0, %xmm8
vmovss  %xmm0, -12(%rsp)
vmovss  %xmm0, -16(%rsp)
vmovss  %xmm0, -20(%rsp)
vmovss  %xmm0, -32(%rsp)
vmovss  %xmm0, -36(%rsp)
vmovss  %xmm0, -24(%rsp)
vmovss  %xmm0, -40(%rsp)
vmovss  %xmm0, -44(%rsp)
vmovss  %xmm0, -28(%rsp)
vmovss  %xmm0, -48(%rsp)
vmovss  %xmm0, -52(%rsp)
movslq  %r8d, %rax
movq%rax, 104(%rsp)
vmovaps %xmm0, %xmm6
vmovaps %xmm0, %xmm3
jmp .L3
.cfi_endproc
.LFE0:
.size   inl1130_, .-inl1130_
.section.rodata.cst4,"aM",@progbits,4
.align 4
.LC1:
.long   3225419776
.align 4
.LC2:
.long   3204448256
.align 4
.LC3:
    .long   1086324736
.align 4
.LC4:
.long   1094713344
.ident  "GCC: (GNU) 6.0.0 20160205 (experimental) [trunk revision 
233136]"
.section.note.GNU-stack,"",@progbits
.file   "innerf.f"
.text
.p2align 4,,15
.globl  inl1130_
.type   inl1130_, @function
inl1130_:
.LFB0:
.cfi_startproc
pushq   %r15
.cfi_def_cfa_offset 16
.cfi_offset 15, -16
pushq   %r14
.cfi_def_cfa_offset 24
.cfi_offset 14, -24
pushq   %r13
.cfi_def_cfa_offset 32
.cfi_offset 13, -32
pushq   %r12
.cfi_def_cfa_offset 40
.cfi_offset 12, -40
pushq   %rbp
.cfi_def_cfa_offset 48
.cfi_offset 6, -48
pushq   %rbx
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
subq$160, %rsp
.cfi_def_cfa_offset 216
movq%rsi, 120(%rsp)
movq%rdx, 128(%rsp)
movq%rcx, 144(%rsp)
movq%r8, 136(%rsp)
movq%r9, 152(%rsp)
movq232(%rsp), %r9
movq240(%rsp), %r10
movsl

Re: [PATCH] tweak -Wplacement-new to fix #69662

2016-02-05 Thread Jason Merrill

On 02/04/2016 04:35 PM, Martin Sebor wrote:

On 02/04/2016 02:10 PM, Jason Merrill wrote:

On 02/04/2016 03:22 PM, Martin Sebor wrote:

+  /* Refers to the declared object that constains the subobject
referenced
+ by OPER.  When the object is initialized, makes it possible to
determine
+ the actual size of a flexible array member used as the buffer
passed
+ as OPER to placement new.  */
+  tree var_decl = NULL_TREE;


This doesn't make sense to me.  There should never be variables of types
with flexible array members; such types should be allocated with malloc
or the equivalent, along with enough extra space for the trailing array.
  Indeed, the C compiler gives an error for initialization of a flexible
array member.


Gcc (in C mode) accepts, as an extension, statically initialized
objects of structs with flexible array members.  It rejects them
in other contexts (such as auto variables).


That seems right.


G++ accepts initialized objects of structs with flexible array
members in all contexts.  I don't know for sure if this difference
is intended or accidental but since C++ allows more safe constructs
than C does, and I can't think of anything wrong with allowing it,
the patch handles this case.  If we decide that initializing such
structs is a bad idea in C++ the special handling can be removed.

FWIW, I've been opening bugs for problems in this area as I find
them hoping to not just document them but also get a more complete
understanding of what's allowed as an extension and what's likely
a bug (and why), and eventually submit fixes.

For example, 68489 points out a problem with allowing arrays of
flexible array members.  Since there's no good way to fix that
problem such arrays should be rejected.  But I haven't been able
to come up with a reason why individual automatic structs with
flexible array members should be disallowed.  If there is such
a reason, we can start diagnosing them.  But I would be reluctant
to start outright rejecting them if they're potentially useful
because it could break working programs.


struct A
{
  int i, ar[];
};

int main()
{
  int k = 24;
  struct A a = { 1, 2, 3, 4 };
  int j = 42;
  return a.ar[1];
}

G++ accepts this testcase and happily puts k and j in the same stack 
slots as elements of a.ar[], while GCC rejects it.  We shouldn't accept 
it.  In any case, we should usually follow the C front end's lead on 
this C compatibility feature.


Jason



[PATCH] PR rtl-optimization/64081: Enable RTL loop unrolling for duplicated exit blocks and back edges.

2016-02-05 Thread Alexander Fomin
Hi!

Some kind of this patch was submitted about a year ago by Igor
Zamyatin. It's an attempt to fix PR rtl-optimization/64081 by enabling
RTL loop unrolling for duplicated exit blocks and back edges.

At the time it caused AIX bootstrap failure, but now it's OK according
to David's testing. I've also bootstrapped and regtested it on
x86_64-linux-gnu.

Is it still OK for trunk now, or you consider this v7 stuff?
Anyway, it's a regression.

Thanks,
Alexander
---
gcc/

PR rtl-optimization/64081
* loop-iv.c (def_pred_latch_p): New function.
(latch_dominating_def): Allow specific cases with non-single
definitions.
(iv_get_reaching_def): Likewise.
(check_complex_exit_p): New function.
(check_simple_exit): Use check_complex_exit_p to allow certain cases
with exits not executing on any iteration.

gcc/testsuite

PR rtl-optimization/64081
* gcc.dg/pr64081.c: New test.
---
 gcc/loop-iv.c  | 114 -
 gcc/testsuite/gcc.dg/pr64081.c |  40 +++
 2 files changed, 142 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr64081.c

diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
index fecaf8f..d7d3e0d 100644
--- a/gcc/loop-iv.c
+++ b/gcc/loop-iv.c
@@ -289,9 +289,27 @@ iv_analysis_loop_init (struct loop *loop)
   check_iv_ref_table_size ();
 }
 
+/* Checks that def is in an immediate predecessor of the latch block.  */
+
+static bool
+def_pred_latch_p (df_ref d_ref)
+{
+  basic_block bb = DF_REF_BB (d_ref);
+  edge_iterator ei;
+  edge e;
+
+  FOR_EACH_EDGE (e, ei, current_loop->latch->preds)
+{
+  if (e->src == bb)
+   return true;
+}
+  return false;
+}
+
 /* Finds the definition of REG that dominates loop latch and stores
it to DEF.  Returns false if there is not a single definition
-   dominating the latch.  If REG has no definition in loop, DEF
+   dominating the latch or all defs are same and they are on different
+   predecessors of loop latch.  If REG has no definition in loop, DEF
is set to NULL and true is returned.  */
 
 static bool
@@ -303,15 +321,28 @@ latch_dominating_def (rtx reg, df_ref *def)
 
   for (adef = DF_REG_DEF_CHAIN (regno); adef; adef = DF_REF_NEXT_REG (adef))
 {
+  /* Initialize this to true for the very first iteration when
+SINGLE_RD is NULL.  */
+  bool def_pred_latch = true;
+
   if (!bitmap_bit_p (df->blocks_to_analyze, DF_REF_BBNO (adef))
  || !bitmap_bit_p (_info->out, DF_REF_ID (adef)))
continue;
 
-  /* More than one reaching definition.  */
-  if (single_rd)
+  /* More than one reaching definition is ok in case definitions are
+in predecessors of latch block and those definitions are the same.
+Probably this could be relaxed and check for sub-dominance instead
+predecessor.  */
+  if (single_rd
+ && (!(def_pred_latch = def_pred_latch_p (adef))
+ || !rtx_equal_p( PATTERN (DF_REF_INSN (single_rd)),
+  PATTERN (DF_REF_INSN (adef)
return false;
 
-  if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef)))
+  /* It's ok if def is not executed on each iteration once we have defs on
+latch's predecessors.  */
+  if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef))
+ && !def_pred_latch)
return false;
 
   single_rd = adef;
@@ -326,10 +357,10 @@ latch_dominating_def (rtx reg, df_ref *def)
 static enum iv_grd_result
 iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 {
-  df_ref use, adef;
+  df_ref use, adef = NULL;
   basic_block def_bb, use_bb;
   rtx_insn *def_insn;
-  bool dom_p;
+  bool dom_p, dom_latch_p = false;
 
   *def = NULL;
   if (!simple_reg_p (reg))
@@ -344,11 +375,26 @@ iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
   if (!DF_REF_CHAIN (use))
 return GRD_INVARIANT;
 
-  /* More than one reaching def.  */
+  adef = DF_REF_CHAIN (use)->ref;
+  /* Having more than one reaching def is ok once all defs are in blocks which
+ are latch's predecessors.  */
   if (DF_REF_CHAIN (use)->next)
-return GRD_INVALID;
+{
+  df_link* defs;
+  unsigned int def_num = 0;
 
-  adef = DF_REF_CHAIN (use)->ref;
+  for (defs = DF_REF_CHAIN (use); defs; defs = defs->next)
+   {
+ if (!def_pred_latch_p (defs->ref))
+   return GRD_INVALID;
+ def_num++;
+   }
+  /* Make sure all preds contain definitions.  */
+  if (def_num != EDGE_COUNT (current_loop->latch->preds))
+   return GRD_INVALID;
+
+  dom_latch_p = true;
+}
 
   /* We do not handle setting only part of the register.  */
   if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
@@ -371,8 +417,8 @@ iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def)
 
   /* The definition does not dominate the use.  This is still OK if
  this may be a use of a biv, i.e. if the def_bb 

Re: [PATCH, i386, AVX512] Adjust expected result for kunpackb intrinsic in avx512f-klogic-2 test.

2016-02-05 Thread Kirill Yukhin
Hi Sasha,

On 04 Feb 17:59, Alexander Fomin wrote:
> OK for trunk and 5-branch?
Patch is OK for main trunk and release branches.

(IMHO, pretty much obvious).

--
Thanks, K


Re: [Patch, testsuite] Require int32 target support in sso tests

2016-02-05 Thread Eric Botcazou
> 2016-02-04  Senthil Kumar Selvaraj  
> 
>   * gcc/testsuite/gcc.dg/sso/p1.c: Add dg-require-effective-target int32.
>   * gcc/testsuite/gcc.dg/sso/p2.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/p3.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/p5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/p6.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/p7.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/p8.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q1.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q2.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q3.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q6.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q7.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/q8.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/r3.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/r5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/r6.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/r7.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/r8.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/s3.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/s5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/s6.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/s7.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/s8.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t1.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t2.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t3.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t6.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t7.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/t8.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/u5.c: Likewise.
>   * gcc/testsuite/gcc.dg/sso/u6.c: Likewise.

I'd rather patch the driver (gcc.dg/sso/sso.exp) itself.

-- 
Eric Botcazou


New Finnish PO file for 'cpplib' (version 6.1-b20160131)

2016-02-05 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'cpplib' has been submitted
by the Finnish team of translators.  The file is available at:

http://translationproject.org/latest/cpplib/fi.po

(This file, 'cpplib-6.1-b20160131.fi.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/cpplib/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/cpplib.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[PATCH, committed] Revert r232560 to fix PR target/69369

2016-02-05 Thread Ilya Enkovich
Hi,

This patch reverts r232560 which caused multiple failures
for Pointer Bounds Checker.  Patch was bootstrapped and
regtested on x86_64-pc-linux-gnu.  Applied to trunk.

Thanks,
Ilya
--
2016-02-05  Ilya Enkovich  

PR target/69369
Revert r232560:
2016-01-19  Jan Hubicka  

* cgraphunit.c (cgraph_node::reset): Clear thunk info and
instrumented_version.


diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b95c172..2c49d7b 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -366,14 +366,12 @@ cgraph_node::reset (void)
   memset (, 0, sizeof (local));
   memset (, 0, sizeof (global));
   memset (, 0, sizeof (rtl));
-  memset (, 0, sizeof (cgraph_thunk_info));
   analyzed = false;
   definition = false;
   alias = false;
   transparent_alias = false;
   weakref = false;
   cpp_implicit_alias = false;
-  instrumented_version = NULL;
 
   remove_callees ();
   remove_all_references ();


[PATCH][P1 tree-optimization/68541] Add heuristics to path splitting

2016-02-05 Thread Jeff Law
This patch addresses multiple issues with how we determine when to split 
paths.  The primary motivation is addressing 68541 (P1).



As I've gotten testcodes from Ajit, I've been able to look closely at 
the path splitting opportunities and consistently what I've seen is that 
contrary to the original belief, CSE/DCE opportunities are rarely 
exposed by path splitting.  It seems the benefit is more due to removing 
the unconditional jump inherent in a CFG diamond -- even more so on the 
microblaze where these jumps have delay slots.


While there are cases where splitting allows for CSE/DCE, they're the 
exception rather than the rule in the codes I've looked at.


With that in mind, we want to encourage path splitting when the amount 
of code we're duplicating is small.   We also want to balance that with 
not path splitting when the original code is something that may be 
if-convertable.


The vast majority of if-convertable codes are cases where the two 
predecessors of the join block are single statement blocks with their 
results feeding the same PHI in the join block.  We now reject that case 
for path splitting so as not to spoil if-conversion.


The only wrinkle with that heuristic is that some codes (MIN, MAX, ABS, 
COND, calls, etc) are not good candidates for further if conversion. 
(ie, we have a MIN_EXPR in both predecessors that feed the PHI in the 
join).   So we still allow those cases for splitting.  This can be 
easily refined as we find more cases or as the if-converter is extended.


So with that in place as the first filter, we just need to put a limiter 
on the number of statements we allow to be copied.  I punted a bit and 
re-used the PARAM for jump threading.  They've got enough in common that 
I felt it was reasonable.  If I were to create a new PARAM, I'd probably 
start with a smaller default value after doing further instrumentation.


While I was working through this I noticed we were path splitting in 
some cases where we shouldn't.  Particularly if we had a CFG like:


  a
 / \
b   c
   / \ /
  e   d
 / \
/   \
   / \
 latch   exit


Duplicating d for the edges b->d and c->d isn't as helpful because the 
duplicate can't be merged into b.  We now detect this kind of case and 
reject it for path splitting.


The new tests are a combination of reductions of codes from Ajit and my 
own poking around.


Looking further out, I really wonder if the low level aspects of path 
splitting like we're trying to capture here really belong in the 
cross-jumping phase of the RTL optimizers.  The higher level aspects 
(when we are able to expose CSE/DCE opportunities) should be driven by 
actually looking at the propagation/simplifications enabled by a 
potential split path.  But those are gcc-7 things.


Bootstrapped and regression tested on x86 linux.  Installed on the 
trunk.  I'll obviously keep an eye out for how the tests are handled on 
other platforms -- I don't think the tests are real sensitive to branch 
costs and such, but I've been surprised before.


Onward to the jump threading code explosion wrap-up...

jeff









diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9a51133..a465156 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2016-02-05  Jeff Law  
+
+   PR tree-optimization/68541
+   * gimple-ssa-split-paths.c: Include tree-cfg.h and params.h.
+   (count_stmts_in_block): New function.
+   (poor_ifcvt_candidate_code): Likewise.
+   (is_feasible_trace): Add some heuristics to determine when path
+   splitting is profitable.
+   (find_block_to_duplicate_for_splitting_paths): Make sure the graph
+   is a diamond with a single exit.
+
 2016-02-05  Martin Sebor  
 
PR c++/69662
diff --git a/gcc/gimple-ssa-split-paths.c b/gcc/gimple-ssa-split-paths.c
index 40c099a..ac6de81 100644
--- a/gcc/gimple-ssa-split-paths.c
+++ b/gcc/gimple-ssa-split-paths.c
@@ -25,11 +25,13 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree.h"
 #include "gimple.h"
 #include "tree-pass.h"
+#include "tree-cfg.h"
 #include "cfganal.h"
 #include "cfgloop.h"
 #include "gimple-iterator.h"
 #include "tracer.h"
 #include "predict.h"
+#include "params.h"
 
 /* Given LATCH, the latch block in a loop, see if the shape of the
path reaching LATCH is suitable for being split by duplication.
@@ -79,6 +81,11 @@ find_block_to_duplicate_for_splitting_paths (basic_block 
latch)
  || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src))
return NULL;
 
+ /* And that the predecessors of BB each have a single successor.  */
+ if (!single_succ_p (EDGE_PRED (bb, 0)->src)
+ || !single_succ_p (EDGE_PRED (bb, 1)->src))
+   return NULL;
+
  /* So at this point we have a simple diamond for an IF-THEN-ELSE
 construct starting at BB_IDOM, with a join point at BB.  BB
 pass