Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote:
> >The point is warning on an alloca (0) may not be as clear cut as it
> >might seem.  It's probably a reasonable thing to do on the host, but on
> >a target, which might be embedded and explicitly overriding the builtin
> >alloca, warning on alloca (0) is less of a slam dunk.
> 
> I confess I haven't heard of such an implementation before but after
> some searching I have managed to find a few examples of it online,
> such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
> Texas Instruments.  For instance:

In the libiberty/alloca.c, I don't see anything special about alloca (0).
Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035).
alloca (0) just returns NULL after it.  The diffutils link is the same
alloca.c as in libiberty.  Returning NULL or returning a non-unique pointer
for alloca (0) is just fine, it is the same thing as returning NULL or
returning a non-unique pointer from malloc (0).  We definitely don't want
to warn for malloc (0), and IMNSHO don't want to warn for alloca (0),
because it behaves the same.

Jakub


Re: [PATCH][2/2] GIMPLE Frontend, middle-end changes

2016-11-17 Thread Richard Biener
On Thu, 17 Nov 2016, Jeff Law wrote:

> On 11/17/2016 01:20 AM, Richard Biener wrote:
> > On Wed, 16 Nov 2016, Jeff Law wrote:
> > 
> > > On 10/28/2016 05:51 AM, Richard Biener wrote:
> > > > 
> > > > These are the middle-end changes and additions to the testsuite.
> > > > 
> > > > They are pretty self-contained, I've organized the changelog
> > > > entries below in areas of changes:
> > > > 
> > > >  1) dump changes - we add a -gimple dump modifier that allows most
> > > >  function dumps to be directy fed back into the GIMPLE FE
> > > > 
> > > >  2) pass manager changes to implement the startwith("pass-name")
> > > >  feature which implements unit-testing for GIMPLE passes
> > > > 
> > > >  3) support for "SSA" input, a __PHI stmt that is lowered once the
> > > >  CFG is built, a facility to allow a specific SSA name to be allocated
> > > >  plus a small required change in the SSA rewriter to allow for
> > > >  pre-existing PHI arguments
> > > > 
> > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together with
> > > > [1/2]).
> > > > 
> > > > I can approve all these changes myself but any comments are welcome.
> > > My only worry would be ensuring that in the case where we're asking for a
> > > particular SSA_NAME in make_ssa_name_fn that we assert the requested name
> > > is
> > > available.
> > > 
> > > ISTM that if it's > the highest current version or in the freelist, then
> > > we
> > > ought to be safe.   If it isn't safe then we should either issue an error,
> > > or
> > > renumber the preexisting SSA_NAME (and determining if it's safe to
> > > renumber
> > > the preexisting SSA_NAME may require more context than we have).
> > 
> > An alternative interface would be to return NULL rather than asserting.
> > OTOH the single caller requesting a specific version first does a lookup
> > if the SSA name already exists, so it is safe.
> I have a slight preference for enforcing safety within the name manager, but I
> can live with doing the checking in the caller.

Actually safety is enforced by means of an assert ... the caller would 
have to check anyway (for a NULL result).  IMHO doing tricks like
re-numbering on already taken versions doesn't make sense as that doesn't
guarantee earlier assigned versions stay the same.

It's really a special thing for the parser which wants to preserve
SSA name versions as in the source.

Richard.


RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-17 Thread Michael Collison
James,

I incorporated all your suggestions, and successfully bootstrapped and re-ran 
the testsuite.

Okay for trunk?

2016-11-18  Michael Collison  

* config/aarch64/aarch64-protos.h
(aarch64_and_split_imm1, aarch64_and_split_imm2)
(aarch64_and_bitmask_imm): New prototypes
* config/aarch64/aarch64.c (aarch64_and_split_imm1):
New overloaded function to create bit mask covering the
lowest to highest bits set.
(aarch64_and_split_imm2): New overloaded functions to create bit
mask of zeros between first and last bit set.
(aarch64_and_bitmask_imm): New function to determine if a integer
is a valid two instruction "and" operation.
* config/aarch64/aarch64.md:(and3): New define_insn and _split
allowing wider range of constants with "and" operations.
* (ior3, xor3): Use new LOGICAL2 iterator to prevent
"and" operator from matching restricted constant range used for
ior and xor operators.
* config/aarch64/constraints.md (UsO constraint): New SImode constraint
for constants in "and" operantions.
(UsP constraint): New DImode constraint for constants in "and" 
operations.
* config/aarch64/iterators.md (lconst2): New mode iterator.
(LOGICAL2): New code iterator.
* config/aarch64/predicates.md (aarch64_logical_and_immediate): New
predicate
(aarch64_logical_and_operand): New predicate allowing extended constants
for "and" operations.
* testsuite/gcc.target/aarch64/and_const.c: New test to verify
additional constants are recognized and fewer instructions generated.
* testsuite/gcc.target/aarch64/and_const2.c: New test to verify
additional constants are recognized and fewer instructions generated.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Thursday, November 17, 2016 9:26 AM
To: Michael Collison
Cc: gcc-patches@gcc.gnu.org; nd
Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote:
> This patch is designed to improve code generation for "and" 
> instructions with certain immediate operands.
> 
> For the following test case:
> 
> int f2(int x)
> {
>x &= 0x0ff8;
> 
>x &= 0xff001fff;
> 
>return x;
> }
> 
> the trunk aarch64 compiler generates:
> 
> mov   w1, 8184
> movk  w1, 0xf00, lsl 16
> and   w0, w0, w1
> 
> We can generate one fewer instruction if we recognize certain 
> constants. With the attached patch the current trunk compiler generates:
> 
> and   w0, w0, 268435448
> and   w0, w0, -16769025
> 
> Bootstrapped and make check successfully completed with no regressions 
> on aarch64-linux-gnu.
> 
> Okay for trunk?

Hi Michael,

I have some minor comments in line.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;  
> HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);  
> int aarch64_get_condition_code (rtx);  bool aarch64_bitmask_imm 
> (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); 
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); 
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, 
> +machine_mode mode);
>  int aarch64_branch_cost (bool, bool);  enum aarch64_symbol_type 
> aarch64_classify_symbolic_expression (rtx);  bool 
> aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git 
> a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 
> 3e663eb..db82c5c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
> machine_mode mode)
>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in 
> +VAL_IN.  */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) {
> +  int first_bit_set = ctz_hwi (val_in);
> +  int last_bit_set = floor_log2 (val_in);
> +  gcc_assert (first_bit_set < last_bit_set);

This assert can never trigger (by definition) unless VAL_IN == 0 (in which case 
floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case you're 
testing for, then this assert would be more explicit as

 gcc_assert (val_in != 0)

I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first and 
last are ambiguous (you have them the opposite way round from what I'd consider 
first [highest, thus leading bits] and last [lowest/trailing bits]).

> +
> +  return ((HOST_WIDE_INT_UC (2) << last_bit_set) -
> +   (HOST_WIDE_INT_1U << first_bit_set)); }
> +
> +/* Creat

Re: [PATCH] fix PR68468

2016-11-17 Thread Waldemar Brodkorb
Hi Jakub,
Jakub Jelinek wrote,

> On Wed, Nov 16, 2016 at 07:31:59AM +0100, Waldemar Brodkorb wrote:
> > > On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote:
> > > > On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote:
> > > > >Hi,
> > > > >
> > > > >the following patch fixes PR68468.
> > > > >Patch is used for a while in Buildroot without issues.
> > > > >
> > > > >2016-11-05  Waldemar Brodkorb 
> > > 
> > > Two spaces before < instead of just one.
> > > > >
> > > > >   PR gcc/68468
> > > 
> > >   PR libgcc/68468
> > > instead.
> > > 
> > > > >   * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets.
> > > 
> > > Capital F in Fix.
> > > No libgcc/ prefix for files in libgcc/ChangeLog.
> > > 
> > > > This is ok.
> > > 
> > > I think Waldemar does not have SVN write access, are you going to check it
> > > in or who will do that?
> > 
> > Should I resend the patch with the suggested fixes or will someone
> > with write access fix it up for me?
> 
> As nobody committed it yet, I've made the changes and committed it for you.

Thanks!
 Waldemar


Re: PING [PATCH] enable -fprintf-return-value by default

2016-11-17 Thread Sandra Loosemore

On 11/16/2016 09:49 AM, Martin Sebor wrote:

I'm looking for an approval of the attached patch.

I've adjusted the documentation based on Sandra's input (i.e.,
documented the negative of the option rather than the positive;
thank you for the review, btw.)

On 11/08/2016 08:13 PM, Martin Sebor wrote:

The -fprintf-return-value optimization has been disabled since
the last time it caused a bootstrap failure on powerpc64le.  With
the underlying problems fixed GCC has bootstrapped fine on all of
powerpc64, powerpc64le and x86_64 and tested with no regressions.
I'd like to re-enable the option.  The attached patch does that.


[snip]

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 242500)
+++ gcc/doc/invoke.texi (working copy)
@@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}.
 -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol
 -fomit-frame-pointer -foptimize-sibling-calls @gol
 -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol
--fprefetch-loop-arrays -fprintf-return-value @gol
+-fprefetch-loop-arrays -fno-printf-return-value @gol
 -fprofile-correction @gol
 -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol
 -fprofile-reorder-functions @gol


Please keep this list alphabetized -- the other "-fno-*" options are
sorted as such.  The documentation parts of the patch are OK with that 
fixed, but I can't approve changing the default for the option.


-Sandra



[PATCH] [AArch64] Fix PR78382

2016-11-17 Thread Hurugalawadi, Naveen
Hi,

Please find attached the patch that fixes PR78382.

The "SYMBOL_SMALL_TLSGD" was not handled for ILP32. 
Hence it generates error when compiled for ILP32.
The attached patch adds the support and handles it properly as expected 
for ILP32.

Please review the patch and let me know if its okay?

Regression tested on AArch64 with no regressions.

Thanks,
Naveen

2016-11-18  Naveen H.S  

* config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
Handle SYMBOL_SMALL_TLSGD for ILP32.
* config/aarch64/aarch64.md : tlsgd_small modified into
tlsgd_small_ to support SImode and DImode.
*tlsgd_small modified into *tlsgd_small_ to support SImode and
DImode.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 11d41cf..1688f0d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1374,10 +1374,17 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 case SYMBOL_SMALL_TLSGD:
   {
 	rtx_insn *insns;
-	rtx result = gen_rtx_REG (Pmode, R0_REGNUM);
+	rtx result;
+	if (TARGET_ILP32)
+	  result = gen_rtx_REG (SImode, R0_REGNUM);
+	else
+	  result = gen_rtx_REG (DImode, R0_REGNUM);
 
 	start_sequence ();
-	aarch64_emit_call_insn (gen_tlsgd_small (result, imm));
+	if (TARGET_ILP32)
+	  aarch64_emit_call_insn (gen_tlsgd_small_si (result, imm));
+	else
+	  aarch64_emit_call_insn (gen_tlsgd_small_di (result, imm));
 	insns = get_insns ();
 	end_sequence ();
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a652a7c..4833c7f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5089,20 +5089,20 @@
 ;; The TLS ABI specifically requires that the compiler does not schedule
 ;; instructions in the TLS stubs, in order to enable linker relaxation.
 ;; Therefore we treat the stubs as an atomic sequence.
-(define_expand "tlsgd_small"
+(define_expand "tlsgd_small_"
  [(parallel [(set (match_operand 0 "register_operand" "")
   (call (mem:DI (match_dup 2)) (const_int 1)))
-	 (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS)
+	 (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS)
 	 (clobber (reg:DI LR_REGNUM))])]
  ""
 {
   operands[2] = aarch64_tls_get_addr ();
 })
 
-(define_insn "*tlsgd_small"
+(define_insn "*tlsgd_small_"
   [(set (match_operand 0 "register_operand" "")
 	(call (mem:DI (match_operand:DI 2 "" "")) (const_int 1)))
-   (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS)
+   (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS)
(clobber (reg:DI LR_REGNUM))
   ]
   ""
diff --git a/gcc/testsuite/gcc.target/aarch64/pr78382.c b/gcc/testsuite/gcc.target/aarch64/pr78382.c
new file mode 100644
index 000..6c98e5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr78382.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O0 -fpic -mabi=ilp32 -mtls-dialect=trad" } */
+
+__thread int abc;
+void
+foo ()
+{
+  int *p;
+  p = &abc;
+}


Go patch committed: Remove more diffs from gofrontend and GCC repos

2016-11-17 Thread Ian Lance Taylor
This patch to gcc/go/gofrontend and libgo removes the remaining
extraneous differences between the GCC repo and gofrontend repo.  A
few files were somehow not deleted from the GCC repo.  The aclocal.m4
file was changed recently, and gcc/go/gofrontend/lex.cc was changed a
while back.  This patch simply restores them to the versions in the
gofrontend repo.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/lex.cc
===
--- gcc/go/gofrontend/lex.cc(revision 242581)
+++ gcc/go/gofrontend/lex.cc(working copy)
@@ -882,7 +882,7 @@ Lex::gather_identifier()
  && (cc < '0' || cc > '9'))
{
  // Check for an invalid character here, as we get better
- // error behavior if we swallow them as part of the
+ // error behaviour if we swallow them as part of the
  // identifier we are building.
  if ((cc >= ' ' && cc < 0x7f)
  || cc == '\t'
@@ -923,7 +923,7 @@ Lex::gather_identifier()
{
  // There is no valid place for a non-ASCII character
  // other than an identifier, so we get better error
- // handling behavior if we swallow this character after
+ // handling behaviour if we swallow this character after
  // giving an error.
  if (!issued_error)
go_error_at(this->location(),
Index: libgo/aclocal.m4
===
--- libgo/aclocal.m4(revision 242581)
+++ libgo/aclocal.m4(working copy)
@@ -579,27 +579,6 @@ if test x"${install_sh}" != xset; then
 fi
 AC_SUBST(install_sh)])
 
-# Copyright (C) 2003, 2005  Free Software Foundation, Inc.
-#
-# This file is free software; the Free Software Foundation
-# gives unlimited permission to copy and/or distribute it,
-# with or without modifications, as long as this notice is preserved.
-
-# serial 2
-
-# Check whether the underlying file-system supports filenames
-# with a leading dot.  For instance MS-DOS doesn't.
-AC_DEFUN([AM_SET_LEADING_DOT],
-[rm -rf .tst 2>/dev/null
-mkdir .tst 2>/dev/null
-if test -d .tst; then
-  am__leading_dot=.
-else
-  am__leading_dot=_
-fi
-rmdir .tst 2>/dev/null
-AC_SUBST([am__leading_dot])])
-
 # Add --enable-maintainer-mode option to configure. -*- Autoconf -*-
 # From Jim Meyering
 
@@ -764,74 +743,6 @@ case $mkdir_p in
 esac
 ])
 
-# Copyright (C) 1998, 1999, 2000, 2001, 2003, 2004, 2005, 2006, 2012
-# Free Software Foundation, Inc.
-#
-# This file is free software; the Free Software Foundation
-# gives unlimited permission to copy and/or distribute it,
-# with or without modifications, as long as this notice is preserved.
-
-# serial 6
-
-# AM_ENABLE_MULTILIB([MAKEFILE], [REL-TO-TOP-SRCDIR])
-# ---
-# Add --enable-multilib to configure.
-AC_DEFUN([AM_ENABLE_MULTILIB],
-[m4_warn([obsolete], [$0 will be removed from Automake core soon.
-Files implementing the "multilib" feature are (and will remain) available
-to the 'contrib/' directory in the Automake distribution.])]dnl
-[# Default to --enable-multilib
-AC_ARG_ENABLE(multilib,
-[  --enable-multilib   build many library versions (default)],
-[case "$enableval" in
-  yes) multilib=yes ;;
-  no)  multilib=no ;;
-  *)   AC_MSG_ERROR([bad value $enableval for multilib option]) ;;
- esac],
- [multilib=yes])
-
-# We may get other options which we leave undocumented:
-# --with-target-subdir, --with-multisrctop, --with-multisubdir
-# See config-ml.in if you want the gory details.
-
-if test "$srcdir" = "."; then
-  if test "$with_target_subdir" != "."; then
-multi_basedir="$srcdir/$with_multisrctop../$2"
-  else
-multi_basedir="$srcdir/$with_multisrctop$2"
-  fi
-else
-  multi_basedir="$srcdir/$2"
-fi
-AC_SUBST(multi_basedir)
-
-# Even if the default multilib is not a cross compilation,
-# it may be that some of the other multilibs are.
-if test $cross_compiling = no && test $multilib = yes \
-   && test "x${with_multisubdir}" != x ; then
-   cross_compiling=maybe
-fi
-
-AC_OUTPUT_COMMANDS([
-# Only add multilib support code if we just rebuilt the top-level
-# Makefile.
-case " $CONFIG_FILES " in
- *" ]m4_default([$1],Makefile)[ "*)
-   ac_file=]m4_default([$1],Makefile)[ . ${multi_basedir}/config-ml.in
-   ;;
-esac],
-  [
-srcdir="$srcdir"
-host="$host"
-target="$target"
-with_multisubdir="$with_multisubdir"
-with_multisrctop="$with_multisrctop"
-with_target_subdir="$with_target_subdir"
-ac_configure_args="${multilib_arg} ${ac_configure_args}"
-multi_basedir="$multi_basedir"
-CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
-CC="$CC"])])dnl
-
 # Helper functions for option handling. -*- Autoconf -*-
 
 # Copyright (C) 2001, 2002, 2003, 2005, 2008, 2010 Free Software
@@ -1077,6 +988,11 @@ AC_SUBST([am__tar])
 AC_SUBST(

libgo patch committed: Update configure, a few binary files

2016-11-17 Thread Ian Lance Taylor
This patch to libgo updates the configure script and a few binary
files.  The configure script was changed without going through the
external repo in a way that somehow dropped the
--with-system-libunwind argument and made a few other changes.  This
patch restores these changes.  A few binary files were changed in the
external repo in merges from the master repo, but the changes did not
get reflected in the GCC repo.  These changes are all irrelevant--a
couple of "http" became "https" in files that appeared compressed but
were not actually compressed, and a couple of timestamps were zeroed
out in files that actually were compressed.  Patch bootstrapped and
ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: libgo/configure
===
--- libgo/configure (revision 242581)
+++ libgo/configure (working copy)
@@ -810,6 +810,7 @@ enable_werror
 enable_version_specific_runtime_libs
 with_libffi
 with_libatomic
+with_system_libunwind
 '
   ac_precious_vars='build_alias
 host_alias
@@ -1459,6 +1460,7 @@ Optional Packages:
   both]
   --without-libffidon't use libffi
   --without-libatomic don't use libatomic
+  --with-system-libunwind use installed libunwind
 
 Some influential environment variables:
   CC  C compiler command
@@ -2483,6 +2485,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu
 
 
 
+
+
+
 ac_config_headers="$ac_config_headers config.h"
 
 
@@ -3467,12 +3472,10 @@ done
 
 cat confdefs.h - <<_ACEOF >conftest.$ac_ext
 /* end confdefs.h.  */
-#include 
+
 int
 main ()
 {
-FILE *f = fopen ("conftest.out", "w");
- return ferror (f) || fclose (f) != 0;
 
   ;
   return 0;
@@ -11206,7 +11209,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11207 "configure"
+#line 11212 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -14679,7 +14682,42 @@ $as_echo "#define AC_APPLE_UNIVERSAL_BUI
  esac
 
 
-GCC_CHECK_UNWIND_GETIPINFO
+
+
+# Check whether --with-system-libunwind was given.
+if test "${with_system_libunwind+set}" = set; then :
+  withval=$with_system_libunwind;
+fi
+
+  # If system-libunwind was not specifically set, pick a default setting.
+  if test x$with_system_libunwind = x; then
+case ${target} in
+  ia64-*-hpux*) with_system_libunwind=yes ;;
+  *) with_system_libunwind=no ;;
+esac
+  fi
+  # Based on system-libunwind and target, do we have ipinfo?
+  if  test x$with_system_libunwind = xyes; then
+case ${target} in
+  ia64-*-*) have_unwind_getipinfo=no ;;
+  *) have_unwind_getipinfo=yes ;;
+esac
+  else
+# Darwin before version 9 does not have _Unwind_GetIPInfo.
+
+case ${target} in
+  *-*-darwin[3-8]|*-*-darwin[3-8].*) have_unwind_getipinfo=no ;;
+  *) have_unwind_getipinfo=yes ;;
+esac
+
+  fi
+
+  if test x$have_unwind_getipinfo = xyes; then
+
+$as_echo "#define HAVE_GETIPINFO 1" >>confdefs.h
+
+  fi
+
 
 for ac_header in port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h 
sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h 
sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h 
netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h 
sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h 
netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h
 do :
@@ -16269,6 +16307,8 @@ ac_configure_args="${multilib_arg} ${ac_
 multi_basedir="$multi_basedir"
 CONFIG_SHELL=${CONFIG_SHELL-/bin/sh}
 CC="$CC"
+CXX="$CXX"
+GFORTRAN="$GFORTRAN"
 AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir"
 
 
Index: libgo/go/archive/zip/testdata/readme.notzip
===
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Index: libgo/go/archive/zip/testdata/readme.zip
===
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Index: libgo/go/compress/gzip/testdata/issue6550.gz
===
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream
Index: libgo/go/encoding/json/testdata/code.json.gz
===
Cannot display: file marked as a binary type.
svn:mime-type = application/octet-stream


Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Martin Sebor

On 11/17/2016 05:32 PM, Martin Sebor wrote:

On 11/17/2016 03:52 PM, Jeff Law wrote:

On 11/17/2016 03:03 PM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote:

On 11/17/2016 11:24 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:

--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist
**ap, gfc_formal_arglist *formal,
  for (f = formal; f; f = f->next)
n++;

-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
+  /* Take care not to call alloca with a zero argument.  */
+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);

  for (i = 0; i < n; i++)
new_arg[i] = NULL;


Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
and we don't rely on those pointers being distinct from other
pointers.

On systems where alloca was implemented on top of malloc, alloca (0)
would
cause collection of alloca'd objects that had gone out of scope.


Ouch.  Do we support any such systems as hosts?  If yes, can't we just
define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1)
on those systems and leave the sane hosts as is?

I would guess they're all dead as hosts for building GCC.  I was most
familiar with hpux, but I'm pretty sure there were others as emacs
(IIRC) had a replacement alloca for systems without it as a builtin.
They probably all fall into the "retro-computing" bucket these days.

Essentially those systems worked by recording all the allocations as
well as the frame depth at which they occurred.  The next time alloca
was called, anything at a deeper depth than the current frame was
released.

So even if we called alloca (0) unexpectedly, it's not going to cause
anything to break.  Failing to call alloca (0) could run the system out
of heap memory.  It's left as an exercise to the reader to ponder how
that might happen -- it can and did happen building GCC "in the old
days".

The point is warning on an alloca (0) may not be as clear cut as it
might seem.  It's probably a reasonable thing to do on the host, but on
a target, which might be embedded and explicitly overriding the builtin
alloca, warning on alloca (0) is less of a slam dunk.


I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments.  For instance:

http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html


Their GCC-derived compilers apparently enable it with -fno-builtins.

Other references include source code derived from an implementation
with Doug Gwyn's name on it, such as this one:

https://opensource.apple.com/source/gnudiff/gnudiff-10/diffutils/alloca.c


There's also a copy in libiberty:
  https://gcc.gnu.org/onlinedocs/libiberty/Functions.html#index-alloca-58
  https://gcc.gnu.org/viewcvs/gcc/trunk/libiberty/alloca.c?view=markup



A comment at the top the file mentions the alloca(0) use case:

As a special case, alloca(0) reclaims storage without
allocating any.  It is a good idea to use alloca(0) in
your main control loop, etc. to force garbage collection.

That said, I don't view this as reason to exclude the warning from
-Wextra.  The vendor-provided compilers are obviously customized
versions of GCC with their own features, including their own set
of options and warnings.  It shouldn't stop us from enabling those
we expect to be useful to the typical GCC user on typical targets,
and especially those that can help expose sources of common bugs.
Recognizing I'm preaching to choir here: Calling alloca with any
argument is considered dangerous enough to be discouraged in most
man pages.  Alloca(0) is a potentially dangerous corner case of
an already dangerous API.  It seems at least as problematic as
any of the warnings already in -Wextra, and I'd say as many in
-Wall.

Even on systems with this unusual alloca implementation, since
alloca(0) is special (and expensive), warning on such calls would
likely be useful.  In fact, since calling alloca(0) in these
environments is actually important, warning on them coukd help
detect their unintended absence.  The few such calls that are
intended can be easily tweaked to suppress the warning.

Martin




Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-17 Thread Marek Polacek
On Thu, Nov 17, 2016 at 09:10:51AM +0100, Jakub Jelinek wrote:
> On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote:
> > Sorry.  So consider the following:
> > 
> > class S
> > {
> >   virtual void foo () = 0;
> > };
> > 
> > struct T {
> >   T &operator << (const char *s);
> > };
> > 
> > T t;
> > 
> > void
> > S::foo ()
> > {
> >   t << "a" << "b" << "c";
> > }
> > 
> > Before
> > 1498   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
> > 1499 ubsan_maybe_instrument_member_call (stmt, is_ctor);
> > 
> > the stmt will be
> > 
> > T::operator<< (T::operator<< (T::operator<< (&t, "a"), "b"), "c")
> > 
> > after ubsan_maybe_instrument_member_call it will be
> > 
> > T::operator<< (UBSAN_NULL (SAVE_EXPR  > "a"), "b")>, 4B, 0);, SAVE_EXPR  > "b")>;, "c")
> > 
> > and that's what is saved into the hash table.  Then another stmt will be the
> > inner
> > 
> > T::operator<< (T::operator<< (&t, "a"), "b")
> > 
> > which we instrument and put into the hash table, and so on.  But those
> > SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
> > T::operator<< calls and we kind of recursively instrument all of them.
> 
> But those SAVE_EXPRs are the same.  If you put a breakpoint on the:
> 489 if (op)
> 490   CALL_EXPR_ARG (stmt, 0) = op;
> line 490 in c-ubsan.c, on the above short testcase is called 2 times
> (the T< not to be NULL), rather than 3 times.
> When trying larger testcase like below I count ~ 120 calls (counted by hand,
> so it is possible it was the right 119 times).
> If this has not been linear, the testcase would be compiling for years.
> Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it,
> it will be just insanely huge, but each SAVE_EXPR appears exactly twice
> in its containing SAVE_EXPR and the second time cp_genericize_r sees
> the SAVE_EXPR, it will do the
> 1138/* Other than invisiref parms, don't walk the same tree twice.  */
> 1139if (p_set->contains (stmt))
> 1140  {
> 1141*walk_subtrees = 0;
> 1142return NULL_TREE;
> 1143  }
> early exit.
 
Clearly I misread things here.  I blame the cold I've caught!

In any case, I'm sorry for wasting your time and I'll just close the PR.

Marek


[PR78365] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413

2016-11-17 Thread kugan

Hi,

I was relying on ipa_get_callee_param_type to get type of parameter and 
then convert arguments to this type while computing jump functions. 
However, in cases like shown in PR78365, ipa_get_callee_param_type, 
instead of giving up, would return the wrong type. I think the current 
uses of ipa_get_callee_param_type are fine with this.


Attached patch now uses callee's DECL_ARGUMENTS to get the type. If it 
cannot be found, then I would give up and set the jump function to varying.


Bootstrapped and regression tested on x86_64-linux-gnu with no new 
regressions. Is this OK for trunk?


Thanks,
Kugan

gcc/testsuite/ChangeLog:

2016-11-18  Kugan Vivekanandarajah  

PR IPA/78365
* gcc.dg/torture/pr78365.c: New test.

gcc/ChangeLog:

2016-11-18  Kugan Vivekanandarajah  

PR IPA/78365
* ipa-cp.c (propagate_constants_accross_call): get param type from 
callees
DECL_ARGUMENTS if available.
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Likewise.
(ipcp_update_vr):  Remove now redundant conversion of precision for VR.
* ipa-prop.h: Make ipa_get_callee_param_type local again.
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 2ec671f..924c846 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -2200,6 +2200,9 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
   struct ipa_edge_args *args;
   bool ret = false;
   int i, args_count, parms_count;
+  struct cgraph_node *calee = cs->callee;
+  tree fndecl = calee ? calee->decl : NULL_TREE;
+  tree parm = fndecl ? DECL_ARGUMENTS (fndecl) : NULL_TREE;
 
   callee = cs->callee->function_symbol (&availability);
   if (!callee->definition)
@@ -2247,7 +2250,6 @@ propagate_constants_accross_call (struct cgraph_edge *cs)
 {
   struct ipa_jump_func *jump_func = ipa_get_ith_jump_func (args, i);
   struct ipcp_param_lattices *dest_plats;
-  tree param_type = ipa_get_callee_param_type (cs, i);
 
   dest_plats = ipa_get_parm_lattices (callee_info, i);
   if (availability == AVAIL_INTERPOSABLE)
@@ -2265,10 +2267,12 @@ propagate_constants_accross_call (struct cgraph_edge 
*cs)
  if (opt_for_fn (callee->decl, flag_ipa_vrp))
ret |= propagate_vr_accross_jump_function (cs,
   jump_func, dest_plats,
-  param_type);
+  parm ?
+  TREE_TYPE (parm) : 
NULL_TREE);
  else
ret |= dest_plats->m_value_range.set_to_bottom ();
}
+  parm = parm ? DECL_CHAIN (parm) : NULL_TREE;
 }
   for (; i < parms_count; i++)
 ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info, i));
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 6321fdd..0f102c6c 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1651,7 +1651,7 @@ determine_locally_known_aggregate_parts (gcall *call, 
tree arg,
 /* Return the Ith param type of callee associated with call graph
edge E.  */
 
-tree
+static tree
 ipa_get_callee_param_type (struct cgraph_edge *e, int i)
 {
   int n;
@@ -1695,6 +1695,9 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
   gcall *call = cs->call_stmt;
   int n, arg_num = gimple_call_num_args (call);
   bool useful_context = false;
+  struct cgraph_node *calee = cs->callee;
+  tree fndecl = calee ? calee->decl : NULL_TREE;
+  tree parm = fndecl ? DECL_ARGUMENTS (fndecl) : NULL_TREE;
 
   if (arg_num == 0 || args->jump_functions)
 return;
@@ -1751,8 +1754,8 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
{
  wide_int min, max;
  value_range_type type;
- if (TREE_CODE (arg) == SSA_NAME
- && param_type
+ if (parm
+ && TREE_CODE (arg) == SSA_NAME
  && (type = get_range_info (arg, &min, &max))
  && (type == VR_RANGE || type == VR_ANTI_RANGE))
{
@@ -1764,7 +1767,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
  vr.equiv = NULL;
  extract_range_from_unary_expr (&jfunc->m_vr,
 NOP_EXPR,
-param_type,
+TREE_TYPE (parm),
 &vr, TREE_TYPE (arg));
  if (jfunc->m_vr.type == VR_RANGE
  || jfunc->m_vr.type == VR_ANTI_RANGE)
@@ -1775,6 +1778,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
  else
gcc_assert (!jfunc->vr_known);
}
+  parm = parm ? DECL_CHAIN (parm) : NULL_TREE;
 
   if (INTEGRAL_TYPE_P (TREE_TYPE (arg))
  && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST))
@@ -5699,8 +5703,6 @@ ipcp_update_vr (struct cgraph_node *node)
   if (vr[i].known
  &

Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Martin Sebor

On 11/17/2016 03:52 PM, Jeff Law wrote:

On 11/17/2016 03:03 PM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote:

On 11/17/2016 11:24 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:

--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist
**ap, gfc_formal_arglist *formal,
  for (f = formal; f; f = f->next)
n++;

-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
+  /* Take care not to call alloca with a zero argument.  */
+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);

  for (i = 0; i < n; i++)
new_arg[i] = NULL;


Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
and we don't rely on those pointers being distinct from other pointers.

On systems where alloca was implemented on top of malloc, alloca (0)
would
cause collection of alloca'd objects that had gone out of scope.


Ouch.  Do we support any such systems as hosts?  If yes, can't we just
define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1)
on those systems and leave the sane hosts as is?

I would guess they're all dead as hosts for building GCC.  I was most
familiar with hpux, but I'm pretty sure there were others as emacs
(IIRC) had a replacement alloca for systems without it as a builtin.
They probably all fall into the "retro-computing" bucket these days.

Essentially those systems worked by recording all the allocations as
well as the frame depth at which they occurred.  The next time alloca
was called, anything at a deeper depth than the current frame was released.

So even if we called alloca (0) unexpectedly, it's not going to cause
anything to break.  Failing to call alloca (0) could run the system out
of heap memory.  It's left as an exercise to the reader to ponder how
that might happen -- it can and did happen building GCC "in the old days".

The point is warning on an alloca (0) may not be as clear cut as it
might seem.  It's probably a reasonable thing to do on the host, but on
a target, which might be embedded and explicitly overriding the builtin
alloca, warning on alloca (0) is less of a slam dunk.


I confess I haven't heard of such an implementation before but after
some searching I have managed to find a few examples of it online,
such as in QNX or in the BlackBerry SDK, or in the GCC shipped by
Texas Instruments.  For instance:

http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html

Their GCC-derived compilers apparently enable it with -fno-builtins.

Other references include source code derived from an implementation
with Doug Gwyn's name on it, such as this one:

https://opensource.apple.com/source/gnudiff/gnudiff-10/diffutils/alloca.c

A comment at the top the file mentions the alloca(0) use case:

As a special case, alloca(0) reclaims storage without
allocating any.  It is a good idea to use alloca(0) in
your main control loop, etc. to force garbage collection.

That said, I don't view this as reason to exclude the warning from
-Wextra.  The vendor-provided compilers are obviously customized
versions of GCC with their own features, including their own set
of options and warnings.  It shouldn't stop us from enabling those
we expect to be useful to the typical GCC user on typical targets,
and especially those that can help expose sources of common bugs.
Recognizing I'm preaching to choir here: Calling alloca with any
argument is considered dangerous enough to be discouraged in most
man pages.  Alloca(0) is a potentially dangerous corner case of
an already dangerous API.  It seems at least as problematic as
any of the warnings already in -Wextra, and I'd say as many in
-Wall.

Even on systems with this unusual alloca implementation, since
alloca(0) is special (and expensive), warning on such calls would
likely be useful.  In fact, since calling alloca(0) in these
environments is actually important, warning on them coukd help
detect their unintended absence.  The few such calls that are
intended can be easily tweaked to suppress the warning.

Martin


libgo patch committed: Rewrite Go type to libffi type conversion in Go

2016-11-17 Thread Ian Lance Taylor
This patch to libgo rewrites the code that converts from a Go type to
a libffi type in Go.  As we move toward the Go 1.7 garbage collector,
it's essential that all allocation of values that can contain Go
pointers be done using the correct type descriptor.  That is simplest
if we do all such allocation in Go code.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 242509)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-d9189ebc139ff739af956094626ccc5eb92c3091
+bc5ad6d10092d6238495357468ee093f7caf39f9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 242509)
+++ libgo/Makefile.am   (working copy)
@@ -1052,6 +1052,14 @@ runtime_internal_atomic_lo_check_GOCFLAG
 runtime_internal_sys_lo_GOCFLAGS = -fgo-compiling-runtime
 runtime_internal_sys_lo_check_GOCFLAGS = -fgo-compiling-runtime
 
+# If libffi is supported (the normal case) use the ffi build tag for
+# the runtime package.
+if USE_LIBFFI
+matchargs_runtime = --tag=libffi
+else
+matchargs_runtime =
+endif
+
 # At least for now, we need -static-libgo for this test, because
 # otherwise we can't get the line numbers.
 # Also use -fno-inline to get better results from the memory profiler.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 242072)
+++ libgo/configure.ac  (working copy)
@@ -121,6 +121,7 @@ if test "$with_libffi" != no; then
 fi
 AC_SUBST(LIBFFI)
 AC_SUBST(LIBFFIINCS)
+AM_CONDITIONAL(USE_LIBFFI, test "$with_liffi" != "no")
 
 # See if the user wants to configure without libatomic. This is useful if we 
are
 # on an architecture for which libgo does not need an atomic support library 
and
Index: libgo/go/reflect/makefunc.go
===
--- libgo/go/reflect/makefunc.go(revision 241341)
+++ libgo/go/reflect/makefunc.go(working copy)
@@ -63,7 +63,7 @@ func MakeFunc(typ Type, fn func(args []V
method: -1,
}
 
-   makeFuncFFI(ftyp, unsafe.Pointer(impl))
+   makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(impl))
 
return Value{t, unsafe.Pointer(&impl), flag(Func) | flagIndir}
 }
@@ -102,7 +102,7 @@ func makeMethodValue(op string, v Value)
rcvr:   rcvr,
}
 
-   makeFuncFFI(ftyp, unsafe.Pointer(fv))
+   makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(fv))
 
return Value{ft, unsafe.Pointer(&fv), v.flag&flagRO | flag(Func) | 
flagIndir}
 }
@@ -128,7 +128,7 @@ func makeValueMethod(v Value) Value {
rcvr:   v,
}
 
-   makeFuncFFI(ftyp, unsafe.Pointer(impl))
+   makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(impl))
 
return Value{t, unsafe.Pointer(&impl), v.flag&flagRO | flag(Func) | 
flagIndir}
 }
Index: libgo/go/reflect/makefunc_ffi.go
===
--- libgo/go/reflect/makefunc_ffi.go(revision 241341)
+++ libgo/go/reflect/makefunc_ffi.go(working copy)
@@ -10,7 +10,10 @@ import (
 
 // The makeFuncFFI function, written in C, fills in an FFI closure.
 // It arranges for ffiCall to be invoked directly from FFI.
-func makeFuncFFI(ftyp *funcType, impl unsafe.Pointer)
+func makeFuncFFI(cif unsafe.Pointer, impl unsafe.Pointer)
+
+// The makeCIF function, implemented in the runtime package, allocates a CIF.
+func makeCIF(ft *funcType) unsafe.Pointer
 
 // FFICallbackGo implements the Go side of the libffi callback.
 // It is exported so that C code can call it.
Index: libgo/go/reflect/makefunc_ffi_c.c
===
--- libgo/go/reflect/makefunc_ffi_c.c   (revision 241341)
+++ libgo/go/reflect/makefunc_ffi_c.c   (working copy)
@@ -8,7 +8,7 @@
 
 #ifdef USE_LIBFFI
 
-#include "go-ffi.h"
+#include "ffi.h"
 
 #if FFI_GO_CLOSURES
 #define USE_LIBFFI_CLOSURES
@@ -18,7 +18,7 @@
 
 /* Declare C functions with the names used to call from Go.  */
 
-void makeFuncFFI(const struct __go_func_type *ftyp, void *impl)
+void makeFuncFFI(void *cif, void *impl)
   __asm__ (GOSYM_PREFIX "reflect.makeFuncFFI");
 
 #ifdef USE_LIBFFI_CLOSURES
@@ -70,20 +70,15 @@ ffi_callback (ffi_cif* cif __attribute__
 /* Allocate an FFI closure and arrange to call ffi_callback.  */
 
 void
-makeFuncFFI(const struct __go_func_type *ftyp, void *impl)
+makeFuncFFI(void *cif, void *impl)
 {
-  ffi_cif *cif;
-
-  cif = (ffi_cif *) __go_alloc (sizeof (ffi_cif));
-  __go_func_to_cif (ftyp, 0, 0, cif);
-
-  ffi_prep_go_closure(impl, cif, ffi_callback);
+  ffi_prep_go_closure(impl, (ffi_cif*)cif, ffi_callback);
 }
 
 #else /* !defined(USE_LIBFFI_CLOSURES) */
 
 void
-mak

Re: [PATCH] warn on overflow in calls to allocation functions (bugs 77531 and 78284)

2016-11-17 Thread Martin Sebor

Attached is an update to the patch that avoids duplicating the
-Walloca-larger-than warnings.  This version also avoids warning
for calls with zero allocation size to functions declared with
the returns_nonnull attribute (like libiberty's xmalloc).  Since
such functions cannot return null there's no portability issue
to worry/warn about.

When applied along with the patch to avoid calling alloca(0)
in GCC posted earlier today (link below) this version bootstraps
and passes all tests on x86_64.  It also builds Binutils 2.27 and
the Linux kernel with no new warnings.

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01838.html

Martin

On 11/16/2016 10:19 AM, Martin Sebor wrote:

Attached is an updated version of the patch that also adds attribute
alloc_size to the standard allocation built-ins (aligned_alloc,
alloca, malloc, calloc, and realloc) and handles alloca.

Besides that, I've renamed the option to -Walloc-size-larger-than
to make it less similar to -Walloca-larger-than.  It think the new
name works because the option works with the alloc_size attribute.
 Other suggestions are of course welcome.

I've left the alloc_max_size function in place until I receive some
feedback on it.

I've regression-tested the patch on x86_64 with a few issues.  The
biggest is that the -Walloc-zero option enabled by -Wextra causes
a number of errors during bootstrap due to invoking the XALLOCAVEC
macro with a zero argument.  The errors look valid to me (and I
got past them by temporarily changing the XALLOCAVEC macro to
always allocate at least one byte) but I haven't fixed the errors
yet.  I'll post a separate patch for those.   The other open issue
is that the new warning duplicates a small subset of the
-Walloca-larger-than warnings.  I expect removing the duplicates
to be straightforward.  I post this updated patch for review while
I work on the remaining issues.

Martin

On 11/13/2016 08:19 PM, Martin Sebor wrote:

Bug 77531 requests a new warning for calls to allocation functions
(those declared with attribute alloc_size(X, Y)) that overflow the
computation X * Z of the size of the allocated object.

Bug 78284 suggests that detecting and diagnosing other common errors
in calls to allocation functions, such as allocating more space than
SIZE_MAX / 2 bytes, would help prevent subsequent buffer overflows.

The attached patch adds two new warning options, -Walloc-zero and
-Walloc-larger-than=bytes that implement these two enhancements.
The patch is not 100% finished because, as it turns out, the GCC
allocation built-ins (malloc et al.) do not make use of the
attribute and so don't benefit from the warnings.  The tests are
also incomplete, and there's at least one bug in the implementation
I know about.

I'm posting the patch while stage 1 is still open and to give
a heads up on it and to get early feedback.  I expect completing
it will be straightforward.

Martin

PS The alloc_max_size function added in the patch handles sizes
specified using suffixes like KB, MB, etc.  I added that to make
it possible to specify sizes in excess of the maximum of INT_MAX
that (AFAIK) options that take integer arguments handle out of
the box.  It only belatedly occurred to me that the suffixes
are unnecessary if the option argument is handled using strtoull.
I can remove the suffix (as I suspect it will raise objections)
but I think that a general solution along these lines would be
useful to let users specify large byte sizes in other options
as well (such -Walloca-larger-than, -Wvla-larger-then).  Are
there any suggestions or preferences?





PR c/77531 - __attribute__((alloc_size(1,2))) could also warn on multiplication overflow
PR c/78284 - warn on malloc with very large arguments

gcc/c-family/ChangeLog:

	PR c/77531
	PR c/78284
	* c.opt (-Walloc-zero, -Walloc-size-larger-than): New options.

gcc/ChangeLog:

	PR c/77531
	PR c/78284
	* builtin-attrs.def (ATTR_ALLOC_SIZE): New identifier tree.
	(ATTR_MALLOC_SIZE_1_NOTHROW_LIST): New attribute list.
	(ATTR_MALLOC_SIZE_1_NOTHROW_LEAF_LIST): Same.
	(ATTR_MALLOC_SIZE_1_2_NOTHROW_LEAF_LIST): Same.
	(ATTR_ALLOC_SIZE_2_NOTHROW_LEAF_LIST): Same.
	* builtins.c (expand_builtin_alloca): Call
	maybe_warn_alloc_args_overflow.
	* builtins.def (akigned_alloc, alloca, calloc, malloc, realloc):
	Add attribute alloc_size.
	* calls.h (maybe_warn_alloc_args_overflow): Declare.
	* calls.c (alloc_max_size): New function.
	(maybe_warn_alloc_args_overflow): Define.
	(initialize_argument_information): Diagnose overflow in functions
	declared with attaribute alloc_size.
	* doc/invoke.texi (Warning Options): Document -Walloc-zero and
	-Walloc-size-larger-than.

gcc/testsuite/ChangeLog:

	PR c/77531
	PR c/78284
	* gcc.dg/attr-alloc_size-3.c: New test.
	* gcc.dg/attr-alloc_size-4.c: New test.
	* gcc.dg/attr-alloc_size-5.c: New test.
	* gcc.dg/attr-alloc_size-6.c: New test.
	* gcc.dg/attr-alloc_size-7.c: New test.
	* gcc.dg/attr-alloc_size-8.c: New test.
	* gcc.dg/attr-alloc_size-9.c: New test.
	* gcc/testsuite

[PR middle-end/38219] Don't test vrp47.c on m68k

2016-11-17 Thread Jeff Law


AFAICT the m68k was the last target needing twiddling to avoid 
unexpected failures on vrp47.c (from reading comments in the bz).


The m68k (like others already mentioned) breaks the conditionals down 
rather than combining them into a logical with single compare/test.


Fixed in the obvious way and installed on the trunk.

Jeff
commit 1d9211ddf7fcede312120eba696ef143d4f00015
Author: law 
Date:   Thu Nov 17 23:54:46 2016 +

PR middle-end/38219
* gcc.dg/tree-ssa/vrp47.c: Do not run on m68k.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242576 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5dd6f56..c039eca 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2016-11-17  Jeff Law  
 
+   PR middle-end/38219
+   * gcc.dg/tree-ssa/vrp47.c: Do not run on m68k.
+
PR target/47192
* gcc.target/m68k/pr47192.c: New test.
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c 
b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
index beca5fa..28a8808 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c
@@ -3,7 +3,7 @@
 /* Skip on S/390.  Lower values in BRANCH_COST lead to two conditional
jumps when evaluating an && condition.  VRP is not able to optimize
this.  */
-/* { dg-do compile { target { ! { logical_op_short_circuit || { s390*-*-* 
mn10300-*-* hppa*-*-* } } } } } */
+/* { dg-do compile { target { ! { logical_op_short_circuit || { s390*-*-* 
mn10300-*-* hppa*-*-* m68k*-*-* } } } } } */
 /* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-dom2 -fdump-tree-vrp2" } */
 /* { dg-additional-options "-march=i586" { target { { i?86-*-* x86_64-*-* } && 
ia32 } } } */
 


[PR target/47192] Avoid unsafe code motion in m68k epilogue

2016-11-17 Thread Jeff Law


Like so many ports, the m68k port forgot to emit a barrier prior to 
deallocation of the stack in its epilogue.


As a result, a memory reference could schedule after the stack 
deallocation.  That memory reference might refer to an object in the 
just allocated stack.  If we get an interrupt between the deallocation 
and the memory reference, boom!


This can be seen in the code for pr47192 when using the appropriate options.

Fixed in the usual way.  Verified the m68k target tests pass. 
Installing on the trunk.


jeff

commit 6dcf231a5ddf1f8dfcf1a6ec59fcf56520666271
Author: law 
Date:   Thu Nov 17 23:39:08 2016 +

PR target/47192
* config/m68k/m68k.c (m68k_expand_epilogue): Emit a scheduling
barrier prior to deallocating the stack.

PR target/47192
* gcc.target/m68k/pr47192.c: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242575 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 24ed0f8..86a8fdf 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2016-11-17  Jeff Law  
+
+   PR target/47192
+   * config/m68k/m68k.c (m68k_expand_epilogue): Emit a scheduling
+   barrier prior to deallocating the stack.
+
 2016-11-17  Andrew Burgess  
 
* config/arc/arc.md (cmem bit/sign-extend peephole2): New peephole
diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c
index ce56692..346c2dc 100644
--- a/gcc/config/m68k/m68k.c
+++ b/gcc/config/m68k/m68k.c
@@ -1205,6 +1205,7 @@ m68k_expand_epilogue (bool sibcall_p)
 stack-based restore.  */
  emit_move_insn (gen_rtx_REG (Pmode, A1_REG),
  GEN_INT (-(current_frame.offset + fsize)));
+ emit_insn (gen_blockage ());
  emit_insn (gen_addsi3 (stack_pointer_rtx,
 gen_rtx_REG (Pmode, A1_REG),
 frame_pointer_rtx));
@@ -1306,6 +1307,7 @@ m68k_expand_epilogue (bool sibcall_p)
 current_frame.fpu_mask, false, false);
 }
 
+  emit_insn (gen_blockage ());
   if (frame_pointer_needed)
 emit_insn (gen_unlink (frame_pointer_rtx));
   else if (fsize_with_regs)
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 356d75a..5dd6f56 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2016-11-17  Jeff Law  
+
+   PR target/47192
+   * gcc.target/m68k/pr47192.c: New test.
+
 2016-11-17  Toma Tabacu  
 
* gcc.target/mips/branch-cost-1.c (dg-options): Use (HAS_MOVN)
diff --git a/gcc/testsuite/gcc.target/m68k/pr47192.c 
b/gcc/testsuite/gcc.target/m68k/pr47192.c
new file mode 100644
index 000..5bbef58
--- /dev/null
+++ b/gcc/testsuite/gcc.target/m68k/pr47192.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mcpu=51qe -fdisable-ipa-pure-const 
-fdump-rtl-pro_and_epilogue" } */
+/* { dg-final { scan-rtl-dump-times "unspec_volatile" 1 "pro_and_epilogue"} } 
*/
+
+
+char F(short *ty);
+
+short V(char cmd)
+{
+  static short st256;
+  static short stc;
+  short sc;
+  short scd;
+  short d;
+
+  F(&sc);
+
+  if (cmd == 4)
+  {
+st256 = 0;
+d = 0;
+  }
+  else
+  {
+scd = sc - stc;
+if (scd < -128)
+{
+  scd += 256;
+}
+d = st256 >> 8;
+st256 -= d << 8;
+  }
+  stc = sc;
+  return d;
+}
+


RE: [PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent redundant testing of branch-cost-1.c.

2016-11-17 Thread Moore, Catherine
> gcc/testsuite/ChangeLog:
> 
> 2016-11-15  Toma Tabacu  
> 
>   * gcc.target/mips/branch-cost-1.c (dg-options): Use
> (HAS_MOVN) instead
>   of isa>=4, in order to downgrade to R5.
> 
> diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> index 61c3029..7f7ebbe 100644
> --- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> +++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c
> @@ -1,4 +1,4 @@
> -/* { dg-options "-mbranch-cost=1 isa>=4" } */
> +/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */
>  /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
>  NOMIPS16 int
>  foo (int x, int y, int z, int k)

I committed this patch for you.  Have you requested  write access to the 
repository?
Catherine


Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Jeff Law

On 11/17/2016 03:03 PM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote:

On 11/17/2016 11:24 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:

--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, 
gfc_formal_arglist *formal,
  for (f = formal; f; f = f->next)
n++;

-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
+  /* Take care not to call alloca with a zero argument.  */
+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);

  for (i = 0; i < n; i++)
new_arg[i] = NULL;


Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
and we don't rely on those pointers being distinct from other pointers.

On systems where alloca was implemented on top of malloc, alloca (0) would
cause collection of alloca'd objects that had gone out of scope.


Ouch.  Do we support any such systems as hosts?  If yes, can't we just
define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1)
on those systems and leave the sane hosts as is?
I would guess they're all dead as hosts for building GCC.  I was most 
familiar with hpux, but I'm pretty sure there were others as emacs 
(IIRC) had a replacement alloca for systems without it as a builtin. 
They probably all fall into the "retro-computing" bucket these days.


Essentially those systems worked by recording all the allocations as 
well as the frame depth at which they occurred.  The next time alloca 
was called, anything at a deeper depth than the current frame was released.


So even if we called alloca (0) unexpectedly, it's not going to cause 
anything to break.  Failing to call alloca (0) could run the system out 
of heap memory.  It's left as an exercise to the reader to ponder how 
that might happen -- it can and did happen building GCC "in the old days".


The point is warning on an alloca (0) may not be as clear cut as it 
might seem.  It's probably a reasonable thing to do on the host, but on 
a target, which might be embedded and explicitly overriding the builtin 
alloca, warning on alloca (0) is less of a slam dunk.



jeff


Re: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads

2016-11-17 Thread Andrew Burgess
* Claudiu Zissulescu  [2016-11-17 13:23:44 
+]:

> 
> > > Note on tests: It will be nice to add a test where the added
> > > peephole kicks in. If you consider to add this test to the current
> > > patch, please resubmit it.
> > 
> > There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
> > fail for me without the peephole, and work with the peephole.
> > 
> > The code generated for L/E ARC is slightly different than the code
> > generated for B/E ARC due to how the structures are laid out in
> > memory, so, for now I've made parts of the test B/E only.
> > 
> > In order to get code that is as efficient for L/E as B/E I'd end up
> > adding a whole new peeophole, I'd rather not do that - it would be
> > extra code to maintain for a combination CMEM+L/E that is not used.  I
> > figure we can come back to that if/when that combination ever becomes
> > interesting.  I'm hoping you'll be fine with that.
> > 
> 
> Sure, please go ahead and apply your patch after having the spaces converted 
> ;) 
> Claudiu

Thanks, committed as r242572.

Andrew


C++ PATCH for c++/78193 (inherited ctor regression on sparc32)

2016-11-17 Thread Jason Merrill
The issue here was that we were still introducing extra temporaries of
an empty class in a CALL_FROM_THUNK_P call to the inherited
constructor.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 312678eec4a6dac8fe05e2ecfde60a717543ae76
Author: Jason Merrill 
Date:   Thu Nov 17 17:18:26 2016 -0500

PR c++/78193 - inherited ctor regressions on sparc32.

* call.c (build_over_call): Don't set CALL_FROM_THUNK_P here.
(build_call_a): Set it here, and don't insert EMPTY_CLASS_EXPR.
(convert_like_real) [ck_rvalue]: Also pass non-addressable
types along directly.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index d25e2e7..97003e5 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -375,10 +375,18 @@ build_call_a (tree function, int n, tree *argarray)
 
   TREE_HAS_CONSTRUCTOR (function) = (decl && DECL_CONSTRUCTOR_P (decl));
 
+  if (current_function_decl && decl
+  && flag_new_inheriting_ctors
+  && DECL_INHERITED_CTOR (current_function_decl)
+  && (DECL_INHERITED_CTOR (current_function_decl)
+ == DECL_CLONED_FUNCTION (decl)))
+/* Pass arguments directly to the inherited constructor.  */
+CALL_FROM_THUNK_P (function) = true;
+
   /* Don't pass empty class objects by value.  This is useful
  for tags in STL, which are used to control overload resolution.
  We don't need to handle other cases of copying empty classes.  */
-  if (! decl || ! DECL_BUILT_IN (decl))
+  else if (! decl || ! DECL_BUILT_IN (decl))
 for (i = 0; i < n; i++)
   {
tree arg = CALL_EXPR_ARG (function, i);
@@ -6844,8 +6852,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, 
int argnum,
 constructor.  */
   if (current_function_decl
  && flag_new_inheriting_ctors
- && DECL_INHERITED_CTOR (current_function_decl)
- && TREE_ADDRESSABLE (totype))
+ && DECL_INHERITED_CTOR (current_function_decl))
return expr;
 
   /* Fall through.  */
@@ -8094,13 +8101,6 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   /* build_new_op_1 will clear this when appropriate.  */
   CALL_EXPR_ORDERED_ARGS (c) = true;
 }
-  if (current_function_decl
-  && flag_new_inheriting_ctors
-  && DECL_INHERITED_CTOR (current_function_decl)
-  && cand->num_convs)
-/* Don't introduce copies when passing arguments along to the inherited
-   constructor.  */
-CALL_FROM_THUNK_P (call) = true;
   return call;
 }
 


Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions

2016-11-17 Thread H.J. Lu
On Thu, Nov 17, 2016 at 4:20 AM, Andrew Senkevich
 wrote:
> 16 Ноя 2016 г. 19:21 пользователь "Bernd Schmidt" 
> написал:
>
>
>>
>> On 11/15/2016 05:31 PM, Andrew Senkevich wrote:
>>>
>>> 2016-11-15 17:56 GMT+03:00 Jeff Law :

 On 11/15/2016 05:55 AM, Andrew Senkevich wrote:
>
>
> 2016-11-11 14:16 GMT+03:00 Uros Bizjak :
>>
>>
>> --- a/gcc/genmodes.c
>> +++ b/gcc/genmodes.c
>> --- a/gcc/init-regs.c
>> +++ b/gcc/init-regs.c
>> --- a/gcc/machmode.h
>> +++ b/gcc/machmode.h
>>
>> These are middle-end changes, you will need a separate review for
>> these.
>
>
>
> Who could review these changes?


 I can.  I likely dropped the message because it looked x86 specific, so
 if
 you could resend it'd be appreciated.
>>>
>>>
>>> Attached (diff with previous only in fixed comments typos).
>>
>>
>> Next time please split middle-end changes out from target-related stuff
>> and send them separately.
>>
>> These ones are OK.
>>
>>
>> Bernd
>
> Hi HJ, could you please commit it?

Done.

-- 
H.J.


Re: [PATCH] Follow-up patch on enabling new AVX512 instructions

2016-11-17 Thread H.J. Lu
On Thu, Nov 17, 2016 at 4:26 AM, Andrew Senkevich
 wrote:
> 16 Ноя 2016 г. 23:45 пользователь "Uros Bizjak"  написал:
>>
>> On Tue, Nov 15, 2016 at 10:50 PM, Andrew Senkevich
>>  wrote:
>> > Hi,
>> >
>> > this is follow-up with tests for new __target__ attributes and
>> > __builtin_cpu_supports update.
>> >
>> > gcc/
>> > * config/i386/i386.c (processor_features): Add
>> > F_AVX5124VNNIW, F_AVX5124FMAPS.
>> > (isa_names_table): Handle new features.
>> > libgcc/
>> > * config/i386/cpuinfo.c (processor_features): Add
>> > FEATURE_AVX5124VNNIW, FEATURE_AVX5124FMAPS.
>> > gcc/testsuite/
>> > * gcc.target/i386/builtin_target.c: Handle new "avx5124vnniw",
>> > "avx5124fmaps".
>> > * gcc.target/i386/funcspec-56.inc: Test new attributes.
>>
>> OK.
>>
>> Thanks,
>> Uros.
>
> HJ, could you please commit this patch also?

Done.

-- 
H.J.


Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote:
> On 11/17/2016 11:24 AM, Jakub Jelinek wrote:
> >On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:
> >>--- a/gcc/fortran/interface.c
> >>+++ b/gcc/fortran/interface.c
> >>@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, 
> >>gfc_formal_arglist *formal,
> >>   for (f = formal; f; f = f->next)
> >> n++;
> >>
> >>-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
> >>+  /* Take care not to call alloca with a zero argument.  */
> >>+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);
> >>
> >>   for (i = 0; i < n; i++)
> >> new_arg[i] = NULL;
> >
> >Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
> >and we don't rely on those pointers being distinct from other pointers.
> On systems where alloca was implemented on top of malloc, alloca (0) would
> cause collection of alloca'd objects that had gone out of scope.

Ouch.  Do we support any such systems as hosts?  If yes, can't we just
define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1)
on those systems and leave the sane hosts as is?

Jakub


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-17 Thread Moore, Catherine


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Matthew Fortune
> Sent: Thursday, November 17, 2016 8:47 AM
> To: Toma Tabacu ; Andrew Bennett
> ; Moore, Catherine
> ; 'gcc-patches@gcc.gnu.org'  patc...@gcc.gnu.org>
> Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires
> standard library support check the sysroot supports the required test
> options.
> 
> Toma Tabacu  writes:
> > Hi,
> >
> > The patch below is a rebased version of Andrew's patch plus a few
> more changes
> > to test options.
> >
> > I have tested Andrew's patch by passing -msoft-float to the testsuite
> without
> > having a soft-float library available, and saw that the inline-memcpy-
> {1,2,3,4,5}.c
> > and memcpy-1.c tests were also failing to find standard library
> headers.
> > In the version below, I have added (REQUIRES_STDLIB) to them as
> well.
> >
> > However, I believe that the memcpy-1.c test was removed from the
> first version
> > of Andrew's patch in response to Matthew's comments, but I don't
> think it
> > should be.
> >
> > Tested with mips-img-linux-gnu and mips-mti-linux-gnu.
> 
> This looks OK to me but I then again I helped with the design for this.
> 
> I'd like to give Catherine a chance to take a look though as the feature
> is unusual.
> 
> One comment below.
> 
> > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> > b/gcc/testsuite/gcc.target/mips/mips.exp
> > index e22d782..ccd4ecb 100644
> > --- a/gcc/testsuite/gcc.target/mips/mips.exp
> > +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> > @@ -1420,6 +1426,22 @@ proc mips-dg-options { args } {
> > }
> >  }
> >
> > +# If the test is marked as requiring standard libraries check
> > +# that the sysroot has support for the current set of test options.
> > +if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } {
> > +   mips_push_test_options saved_options $extra_tool_flags
> > +   set output [mips_preprocess "" {
> > + #include 
> > +   } 1]
> > +   mips_pop_test_options saved_options
> > +
> > +   # If the preprocessing of the stdlib.h file produced errors mark
> > +   # the test as unsupported.
> > +   if { ![string equal $output ""] } {
> > +   set do_what [lreplace $do_what 1 1 "N"]
> 
> We should describe what we expect the format of do_what to be at
> the time
> of writing in case it ever changes. i.e. a comment to say what the
> second
> character means and therefore make it clear that setting it to 'n' makes
> the test unsupported.
> 

This patch looks okay to me after updating the comment as Matthew suggested.



Re: [PATCH, rs6000] Add built-in support for vector compare listed in the ABI

2016-11-17 Thread Segher Boessenkool
Hi Carl,

On Thu, Nov 17, 2016 at 01:19:57PM -0800, Carl E. Love wrote:
> 2016-11-15  Carl Love 

Two spaces between your name and email address.

> * gcc.target/powerpc/builtins-3.c : New file to test the new

No space before colon.

> built-ins for vecotr compare equal and vector compare not equal.

Typo ("vecotr").  Changelog is indented with tabs btw, not spaces.

>{ ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW,
>  RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, 
> RS6000_BTI_unsigned_V4SI, 0 },
>{ ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
> +RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
> +
> +  { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
>  RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
>{ ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
>  RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, 
> RS6000_BTI_unsigned_V2DI, 0 },

Why the blank line?

Otherwise looks fine.  Thanks,


Segher


C++ PATCH for c++/78124 (list-init and inherited ctor)

2016-11-17 Thread Jason Merrill
An inherited ctor makes the class non-aggregate.  We weren't catching
this before because we add the inheriting ctors after we set
CLASSTYPE_NON_AGGREGATE, but it's easy enough to set it earlier, here.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit d462f9acb6cdb05dcc49bee32f0741633de33aed
Author: Jason Merrill 
Date:   Thu Nov 17 16:23:09 2016 -0500

PR c++/78124 - list-initialization and inherited ctor

* name-lookup.c (do_class_using_decl): Set CLASSTYPE_NON_AGGREGATE.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index cb1b9fa..e5f9113 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5707,7 +5707,6 @@ extern bool type_has_user_nondefault_constructor (tree);
 extern tree in_class_defaulted_default_constructor (tree);
 extern bool user_provided_p(tree);
 extern bool type_has_user_provided_constructor  (tree);
-extern bool type_has_user_provided_or_explicit_constructor  (tree);
 extern bool type_has_non_user_provided_default_constructor (tree);
 extern bool vbase_has_user_provided_move_assign (tree);
 extern tree default_init_uninitialized_part (tree);
diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 7ad65b8..4f80e8d 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -3382,6 +3382,7 @@ do_class_using_decl (tree scope, tree name)
 {
   maybe_warn_cpp0x (CPP0X_INHERITING_CTORS);
   name = ctor_identifier;
+  CLASSTYPE_NON_AGGREGATE (current_class_type) = true;
 }
   if (constructor_name_p (name, current_class_type))
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C 
b/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C
new file mode 100644
index 000..1bb05a2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C
@@ -0,0 +1,16 @@
+// PR c++/78124
+// { dg-do compile { target c++11 } }
+
+struct base {
+explicit constexpr base(int&&) {}
+};
+
+struct derived: base {
+using base::base;
+};
+
+int main()
+{
+derived d { 0 };
+}
+


C++ PATCH for c++/68377 (parens in fold-expression)

2016-11-17 Thread Jason Merrill
finish_parenthesized_expr uses TREE_NO_WARNING to mark parenthesized
expressions, so I think it's reasonable to use that to recognize them
here.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 043b2830c4e296951a8209ade25b8d01394c7386
Author: Jason Merrill 
Date:   Thu Nov 17 15:25:00 2016 -0500

PR c++/68377 - parenthesized expr in fold-expression

* parser.c (cp_parser_fold_expression): Check TREE_NO_WARNING.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 9a5039f..3ab0b68 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -4679,7 +4679,9 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
   /* The operands of a fold-expression are cast-expressions, so binary or
  conditional expressions are not allowed.  We check this here to avoid
  tentative parsing.  */
-  if (is_binary_op (TREE_CODE (expr1)))
+  if (EXPR_P (expr1) && TREE_NO_WARNING (expr1))
+/* OK, the expression was parenthesized.  */;
+  else if (is_binary_op (TREE_CODE (expr1)))
 error_at (location_of (expr1),
  "binary expression in operand of fold-expression");
   else if (TREE_CODE (expr1) == COND_EXPR)
diff --git a/gcc/testsuite/g++.dg/cpp1z/fold8.C 
b/gcc/testsuite/g++.dg/cpp1z/fold8.C
new file mode 100644
index 000..e27db7a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/fold8.C
@@ -0,0 +1,15 @@
+// PR c++/68377
+// { dg-options -std=c++1z }
+
+struct Sink { } s;
+template  Sink& operator<<(Sink&, const T&);
+
+template 
+int f(Tx... xs) {
+  return ((xs+1) + ...);
+}
+
+int main() {
+  s << f(3,4,5) << "\n";
+  return 0;
+}


C++ PATCH for c++/78369 (ICE with {} default argument)

2016-11-17 Thread Jason Merrill
The C++17 copy elision code needed to handle CONSTRUCTORs.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 8b82502696188d92fc560cc55e0a564c507e6144
Author: Jason Merrill 
Date:   Thu Nov 17 15:37:56 2016 -0500

PR c++/78369 - {} as default argument

* call.c (build_special_member_call): Handle CONSTRUCTOR.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f6f4590..d25e2e7 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8317,7 +8317,8 @@ build_special_member_call (tree instance, tree name, 
vec **args,
   if (!reference_related_p (class_type, TREE_TYPE (arg)))
arg = perform_implicit_conversion_flags (class_type, arg,
 tf_warning, flags);
-  if (TREE_CODE (arg) == TARGET_EXPR
+  if ((TREE_CODE (arg) == TARGET_EXPR
+  || TREE_CODE (arg) == CONSTRUCTOR)
  && (same_type_ignoring_top_level_qualifiers_p
  (class_type, TREE_TYPE (arg
{
diff --git a/gcc/testsuite/g++.dg/overload/defarg11.C 
b/gcc/testsuite/g++.dg/overload/defarg11.C
new file mode 100644
index 000..26fac6e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/defarg11.C
@@ -0,0 +1,11 @@
+// PR c++/78369
+// { dg-do compile { target c++11 } }
+
+struct A { };
+inline void f(struct A a = {})  {}
+
+int main()
+{
+  f();
+  return 0;
+}


[v3 PATCH] LWG 2766, LWG 2749

2016-11-17 Thread Ville Voutilainen
The first patch does everything but the container adaptor parts (which
are wrong in the p/r) and the tuple part (which is icky). The second
part does the tuple parts, and needs some serious RFC. I know
it's ugly as sin, but it works. I don't think we should try to teach
our tuple to give the right answer for is_constructible etc., because
the last attempt at it broke boost and would break reasonable existing
code in addition to just boost - such things are not Stage 3 material,
imnsho. We, imho, have two options: either skip the tuple parts
of LWG 2766 or tolerate the ugliness of that second patch. I can
tolerate that ugliness, it's not quite so hideous as I feared.

2016-11-18  Ville Voutilainen  

Implement LWG 2766,
Swapping non-swappable types and LWG 2749,
swappable traits for variants.
* include/bits/stl_pair.h (swap(pair<_T1, _T2>&, pair<_T1, _T2>&)):
Add a deleted overload.
* include/bits/unique_ptr.h
(swap(unique_ptr<_Tp, _Dp>&, unique_ptr<_Tp, _Dp>&)): Likewise.
* include/std/array
(swap(array<_Tp, _Nm>&, array<_Tp, _Nm>&)): Likewise.
* include/std/optional
(swap(optional<_Tp>&, optional<_Tp>&)): Likewise.
* include/std/variant
(swap(variant<_Types...>&, variant<_Types...>&)): Likewise.
* testsuite/20_util/optional/swap/2.cc: Add tests for disabled
swaps.
* testsuite/20_util/pair/swap_cxx17.cc: New.
* testsuite/20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc:
Likewise.
* testsuite/20_util/variant/compile.cc: Add tests for disabled
swaps.
* testsuite/23_containers/array/specialized_algorithms/swap_cxx17.cc:
New.
* testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust.
* testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc:
Likewise.

2016-11-18  Ville Voutilainen  

Implement LWG 2766 for tuple.
* include/bits/move.h (swap(_Tp&, _Tp&)): Constrain
with __is_tuple_like.
* include/std/tuple (__is_tuple_like_impl, __is_tuple_like):
Move to type_traits.
(swap(tuple<_Elements...>&, tuple<_Elements...>&)): Add a deleted
overload.
* include/std/type_traits (__is_tuple_like_impl, __is_tuple_like):
New.
(swap(_Tp&, _Tp&)): Constrain with __is_tuple_like.
* include/std/utility (__is_tuple_like_impl): Move to type_traits.
* testsuite/20_util/tuple/swap_cxx17.cc: New.
diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index ef52538..44568dd 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -478,6 +478,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 swap(pair<_T1, _T2>& __x, pair<_T1, _T2>& __y)
 noexcept(noexcept(__x.swap(__y)))
 { __x.swap(__y); }
+
+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  template
+inline
+typename enable_if<__not_<__and_<__is_swappable<_T1>,
+__is_swappable<_T2>>>::value>::type
+swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete;
+#endif
 #endif // __cplusplus >= 201103L
 
   /**
diff --git a/libstdc++-v3/include/bits/unique_ptr.h 
b/libstdc++-v3/include/bits/unique_ptr.h
index f9ec60f..c6bf4b6 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -649,6 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 swap(unique_ptr<_Tp, _Dp>& __x,
 unique_ptr<_Tp, _Dp>& __y) noexcept
 { __x.swap(__y); }
+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  template
+inline
+typename enable_if<__not_<__is_swappable<_Dp>>::value>::type
+swap(unique_ptr<_Tp, _Dp>&,
+unique_ptr<_Tp, _Dp>&) = delete;
+#endif
 
   template
diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array
index 3ab0355..c43e281 100644
--- a/libstdc++-v3/include/std/array
+++ b/libstdc++-v3/include/std/array
@@ -287,6 +287,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 swap(array<_Tp, _Nm>& __one, array<_Tp, _Nm>& __two)
 noexcept(noexcept(__one.swap(__two)))
 { __one.swap(__two); }
+#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11
+  template
+inline
+typename enable_if<__not_<
+  typename _GLIBCXX_STD_C::__array_traits<_Tp, _Nm>::_Is_swappable>
+  ::value>::type
+swap(array<_Tp, _Nm>&, array<_Tp, _Nm>&) = delete;
+#endif
 
   template
 constexpr _Tp&
diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index ea673cc..191d64b 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -930,6 +930,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { __lhs.swap(__rhs); }
 
   template
+inline enable_if_t && is_swappable_v<_Tp>)>
+swap(optional<_Tp>&, optional<_Tp>&) = delete;
+
+  template
 constexpr optional>
 make_optional(_Tp&& __t)
 { return optional> { std::forward<_Tp>(__t) }; }
diff --git a/libstdc++-v3/include/std/varia

Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Jeff Law

On 11/17/2016 11:24 AM, Jakub Jelinek wrote:

On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:

--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, 
gfc_formal_arglist *formal,
   for (f = formal; f; f = f->next)
 n++;

-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
+  /* Take care not to call alloca with a zero argument.  */
+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);

   for (i = 0; i < n; i++)
 new_arg[i] = NULL;


Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
and we don't rely on those pointers being distinct from other pointers.
On systems where alloca was implemented on top of malloc, alloca (0) 
would cause collection of alloca'd objects that had gone out of scope.


There was a time when you'd find alloca (0); calls sprinkled through GCC 
on purpose.


Jeff




Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker

2016-11-17 Thread Jeff Law

On 11/08/2016 05:49 PM, Iain Sandoe wrote:



On 8 Nov 2016, at 13:39, Mike Stump  wrote:

On Nov 8, 2016, at 1:05 PM, Iain Sandoe 
wrote:


Simple for the simple case is already part of my patch, but
capability for the expert and non-simple case is also present,


I'm trying to ask a specific question, what does the patch allow
that can't be done otherwise?  Kinda a trivial question, and I
don't see any answer.

I'm not looking for, it allows it to work, or it makes the expert
case work.  I'm look for the specific question, and the specific
information you want, and why ld -v doesn't get it.


ld -v gets it when you can execute ld. It doesn’t get it when the
$host ld is not executable on $build.

Exactly!


Providing the option to give the version allows that without
requiring the complexity of other (possibly valid) solutions.  If you
know that you’re building (my patched) ld64-253.9 for powerpc-darwin9
(crossed from x86-64-darwin14) it’s easy, just put —with-ld64=253.9
..

I think we’ve debated this enough - I’m OK with keeping my extra
facility locally and will resubmit the patch with it removed in due
course, Iain
Your call. But ISTM the ability to specify the linker version or even 
better, its behaviour is a notable improvement for these crosses.


jeff


[PATCH, rs6000] Add built-in support for vector compare listed in the ABI

2016-11-17 Thread Carl E. Love
GCC maintainers:

The Power ABI document lists a number of built-ins that it is supposed
to support.  There are still some missing.  This patch adds the built-in
support for the following built-ins:

vector bool char  vec_cmpeq vector bool char   vector bool char
vector bool int   vec_cmpeq vector bool intvector bool int
vector bool long long vec_cmpeq vector bool long long  vector bool long long
vector bool short vec_cmpeq vector bool short  vector bool short
vector bool char  vec_cmpne vector bool char   vector bool char
vector bool int   vec_cmpne vector bool intvector bool int
vector bool long long vec_cmpne vector bool long long  vector bool long long
vector bool short vec_cmpne vector bool short  vector bool short

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions. 

Is this ok for trunk?

 Carl Love

gcc/ChangeLog:

2016-11-15  Carl Love 

* config/rs6000/rs6000-c.c : Add built-in support for vector compare
equal and vector compare not equal.  The vector compares take two
arguments of type vector bool char, vector bool short, vector bool int,
vector bool long long with the same return type.

gcc/testsuite/ChangeLog:

2016-11-15  Carl Love 

* gcc.target/powerpc/builtins-3.c : New file to test the new
built-ins for vecotr compare equal and vector compare not equal.
---
 gcc/config/rs6000/rs6000-c.c  | 19 +++-
 gcc/testsuite/gcc.target/powerpc/builtins-3.c | 68 +++
 2 files changed, 85 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 4bba293..6566279 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1107,15 +1107,24 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUB,
 RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 
RS6000_BTI_unsigned_V16QI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUB,
+RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH,
+RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH,
 RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH,
 RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 
0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW,
+RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW,
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW,
 RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 
0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
+RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
+
+  { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
 RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD,
 RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 
0 },
@@ -4486,9 +4495,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI,
 RS6000_BTI_V16QI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB,
+RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI,
+RS6000_BTI_bool_V16QI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB,
 RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI,
 RS6000_BTI_unsigned_V16QI, 0 },
-
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEH,
 RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI,
 RS6000_BTI_bool_V8HI, 0 },
@@ -4508,7 +4519,11 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW,
 RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
 RS6000_BTI_unsigned_V4SI, 0 },
-
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB,
+RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI,
+RS6000_BTI_bool_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNED,
+RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEF,
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNED,
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3.c
new file mode 100644
index 000..8d4b63d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3.c
@@ -0,0 +1,68 @@
+

Re: [PATCH] libgcc/config/bfin: use the generic linker version information (PR74748)

2016-11-17 Thread Jeff Law

On 11/06/2016 09:39 AM, Waldemar Brodkorb wrote:

Hi,

This commit makes the Blackfin platform use the generic linker version
information, rather than a completely duplicated file, specific for the
Blackfin architecture.

This is made possible using the newly introduced skip_underscore
variable of the mkmap-symver script.

This also allows to get a correct linker version file, with symbol names
matching the ones found in libgcc. Thanks to this, the necessary symbols
are marked "GLOBAL" instead of "LOCAL", which makes them visible at link
time, and solves a large number of "undefined reference"
issues. Indeed, the Blackfin specific linker version script had an extra
underscore in front of all symbols, which meant none of them matched the
symbols in libgcc, and therefore all libgcc symbols were marked as
"LOCAL", making them invisible for linking.

Signed-off-by: Thomas Petazzoni 
Tested-by: Waldemar Brodkorb 

 2016-11-06  Thomas Petazzoni 

PR gcc/74748
 * libgcc/config/bfin/libgcc-glibc.ver, libgcc/config/bfin/t-linux:
 use generic linker version information on Blackfin.


This is fine once the prereqs are approved.

jeff




Re: [PATCH] libgcc/mkmap-symver: support skip_underscore (PR74748)

2016-11-17 Thread Jeff Law

On 11/06/2016 09:32 AM, Waldemar Brodkorb wrote:

Hi,

Some platforms, such as Blackfin, have a special prefix for assembly
symbols as opposed to C symbols. For this reason, a function named
"foo()" in C will in fact be visible as a symbol called "_foo" in the
ELF binary.

The current linker version script logic in libgcc doesn't take into
account this situation properly. The Blackfin specific
libgcc/config/bfin/libgcc-glibc.ver has an additional "_" in front of
every symbol so that it matches the output of "nm" (which gets parsed to
produce the final linker version script). But due to this additional
"_", ld no longer matches with the symbols since "ld" does the matching
with the original symbol name, not the one prefixed with "_".

Due to this, none of the symbols in libgcc/config/bfin/libgcc-glibc.ver
are actually matched with symbols in libgcc. This causes all libgcc
symbols to be left as "LOCAL", which causes lots of "undefined
reference" whenever some C or C++ code that calls a function of libgcc
is compiled.

To address this, this commit introduces a "skip_underscore" variable to
the mkmap-symver script. It tells mkmap-symver to ignore the leading
underscore from the "nm" output.

Note that this new argument is different from the existing
"leading_underscore" argument, which *adds* an additional underscore to
the generated linker version script.

Having this functionality paves the way to using the generic linker
version information for Blackfin, instead of using a custom one.

Signed-off-by: Thomas Petazzoni 
Tested-by: Waldemar Brodkorb 

2016-11-06  Thomas Petazzoni 

PR gcc/74748
* libgcc/mkmap-symver.awk: add support for skip_underscore
AFAICT this skips the first character regardless of whether or not it is 
an underscore when skip_underscore is in effect, right.  Is that 
intentional?


Jeff


Re: [PATCH, wwwdocs] document changes appropriate for bug-fix releases

2016-11-17 Thread Jeff Law

On 11/05/2016 04:04 PM, Martin Sebor wrote:

The patch below documents the rule of thumb for what changes are
appropriate for bug-fix release branches that we discussed in
the gcc-patches thread Re: relax rule for flexible array members
in 6.x (78039 - fails to compile glibc tests)
  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01818.html

Please let me know if it accurately reflects the convention in
place of if you have suggestions for changes.

This is fine.  Please install if you haven't already

jeff



[PATCH v3] bb-reorder: Improve compgotos pass (PR71785)

2016-11-17 Thread Segher Boessenkool
For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to.  This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler.  Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.

This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher.  For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.

This patch rewrites the algorithm to deal with this.  It also makes it
simpler: it does not need the "candidates" array anymore, it does not
need RTL layout mode, it does not need cleanup_cfg, and it does not
need to keep track of what blocks it already visited.

Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?


Segher


2016-11-17  Segher Boessenkool  

PR rtl-optimization/71785
* bb-reorder.c (maybe_duplicate_computed_goto): New function.
(duplicate_computed_gotos): New function.
(pass_duplicate_computed_gotos::execute): Rewrite.

---
 gcc/bb-reorder.c | 214 ---
 1 file changed, 93 insertions(+), 121 deletions(-)

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 85bc569..90de2de 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -2587,11 +2587,100 @@ make_pass_reorder_blocks (gcc::context *ctxt)
   return new pass_reorder_blocks (ctxt);
 }
 
+/* Duplicate a block (that we already know ends in a computed jump) into its
+   predecessors, where possible.  Return whether anything is changed.  */
+static bool
+maybe_duplicate_computed_goto (basic_block bb, int max_size)
+{
+  if (single_pred_p (bb))
+return false;
+
+  /* Make sure that the block is small enough.  */
+  rtx_insn *insn;
+  FOR_BB_INSNS (bb, insn)
+if (INSN_P (insn))
+  {
+   max_size -= get_attr_min_length (insn);
+   if (max_size < 0)
+  return false;
+  }
+
+  bool changed = false;
+  edge e;
+  edge_iterator ei;
+  for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); )
+{
+  basic_block pred = e->src;
+
+  /* Do not duplicate BB into PRED if that is the last predecessor, or if
+we cannot merge a copy of BB with PRED.  */
+  if (single_pred_p (bb)
+ || !single_succ_p (pred)
+ || e->flags & EDGE_COMPLEX
+ || pred->index < NUM_FIXED_BLOCKS
+ || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred
+   {
+ ei_next (&ei);
+ continue;
+   }
+
+  if (dump_file)
+   fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n",
+bb->index, e->src->index);
+
+  /* Remember if PRED can be duplicated; if so, the copy of BB merged
+with PRED can be duplicated as well.  */
+  bool can_dup_more = can_duplicate_block_p (pred);
+
+  /* Make a copy of BB, merge it into PRED.  */
+  basic_block copy = duplicate_block (bb, e, NULL);
+  emit_barrier_after_bb (copy);
+  reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
+  merge_blocks (pred, copy);
+
+  changed = true;
+
+  /* Try to merge the resulting merged PRED into further predecessors.  */
+  if (can_dup_more)
+   maybe_duplicate_computed_goto (pred, max_size);
+}
+
+  return changed;
+}
+
 /* Duplicate the blocks containing computed gotos.  This basically unfactors
computed gotos that were factored early on in the compilation process to
-   speed up edge based data flow.  We used to not unfactoring them again,
-   which can seriously pessimize code with many computed jumps in the source
-   code, such as interpreters.  See e.g. PR15242.  */
+   speed up edge based data flow.  We used to not unfactor them again, which
+   can seriously pessimize code with many computed jumps in the source code,
+   such as interpreters.  See e.g. PR15242.  */
+static void
+duplicate_computed_gotos (function *fun)
+{
+  /* We are estimating the length of uncond jump insn only once
+ since the code for getting the insn length always returns
+ the minimal length now.  */
+  if (uncond_jump_length == 0)
+uncond_jump_length = get_uncond_jump_length ();
+
+  /* Never copy a block larger than this.  */
+  int max_size
+= uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS);
+
+  bool changed = false;
+
+  /* Try to duplicate all blocks that end in a computed jump and that
+ can be duplicated at all.  */
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+if (computed_jump_p (BB_END (bb))

Re: [PATCH v2] bb-reorder: Improve compgotos pass (PR71785)

2016-11-17 Thread Segher Boessenkool
On Wed, Nov 16, 2016 at 11:52:23AM -0700, Jeff Law wrote:
> >2016-10-30  Segher Boessenkool  
> >
> > PR rtl-optimization/71785
> > * bb-reorder.c (duplicate_computed_gotos_find_candidates): New
> > function, factored out from pass_duplicate_computed_gotos::execute.
> > (duplicate_computed_gotos_do_duplicate): Ditto.  Don't use 
> > BB_VISITED.
> > (pass_duplicate_computed_gotos::execute): Rewrite.  Rerun the pass as
> > long as it makes changes.
> OK.  I'm just going to note for the record here that while we iterate 
> until nothing changes, the statement and block clamps should in practice 
> ensure we hit a point where nothing changes.

It is hard limited as well, the loop is not infinite.

Anyway, I said I'd revisit this in stage 3; posting the v3 patch now.


Segher


Re: [PATCH, GCC/ARM 2/2, ping3] Allow combination of aprofile and rmprofile multilibs

2016-11-17 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 24/10/16 09:07, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile ARM
architectures. This specific patch is concerned with the latter. The patch
works
by moving the bits shared by both aprofile and rmprofile multilib build
(variable initilization as well as ISA and float ABI to build multilib for)
to a
new t-multilib file. Then, based on which profile was requested in
--with-multilib-list option, that files includes t-aprofile and/or t-rmprofile
where the architecture and FPU to build the multilib for are specified.

Unfortunately the duplication of CPU to A profile architectures could not be
avoided because substitution due to MULTILIB_MATCHES are not transitive.
Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have
the expected effect. Two patches were written to allow this using 2 different
approaches but I decided against it because this is not the right solution IMO.
See caveats below for what I believe is the correct approach.


*** combined build caveats ***

As the documentation in this patch warns, there is a few caveats to using a
combined multilib build due to the way the multilib framework works.

1) For instance, when using only rmprofile the combination of options -mthumb
-march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined
multilib build the default multilib would be used. This is because in the
rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the
option is ignored when considering MULTILIB_REQUIRED entries.

2) Another issue is the fact that aprofile and rmprofile multilib build have
some conflicting requirements in terms of how to map options for which no
multilib is built to another option. (i) A first example of this is the
difference of CPU to architecture mapping mentionned above: rmprofile multilib
build needs A profile CPUs and architectures to be mapped down to ARMv7 so that
one of the v7-ar multilib gets chosen in such a case but aprofile needs A
profile architectures to stand on their own because multilibs are built for
several architectures.

(ii) Another example of this is that in aprofile multilib build no multilib is
built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16.
Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16.
However,
rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16.
This
has the consequence that when building for -mthumb -march=armv7e-m
-mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is
rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and
there
is no multilib for that.

Both of these issues could be handled by using MULTILIB_REUSE instead of
MULTILIB_MATCHES but this would require a large set of rules. I believe instead
the right approach is to create a new mechanism to inform GCC on how options
can
be down mapped _when no multilib can be found_ which would require a smaller
set
of rules and would make it explicit that the options are not equivalent. A
patch
will be posted to this effect at a later time.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow combinations of aprofile and rmprofile values for
--with-multilib-list.
* config/arm/t-multilib: New file.
* config/arm/t-aprofile: Remove initialization of MULTILIB_*
variables.  Remove setting of ISA and floating-point ABI in
MULTILIB_OPTIONS and MULTILIB_DIRNAMES.  Set architecture and FPU in
MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS
and MULTILIB_DIRNAMES respectively.  Add comment to introduce all
matches.  Add architecture matches for marvel-pj4 and generic-armv7-a
CPU options.
* config/arm/t-rmprofile: Likewise except for the matches changes.
* doc/install.texi (--with-multilib-list): Document the combination of
aprofile and rmprofile values and warn about pitfalls in doing that.


Testing:

* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile
and rmprofile,aprofile
* default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and
rmprofile,aprofile

* Difference in --print-multi-directory between aprofile or rmprofile and
aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU,
FPU and float ABI is as expected (best multilib for the combination is chosen),
modulo the caveat 

Re: [PATCH, GCC/ARM 1/2, ping3] Add multilib support for embedded bare-metal targets

2016-11-17 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 08/11/16 13:36, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 02/11/16 10:05, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 27/10/16 15:26, Thomas Preudhomme wrote:

Hi Kyrill,

On 27/10/16 10:45, Kyrill Tkachov wrote:

Hi Thomas,

On 24/10/16 09:06, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 13/10/16 16:35, Thomas Preudhomme wrote:

Hi ARM maintainers,

This patchset aims at adding multilib support for R and M profile ARM
architectures and allowing it to be built alongside multilib for A profile
ARM
architectures. This specific patch adds the t-rmprofile multilib Makefile
fragment for the former objective. Multilib are built for all M profile
architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7
multilib is used for R profile architectures but also A profile
architectures.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme 

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile
value
for ARM.


Testing:

== aprofile ==
* "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the
patchset for both aprofile and rmprofile
* default spec (gcc -dumpspecs) is the same before and after the patchset for
aprofile
* No difference in --print-multi-directory between before and after the
patchset
for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU
and
float ABI

== rmprofile ==
* aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float
ABI) and directory naming
* Difference in --print-multi-directory between before [1] and after the
patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture,
CPU, FPU and float ABI modulo the name and directory structure changes

[1] as per patch applied in ARM embedded branches
https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup






== aprofile + rmprofile ==
* aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is
not supported


Is this ok for master branch?

Best regards,

Thomas


+# Arch Matches
+MULTILIB_MATCHES   += march?armv6s-m=march?armv6-m
+MULTILIB_MATCHES   += march?armv8-m.main=march?armv8-m.main+dsp
+MULTILIB_MATCHES   += march?armv7=march?armv7-r
+ifeq (,$(HAS_APROFILE))
+MULTILIB_MATCHES   += march?armv7=march?armv7-a
+MULTILIB_MATCHES   += march?armv7=march?armv7ve
+MULTILIB_MATCHES   += march?armv7=march?armv8-a
+MULTILIB_MATCHES   += march?armv7=march?armv8-a+crc
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a
+MULTILIB_MATCHES   += march?armv7=march?armv8.1-a+crc
+endif

I think you want to update the patch to handle -march=armv8.2-a and
armv8.2-a+fp16
Thanks,
Kyrill


Indeed. Please find updated ChangeLog and patch (attached):

*** gcc/ChangeLog ***

2016-10-03  Thomas Preud'homme  

* config.gcc: Allow new rmprofile value for configure option
--with-multilib-list.
* config/arm/t-rmprofile: New file.
* doc/install.texi (--with-multilib-list): Document new rmprofile value
for ARM.

Ok for trunk?

Best regards,

Thomas
diff --git a/gcc/config.gcc b/gcc/config.gcc
index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3739,6 +3739,16 @@ case "${target}" in
 # pragmatic.
 tmake_profile_file="arm/t-aprofile"
 ;;
+			rmprofile)
+# Note that arm/t-rmprofile is a
+# stand-alone make file fragment to be
+# used only with itself.  We do not
+# specifically use the
+# TM_MULTILIB_OPTION framework because
+# this shorthand is more
+# pragmatic.
+tmake_profile_file="arm/t-rmprofile"
+;;
 			default)
 ;;
 			*)
@@ -3748,9 +3758,10 @@ case "${target}" in
 			esac
 
 			if test "x${tmake_profile_file}" != x ; then
-# arm/t-aprofile is only designed to work
-# without any with-cpu, with-arch, with-mode,
-# with-fpu or with-float options.
+# arm/t-aprofile and arm/t-rmprofile are only
+# designed to work without any with-cpu,
+# with-arch, with-mode, with-fpu or with-float
+# options.
 if test "x$with_arch" != x \
 || test "x$with_cpu" != x \
 || test "x$with_float" != x \
diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile
new file mode 100644
index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c
--- /dev/null
+++ b/gcc/config/arm/t-rmprofile
@@ -0,0 +1,174 @@
+# Copyright (C) 2016 Free Software Foundation, Inc.
+#
+# This file is part of GCC.
+#
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either 

Re: [PATCH, ARM/testsuite 6/7, ping3] Force soft float in ARMv6-M and ARMv8-M Baseline options

2016-11-17 Thread Thomas Preudhomme

Ping?

Best regards,

Thomas

On 08/11/16 13:35, Thomas Preudhomme wrote:

Ping,

Best regards,

Thomas

On 02/11/16 10:04, Thomas Preudhomme wrote:

Ping?

Best regards,

Thomas

On 28/10/16 10:49, Thomas Preudhomme wrote:

On 22/09/16 16:47, Richard Earnshaw (lists) wrote:

On 22/09/16 15:51, Thomas Preudhomme wrote:

Sorry, noticed an error in the patch. It was not caught during testing
because GCC was built with --with-mode=thumb. Correct patch attached.

Best regards,

Thomas

On 22/09/16 14:49, Thomas Preudhomme wrote:

Hi,

ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the
arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like
-mthumb is
passed for architectures that only support Thumb instruction set. This
patch
adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add
options.
Patch is in attachment.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2016-07-15  Thomas Preud'homme  

* lib/target-supports.exp (add_options_for_arm_arch_v6m): Add
-mfloat-abi=soft option.
(add_options_for_arm_arch_v8m_base): Likewise.


Is this ok for trunk?

Best regards,

Thomas


6_softfloat_testing_v6m_v8m_baseline.patch


diff --git a/gcc/testsuite/lib/target-supports.exp
b/gcc/testsuite/lib/target-supports.exp
index
0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398


100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } {
 # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */
 #/* { dg-add-options arm_arch_v5 } */
 # /* { dg-require-effective-target arm_arch_v5_multilib } */
-foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__
- v4t "-march=armv4t" __ARM_ARCH_4T__
- v5 "-march=armv5 -marm" __ARM_ARCH_5__
- v5t "-march=armv5t" __ARM_ARCH_5T__
- v5te "-march=armv5te" __ARM_ARCH_5TE__
- v6 "-march=armv6" __ARM_ARCH_6__
- v6k "-march=armv6k" __ARM_ARCH_6K__
- v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
- v6z "-march=armv6z" __ARM_ARCH_6Z__
- v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__
- v7a "-march=armv7-a" __ARM_ARCH_7A__
- v7r "-march=armv7-r" __ARM_ARCH_7R__
- v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
- v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
- v8a "-march=armv8-a" __ARM_ARCH_8A__
- v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
- v8m_base "-march=armv8-m.base -mthumb"
__ARM_ARCH_8M_BASE__
- v8m_main "-march=armv8-m.main -mthumb"
__ARM_ARCH_8M_MAIN__ } {
+foreach { armfunc armflag armdef } {
+v4 "-march=armv4 -marm" __ARM_ARCH_4__
+v4t "-march=armv4t" __ARM_ARCH_4T__
+v5 "-march=armv5 -marm" __ARM_ARCH_5__
+v5t "-march=armv5t" __ARM_ARCH_5T__
+v5te "-march=armv5te" __ARM_ARCH_5TE__
+v6 "-march=armv6" __ARM_ARCH_6__
+v6k "-march=armv6k" __ARM_ARCH_6K__
+v6t2 "-march=armv6t2" __ARM_ARCH_6T2__
+v6z "-march=armv6z" __ARM_ARCH_6Z__
+v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__
+v7a "-march=armv7-a" __ARM_ARCH_7A__
+v7r "-march=armv7-r" __ARM_ARCH_7R__
+v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__
+v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__
+v8a "-march=armv8-a" __ARM_ARCH_8A__
+v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
+v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
__ARM_ARCH_8M_BASE__
+v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } {
 eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] {
 proc check_effective_target_arm_arch_FUNC_ok { } {
 if { [ string match "*-marm*" "FLAG" ] &&



I think if you're going to do this you need to also check that changing
the ABI in this way isn't incompatible with other aspects of how the
user has invoked dejagnu.


The reason this patch was made is that without it dg-require-effective-target
arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain
but then any testcase containing a function for such a target (such as the
atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not
support hard float ABI.

I see 2 ways to fix this:

1) the approach taken in this patch, ie saying that to select ARMv8-M baseline
architecture you need the right -march, -mthumb but also the right float ABI.

Note that the comment at the top of that procedure says:
# Creates a series of routines that return 1 if the given architecture
# can be selected and a routine to give the flags to select that architecture

2) Add a function to the assembly that is used to test support for the
architecture.

The reason I favor t

Re: [PATCH, GCC/ARM] Make arm_feature_set agree with type of FL_* macros

2016-11-17 Thread Thomas Preudhomme



On 17/11/16 09:10, Kyrill Tkachov wrote:


On 16/11/16 18:09, Thomas Preudhomme wrote:

Hi,

I've rebased the patch to make arm_feature_set agree with type of FL_* macros
on top of trunk rather than on top of the optional -mthumb patch. That
involved doing the changes to gcc/config/arm/arm-protos.h rather than
gcc/config/arm/arm-flags.h. I also took advantage of the fact that each line
is changed to change the indentation to tabs and add dots in comments missing
one.

For reference, please find below the original patch description:

Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array
of 2 unsigned long. However, the flags stored in these two entries are
(signed) int, being combinations of bits set via expression of the form 1 <<
bitno. This creates 3 issues:

1) undefined behavior when setting the msb (1 << 31)
2) undefined behavior when storing a flag with msb set (negative int) into one
of the unsigned array entries (positive int)
3) waste of space since the top 32 bits of each entry is not used

This patch changes the definition of FL_* macro to be unsigned int by using
the form 1U << bitno instead and changes the definition of arm_feature_set to
be an array of 2 unsigned (int) entries.

*** gcc/ChangeLog ***

2016-10-15  Thomas Preud'homme  

* config/arm/arm-protos.h (FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M,
FL_MODE26, FL_MODE32, FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED,
FL_STRONG, FL_ARCH5E, FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF,
FL_ARCH6K, FL_THUMB2, FL_NOTM, FL_THUMB_DIV, FL_VFPV3, FL_NEON,
FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV, FL_ARCH8, FL_CRC32, FL_SMALLMUL,
FL_NO_VOLATILE_CE, FL_IWMMXT, FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1,
FL2_ARCH8_2, FL2_FP16INST): Reindent comment, add final dot when
missing and make value unsigned.
(arm_feature_set): Use unsigned entries instead of unsigned long.


Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state.

Is this ok for trunk?

Best regards,

Thomas

On 14/11/16 18:56, Thomas Preudhomme wrote:

My apologize, I realized when trying to apply the patch that I wrote it on top
of the optional -mthumb patch instead of the reverse. I'll rebase it to not
screw up bisect.

Best regards,

Thomas

On 14/11/16 14:47, Kyrill Tkachov wrote:


On 14/11/16 14:07, Thomas Preudhomme wrote:

Hi,

Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array
of 2 unsigned long. However, the flags stored in these two entries are
(signed) int, being combinations of bits set via expression of the form 1 <<
bitno. This creates 3 issues:

1) undefined behavior when setting the msb (1 << 31)
2) undefined behavior when storing a flag with msb set (negative int) into one
of the unsigned array entries (positive int)
3) waste of space since the top 32 bits of each entry is not used

This patch changes the definition of FL_* macro to be unsigned int by using
the form 1U << bitno instead and changes the definition of arm_feature_set to
be an array of 2 unsigned (int) entries.

Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state.

Is this ok for trunk?



Ok.
Thanks,
Kyrill


Best regards,

Thomas




+#define FL_ARCH7  (1U << 22)/* Architecture 7.  */
+#define FL_ARM_DIV(1U << 23)/* Hardware divide (ARM mode).  */
+#define FL_ARCH8  (1U << 24)/* Architecture 8.  */
+#define FL_CRC32  (1U << 25)/* ARMv8 CRC32 instructions.  */
+#define FL_SMALLMUL   (1U << 26)/* Small multiply supported.  */
+#define FL_NO_VOLATILE_CE  (1U << 27)/* No volatile memory in IT block.  */
+
+#define FL_IWMMXT (1U << 29)/* XScale v2 or "Intel Wireless MMX
+   technology".  */
+#define FL_IWMMXT2(1U << 30)/* "Intel Wireless MMX2
+technology".  */
+#define FL_ARCH6KZ(1U << 31)/* ARMv6KZ architecture.  */
+
+#define FL2_ARCH8_1   (1U << 0)/* Architecture 8.1.  */
+#define FL2_ARCH8_2   (1U << 1)/* Architecture 8.2.  */
+#define FL2_FP16INST  (1U << 2)/* FP16 Instructions for ARMv8.2 and
+   later.  */

Since you're here can you please replace the "Architecture " by
"ARMv(7,8,8.1,8.2)-A"
in these entries.


That seems to make it less accurate though. For a start, most of them would also 
apply to the R profile. There is also a couple of them that would apply to the M 
profile: for instance FL_ARCH7 would be set for ARMv7-M and FL_ARCH8 would be 
set for ARMv8-M.


Best regards,

Thomas


Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")

2016-11-17 Thread Jason Merrill
OK.

On Thu, Nov 17, 2016 at 3:25 PM, Paolo Carlini  wrote:
> Hi,
>
> On 17/11/2016 18:14, Jason Merrill wrote:
>>
>> How about changing cp_parser_non_integral_constant_expression to issue
>> a pedwarn instead of error for NIC_FLOAT?
>
> Indeed. And a relevant observation is that we don't pass NIC_FLOAT from
> anywhere else in the parser, please correct me if I'm wrong. The below is
> finishing testing, all good so far.
>
> Thanks,
> Paolo.
>
> 


Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")

2016-11-17 Thread Paolo Carlini

Hi,

On 17/11/2016 18:14, Jason Merrill wrote:

How about changing cp_parser_non_integral_constant_expression to issue
a pedwarn instead of error for NIC_FLOAT?
Indeed. And a relevant observation is that we don't pass NIC_FLOAT from 
anywhere else in the parser, please correct me if I'm wrong. The below 
is finishing testing, all good so far.


Thanks,
Paolo.


Index: cp/parser.c
===
--- cp/parser.c (revision 242540)
+++ cp/parser.c (working copy)
@@ -3028,8 +3028,9 @@ cp_parser_non_integral_constant_expression (cp_par
  switch (thing)
{
  case NIC_FLOAT:
-   error ("floating-point literal "
-  "cannot appear in a constant-expression");
+   pedwarn (input_location, OPT_Wpedantic,
+"ISO C++ forbids using a floating-point literal "
+"in a constant-expression");
return true;
  case NIC_CAST:
error ("a cast to a type other than an integral or "
Index: testsuite/g++.dg/parse/pr55080.C
===
--- testsuite/g++.dg/parse/pr55080.C(revision 0)
+++ testsuite/g++.dg/parse/pr55080.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/55080
+// { dg-options "-std=c++98 -pedantic" }
+
+class B {
+ static const int c = 3.1415926; // { dg-warning "constant-expression" }
+};


Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr

2016-11-17 Thread Thomas Preudhomme

Hi Kyrill,

I've committed the following updated patch where the test is restricted to Thumb 
execution mode and skipping it if not possible since -mtpcs-leaf-frame is only 
available in Thumb mode. I've considered the change obvious.


*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
being live in the function and lr needing to be saved.  Distinguish
between already saved pushable registers and registers to push.
Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933-1.c: New test.
* gcc.target/arm/pr77933-2.c: Likewise.

Best regards,

Thomas

On 17/11/16 10:04, Kyrill Tkachov wrote:


On 09/11/16 16:41, Thomas Preudhomme wrote:

I've reworked the patch following comments from Wilco [1] (sorry could not
find it in my MUA for some reason).

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html


== Context ==

When saving registers, function thumb1_expand_prologue () aims at minimizing
the number of push instructions. One of the optimization it does is to push LR
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add LR to
the pushable_regs mask if it is live just before pushing the registers in that
mask. The mask of live pushable registers which is used to detect whether LR
needs to be saved is then clear to ensure LR is only saved once.


== Problem ==

However beyond deciding what register to push pushable_regs is used to track
what pushable register can be used to move a high register before being
pushed, hence the name. That mask is cleared when all high registers have been
assigned a low register but the clearing assumes the high registers were
assigned to the registers with the biggest number in that mask. This is not
the case because LR is not considered when looking for a register in that
mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE path above
yet the mask of live pushable registers is not cleared in that case.


== Solution ==

This patch changes the loop to iterate over register LR to r0 so as to both
fix the stack corruption reported in PR77933 and reuse lr to push some high
register when possible. This patch also introduce a new variable
lr_needs_saving to record whether LR (still) needs to be saved at a given
point in code and sets the variable accordingly throughout the code, thus
fixing the second issue. Finally, this patch create a new push_mask variable
to distinguish between the mask of registers to push and the mask of live
pushable registers.


== Note ==

Other bits could have been improved but have been left out to allow the patch
to be backported to stable branch:

(1) using argument registers that are not holding an argument
(2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask
(low register push) and push_mask
(3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0
(4) rename l_mask to a more appropriate name (live_pushable_regs_mask?)

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr
being live in the function and lr needing to be saved. Distinguish
between already saved pushable registers and registers to push.
Check for LR being an available pushable register.


*** gcc/testsuite/ChangeLog ***

2016-11-08  Thomas Preud'homme  

PR target/77933
* gcc.target/arm/pr77933-1.c: New test.
* gcc.target/arm/pr77933-2.c: Likewise.


Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0

Is this ok for trunk?



Ok.
Thanks,
Kyrill


Best regards,

Thomas

On 02/11/16 17:08, Thomas Preudhomme wrote:

Hi,

When saving registers, function thumb1_expand_prologue () aims at minimizing the
number of push instructions. One of the optimization it does is to push lr
alongside high register(s) (after having moved them to low register(s)) when
there is no low register to save. The way this is implemented is to add lr to
the list of registers that can be pushed just before the push happens. This
would then push lr and allows it to be used for further push if there was not
enough registers to push all high registers to be pushed.

However, the logic that decides what register to move high registers to before
being pushed only looks at low registers (see for loop initialization). This
means not only that lr is not used for pushing high registers but also that lr
is not removed from the list of registers to be pushed when it's not used. This
extra lr push is not poped in epilogue leading in stack corruption.

This patch changes the loop to iterate over register r0 to lr so as

Re: [PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode

2016-11-17 Thread Thomas Preudhomme

Hi Christophe,

On 17/11/16 13:36, Christophe Lyon wrote:

On 16 November 2016 at 10:39, Kyrill Tkachov
 wrote:


On 09/11/16 16:19, Thomas Preudhomme wrote:


Hi,

This patch fixes the following ICE when building when compiling an empty
FIQ interrupt handler in ARM mode:

empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints:
 }
 ^

(insn/f 13 12 14 (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3}
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4])))
(nil)))

The ICE was provoked by missing an alternative to reflect that ARM mode
can do an add of general register into sp which is unpredictable in Thumb
mode add immediate.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* config/arm/arm.md (arm_addsi3): Add alternative for addition of
general register with general register or ARM constant into SP
register.


*** gcc/testsuite/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* gcc.target/arm/empty_fiq_handler.c: New test.


Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression.

Is this ok for trunk?



I see that "r" does not include the stack pointer (STACK_REG is separate
from GENERAL_REGs) so we are indeed missing
that constraint.

Ok for trunk.
Thanks,
Kyrill


Best regards,

Thomas





Hi Thomas,

The new test fails when compiled with -mthumb -march=armv5t:
gcc.target/arm/empty_fiq_handler.c: In function 'fiq_handler':
gcc.target/arm/empty_fiq_handler.c:11:1: error: interrupt Service
Routines cannot be coded in Thumb mode


Right, interrupt handler can only be compiled in the mode where the CPU boots. 
So for non Thumb-only targets it should be compiled with -marm. I'll push a 
patch tomorrow.


Best regards,

Thomas


Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts

2016-11-17 Thread Jason Merrill

On 11/17/2016 04:34 AM, Chung-Lin Tang wrote:

+ /* These clauses really need a positive integral constant expression,
+but we don't have a predicate for that (yet).  */


What do you mean?  Don't you check this in finish_omp_clauses after the 
tsubst?


Jason



Re: [PATCH] PR target/78101, Fix PowerPC power9-fusion support

2016-11-17 Thread Segher Boessenkool
Hi!

On Thu, Nov 17, 2016 at 01:55:25PM -0500, Michael Meissner wrote:
> Are these patches ok to change into the trunk?  Since the bug shows up in GCC
> 6.x, can I apply and submit these patches to the GCC 6.x branch?

This is okay for trunk.  Please watch that for regressions for a week
or so before backporting (unless it is urgent for 6?)

One question...

> +case DFmode:
> +  if ((!TARGET_POWERPC64 && !TARGET_DF_FPR) || !TARGET_P9_FUSION)
> + return 0;
> +  break;

>  case DFmode:
> -  if (!TARGET_DF_FPR)
> +  if (!TARGET_POWERPC64 && !TARGET_DF_FPR)
>   return 0;
>break;

These conditions aren't obvious, could you add a comment please?


Segher


[committed] strncmp builtin expansion improvement

2016-11-17 Thread Aaron Sawdey
Committed to trunk as 242556 after removing the use->clobber change
from cmpstrnsi and bootstrap/regtest.


gcc/ChangeLog
2016-11-17  Aaron Sawdey  

* config/i386/i386.md (cmpstrnsi): New test to bail out if neither
string input is a string constant.

2016-11-17  Aaron Sawdey  

* builtins.c (expand_builtin_strncmp): Attempt expansion of strncmp
via cmpstrnsi even if neither string is constant.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[PATCH] PR target/78101, Fix PowerPC power9-fusion support

2016-11-17 Thread Michael Meissner
This patch fixes the problem reported in PR 78101 where the power9-fusion
support generates an insn that isn't matched:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101

It also fixes the bug that Andrew Stubbs reported.
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01367.html

There were two bugs in the code:

1)  The fusion peephole for SFmode and DFmode was inconsistant about
whether those values were allowed in GPRs if software floating point is
used.

2)  The power9 fusion store insn had an early clobber in the match_scratch,
which prevented having an address that uses a register, does ADDIS to
add to the upper 16 bits, and then folds the lower ADDI into the store
operation would not work if the address used the scratch register.

In addition to the two bugs, the fusion code had been written much earlier than
the support for the new ISA 3.0 scalar d-form (register+offset) instructions,
and the fusion code did not match these types of stores.  I have fixed this, so
that those memory references can also be fused.

I have bootstraped the compiler with these changes and there were no
regressions on the following systems:
1)  Little endian power8
2)  Big endian power8 (no support for 32-bit libraries)
3)  Big endian power7 (support for 32-bit libraries)

I have also built and ran spec 2006 CPU tests with this option enabled, and
they run fine, with some minor performance changes on power8 using power9
fusion.

I have built the cam4_r and cam4_s benchmarks of the next generation Spec (kit
102) and they now compile fine with -mpower9-fusion (they were the source of
the bug 78101).

Are these patches ok to change into the trunk?  Since the bug shows up in GCC
6.x, can I apply and submit these patches to the GCC 6.x branch?

[gcc]
2016-11-17  Michael Meissner  

PR target/78101
* config/rs6000/predicates.md (fusion_addis_mem_combo_load): Add
the appropriate checks for SFmode/DFmode load/stores in GPR
registers.
(fusion_addis_mem_combo_store): Likewise.
* config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Rename
fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar
d-form instructions for traditional Altivec registers.
(emit_fusion_p9_load): Likewise.
(emit_fusion_p9_store): Likewise.
* config/rs6000/rs6000.md (p9 fusion store peephole2): Remove
early clobber from scratch register.  Do not match if the register
being stored is the scratch register.
(fusion_vsx___load): Rename fusion_fpr_*
to fusion_vsx_* and add in support for ISA 3.0 scalar d-form
instructions for traditional Altivec registers.
(fusion_fpr___load): Likewise.
(fusion_vsx___store): Likewise.
(fusion_fpr___store): Likewise.

[gcc/testsuite]
2016-11-17  Michael Meissner  

PR target/78101
* gcc.target/powerpc/fusion4.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md 
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revision 242456)
+++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy)
@@ -1844,7 +1844,7 @@ (define_predicate "fusion_gpr_mem_load"
 ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the
 ;; memory field with both the addis and the memory offset.  Sign extension
 ;; is not handled here, since lha and lwa are not fused.
-;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend
+;; With P9 fusion, also match a fpr/vector load and float_extend
 (define_predicate "fusion_addis_mem_combo_load"
   (match_code "mem,zero_extend,float_extend")
 {
@@ -1873,11 +1873,15 @@ (define_predicate "fusion_addis_mem_comb
   break;
 
 case SFmode:
-case DFmode:
   if (!TARGET_P9_FUSION)
return 0;
   break;
 
+case DFmode:
+  if ((!TARGET_POWERPC64 && !TARGET_DF_FPR) || !TARGET_P9_FUSION)
+   return 0;
+  break;
+
 default:
   return 0;
 }
@@ -1920,6 +1924,7 @@ (define_predicate "fusion_addis_mem_comb
 case QImode:
 case HImode:
 case SImode:
+case SFmode:
   break;
 
 case DImode:
@@ -1927,13 +1932,8 @@ (define_predicate "fusion_addis_mem_comb
return 0;
   break;
 
-case SFmode:
-  if (!TARGET_SF_FPR)
-   return 0;
-  break;
-
 case DFmode:
-  if (!TARGET_DF_FPR)
+  if (!TARGET_POWERPC64 && !TARGET_DF_FPR)
return 0;
   break;
 
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  
(.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000)
(revis

Re: [PATCH] avoid calling alloca(0)

2016-11-17 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote:
> --- a/gcc/fortran/interface.c
> +++ b/gcc/fortran/interface.c
> @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, 
> gfc_formal_arglist *formal,
>for (f = formal; f; f = f->next)
>  n++;
>  
> -  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
> +  /* Take care not to call alloca with a zero argument.  */
> +  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);
>  
>for (i = 0; i < n; i++)
>  new_arg[i] = NULL;

Ugh, that is just too ugly.  I don't see anything wrong on alloca (0),
and we don't rely on those pointers being distinct from other pointers.

Jakub


[PATCH] avoid calling alloca(0)

2016-11-17 Thread Martin Sebor

Testing my patch to warn on overflow in calls to allocation
functions (bugs 77531 and 78284):

  https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01672.html

exposed a number of instances of alloca(0) calls in GCC sources.
Besides this patch which enables the new warnings with -Wextra
(the patch is still in the review queue) the same code also
triggers a -Walloca-larger-than warning (which has to be
explicitly enabled).  It doesn't look to me like these calls
are bugs per se but in the interest of eating our own dog food
so to speak and being able to compile GCC with -Walloca-larger-than
(and not to have to disable the yet-to-be committed -Walloc-zero
warning) the attached patch makes sure that the alloca calls that
are subject to the warnings always request at least 1 byte.

Thanks
Martin

PS Alloca(0) is not necessarily undefined but the pointer the call
returns need not be distinct from other pointers returned from
other such calls in the same function (and there are compilers
that do not make it distinct).  Relying on the pointer being
distinct could be a source of subtle bugs, hence the warning.
gcc/fortran/ChangeLog:

	* interface.c (compare_actual_formal): Avoid calling alloca with
	an argument of zero.
	* symbol.c (do_traverse_symtree): Same.
	* trans-intrinsic.c (gfc_conv_intrinsic_ishftc): Same.

gcc/ChangeLog:

	* reg-stack.c (subst_asm_stack_regs): Same.
	* tree-ssa-threadedge.c
	(record_temporary_equivalences_from_stmts_at_dest): Same.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index a0cb0bb..2da51d7 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
   for (f = formal; f; f = f->next)
 n++;
 
-  new_arg = XALLOCAVEC (gfc_actual_arglist *, n);
+  /* Take care not to call alloca with a zero argument.  */
+  new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n);
 
   for (i = 0; i < n; i++)
 new_arg[i] = NULL;
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 0b711ca..cf403c4 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -3979,7 +3979,8 @@ do_traverse_symtree (gfc_symtree *st, void (*st_func) (gfc_symtree *),
 
   gcc_assert ((st_func && !sym_func) || (!st_func && sym_func));
   nodes = count_st_nodes (st);
-  st_vec = XALLOCAVEC (gfc_symtree *, nodes);
+  /* Take care not to call alloca with a zero argument.  */
+  st_vec = XALLOCAVEC (gfc_symtree *, nodes + !nodes);
   node_cntr = 0; 
   fill_st_vector (st, st_vec, node_cntr);
 
diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c
index 463bb58..e4715f8 100644
--- a/gcc/fortran/trans-intrinsic.c
+++ b/gcc/fortran/trans-intrinsic.c
@@ -5525,7 +5525,8 @@ gfc_conv_intrinsic_ishftc (gfc_se * se, gfc_expr * expr)
   unsigned int num_args;
 
   num_args = gfc_intrinsic_argument_list_length (expr);
-  args = XALLOCAVEC (tree, num_args);
+  /* Take care not to call alloca with a zero argument.  */
+  args = XALLOCAVEC (tree, num_args + !num_args);
 
   gfc_conv_intrinsic_function_args (se, expr, args, num_args);
 
diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c
index 4e86fa9..7da5a5f 100644
--- a/gcc/reg-stack.c
+++ b/gcc/reg-stack.c
@@ -2050,9 +2050,10 @@ subst_asm_stack_regs (rtx_insn *insn, stack_ptr regstack)
   for (i = 0, note = REG_NOTES (insn); note; note = XEXP (note, 1))
 i++;
 
-  note_reg = XALLOCAVEC (rtx, i);
-  note_loc = XALLOCAVEC (rtx *, i);
-  note_kind = XALLOCAVEC (enum reg_note, i);
+  /* Take care not to call alloca with a zero argument.  */
+  note_reg = XALLOCAVEC (rtx, i + !i);
+  note_loc = XALLOCAVEC (rtx *, i + !i);
+  note_kind = XALLOCAVEC (enum reg_note, i + !i);
 
   n_notes = 0;
   for (note = REG_NOTES (insn); note; note = XEXP (note, 1))
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 534292c..0564e69 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -340,7 +340,8 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
 	  unsigned int num, i = 0;
 
 	  num = NUM_SSA_OPERANDS (stmt, SSA_OP_ALL_USES);
-	  copy = XALLOCAVEC (tree, num);
+	  /* Take care not to call alloca with a zero argument.  */
+	  copy = XALLOCAVEC (tree, num + !num);
 
 	  /* Make a copy of the uses & vuses into USES_COPY, then cprop into
 		 the operands.  */


Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-17 Thread Jeff Law

On 11/16/2016 03:12 PM, Andrew Burgess wrote:

* Mike Stump  [2016-11-16 12:59:53 -0800]:


On Nov 16, 2016, at 12:09 PM, Andrew Burgess  
wrote:

My only remaining concern is the new tests, I've tried to restrict
them to targets that I suspect they'll pass on with:

   /* { dg-final-use { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target 
*-*-linux* *-*-gnu* } } } */

but I'm still nervous that I'm going to introduce test failures.  Is
there any advice / guidance I should follow before I commit, or are
folk pretty relaxed so long as I've made a reasonable effort?


So, if you are worried about the way the line is constructed, I usually test it 
by misspelling the *-*-linux* *-*-gnu* part as *-*-linNOTux* *-*-gnNOTu* and 
see if the test then doesn't run on your machine.  If it doesn't then you can 
be pretty confident that only machines that match the target triplet can be 
impacted.  I usually do this type of testing by running the test case in 
isolation (not the full tests suite).  Anyway, do the best you can, and don't 
worry about t it too much, learn from the experience, even if it goes wrong in 
some way.  If it did go wrong, just be responsive (don't check it in just 
before a 6 week vacation) about fixing it, if you can.



Thanks for the feedback.

Change committed as revision 242519.  If anyone sees any issues just
let me know.

Thanks.  And thanks for seeing this through... I know it's a pain sometimes.

jeff


Re: Default associative containers constructors/destructor/assignment

2016-11-17 Thread Jonathan Wakely


On 28/10/16 21:42 +0200, François Dumont wrote:

  /**
   *  @brief  %Map move constructor.
-   *  @param  __x  A %map of identical element and allocator types.
   *
-   *  The newly-created %map contains the exact contents of @a __x.
-   *  The contents of @a __x are a valid, but unspecified %map.
+   *  The newly-created %map contains the exact contents of the moved
+   *  instance. The moved instance is a valid, but unspecified, %map.


This comment isn't true, because non-propagating or non-equal
allocators can force elements to be copied, but that's unrelated to
your patch.

There are no problems with the changes to the map and set containers,
but I have some comments on the _Rb_tree changes. Overall I like the
change.


diff --git a/libstdc++-v3/include/bits/stl_tree.h 
b/libstdc++-v3/include/bits/stl_tree.h
index 0bc5f4b..126087e 100644
--- a/libstdc++-v3/include/bits/stl_tree.h
+++ b/libstdc++-v3/include/bits/stl_tree.h
@@ -137,6 +137,81 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
  };

+  // Helper type offering value initialization guaranty on the compare functor.


Spelling: "guarantee"


+  template
+struct _Rb_tree_key_compare
+{
+  _Key_compare _M_key_compare;
+
+  _Rb_tree_key_compare()
+  _GLIBCXX_NOEXCEPT_IF(
+   is_nothrow_default_constructible<_Key_compare>::value)
+  : _M_key_compare()
+  { }
+
+  _Rb_tree_key_compare(const _Key_compare& __comp)
+  : _M_key_compare(__comp)
+  { }
+
+#if __cplusplus >= 201103L
+  _Rb_tree_key_compare(_Rb_tree_key_compare&& __x)
+   noexcept(is_nothrow_copy_constructible<_Key_compare>::value)
+  : _M_key_compare(__x._M_key_compare)
+  { }
+#endif


This constructor makes the type move-only (i.e.  non-copyable) in
C++11 and later. It's copyable in C++98. Is that what you want?

Adding defaulted copy operations would fix that. Even if we don't
actually need those copy operations, I'm uncomfortable with it being
copyable in C++98 and non-copyable otherwise.


+void
+_M_reset()
+{
+  _M_initialize();
+  _M_node_count = 0;
+}


This introduces a small change in behaviour, because _M_reset() now
does _M_header._M_color = _S_red, which it didn't do before. That
store is redundant. It could be avoided by just doing the three
assignments in _M_reset() instead of calling _M_initialize().

And we could remove _M_initialize() entirely, and remove the redundant
mem-initializers for _M_node_count (because it's set my _M_reset and
_M_move_data anyway):

   _Rb_tree_header() _GLIBCXX_NOEXCEPT
   {
 _M_reset();
 _M_header._M_color = _S_red;
   }

#if __cplusplus >= 201103L
   _Rb_tree_header(_Rb_tree_header&& __x) noexcept
   {
 if (__x._M_header._M_parent != nullptr)
  _M_move_data(__x);
 else
   {
 _M_reset();
 _M_header._M_color = _S_red;
   }
   }

   void
   _M_move_data(_Rb_tree_header& __x)
   {
 _M_header._M_parent = __x._M_header._M_parent;
 _M_header._M_left = __x._M_header._M_left;
 _M_header._M_right = __x._M_header._M_right;
 _M_header._M_parent->_M_parent = &_M_header;
 _M_node_count = __x._M_node_count;

 __x._M_reset();
   }
#endif

   void
   _M_reset()
   {
 _M_header._M_parent = 0;
 _M_header._M_left = &_M_header;
 _M_header._M_right = &_M_header;
 _M_node_count = 0;
   }



@@ -599,50 +674,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // Unused _Is_pod_comparator is kept as it is part of mangled name.
  template
-struct _Rb_tree_impl : public _Node_allocator
+struct _Rb_tree_impl
+   : public _Node_allocator
+   , public _Rb_tree_key_compare<_Key_compare>
+   , public _Rb_tree_header


Do these need to be base classes, rather than data members?

We derive from the allocator to benefit from the empty base-class
optimization, but that isn't relevant for the _Rb_tree_key_compare and
_Rb_tree_header types. It *could* be relevant for the comparison
function, but would be an ABI change. We could do that ABI change
conditionally, for gnu-versioned-namespace builds, but that's still
possible using data members (e.g. have a data member that derives from
the comparison function and contains the header node and/or node count
members).

Making them data members would simply mean restoring the function
_Rb_tree_impl::_M_reset() and making it call reset on the member:

void
_M_reset() { _M_header._M_reset(); }

The minor convenience of inheriting this function from a base class
doesn't seem worth the stronger coupling that comes from using
inheritance.

Am I missing some other reason that inheritance is used here?


- _Rb_tree_impl(const _Key_compare& __comp, const _Node_allocator& __a)
- : _Node_allocator(__a), _M_key_compare(__comp), _M_header(),
-   _M_node_count(0)
- { _M_initialize(); }


Please mention the removal of this constructor in the changelog.


@@ -1534,19 +1583,7 @@ _GLIBC

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Segher Boessenkool
On Thu, Nov 17, 2016 at 04:50:46PM +, Kyrill Tkachov wrote:
> >>Is loading/storing a pair as cheap as loading/storing a single register?
> >>In that case you could shrink-wrap per pair of registers instead.
> >
> >I suppose it can vary by microarchitecture. For the purposes of codegen 
> >I'd say
> >it's more expensive than load/storing a single register (as there's more 
> >memory bandwidth required after all)
> >but cheaper than two separate loads stores (alignment quirks 
> >notwithstanding).
> >Interesting idea. That could help with code size too. I'll try it out.
> 
> I'm encountering some difficulties implementing this idea.
> I want to still keep the per-register structures across the hooks but 
> basically restrict the number
> of components in a basic block to an even number of FPRs and GPRs. I tried 
> doing this in COMPONENTS_FOR_BB

So your COMPONENTS_FOR_BB returns both components in a pair whenever one
of those is needed?  That should work afaics.

> but apparently this ended up not saving/restoring some of the registers at 
> all because the components that were
> "filtered out" that way still made their way to the bitmap passed into 
> SET_HANDLED_COMPONENTS and so the normal
> prologue/epilogue didn't end up saving and restoring them.

I am not sure what this means?  "filtered out"?


Segher


Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")

2016-11-17 Thread Jason Merrill
How about changing cp_parser_non_integral_constant_expression to issue
a pedwarn instead of error for NIC_FLOAT?

Jason


Re: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-17 Thread Jeff Law

On 11/10/2016 10:00 AM, Jakub Jelinek wrote:

Hi!

On arm/aarch64 we ICE because some decls that make it here has non-NULL
DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
that it creates for VLA initialization from STRING_CST such a decl,
I believe we have some PRs about it already open.
I think it won't hurt to check for the large sizes properly even in this
function though, and punt on unexpected cases, or even extremely large ones.

Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
to test on arm and/or aarch64.  Ok for trunk if the testing there passes
too?

2016-11-10  Jakub Jelinek  

PR middle-end/78201
* varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
instead, return false if it is NULL, or doesn't fit into uhwi, or
is larger or equal to targetm.max_anchor_offset.

* g++.dg/opt/pr78201.C: New test.

OK.
jeff



Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-17 Thread Jeff Law

On 11/17/2016 09:56 AM, Ilya Verbin wrote:

2016-11-17 18:50 GMT+03:00 Rainer Orth :

Rainer Orth  writes:


I happened to notice that my libcilkrts SPARC port has been applied
upstream.  So to reach closure on this issue for the GCC 7 release, I'd
like to import upstream into mainline which seems to be covered by the
free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
specific and we have no listed maintainer.


I initially used Ilya's intel.com address, which bounced.  Now using the
current address listed in MAINTAINERS...


Yeah, I don't work for Intel anymore. And I'm not a libcilkrts
maintainer, so I can't approve it.
Do you want to be?  IIRC I was going to nominate you, but held off 
knowing your situation was going to change.


If you're interested in maintainer positions, I can certainly still 
nominate you.


jeff



Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-17 Thread Ilya Verbin
2016-11-17 18:50 GMT+03:00 Rainer Orth :
> Rainer Orth  writes:
>
>> I happened to notice that my libcilkrts SPARC port has been applied
>> upstream.  So to reach closure on this issue for the GCC 7 release, I'd
>> like to import upstream into mainline which seems to be covered by the
>> free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
>> though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
>> specific and we have no listed maintainer.
>
> I initially used Ilya's intel.com address, which bounced.  Now using the
> current address listed in MAINTAINERS...

Yeah, I don't work for Intel anymore. And I'm not a libcilkrts
maintainer, so I can't approve it.

  -- Ilya

>> The following patch has passed x86_64-pc-linux-gnu bootstrap without
>> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
>> are currently running.
>>
>> Ok for mainline if they pass?
>
> Both Solaris bootstraps have completed successfully now.
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+

2016-11-17 Thread Jonathan Wakely

On 03/11/16 15:11 +0100, Rainer Orth wrote:

Fortunately, this is all easily fixed by wrapping the affected templates
in a new macro.  That's what this patch does.  The new libstdc++
acinclude.m4 test may well need wording changes in comments etc. and can
perhaps be shortened a bit, bit it worked for me.


See below.


The fixincludes part passes make check.

The patch has been regtested (C++ only) on

* Solaris 11.2 SRU 13.6 (i.e. without both overloads and templates in
 ),

* Solaris 11.3 SRU 3.6 (i.e. with overloads only), and

* Solaris 12 Build 110 (i.e. with both overloads and templates, and the
 _GLIBCXX_USE_C99_MATH #undef's)

* Linux/x86_64

I've checked that __CORRECT_ISO_CPP11_MATH_H_PROTO[12] were defined in
config.h as expected, and there were no testsuite regressions.  On the
contrary, a couple of tests that before showed up as UNSUPPORTED due to
the _GLIBCXX_USE_C99_MATH* #undef's now PASS as they should.

Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches
after some soak time?


The change is OK in principle, but I'd prefer more meaningful names
for the macros ...


--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -2181,7 +2181,8 @@ AC_DEFUN([GLIBCXX_CHECK_STDIO_PROTO], [
])

dnl
-dnl Check whether required C++11 overloads are present in .
+dnl Check whether required C++11 overloads and templates are present
+dnl in .


The standard doesn't actually require templates to be used here, it
only requires "sufficient additional overloads" so that integral
arguments work. We happen to do that with templates, and apparently so
do the Solaris headers, but other implementations are possible. So a
more accurate description would be:

dnl Check whether required C++11 overloads for FP and integral types
dnl are present in .

And rather than PROTO1 and PROTO2 the macros could be called PROTO_FP
for the overloads for FP types, and PROTO_INT (or just PROTO_I) for
the overloads for integral types (which happen to be templates in our
header).


+  # Solaris 12 Build 90, Solaris 11.3 SRU 5.6, and Solaris 10 Patch
+  # 11996[67]-02 introduced the C++11  templates.
+  AC_MSG_CHECKING([for C++11  templates])
+  AC_CACHE_VAL(glibcxx_cv_math11_template, [
+   AC_COMPILE_IFELSE([AC_LANG_SOURCE(
+ [#include 
+  namespace std {
+struct __true_type { };
+struct __false_type { };


We don't need these structs.


+template
+  struct __is_integer
+  {
+enum { __value = 0 };
+typedef __false_type __type;
+  };


We don't need a definition for the primary template, this will work:

 template
   struct __is_integer;


+template<>
+  struct __is_integer
+  {
+enum { __value = 1 };
+typedef __true_type __type;


The __type typedef is not needed here, just __value.


+  };
+  }
+  namespace __gnu_cxx {
+template
+	   struct __enable_if 
+	   { };


Again, no need to define the primary template:

 template
   struct __enable_if;


+template
+  struct __enable_if
+  { typedef _Tp __type; };
+  }


The suggestions above would make it shorter, while still remaining
accurate to what the real code in the header does. It could be reduced
even further without altering the meaning for the purposes of this
test:


+  namespace std {
+template
+  constexpr typename __gnu_cxx::__enable_if
+<__is_integer<_Tp>::__value, double>::__type
+  log2(_Tp __x)
+  { return __builtin_log2(__x); }


 template
   struct __enable_if_int;

 template<>
   struct __enable_if_int
   { typedef double __type; };

 template
   constexpr typename __enable_if_int<_Tp>::__type
   log2(_Tp __x)
   { return __builtin_log2(__x); }

I don't mind whether you go with this or just remove the unnecessary
structs, typedef and primary template bodies. Either is OK.


+  }
+  int
+  main (void)
+  {
+int i = 1000;
+return std::log2(i);
+  }


Re: [PR target/78213] Do not ICE on non-empty -fself-test

2016-11-17 Thread Aldy Hernandez

On 11/14/2016 08:18 AM, Bernd Schmidt wrote:

On 11/11/2016 06:10 PM, Aldy Hernandez wrote:

The problem in this PR is that -fself-test is being run on a non empty
source file.  This causes init_emit() to run, which sets:

REG_POINTER (virtual_incoming_args_rtx) = 1;

Setting REG_POINTER on the virtual incoming args, causes /f to be
printed on some RTL dumps, causing the -fself-test machinery to fail at
matching the expected value.


How about always running init_emit and testing for the correct output?


I would prefer Jakub's suggestion of running in finish_options().

I assume there are other places throughout the self-tests that depend on 
NOT continuing the compilation process, and I'd hate to plug each one.


Would the attached patch be acceptable to both of you?

Aldy
commit 52b906160a109b84a26f5aa15a653f4aee612942
Author: Aldy Hernandez 
Date:   Fri Nov 11 06:32:08 2016 -0800

PR target/78213
* opts.c (finish_options): Set -fsyntax-only if running self
tests.

diff --git a/gcc/opts.c b/gcc/opts.c
index d2d6100..cb20154 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -744,6 +744,14 @@ finish_options (struct gcc_options *opts, struct 
gcc_options *opts_set,
   opts->x_flag_toplevel_reorder = 0;
 }
 
+  /* -fself-test depends on the state of the compiler prior to
+ compiling anything.  Ideally it should be run on an empty source
+ file.  However, in case we get run with actual source, assume
+ -fsyntax-only which will inhibit any compiler initialization
+ which may confuse the self tests.  */
+  if (opts->x_flag_self_test)
+opts->x_flag_syntax_only = 1;
+
   if (opts->x_flag_tm && opts->x_flag_non_call_exceptions)
 sorry ("transactional memory is not supported with non-call exceptions");
 
diff --git a/gcc/testsuite/gcc.dg/pr78213.c b/gcc/testsuite/gcc.dg/pr78213.c
new file mode 100644
index 000..e43c83c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr78213.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-fself-test" } */
+
+/* Verify that -fself-test does not fail on a non empty source.  */
+
+int i; 
 void bar();
 void foo()
+{
+  while (i--)
+bar();
+}
+/* { dg-message "fself\-test: " "-fself-test" { target *-*-* } 0 } */


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Kyrill Tkachov


On 17/11/16 14:55, Kyrill Tkachov wrote:


On 17/11/16 14:44, Segher Boessenkool wrote:

Hi Kyrill,

On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote:

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there
were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

After looking at the gobmk performance with performance counters it looks
like more icache pressure.
I see an increase in misses.
This looks to me like an effect of code size increase, though it is not
that large an increase (0.4% with SWS).

Right.  I don't see how to improve on this (but see below); ideas welcome :-)


Branch mispredicts also go up a bit but not as much as icache misses.

I don't see that happening -- for some testcases we get unlucky and have
more branch predictor aliasing, and for some we have less, it's pretty
random.  Some testcases are really sensitive to this.


Right, I don't think it's the branch prediction at fault in this case,
rather the above icache stuff.




I don't think there's anything we can do here, or at least that this patch
can do about it.
Overall, there's a slight improvement in SPECINT, even with the gobmk
regression and a slightly larger improvement
on SPECFP due to povray.

And that is for only the "normal" GPRs, not LR or FP yet, right?


This patch does implement FP registers wrapping as well but not LR.
Though I remember seeing the improvement even when only GPRs were wrapped
in an earlier version of the patch.


Segher, one curious artifact I spotted while looking at codegen differences
in gobmk was a case where we fail
to emit load-pairs as effectively in the epilogue and its preceeding basic
block.
So before we had this epilogue:
.L43:
 ldpx21, x22, [sp, 16]
 ldpx23, x24, [sp, 32]
 ldpx25, x26, [sp, 48]
 ldpx27, x28, [sp, 64]
 ldrx30, [sp, 80]
 ldpx19, x20, [sp], 112
 ret

and I see this becoming (among numerous other changes in the function):

.L69:
 ldpx21, x22, [sp, 16]
 ldrx24, [sp, 40]
.L43:
 ldpx25, x26, [sp, 48]
 ldpx27, x28, [sp, 64]
 ldrx23, [sp, 32]
 ldrx30, [sp, 80]
 ldpx19, x20, [sp], 112
 ret

So this is better in the cases where we jump straight into .L43 because we
load fewer registers
but worse when we jump to or fallthrough to .L69 because x23 and x24 are
now restored using two loads
rather than a single load-pair. This hunk isn't critical to performance in
gobmk though.

Is loading/storing a pair as cheap as loading/storing a single register?
In that case you could shrink-wrap per pair of registers instead.


I suppose it can vary by microarchitecture. For the purposes of codegen I'd say
it's more expensive than load/storing a single register (as there's more memory 
bandwidth required after all)
but cheaper than two separate loads stores (alignment quirks notwithstanding).
Interesting idea. That could help with code size too. I'll try it out.


I'm encountering some difficulties implementing this idea.
I want to still keep the per-register structures across the hooks but basically 
restrict the number
of components in a basic block to an even number of FPRs and GPRs. I tried 
doing this in COMPONENTS_FOR_BB
but apparently this ended up not saving/restoring some of the registers at all 
because the components that were
"filtered out" that way still made their way to the bitmap passed into 
SET_HANDLED_COMPONENTS and so the normal
prologue/epilogue didn't end up saving and restoring them.

I don't want to do it in GET_SEPARATE_COMPONENTS as that doesn't see each basic 
block.
Is this something DISQUALIFY_COMPONENTS could be used for?

Thanks,
Kyrill


Thanks,
Kyrill



Segher






Patch ping: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)

2016-11-17 Thread Jakub Jelinek
Hi!

On Thu, Nov 10, 2016 at 06:00:29PM +0100, Jakub Jelinek wrote:
> On arm/aarch64 we ICE because some decls that make it here has non-NULL
> DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that
> doesn't fit into shwi would ICE similarly).  While it is arguably a FE bug
> that it creates for VLA initialization from STRING_CST such a decl,
> I believe we have some PRs about it already open.
> I think it won't hurt to check for the large sizes properly even in this
> function though, and punt on unexpected cases, or even extremely large ones.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going
> to test on arm and/or aarch64.  Ok for trunk if the testing there passes
> too?
> 
> 2016-11-10  Jakub Jelinek  
> 
>   PR middle-end/78201
>   * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo.
>   Don't test decl != NULL.  Don't look at DECL_SIZE, but DECL_SIZE_UNIT
>   instead, return false if it is NULL, or doesn't fit into uhwi, or
>   is larger or equal to targetm.max_anchor_offset.
> 
>   * g++.dg/opt/pr78201.C: New test.

I'd like to ping this patch, Yvan has successfully tested it on arm and
aarch64.

Jakub


Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions

2016-11-17 Thread James Greenhalgh
On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote:
> This patch is designed to improve code generation for "and" instructions with
> certain immediate operands.
> 
> For the following test case:
> 
> int f2(int x)
> {
>x &= 0x0ff8;
> 
>x &= 0xff001fff;
> 
>return x;
> }
> 
> the trunk aarch64 compiler generates:
> 
> mov   w1, 8184
> movk  w1, 0xf00, lsl 16
> and   w0, w0, w1
> 
> We can generate one fewer instruction if we recognize certain constants. With
> the attached patch the current trunk compiler generates:
> 
> and   w0, w0, 268435448
> and   w0, w0, -16769025
> 
> Bootstrapped and make check successfully completed with no regressions on
> aarch64-linux-gnu.
> 
> Okay for trunk?

Hi Michael,

I have some minor comments in line.

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 3cdd69b..7ef8cdf 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params;
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
>  bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
> +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
> +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
> mode);
>  int aarch64_branch_cost (bool, bool);
>  enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx);
>  bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3e663eb..db82c5c 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, 
> machine_mode mode)
>return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26];
>  }
>  
> +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN.  
> */
> +
> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm1 (HOST_WIDE_INT val_in)
> +{
> +  int first_bit_set = ctz_hwi (val_in);
> +  int last_bit_set = floor_log2 (val_in);
> +  gcc_assert (first_bit_set < last_bit_set);

This assert can never trigger (by definition) unless VAL_IN == 0 (in which
case floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case
you're testing for, then this assert would be more explicit as

 gcc_assert (val_in != 0)

I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first
and last are ambiguous (you have them the opposite way round from what
I'd consider first [highest, thus leading bits] and last
[lowest/trailing bits]).

> +
> +  return ((HOST_WIDE_INT_UC (2) << last_bit_set) -
> +   (HOST_WIDE_INT_1U << first_bit_set));
> +}
> +
> +/* Create constant with zero bits between first and last bit set and one
> +   bits elsewhere.  */

I can't parse this comment in relation to what the code below does.
Looking at the code, you're building a new constant where the bits between
the first and last bits are unmodified, and all other bits are set to one.

i.e. for  1010 1000 you'd generate  1010 .

> +unsigned HOST_WIDE_INT
> +aarch64_and_split_imm2 (HOST_WIDE_INT val_in)
> +{
> +  return val_in | ~aarch64_and_split_imm1 (val_in);
> +}
> +
> +/* Return true if VAL_IN is a valid 'and' bitmask immediate.  */
> +
> +bool
> +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode)
> +{
> +  if (aarch64_bitmask_imm (val_in, mode))
> +return false;
> +
> +  if (aarch64_move_imm (val_in, mode))
> +return false;
> +
> +  unsigned HOST_WIDE_INT imm2 = aarch64_and_split_imm2 (val_in);
> +
> +  if (!aarch64_bitmask_imm (imm2, mode))
> +return false;
> +
> +  return true;

You can replace these four lines with:

  return aarch64_bitmask_imm (imm2, mode);

> +}
>  
>  /* Return true if val is an immediate that can be loaded into a
> register in a single instruction.  */

> diff --git a/gcc/testsuite/gcc.target/aarch64/and_const.c 
> b/gcc/testsuite/gcc.target/aarch64/and_const.c
> new file mode 100644
> index 000..9c377d7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/and_const.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int f2 (int x)
> +{
> +   x &= 0x0ff8;
> +
> +   x &= 0xff001fff;
> +
> +   return x;
> +}

It would be good to have a test for the DImode cases too (using long long).

Thanks,
James



Re: [patch, libfortran] Add AVX-specific matmul

2016-11-17 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 08:41:48AM +0100, Thomas Koenig wrote:
> Am 17.11.2016 um 00:20 schrieb Jakub Jelinek:
> >On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote:
> >>>Don't you need to test in configure if the assembler supports AVX?
> >>>Otherwise if somebody is bootstrapping gcc with older assembler, it will
> >>>just fail to bootstrap.
> >>
> >>That's a good point.  The AVX instructions were added in binutils 2.19,
> >>which was released in 2011. This could be put in the prerequisites.
> >>
> >>What should the test do?  Fail with an error message "you need newer
> >>binutils" or simply (and silently) not compile the AVX vesion?
> >
> >>From what I understood, you want those functions just to be implementation
> >details, not exported from libgfortran.so*.  Thus the test would do
> >something similar to what gcc/testsuite/lib/target-supports.exp 
> >(check_effective_target_avx)
> >does, but of course in autoconf way, not in tcl.
> 
> OK, that looks straightworward enough. I'll give it a shot.
> 
> >Also, from what I see, target_clones just use IFUNCs, so you probably also
> >need some configure test whether ifuncs are supported (the
> >gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something
> >similar again in configure.  But if so, then I have no idea why you use
> >a wrapper around the function, instead of using it on the exported APIs.
> 
> As you wrote above, I wanted this as an implementation detail. I also
> wanted the ability to be able to add new instruction sets without
> breaking the ABI.

But even exported IFUNC is an implementation detail.  For other
libraries/binaries IFUNC symbol is like any other symbol, they will have
SHN_UNDEF symbol pointing to that, and it matters only for the dynamic
linker during relocation processing.  Whether some function is IFUNC or not
is not an ABI change, you can change at any time a normal function into
IFUNC or vice versa, without breaking ABI.

> You're right - integer multiplication looks different.
> 
> Nobody I know cares about integer matrix multiplication
> speed, whereas real has gotten a _lot_ of attention over
> the decades.  So, putting in AVX will make the code run
> faster on more machines, while putting in AVX2 will
> (IMHO) bloat the library for no good reason.  However,
> I am willing to stand corrected on this. Putting in AVX512f
> makes sense.

Which is why I've been proposing to use avx2,default for the
matmul_i* files and avx,default for the others.
avx will not buy much for matmul_i*, while avx2 will.

> I have also been trying to get target_clones to work on POWER
> to get Altivec instructions, but to no avail. I also cannot
> find any examples in the testsuite.

Haven't checked, but maybe the target_clones attribute has been only
implemented for x86_64/i686 and not for other targets.
But power supports target attribute, so you e.g. have the option of
#including the routine multiple times in one TU, each time with different
name and target attribute, and then write the IFUNC routine for it by hand.
Or attempt to support target_clones on power, or ask power maintainers
to do that.

Jakub


Re: [PATCH] Fix PR78333

2016-11-17 Thread Christophe Lyon
On 17 November 2016 at 11:27, Christophe Lyon
 wrote:
> On 17 November 2016 at 10:53, Richard Biener  wrote:
>> On Thu, 17 Nov 2016, Rainer Orth wrote:
>>
>>> Hi Richard,
>>>
>>> >> Probably providing dummy implemenations as in the original testcase in
>>> >> the PR is enough?
>>> >
>>> > Maybe { dg-require weak } plus weak definitions of the cyg_profile funcs?
>>> > Or simply restrict the test to { target *-*-linux* }?
>>>
>>> the only existing dg-do run testcase (gcc.dg/20001117-1.c) just has
>>>
>>> void __attribute__((no_instrument_function))
>>> __cyg_profile_func_enter(void *this_fn, void *call_site)
>>> {
>>>   if (call_site == (void *)0)
>>> abort ();
>>> }
>>>
>>> void __attribute__((no_instrument_function))
>>> __cyg_profile_func_exit(void *this_fn, void *call_site)
>>> {
>>>   if (call_site == (void *)0)
>>> abort ();
>>> }
>>>
>>> In the case at hand, we could do with empty implementations.  This
>>> certainly works on Solaris.
>>
>> Ok.  Christophe, can you add the above and verify it works for you
>> (and then commit)?
>>
> OK, I'm taking a look.
>

I tested and committed (r242553) the attached patch (on arm-none-eabi and
arm-none-linux-gnueabihf). I used empty function as suggested by Rainer.

Thanks,

Christophe

>> Thanks,
>> Richard.
Fix PR78333 testcase for non-glibc systems.

2016-11-17  Christophe Lyon  

gcc/testsuite/

* gcc.dg/pr78333.c: Add empty implementations of
__cyg_profile_func_enter() and __cyg_profile_func_exit() to avoid
problems on non-glibc systems.


diff --git a/gcc/testsuite/gcc.dg/pr78333.c b/gcc/testsuite/gcc.dg/pr78333.c
index fd3669c..ca037e5 100644
--- a/gcc/testsuite/gcc.dg/pr78333.c
+++ b/gcc/testsuite/gcc.dg/pr78333.c
@@ -1,6 +1,19 @@
 /* { dg-do link } */
 /* { dg-options "-finstrument-functions" } */
 
+/* Add empty implementations of __cyg_profile_func_enter() and
+   __cyg_profile_func_exit() to avoid problems on non-glibc
+   systems.  */
+void __attribute__((no_instrument_function))
+__cyg_profile_func_enter(void *this_fn, void *call_site)
+{
+}
+
+void __attribute__((no_instrument_function))
+__cyg_profile_func_exit(void *this_fn, void *call_site)
+{
+}
+
 extern inline __attribute__((gnu_inline, always_inline)) int foo () { }
 int main()
 {


Fix PR rtl-optimization/78355

2016-11-17 Thread Eric Botcazou
As diagnosed by the submitter, LRA can generate unaligned accesses when 
SLOW_UNALIGNED_ACCESS is 1, for example when STRICT_ALIGNMENT is 1, because 
simplify_operand_subreg can wrongly reject aligned accesses when reloading 
SUBREGs of MEMs in favor of unaligned ones.  The fix is to add the missing 
guard on the alignment before invoking SLOW_UNALIGNED_ACCESS (as in every 
other use of SLOW_UNALIGNED_ACCESS in the compiler).

Tested on arm-eabi, approved by Vladimir, applied on the mainline.


2016-11-17  Pip Cet  
Eric Botcazou  

PR rtl-optimization/78355
* doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Document that the macro
only needs to deal with unaligned accesses.
* doc/tm.texi: Regenerate.
* lra-constraints.c (simplify_operand_subreg): Only invoke
SLOW_UNALIGNED_ACCESS on innermode if the MEM is not aligned enough.

-- 
Eric BotcazouIndex: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 242377)
+++ doc/tm.texi.in	(working copy)
@@ -4654,7 +4654,8 @@ other fields in the same word of the str
 Define this macro to be the value 1 if memory accesses described by the
 @var{mode} and @var{alignment} parameters have a cost many times greater
 than aligned accesses, for example if they are emulated in a trap
-handler.
+handler.  This macro is invoked only for unaligned accesses, i.e. when
+@code{@var{alignment} < GET_MODE_ALIGNMENT (@var{mode})}.
 
 When this macro is nonzero, the compiler will act as if
 @code{STRICT_ALIGNMENT} were nonzero when generating code for block
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 242377)
+++ lra-constraints.c	(working copy)
@@ -1486,9 +1486,10 @@ simplify_operand_subreg (int nop, machin
 	 equivalences in function lra_constraints) and because for spilled
 	 pseudos we allocate stack memory enough for the biggest
 	 corresponding paradoxical subreg.  */
-	  if (!SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))
-	  || SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg))
-	  || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode))
+	  if (!(MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (mode)
+		&& SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)))
+	  || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
+		  && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg
 	return true;
 
 	  /* INNERMODE is fast, MODE slow.  Reload the mem in INNERMODE.  */


Re: [PATCH 1/2] PR77822

2016-11-17 Thread Dominik Vogt
On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote:
> The following two patches fix PR 77822 on s390x for gcc-7.  As the
> macro doing the argument range checks can be used on other targets
> as well, I've put it in system.h (couldn't think of a better
> place; maybe rtl.h?).
> 
> Bootstrapped on s390x biarch, regression tested on s390x biarch
> and s390, all on a zEC12 with -march=zEC12.
> 
> Please check the commit messages for details.

Common code patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* system.h (SIZE_POS_IN_RANGE): New.
>From 44654374a0d62e7bc91356fc2189fac5cd99ae12 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 17 Nov 2016 14:49:18 +0100
Subject: [PATCH 1/2] PR target/77822: Add helper macro SIZE_POS_IN_RANGE to
 system.h.

The macro can be used to validate the arguments of zero_extract and
sign_extract to fix this problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822
---
 gcc/system.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/gcc/system.h b/gcc/system.h
index 8c6127c..cc68f07 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -316,6 +316,14 @@ extern int errno;
   ((unsigned HOST_WIDE_INT) (VALUE) - (unsigned HOST_WIDE_INT) (LOWER) \
<= (unsigned HOST_WIDE_INT) (UPPER) - (unsigned HOST_WIDE_INT) (LOWER))
 
+/* A convenience macro to determine whether a SIZE lies inclusively
+   within [1, RANGE], POS lies inclusively within between
+   [0, RANGE - 1] and the sum lies inclusively within [1, RANGE].  */
+#define SIZE_POS_IN_RANGE(SIZE, POS, RANGE) \
+  (IN_RANGE (POS, 0, (RANGE) - 1) \
+   && IN_RANGE (SIZE, 1, RANGE) \
+   && IN_RANGE ((SIZE) + (POS), 1, RANGE))
+
 /* Infrastructure for defining missing _MAX and _MIN macros.  Note that
macros defined with these cannot be used in #if.  */
 
-- 
2.3.0



[committed] Fix locations within raw strings

2016-11-17 Thread David Malcolm
Whilst investigating PR preprocessor/78324 I noticed that the
substring location code currently doesn't handle raw strings
correctly, by not skipping the 'R', opening quote, delimiter
and opening parenthesis.

For example, an attempt to underline chars 4-7 with caret at 6 of
this raw string yields this erroneous output:
   __emit_string_literal_range (R"foo(0123456789)foo",
~~^~

With the patch, the correct range/caret is printed:

   __emit_string_literal_range (R"foo(0123456789)foo",
  ~~^~

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

Committed to trunk as r242552.

gcc/ChangeLog:
* input.c (selftest::test_lexer_string_locations_long_line): New
function.
(selftest::test_lexer_string_locations_raw_string_multiline): New
function.
(selftest::input_c_tests): Call the new functions, via
for_each_line_table_case.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-string-literals-1.c
(test_raw_string_one_liner): New function.
(test_raw_string_multiline): New function.

libcpp/ChangeLog:
* charset.c (cpp_interpret_string_1): Skip locations from
loc_reader when advancing 'p' when handling raw strings.
---
 gcc/input.c| 74 ++
 .../plugin/diagnostic-test-string-literals-1.c | 33 ++
 libcpp/charset.c   | 13 +++-
 3 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/gcc/input.c b/gcc/input.c
index c2042e8..728f4dd 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -3156,6 +3156,78 @@ test_lexer_string_locations_long_line (const 
line_table_case &case_)
  i, 2, 7 + i, 7 + i);
 }
 
+/* Test of locations within a raw string that doesn't contain a newline.  */
+
+static void
+test_lexer_string_locations_raw_string_one_line (const line_table_case &case_)
+{
+  /* .00.0001122.
+ .12.3456789012345678901.  */
+  const char *content = ("R\"foo(0123456789)foo\"\n");
+  lexer_test test (case_, content, NULL);
+
+  /* Verify that we get the expected token back.  */
+  const cpp_token *tok = test.get_token ();
+  ASSERT_EQ (tok->type, CPP_STRING);
+
+  /* Verify that cpp_interpret_string works.  */
+  cpp_string dst_string;
+  const enum cpp_ttype type = CPP_STRING;
+  bool result = cpp_interpret_string (test.m_parser, &tok->val.str, 1,
+ &dst_string, type);
+  ASSERT_TRUE (result);
+  ASSERT_STREQ ("0123456789", (const char *)dst_string.text);
+  free (const_cast  (dst_string.text));
+
+  if (!should_have_column_data_p (line_table->highest_location))
+return;
+
+  /* 0-9, plus the nil terminator.  */
+  ASSERT_NUM_SUBSTRING_RANGES (test, tok->src_loc, CPP_STRING, 11);
+  for (int i = 0; i < 11; i++)
+ASSERT_CHAR_AT_RANGE (test, tok->src_loc, CPP_STRING,
+ i, 1, 7 + i, 7 + i);
+}
+
+/* Test of locations within a raw string that contains a newline.  */
+
+static void
+test_lexer_string_locations_raw_string_multiline (const line_table_case &case_)
+{
+  /* .00..
+ .12.3456.  */
+  const char *content = ("R\"foo(\n"
+  /* .0.
+ .12345.  */
+"hello\n"
+"world\n"
+  /* .0.
+ .12345.  */
+")foo\"\n");
+  lexer_test test (case_, content, NULL);
+
+  /* Verify that we get the expected token back.  */
+  const cpp_token *tok = test.get_token ();
+  ASSERT_EQ (tok->type, CPP_STRING);
+
+  /* Verify that cpp_interpret_string works.  */
+  cpp_string dst_string;
+  const enum cpp_ttype type = CPP_STRING;
+  bool result = cpp_interpret_string (test.m_parser, &tok->val.str, 1,
+ &dst_string, type);
+  ASSERT_TRUE (result);
+  ASSERT_STREQ ("\nhello\nworld\n", (const char *)dst_string.text);
+  free (const_cast  (dst_string.text));
+
+  if (!should_have_column_data_p (line_table->highest_location))
+return;
+
+  /* Currently we don't support locations within raw strings that
+ contain newlines.  */
+  ASSERT_HAS_NO_SUBSTRING_RANGES (test, tok->src_loc, tok->type,
+ "range endpoints are on different lines");
+}
+
 /* Test of lexing char constants.  */
 
 static void
@@ -3297,6 +3369,8 @@ input_c_tests ()
   for_each_line_table_case 
(test_lexer_string_locations_stringified_macro_argument);
   for_each_line_table_case (test_lexer_string_locations_non_string);
   for_each_line_table_case (test_lexer_string_locations_long_line);
+  for_each_line_table_case (test_lexer_string_locations_raw_string_one_line);
+  for_each_line_table_case (test_lexer_string_locations_raw_string_multiline);
   for_each_line_table_case (test_lexer_c

Re: [PATCH 2/2] PR77822

2016-11-17 Thread Dominik Vogt
On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote:
> The following two patches fix PR 77822 on s390x for gcc-7.  As the
> macro doing the argument range checks can be used on other targets
> as well, I've put it in system.h (couldn't think of a better
> place; maybe rtl.h?).
> 
> Bootstrapped on s390x biarch, regression tested on s390x biarch
> and s390, all on a zEC12 with -march=zEC12.
> 
> Please check the commit messages for details.

S390 backend patch.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.md ("extzv")
("*extzv")
("*extzvdi_lshiftrt")
("*_ior_and_sr_ze")
("*extract1bitdi")
("*insv", "*insv_rnsbg_noshift")
("*insv_rnsbg_srl", "*insv_mem_reg")
("*insvdi_mem_reghigh", "*insvdi_reg_imm"): Use SIZE_POS_IN_RANGE to
validate the arguments of zero_extract and sign_extract.
>From 2e20be51e86cde2e02f8f02e0367456d3fe8d92b Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Thu, 17 Nov 2016 14:49:48 +0100
Subject: [PATCH 2/2] PR target/77822: S390: Validate argument range of
 {zero,sign}_extract.

With some undefined code, combine generates patterns where the arguments to
*_extract are out of range, e.b. a negative bit position.  If the s390 backend
accepts these, they lead to not just undefined behaviour but invalid assembly
instructions (argument out of the allowed range).  So this patch makes sure
that the rtl expressions with out of range arguments are rejected.
---
 gcc/config/s390/s390.md | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index fedd6f9..d2dfb1e 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -3741,6 +3741,8 @@
  (clobber (reg:CC CC_REGNUM))])]
   "TARGET_Z10"
 {
+  if (! SIZE_POS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64))
+FAIL;
   /* Starting with zEC12 there is risbgn not clobbering CC.  */
   if (TARGET_ZEC12)
 {
@@ -3760,7 +3762,9 @@
 (match_operand 2 "const_int_operand" "")   ; size
 (match_operand 3 "const_int_operand" ""))) ; start
   ]
-  ""
+  "
+   && SIZE_POS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]),
+GET_MODE_BITSIZE (mode))"
   "\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, 
shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
@@ -3773,6 +3777,7 @@
(lshiftrt:DI (match_operand:DI 3 "register_operand" "d")
 (match_operand:DI 4 "nonzero_shift_count_operand" "")))]
   "
+   && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64)
&& 64 - UINTVAL (operands[4]) >= UINTVAL (operands[1])"
   "\t%0,%3,%2,%2+%1-1,128-%2-%1-%4"
   [(set_attr "op_type" "RIE")
@@ -3791,6 +3796,7 @@
  (match_operand 5 "const_int_operand" "")) ; start
 4)))]
   "
+   && SIZE_POS_IN_RANGE (INTVAL (operands[4]), INTVAL (operands[5]), 64)
&& UINTVAL (operands[2]) == (~(0ULL) << UINTVAL (operands[4]))"
   "\t%0,%3,64-%4,63,%4+%5"
   [(set_attr "op_type" "RIE")
@@ -3804,7 +3810,8 @@
(const_int 1)  ; size
(match_operand 2 "const_int_operand" "")) ; start
   (const_int 0)))]
-  ""
+  "
+   && SIZE_POS_IN_RANGE (1, INTVAL (operands[2]), 64)"
   "\t%0,%1,64-1,128+63,%2+1" ; dst, src, start, end, shift
   [(set_attr "op_type" "RIE")
(set_attr "z10prop" "z10_super_E1")])
@@ -3919,6 +3926,8 @@
  (match_operand 2 "const_int_operand""I")) ; pos
(match_operand:GPR 3 "nonimmediate_operand" "d"))]
   "
+   && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]),
+GET_MODE_BITSIZE (mode))
&& (INTVAL (operands[1]) + INTVAL (operands[2])) <= "
   "\t%0,%3,%2,%2+%1-1,-%2-%1"
   [(set_attr "op_type" "RIE")
@@ -4214,6 +4223,7 @@
  (match_operand:DI 3 "nonimmediate_operand" "d")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_Z10
+   && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64)
&& INTVAL (operands[1]) + INTVAL (operands[2]) == 64"
   "rnsbg\t%0,%3,%2,63,0"
   [(set_attr "op_type" "RIE")])
@@ -4230,6 +4240,7 @@
  (match_operand:DI 4 "nonimmediate_operand" "d")))
(clobber (reg:CC CC_REGNUM))]
   "TARGET_Z10
+   && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64)
&& INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL (operands[2])"
   "rnsbg\t%0,%4,%2,%2+%1-1,%3"
   [(set_attr "op_type" "RIE")])
@@ -4239,7 +4250,8 @@
(match_operand 1 "const_int_operand" "n,n")
(const_int 0))
(match_operand:W 2 "register_operand" "d,d"))]
-  "INTVAL (operands[1]) > 0
+  "SIZE_POS_IN_RANGE (INTVAL (operands[1]), 0, 64)
+   && INTVAL (operands[1]) > 0
&& INTVAL (operands[1]) <= GET_MODE_BITSIZE (SImode)
&& INTVAL (operands[1]) % BITS_PER_UNIT == 0"
 {
@@ -4260,6 +4272,

[PATCH 0/2] PR77822

2016-11-17 Thread Dominik Vogt
The following two patches fix PR 77822 on s390x for gcc-7.  As the
macro doing the argument range checks can be used on other targets
as well, I've put it in system.h (couldn't think of a better
place; maybe rtl.h?).

Bootstrapped on s390x biarch, regression tested on s390x biarch
and s390, all on a zEC12 with -march=zEC12.

Please check the commit messages for details.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany



Re: Import libcilkrts Build 4467 (PR target/68945)

2016-11-17 Thread Rainer Orth
Rainer Orth  writes:

> I happened to notice that my libcilkrts SPARC port has been applied
> upstream.  So to reach closure on this issue for the GCC 7 release, I'd
> like to import upstream into mainline which seems to be covered by the
> free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
> though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
> specific and we have no listed maintainer.

I initially used Ilya's intel.com address, which bounced.  Now using the
current address listed in MAINTAINERS...

> The following patch has passed x86_64-pc-linux-gnu bootstrap without
> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
> are currently running.
>
> Ok for mainline if they pass?

Both Solaris bootstraps have completed successfully now.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")

2016-11-17 Thread Paolo Carlini
... oops, forgot about parser->non_integral_constant_expression_p. The 
version below, which I'm currently regtesting again, seems more correct.


Thanks,
Paolo.

///
Index: cp/parser.c
===
--- cp/parser.c (revision 242540)
+++ cp/parser.c (working copy)
@@ -4841,7 +4841,13 @@ cp_parser_primary_expression (cp_parser *parser,
 checked at that point.  If we are not within a cast, then
 this code is invalid.  */
  if (!cast_p)
-   cp_parser_non_integral_constant_expression (parser, NIC_FLOAT);
+   {
+ parser->non_integral_constant_expression_p = true;
+ if (!parser->allow_non_integral_constant_expression_p)
+   pedwarn (input_location, OPT_Wpedantic,
+"ISO C++ forbids using a floating-point literal "
+"in a constant-expression");
+   }
}
   return cp_expr (token->u.value, token->location);
 
Index: testsuite/g++.dg/parse/pr55080.C
===
--- testsuite/g++.dg/parse/pr55080.C(revision 0)
+++ testsuite/g++.dg/parse/pr55080.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/55080
+// { dg-options "-std=c++98 -pedantic" }
+
+class B {
+ static const int c = 3.1415926; // { dg-warning "constant-expression" }
+};


Re: [PATCH][2/2] GIMPLE Frontend, middle-end changes

2016-11-17 Thread Jeff Law

On 11/17/2016 01:20 AM, Richard Biener wrote:

On Wed, 16 Nov 2016, Jeff Law wrote:


On 10/28/2016 05:51 AM, Richard Biener wrote:


These are the middle-end changes and additions to the testsuite.

They are pretty self-contained, I've organized the changelog
entries below in areas of changes:

 1) dump changes - we add a -gimple dump modifier that allows most
 function dumps to be directy fed back into the GIMPLE FE

 2) pass manager changes to implement the startwith("pass-name")
 feature which implements unit-testing for GIMPLE passes

 3) support for "SSA" input, a __PHI stmt that is lowered once the
 CFG is built, a facility to allow a specific SSA name to be allocated
 plus a small required change in the SSA rewriter to allow for
 pre-existing PHI arguments

Bootstrapped and tested on x86_64-unknown-linux-gnu (together with [1/2]).

I can approve all these changes myself but any comments are welcome.

My only worry would be ensuring that in the case where we're asking for a
particular SSA_NAME in make_ssa_name_fn that we assert the requested name is
available.

ISTM that if it's > the highest current version or in the freelist, then we
ought to be safe.   If it isn't safe then we should either issue an error, or
renumber the preexisting SSA_NAME (and determining if it's safe to renumber
the preexisting SSA_NAME may require more context than we have).


An alternative interface would be to return NULL rather than asserting.
OTOH the single caller requesting a specific version first does a lookup
if the SSA name already exists, so it is safe.
I have a slight preference for enforcing safety within the name manager, 
but I can live with doing the checking in the caller.


jeff


Re: Fix PR78154

2016-11-17 Thread Jeff Law

On 11/17/2016 01:48 AM, Richard Biener wrote:

On Wed, 16 Nov 2016, Jeff Law wrote:


On 11/16/2016 05:17 PM, Martin Sebor wrote:



(I've heard some noise in C++-land about making memcpy(0,0,0) valid, but
that may have just been noise)


We may have read the same discussion.  It would make some things
a little easier in C++ (and remove what most people view as yet
another unnecessary gotcha in the language).

And that may be a reasonable thing to do.

While GCC does take advantage of the non-null attribute when trying to prove
certain pointers must be non-null, it only does so when the magic flag is
turned on.  There was a sense that it was too aggressive and that time may be
necessary for folks to come to terms with what GCC was doing, particularly in
the the memcpy (*, *, 0) case -- but I've never gotten the sense that happened
and we've never turned that flag on by default.


We only have -f[no-]delete-null-pointer-checks and that's on by default.

So we _do_ take advantage of those.
Hmm, maybe it's the path isolation I'm thinking about where we have the 
additional flag to control whether or not we use the attributes to help 
identify erroneous paths.


jeff


Re: PR78319

2016-11-17 Thread Jeff Law

On 11/17/2016 01:52 AM, Richard Biener wrote:

On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote:


On 17 November 2016 at 03:20, Jeff Law  wrote:

On 11/16/2016 01:23 PM, Prathamesh Kulkarni wrote:


Hi,
As discussed in PR, this patch marks the test-case to xfail on
arm-none-eabi.
OK to commit ?


You might check if Aldy's change to the uninit code helps your case
(approved earlier today, so hopefully in the tree very soon).  I quickly
scanned the BZ.  There's some overlap, but it might be too complex for
Aldy's enhancements to catch.

Hi Jeff,
I tried Aldy's patch [1], but it didn't catch the case in PR78319.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00225.html


XFAILing is ok.

Agreed.

Thanks for checking Prathamesh.

Jeff


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Kyrill Tkachov


On 17/11/16 15:01, Segher Boessenkool wrote:

On Thu, Nov 17, 2016 at 02:55:20PM +, Kyrill Tkachov wrote:

Overall, there's a slight improvement in SPECINT, even with the gobmk
regression and a slightly larger improvement
on SPECFP due to povray.

And that is for only the "normal" GPRs, not LR or FP yet, right?

This patch does implement FP registers wrapping as well but not LR.

I meant frame pointer, not floating point :-)


Ah :) HARD_FRAME_POINTER is wrapped with -fomit-frame-pointer but we don't
touch the stack pointer (SP) in any case.

Kyrill



Segher




Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Segher Boessenkool
On Thu, Nov 17, 2016 at 02:55:20PM +, Kyrill Tkachov wrote:
> >>Overall, there's a slight improvement in SPECINT, even with the gobmk
> >>regression and a slightly larger improvement
> >>on SPECFP due to povray.
> >And that is for only the "normal" GPRs, not LR or FP yet, right?
> 
> This patch does implement FP registers wrapping as well but not LR.

I meant frame pointer, not floating point :-)


Segher


Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Kyrill Tkachov


On 17/11/16 14:44, Segher Boessenkool wrote:

Hi Kyrill,

On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote:

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there
were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%

After looking at the gobmk performance with performance counters it looks
like more icache pressure.
I see an increase in misses.
This looks to me like an effect of code size increase, though it is not
that large an increase (0.4% with SWS).

Right.  I don't see how to improve on this (but see below); ideas welcome :-)


Branch mispredicts also go up a bit but not as much as icache misses.

I don't see that happening -- for some testcases we get unlucky and have
more branch predictor aliasing, and for some we have less, it's pretty
random.  Some testcases are really sensitive to this.


Right, I don't think it's the branch prediction at fault in this case,
rather the above icache stuff.




I don't think there's anything we can do here, or at least that this patch
can do about it.
Overall, there's a slight improvement in SPECINT, even with the gobmk
regression and a slightly larger improvement
on SPECFP due to povray.

And that is for only the "normal" GPRs, not LR or FP yet, right?


This patch does implement FP registers wrapping as well but not LR.
Though I remember seeing the improvement even when only GPRs were wrapped
in an earlier version of the patch.


Segher, one curious artifact I spotted while looking at codegen differences
in gobmk was a case where we fail
to emit load-pairs as effectively in the epilogue and its preceeding basic
block.
So before we had this epilogue:
.L43:
 ldpx21, x22, [sp, 16]
 ldpx23, x24, [sp, 32]
 ldpx25, x26, [sp, 48]
 ldpx27, x28, [sp, 64]
 ldrx30, [sp, 80]
 ldpx19, x20, [sp], 112
 ret

and I see this becoming (among numerous other changes in the function):

.L69:
 ldpx21, x22, [sp, 16]
 ldrx24, [sp, 40]
.L43:
 ldpx25, x26, [sp, 48]
 ldpx27, x28, [sp, 64]
 ldrx23, [sp, 32]
 ldrx30, [sp, 80]
 ldpx19, x20, [sp], 112
 ret

So this is better in the cases where we jump straight into .L43 because we
load fewer registers
but worse when we jump to or fallthrough to .L69 because x23 and x24 are
now restored using two loads
rather than a single load-pair. This hunk isn't critical to performance in
gobmk though.

Is loading/storing a pair as cheap as loading/storing a single register?
In that case you could shrink-wrap per pair of registers instead.


I suppose it can vary by microarchitecture. For the purposes of codegen I'd say
it's more expensive than load/storing a single register (as there's more memory 
bandwidth required after all)
but cheaper than two separate loads stores (alignment quirks notwithstanding).
Interesting idea. That could help with code size too. I'll try it out.
Thanks,
Kyrill



Segher




Re: [patch, libfortran] Add AVX-specific matmul

2016-11-17 Thread Thomas Koenig

Well, here is a newer version of the patch.

I wrote a few configure tests to check for AVX.
This version hast the advantage that, if anybody
uses 32-bit programs with AVX, they would also benefit.

Jakub, would you be OK with that patch?

I do not yet want to commit this because it needs more
testing on different platforms to see if it actually
performs better.

Regarding putting the blocked part into something separate:
Quite doable, but I would rather like to do this in a follow-up
patch, if we decide t do it.

Regards

Thomas

2016-11-17  Thomas Koenig  

PR fortran/78379
* acinclude.m4 (LIBGFOR_CHECK_AVX):  New test.
(LIBGFOR_CHECK_AVX2):  New test.
(LIBGFOR_CHECK_AVX512F):  New test.
* configure.ac:  Call LIBGFOR_CHECK_AVX, LIBGFOR_CHECK_AVX2
and LIBGFOR_CHECK_AVX512F.
* config.h.in: Regenerated.
* configure:  Regenerated.
* m4/matmul.m4: For AVX, AVX2 and AVX_512F, make the work function
for matmul static with target_clones for AVX and default, and
create a wrapper function to call it.
* generated/matmul_c10.c: Regenerated.
* generated/matmul_c16.c: Regenerated.
* generated/matmul_c4.c: Regenerated.
* generated/matmul_c8.c: Regenerated.
* generated/matmul_i1.c: Regenerated.
* generated/matmul_i16.c: Regenerated.
* generated/matmul_i2.c: Regenerated.
* generated/matmul_i4.c: Regenerated.
* generated/matmul_i8.c: Regenerated.
* generated/matmul_r10.c: Regenerated.
* generated/matmul_r16.c: Regenerated.
* generated/matmul_r4.c: Regenerated.
* generated/matmul_r8.c: Regenerated.


Index: acinclude.m4
===
--- acinclude.m4	(Revision 242477)
+++ acinclude.m4	(Arbeitskopie)
@@ -393,3 +393,54 @@ AC_DEFUN([LIBGFOR_CHECK_STRERROR_R], [
 		  [Define if strerror_r takes two arguments and is available in .]),)
   CFLAGS="$ac_save_CFLAGS"
 ])
+
+dnl Check for AVX
+
+AC_DEFUN([LIBGFOR_CHECK_AVX], [
+  ac_save_CFLAGS="$CFLAGS"
+  CFLAGS="-O2 -mavx"
+  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+  void _mm256_zeroall (void)
+{
+   __builtin_ia32_vzeroall ();
+}]], [[]])],
+	AC_DEFINE(HAVE_AVX, 1,
+	[Define if AVX instructions can be compiled.]),
+	[])
+  CFLAGS="$ac_save_CFLAGS"
+])
+
+dnl Check for AVX2
+
+AC_DEFUN([LIBGFOR_CHECK_AVX2], [
+  ac_save_CFLAGS="$CFLAGS"
+  CFLAGS="-O2 -mavx2"
+  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+  typedef long long __v4di __attribute__ ((__vector_size__ (32)));
+	__v4di
+	mm256_is32_andnotsi256  (__v4di __X, __v4di __Y)
+{
+	   return __builtin_ia32_andnotsi256 (__X, __Y);
+}]], [[]])],
+	AC_DEFINE(HAVE_AVX2, 1,
+	[Define if AVX2 instructions can be compiled.]),
+	[])
+  CFLAGS="$ac_save_CFLAGS"
+])
+
+dnl Check for AVX512f
+
+AC_DEFUN([LIBGFOR_CHECK_AVX512F], [
+  ac_save_CFLAGS="$CFLAGS"
+  CFLAGS="-O2 -mavx512f"
+  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+	typedef double __m512d __attribute__ ((__vector_size__ (64)));
+	__m512d _mm512_add (__m512d a)
+	{
+	  return __builtin_ia32_addpd512_mask (a, a, a, 1, 4);
+}]], [[]])],
+	AC_DEFINE(HAVE_AVX512F, 1,
+	[Define if AVX512f instructions can be compiled.]),
+	[])
+  CFLAGS="$ac_save_CFLAGS"
+])
Index: config.h.in
===
--- config.h.in	(Revision 242477)
+++ config.h.in	(Arbeitskopie)
@@ -78,6 +78,15 @@
 /* Define to 1 if the target supports __attribute__((visibility(...))). */
 #undef HAVE_ATTRIBUTE_VISIBILITY
 
+/* Define if AVX instructions can be compiled. */
+#undef HAVE_AVX
+
+/* Define if AVX2 instructions can be compiled. */
+#undef HAVE_AVX2
+
+/* Define if AVX512f instructions can be compiled. */
+#undef HAVE_AVX512F
+
 /* Define to 1 if you have the `cabs' function. */
 #undef HAVE_CABS
 
Index: configure
===
--- configure	(Revision 242477)
+++ configure	(Arbeitskopie)
@@ -26174,6 +26174,93 @@ $as_echo "#define HAVE_CRLF 1" >>confdefs.h
 
 fi
 
+# Check whether we support AVX extensions
+
+  ac_save_CFLAGS="$CFLAGS"
+  CFLAGS="-O2 -mavx"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+  void _mm256_zeroall (void)
+{
+   __builtin_ia32_vzeroall ();
+}
+int
+main ()
+{
+
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+
+$as_echo "#define HAVE_AVX 1" >>confdefs.h
+
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+  CFLAGS="$ac_save_CFLAGS"
+
+
+# Check wether we support AVX2 extensions
+
+  ac_save_CFLAGS="$CFLAGS"
+  CFLAGS="-O2 -mavx2"
+  cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+  typedef long long __v4di __attribute__ ((__vector_size__ (32)));
+	__v4di
+	mm256_is32_andnotsi256  (__v4di __X, __v4di __Y)
+{
+	   return __builtin_ia32_andnotsi256 (__X, __Y);
+}
+int

Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Segher Boessenkool
Hi Kyrill,

On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote:
> >>I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there 
> >>were
> >>some interesting swings.
> >>458.sjeng +1.45%
> >>471.omnetpp   +2.19%
> >>445.gobmk -2.01%
> >>
> >>On SPECFP:
> >>453.povray+7.00%

> After looking at the gobmk performance with performance counters it looks 
> like more icache pressure.
> I see an increase in misses.
> This looks to me like an effect of code size increase, though it is not 
> that large an increase (0.4% with SWS).

Right.  I don't see how to improve on this (but see below); ideas welcome :-)

> Branch mispredicts also go up a bit but not as much as icache misses.

I don't see that happening -- for some testcases we get unlucky and have
more branch predictor aliasing, and for some we have less, it's pretty
random.  Some testcases are really sensitive to this.

> I don't think there's anything we can do here, or at least that this patch 
> can do about it.
> Overall, there's a slight improvement in SPECINT, even with the gobmk 
> regression and a slightly larger improvement
> on SPECFP due to povray.

And that is for only the "normal" GPRs, not LR or FP yet, right?

> Segher, one curious artifact I spotted while looking at codegen differences 
> in gobmk was a case where we fail
> to emit load-pairs as effectively in the epilogue and its preceeding basic 
> block.
> So before we had this epilogue:
> .L43:
> ldpx21, x22, [sp, 16]
> ldpx23, x24, [sp, 32]
> ldpx25, x26, [sp, 48]
> ldpx27, x28, [sp, 64]
> ldrx30, [sp, 80]
> ldpx19, x20, [sp], 112
> ret
> 
> and I see this becoming (among numerous other changes in the function):
> 
> .L69:
> ldpx21, x22, [sp, 16]
> ldrx24, [sp, 40]
> .L43:
> ldpx25, x26, [sp, 48]
> ldpx27, x28, [sp, 64]
> ldrx23, [sp, 32]
> ldrx30, [sp, 80]
> ldpx19, x20, [sp], 112
> ret
> 
> So this is better in the cases where we jump straight into .L43 because we 
> load fewer registers
> but worse when we jump to or fallthrough to .L69 because x23 and x24 are 
> now restored using two loads
> rather than a single load-pair. This hunk isn't critical to performance in 
> gobmk though.

Is loading/storing a pair as cheap as loading/storing a single register?
In that case you could shrink-wrap per pair of registers instead.


Segher


Re: [PATCH] Fix PR77848

2016-11-17 Thread Bill Schmidt

> On Nov 17, 2016, at 3:42 AM, Richard Biener  
> wrote:
> 
> On Thu, Nov 17, 2016 at 4:14 AM, Bill Schmidt
>  wrote:
>> 
>> Yes, the new predicate works fine.  New patch below, bootstrapped and tested
>> on powerpc64le-unknown-linux-gnu, with only the bb-slp-cond-1.c regressions
>> previously discussed.  Is this ok for trunk?
> 
> Yes.  Please open a bug for the bb-slp-cond-1.c testsuite FAIL (so we have an
> excuse to either improve loop vectorization data dependence analysis or
> BB vectorization in stage3).

Fix committed as r242550, new bug open at 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78396.

Thanks!
Bill
> 
> Thanks,
> Richard.



Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation

2016-11-17 Thread Kyrill Tkachov


On 14/11/16 14:25, Kyrill Tkachov wrote:


On 11/11/16 15:31, Kyrill Tkachov wrote:


On 11/11/16 10:17, Kyrill Tkachov wrote:


On 10/11/16 23:39, Segher Boessenkool wrote:

On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote:

On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov

I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were
some interesting swings.
458.sjeng +1.45%
471.omnetpp   +2.19%
445.gobmk -2.01%

On SPECFP:
453.povray+7.00%


Wow, this looks really good.  Thank you for implementing this.  If I
get some time I am going to try it out on other processors than A72
but I doubt I have time any time soon.

I'd love to hear what causes the slowdown for gobmk as well, btw.


I haven't yet gotten a direct answer for that (through performance analysis 
tools)
but I have noticed that load/store pairs are not generated as aggressively as I 
hoped.
They are being merged by the sched fusion pass and peepholes (which runs after 
this)
but it still misses cases. I've hacked the SWS hooks to generate pairs 
explicitly and that
increases the number of pairs and helps code size to boot. It complicates the 
logic of
the hooks a bit but not too much.

I'll make those changes and re-benchmark, hopefully that
will help performance.



And here's a version that explicitly emits pairs. I've looked at assembly 
codegen on SPEC2006
and it generates quite a few more LDP/STP pairs than the original version.
I kicked off benchmarks over the weekend to see the effect.
Andrew, if you want to try it out (more benchmarking and testing always 
welcome) this is the
one to try.



And I discovered over the weekend that gamess and wrf have validation errors.
This version runs correctly.
SPECINT results were fine though and there is even a small overall gain due to
sjeng and omnetpp. However, gobmk still has the regression.
I'll rerun SPECFP with this patch (it's really just a small bugfix over the 
previous version)
and get on with analysing gobmk.



After looking at the gobmk performance with performance counters it looks like 
more icache pressure.
I see an increase in misses.
This looks to me like an effect of code size increase, though it is not that 
large an increase (0.4% with SWS).
Branch mispredicts also go up a bit but not as much as icache misses.
I don't think there's anything we can do here, or at least that this patch can 
do about it.
Overall, there's a slight improvement in SPECINT, even with the gobmk 
regression and a slightly larger improvement
on SPECFP due to povray.

Segher, one curious artifact I spotted while looking at codegen differences in 
gobmk was a case where we fail
to emit load-pairs as effectively in the epilogue and its preceeding basic 
block.
So before we had this epilogue:
.L43:
ldpx21, x22, [sp, 16]
ldpx23, x24, [sp, 32]
ldpx25, x26, [sp, 48]
ldpx27, x28, [sp, 64]
ldrx30, [sp, 80]
ldpx19, x20, [sp], 112
ret

and I see this becoming (among numerous other changes in the function):

.L69:
ldpx21, x22, [sp, 16]
ldrx24, [sp, 40]
.L43:
ldpx25, x26, [sp, 48]
ldpx27, x28, [sp, 64]
ldrx23, [sp, 32]
ldrx30, [sp, 80]
ldpx19, x20, [sp], 112
ret

So this is better in the cases where we jump straight into .L43 because we load 
fewer registers
but worse when we jump to or fallthrough to .L69 because x23 and x24 are now 
restored using two loads
rather than a single load-pair. This hunk isn't critical to performance in 
gobmk though.

Given that there is an overall gain is this ok for trunk?
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01352.html

Thanks,
Kyrill



2016-11-11  Kyrylo Tkachov  

* config/aarch64/aarch64.h (machine_function): Add
reg_is_wrapped_separately field.
* config/aarch64/aarch64.c (emit_set_insn): Change return type to
rtx_insn *.
(aarch64_save_callee_saves): Don't save registers that are wrapped
separately.
(aarch64_restore_callee_saves): Don't restore registers that are
wrapped separately.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p,
aarch64_offset_7bit_signed_scaled_p): Move earlier in the file.
(aarch64_get_separate_components): New function.
(aarch64_get_next_set_bit): Likewise.
(aarch64_components_for_bb): Likewise.
(aarch64_disqualify_components): Likewise.
(aarch64_emit_prologue_components): Likewise.
(aarch64_emit_epilogue_components): Likewise.
(aarch64_set_handled_components): Likewise.
(TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS,
TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB,
TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS,
TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.





Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f

2016-11-17 Thread Bin.Cheng
On Thu, Nov 17, 2016 at 10:53 AM, Richard Biener
 wrote:
> On Thu, Nov 17, 2016 at 11:26 AM, Bin.Cheng  wrote:
>> On Thu, Nov 17, 2016 at 8:32 AM, Richard Biener
>>  wrote:
>>> On Wed, Nov 16, 2016 at 6:20 PM, Bin Cheng  wrote:
 Hi,
 Currently test gfortran.dg/vect/fast-math-mgrid-resid.f checks all 
 predictive commoning opportunities for all possible loops.  This makes it 
 fragile because vectorizer may peel the loop differently, as well as may 
 choose different vector factors.  For example, on x86-solaris, vectorizer 
 doesn't peel for prologue loop; for -march=haswell, the case is long time 
 failed because vector factor is 4, while iteration distance of predictive 
 commoning opportunity is smaller than 4.  This patch refines it by only 
 checking if predictive commoning variable is created when vector factor is 
 2; or vectorization variable is created when factor is 4.  This works 
 since we have only one main loop, and only one vector factor can be used.
 Test result checked for various x64 targets.  Is it OK?
>>>
>>> I think that as you write the test is somewhat fragile.  But rather
>>> than adjusting the scanning like you do
>>> I'd add --param vect-max-peeling-for-alignment=0 and -mprefer-avx128
>> In this way, is it better to add "--param
>> vect-max-peeling-for-alignment=0" for all targets?  Otherwise we still
>> need to differentiate test string to handle different targets.  But I
>> have another question here: what if a target can't handle unaligned
>> access and vectorizer have to peel for alignment for it?
>
> You'd get versioning for alignment instead.
>
>> Also do you think it's ok to check predictive commoning PHI node as below?
>> # vectp_u.122__lsm0.158_94 = PHI 
>> In this way, we don't need to take possible prologue/epilogue loops
>> into consideration.
>
> I hoped w/o peeling we can simply scan for "Executing predictive commoning".
> But with versioning for alignment you'd still get two loops.
>
> So maybe checking for both "Executing predictive commoning" and looking
> for a vect_lsm PHI node is ok...
Understood.  Here is the update patch.  Test result checked on x86_64
and aarch64.  Is it OK?

Thanks,
bin

gcc/testsuite/ChangeLog
2016-11-15  Bin Cheng  

PR testsuite/78114
* gfortran.dg/vect/fast-math-mgrid-resid.f: Add additional
options.  Refine test by checking predictive commining PHI
nodes in vectorized loop wrto vector factor.

>
>>> as additional option on x86_64-*-* i?86-*-*.
>>>
>>> Your new pattern would fail with avx512 if vector (8) real would be used.
>>>
>>> What's the actual change that made the testcase fail btw?
>> There are two cases.
>> A) After vect_do_peeling change, vectorizer may only peel one
>> iteration for prologue loop (if vf == 2), below test string was added
>> for this reason:
>> ! { dg-final { scan-tree-dump-times "Loop iterates only 1 time,
>> nothing to do" 1 "pcom" } }
>> This fails on x86_64 solaris because prologue loop is not peeled at all.
>> B) Depending on ilp, I think below test strings fail for long time with 
>> haswell:
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 1 "pcom" { target lp64 } } }
>> ! { dg-final { scan-tree-dump-times "Executing predictive commoning
>> without unrolling" 2 "pcom" { target ia32 } } }
>> Because vectorizer choose vf==4 in this case, and there is no
>> predictive commoning opportunities at all.
>
> Yes.  I suggest -mprefer-avx128 for that.
>
>> Also the newly added test string fails in this case too because the
>> prolog peeled iterates more than 1 times.
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
 Thanks,
 bin

 gcc/testsuite/ChangeLog
 2016-11-16  Bin Cheng  

 PR testsuite/78114
 * gfortran.dg/vect/fast-math-mgrid-resid.f: Refine test by
 checking predictive commining variables in vectorized loop
 wrto vector factor.
diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f 
b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
index 88238f9..293cac9 100644
--- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
+++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f
@@ -1,7 +1,7 @@
 ! { dg-do compile }
 ! { dg-require-effective-target vect_double }
-! { dg-options "-O3 -fpredictive-commoning -fdump-tree-pcom-details" }
-
+! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 
-fpredictive-commoning -fdump-tree-pcom-details" }
+! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } 
}
 
 *** RESID COMPUTES THE RESIDUAL:  R = V - AU
 *
@@ -38,8 +38,8 @@ C
   RETURN
   END
 ! we want to check that predictive commoning did something on the
-! vectorized loop.
-! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling" 1 "pcom" { target lp64 } } }
-! { dg-final { scan-tree-dump-times "Executing predictive commoning without 
unrolling"

Re: [patch,testsuite] Support dg-require-effective-target label_offsets.

2016-11-17 Thread Senthil Kumar Selvaraj

Georg-Johann Lay writes:

> On 04.11.2016 06:18, Senthil Kumar Selvaraj wrote:
>>
>> Georg-Johann Lay writes:
>>> State of matters is that Binutils support is missing, and if I understand 
>>> you
>>> correctly, dg-require is not appropriate to factor out availability of such
>>> features?
>>
>> I'll take a stab at adding the missing binutils support for avr label diffs.
>
> Thanks for taking care of this.  I had a look into it but got stuck with a 
> test 
> case derived from gcc.c-torture/execute/pr70460.c
>
> int c;
>
> __attribute__((noinline, noclone)) void
> call (void)
> {
>__asm ("nop");
> }
>
> __attribute__((noinline, noclone)) void
> foo (int x)
> {
>static int b[] = { &&lab1 - &&lab0, &&lab2 - &&lab0 };
>void *a = &&lab0 + b[x];
>goto *a;
> lab1:
>c += 2;
>call();
> lab2:
>c++;
> lab0:
>;
> }
>
> int
> main ()
> {
>foo (0);
>if (c != 3)
>  __builtin_abort ();
>foo (1);
>if (c != 4)
>  __builtin_abort ();
>return 0;
> }
>
>
> The problem is when relaxation is turned on and the CALL is shortened to  
> RCALL 
> but respective literals are not fixed up.

That linker issue is fixed with 
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4cb771f214ed6a2102e37bce255c6be5d0642f3a

Regards
Senthil


[patch,avr,committed] More use of CONST_INT_P, SUBREG_P.

2016-11-17 Thread Georg-Johann Lay
This is a no-op code clean-up that makes more use of SUBREG_P and CONST_INT_P 
if possible.


Applied as obvious.

Johann

* config/avr/avr.c (avr_print_operand_address): Use CONST_INT_P if
appropriate.
(ashlqi3_out, ashlsi3_out, ashrqi3_out, ashrhi3_out): Same.
(ashrsi3_out, lshrqi3_out, lshrhi3_out, lshrsi3_out): Same.
(avr_rtx_costs_1, extra_constraint_Q): Same.
(avr_address_cost): Use SUBREG_P if appropriate.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 242541)
+++ config/avr/avr.c	(working copy)
@@ -2544,7 +2544,7 @@ avr_print_operand_address (FILE *file, m
   rtx x = addr;
   if (GET_CODE (x) == CONST)
 x = XEXP (x, 0);
-  if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x,1)) == CONST_INT)
+  if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x,1)))
 {
   /* Assembler gs() will implant word address.  Make offset
  a byte offset inside gs() for assembler.  This is
@@ -6083,7 +6083,7 @@ out_shift_with_cnt (const char *templ, r
 const char *
 ashlqi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
 
@@ -6180,7 +6180,7 @@ ashlqi3_out (rtx_insn *insn, rtx operand
 const char *
 ashlhi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL);
   int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]);
@@ -6500,7 +6500,7 @@ avr_out_ashlpsi3 (rtx_insn *insn, rtx *o
 const char *
 ashlsi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
   int *t = len;
@@ -6589,7 +6589,7 @@ ashlsi3_out (rtx_insn *insn, rtx operand
 const char *
 ashrqi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
 
@@ -6661,7 +6661,7 @@ ashrqi3_out (rtx_insn *insn, rtx operand
 const char *
 ashrhi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL);
   int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]);
@@ -6883,7 +6883,7 @@ avr_out_ashrpsi3 (rtx_insn *insn, rtx *o
 const char *
 ashrsi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
   int *t = len;
@@ -6980,7 +6980,7 @@ ashrsi3_out (rtx_insn *insn, rtx operand
 const char *
 lshrqi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
 
@@ -7075,7 +7075,7 @@ lshrqi3_out (rtx_insn *insn, rtx operand
 const char *
 lshrhi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL);
   int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]);
@@ -7386,7 +7386,7 @@ avr_out_lshrpsi3 (rtx_insn *insn, rtx *o
 const char *
 lshrsi3_out (rtx_insn *insn, rtx operands[], int *len)
 {
-  if (GET_CODE (operands[2]) == CONST_INT)
+  if (CONST_INT_P (operands[2]))
 {
   int k;
   int *t = len;
@@ -10549,7 +10549,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
   return true;
 }
 	  *total = COSTS_N_INSNS (1);
-	  if (GET_CODE (XEXP (x, 1)) != CONST_INT)
+	  if (!CONST_INT_P (XEXP (x, 1)))
 	*total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, speed);
 	  break;
 
@@ -10568,7 +10568,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
 *total = COSTS_N_INSNS (1) + *total;
   return true;
 }
-	  if (GET_CODE (XEXP (x, 1)) != CONST_INT)
+	  if (!CONST_INT_P (XEXP (x, 1)))
 	{
 	  *total = COSTS_N_INSNS (2);
 	  *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1,
@@ -10594,7 +10594,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
   break;
 
 	case SImode:
-	  if (GET_CODE (XEXP (x, 1)) != CONST_INT)
+	  if (!CONST_INT_P (XEXP (x, 1)))
 	{
 	  *total = COSTS_N_INSNS (4);
 	  *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1,
@@ -10645,7 +10645,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod
 case IOR:
   *total = COSTS_N_INSNS (GET_MODE_SIZE (mode));
   *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed);
-  if (GET_CODE (XEXP (x, 1)) != CONST_INT)
+  if (!CONST_INT_P (XEXP (x, 1)))
 	*total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, speed);
   return true;
 
@@ -10813,7 +10813,7 @@ avr_rtx_costs_1 (rtx x, mac

RE: [PATCH 2/2] [ARC] Update target specific tests.

2016-11-17 Thread Claudiu Zissulescu
> These entries should be going into the gcc/testsuite/ChangeLog file,
> and so don't need the "testsuite/" prefix.
> 
Ups :) Fix it.

> Otherwise I'm happy for this to be merged.  I've only skimmed the
> change, but assuming you've run the tests this all seems good.

Results of running arc.exp on LE/arc700 default cpu should be:

=== gcc Summary ===

# of expected passes151
# of unsupported tests  2

Committed, thank you for your review,
Claudiu


RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-17 Thread Matthew Fortune
Toma Tabacu  writes:
> Hi,
> 
> The patch below is a rebased version of Andrew's patch plus a few more changes
> to test options.
> 
> I have tested Andrew's patch by passing -msoft-float to the testsuite without
> having a soft-float library available, and saw that the 
> inline-memcpy-{1,2,3,4,5}.c
> and memcpy-1.c tests were also failing to find standard library headers.
> In the version below, I have added (REQUIRES_STDLIB) to them as well.
> 
> However, I believe that the memcpy-1.c test was removed from the first version
> of Andrew's patch in response to Matthew's comments, but I don't think it
> should be.
> 
> Tested with mips-img-linux-gnu and mips-mti-linux-gnu.

This looks OK to me but I then again I helped with the design for this.

I'd like to give Catherine a chance to take a look though as the feature is 
unusual.

One comment below.

> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index e22d782..ccd4ecb 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -1420,6 +1426,22 @@ proc mips-dg-options { args } {
>   }
>  }
> 
> +# If the test is marked as requiring standard libraries check
> +# that the sysroot has support for the current set of test options.
> +if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } {
> + mips_push_test_options saved_options $extra_tool_flags
> + set output [mips_preprocess "" {
> +   #include 
> + } 1]
> + mips_pop_test_options saved_options
> +
> + # If the preprocessing of the stdlib.h file produced errors mark
> + # the test as unsupported.
> + if { ![string equal $output ""] } {
> + set do_what [lreplace $do_what 1 1 "N"]

We should describe what we expect the format of do_what to be at the time
of writing in case it ever changes. i.e. a comment to say what the second
character means and therefore make it clear that setting it to 'n' makes
the test unsupported.

Thanks,
Matthew


RE: [PATCH] [ARC] Add support for QuarkSE processor.

2016-11-17 Thread Claudiu Zissulescu
> 
> Looks fine.
> 

Committed, thank you for your review,
Claudiu


RE: [PATCH 4/4] [ARC] Fix compilation issue in pr71872.

2016-11-17 Thread Claudiu Zissulescu
> Looks fine.
> 
> Thanks,
> Andrew
> 
Committed, thank you for your review,
Claudiu


Re: [PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode

2016-11-17 Thread Christophe Lyon
On 16 November 2016 at 10:39, Kyrill Tkachov
 wrote:
>
> On 09/11/16 16:19, Thomas Preudhomme wrote:
>>
>> Hi,
>>
>> This patch fixes the following ICE when building when compiling an empty
>> FIQ interrupt handler in ARM mode:
>>
>> empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints:
>>  }
>>  ^
>>
>> (insn/f 13 12 14 (set (reg/f:SI 13 sp)
>> (plus:SI (reg/f:SI 11 fp)
>> (const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3}
>>  (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
>> (plus:SI (reg/f:SI 11 fp)
>> (const_int 4 [0x4])))
>> (nil)))
>>
>> The ICE was provoked by missing an alternative to reflect that ARM mode
>> can do an add of general register into sp which is unpredictable in Thumb
>> mode add immediate.
>>
>> ChangeLog entries are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>> 2016-11-04  Thomas Preud'homme  
>>
>> * config/arm/arm.md (arm_addsi3): Add alternative for addition of
>> general register with general register or ARM constant into SP
>> register.
>>
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2016-11-04  Thomas Preud'homme  
>>
>> * gcc.target/arm/empty_fiq_handler.c: New test.
>>
>>
>> Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression.
>>
>> Is this ok for trunk?
>>
>
> I see that "r" does not include the stack pointer (STACK_REG is separate
> from GENERAL_REGs) so we are indeed missing
> that constraint.
>
> Ok for trunk.
> Thanks,
> Kyrill
>
>> Best regards,
>>
>> Thomas
>
>

Hi Thomas,

The new test fails when compiled with -mthumb -march=armv5t:
gcc.target/arm/empty_fiq_handler.c: In function 'fiq_handler':
gcc.target/arm/empty_fiq_handler.c:11:1: error: interrupt Service
Routines cannot be coded in Thumb mode

Christophe


RE: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads

2016-11-17 Thread Claudiu Zissulescu

> > Note on tests: It will be nice to add a test where the added
> > peephole kicks in. If you consider to add this test to the current
> > patch, please resubmit it.
> 
> There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
> fail for me without the peephole, and work with the peephole.
> 
> The code generated for L/E ARC is slightly different than the code
> generated for B/E ARC due to how the structures are laid out in
> memory, so, for now I've made parts of the test B/E only.
> 
> In order to get code that is as efficient for L/E as B/E I'd end up
> adding a whole new peeophole, I'd rather not do that - it would be
> extra code to maintain for a combination CMEM+L/E that is not used.  I
> figure we can come back to that if/when that combination ever becomes
> interesting.  I'm hoping you'll be fine with that.
> 

Sure, please go ahead and apply your patch after having the spaces converted ;) 
Claudiu


Re: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads

2016-11-17 Thread Andrew Burgess
* Claudiu Zissulescu  [2016-11-17 13:02:02 
+]:

> Hi Andrew,
> 
> Approved, please apply, but ...
> 
> > +(define_peephole2
> > +  [(set (match_operand:SI 0 "register_operand" "")
> > +(sign_extend:SI
> > +  (match_operand:QI 1 "any_mem_operand" "")))
> > +   (set (reg:CC_ZN CC_REG)
> > +   (compare:CC_ZN (match_dup 0)
> > +   (const_int 0)))
> > +   (set (pc)
> > +(if_then_else (match_operator 2 "ge_lt_comparison_operator"
> > +[(reg:CC_ZN CC_REG) (const_int 0)])
> > +  (match_operand 3 "" "")
> > +  (match_operand 4 "" "")))]
> > +  "TARGET_NPS_CMEM
> > +   && cmem_address (XEXP (operands[1], 0), SImode)
> > +   && peep2_reg_dead_p (2, operands[0])
> > +   && peep2_regno_dead_p (3, CC_REG)"
> > +  [(set (match_dup 0)
> > +(zero_extend:SI
> > +  (match_dup 1)))
> > +   (set (reg:CC_ZN CC_REG)
> > +   (compare:CC_ZN (zero_extract:SI
> > + (match_dup 0)
> > + (const_int 1)
> > + (const_int 7))
> > +   (const_int 0)))
> > +   (set (pc)
> > +(if_then_else (match_dup 2)
> > +  (match_dup 3)
> > +  (match_dup 4)))]
> > +  "if (GET_CODE (operands[2]) == GE)
> > + operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> > const0_rtx);
> > +   else
> > + operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> > const0_rtx);")
> > +
> 
> It seems to me you use spaces instead of tabs.

Ooops.  I'll fix.

> 
> Note on tests: It will be nice to add a test where the added
> peephole kicks in. If you consider to add this test to the current
> patch, please resubmit it.

There were cmem-bit-{1,2,3,4}.c added in that patch.  All of which
fail for me without the peephole, and work with the peephole.

The code generated for L/E ARC is slightly different than the code
generated for B/E ARC due to how the structures are laid out in
memory, so, for now I've made parts of the test B/E only.

In order to get code that is as efficient for L/E as B/E I'd end up
adding a whole new peeophole, I'd rather not do that - it would be
extra code to maintain for a combination CMEM+L/E that is not used.  I
figure we can come back to that if/when that combination ever becomes
interesting.  I'm hoping you'll be fine with that.

Thanks,
Andrew


Import libcilkrts Build 4467 (PR target/68945)

2016-11-17 Thread Rainer Orth
I happened to notice that my libcilkrts SPARC port has been applied
upstream.  So to reach closure on this issue for the GCC 7 release, I'd
like to import upstream into mainline which seems to be covered by the
free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even
though https://gcc.gnu.org/codingconventions.html#upstream lists nothing
specific and we have no listed maintainer.

A few issues are worth mention:

* Upstream still has a typo in the git URL in many files.  I've
  corrected that during the import to avoid a massive diff:

-#  https://bitbucket.org/intelcilkruntime/itnel-cilk-runtime.git are
+#  https://bitbucket.org/intelcilkruntime/intel-cilk-runtime.git are

* libcilkrts.spec.in is missing upstream.  I've no idea if this is
  intentional.

* A few of my changes have been lost and I can't tell if this is by
  accident:

** Lost whitespace:

--- libcilkrts-old/Makefile.am  2016-05-04 16:44:24.0 +
+++ libcilkrts-new/Makefile.am  2016-11-17 11:35:33.782987017 +
@@ -54,7 +54,7 @@ GENERAL_FLAGS = -I$(top_srcdir)/include
 # Enable Intel Cilk Plus extension
 GENERAL_FLAGS += -fcilkplus
 
-# Always generate unwind tables
+#Always generate unwind tables
 GENERAL_FLAGS += -funwind-tables
 
 AM_CFLAGS = $(XCFLAGS) $(GENERAL_FLAGS) -std=c99

** Lost alphabetical order of targets:

diff -rup libcilkrts-old/configure.ac libcilkrts-new/configure.ac
--- libcilkrts-old/configure.ac 2016-11-16 18:34:28.0 +
+++ libcilkrts-new/configure.ac 2016-11-17 11:35:33.800015570 +
@@ -143,14 +145,14 @@ esac
 # contains information on what's needed
 case "${target}" in
 
-  arm-*-*)
-config_dir="arm"
-;;
-
   i?86-*-* | x86_64-*-*)
 config_dir="x86"
 ;;
 
+  arm-*-*)
+config_dir="arm"
+;;
+
   sparc*-*-*)
 config_dir="sparc"
 ;;
diff -rup libcilkrts-old/configure.tgt libcilkrts-new/configure.tgt
--- libcilkrts-old/configure.tgt2016-11-16 18:34:28.0 +
+++ libcilkrts-new/configure.tgt2016-11-17 11:35:33.807873451 +
@@ -44,10 +44,10 @@
 
 # Disable Cilk Runtime library for unsupported architectures.
 case "${target}" in
-  arm-*-*)
-;;
   i?86-*-* | x86_64-*-*)
 ;;
+  arm-*-*)
+;;
   sparc*-*-*)
 ;;
   *-*-*)

  I've done nothing about those, just wanted to point them out.

The following patch has passed x86_64-pc-linux-gnu bootstrap without
regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps
are currently running.

Ok for mainline if they pass?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-11-17  Rainer Orth  

PR target/68945
Merge from upstream, version 2.0.4467.0.
Fix typo in git URL.
* aclocal.m4, configure, Makefile.in: Regenerate.

# HG changeset patch
# Parent  f20811cd2fde8357b5c51307decee8c877c1e16a
Import libcilkrts Build 4467 (PR target/68945)

diff --git a/libcilkrts/Makefile.am b/libcilkrts/Makefile.am
--- a/libcilkrts/Makefile.am
+++ b/libcilkrts/Makefile.am
@@ -54,7 +54,7 @@ GENERAL_FLAGS = -I$(top_srcdir)/include 
 # Enable Intel Cilk Plus extension
 GENERAL_FLAGS += -fcilkplus
 
-# Always generate unwind tables
+#Always generate unwind tables
 GENERAL_FLAGS += -funwind-tables
 
 AM_CFLAGS = $(XCFLAGS) $(GENERAL_FLAGS) -std=c99
diff --git a/libcilkrts/README b/libcilkrts/README
--- a/libcilkrts/README
+++ b/libcilkrts/README
@@ -1,14 +1,16 @@
-Intel(R) Cilk(TM) Plus runtime library
+Intel(R) Cilk(TM) Plus Runtime Library
 
 Index:
-1. BUILDING
-2. USING
-3. DOXYGEN DOCUMENTATION
-4. QUESTIONS OR BUGS
-5. CONTRIBUTIONS
+1. BUILDING WITH AUTOMAKE
+2. BUILDING WITH CMAKE
+3. INSTALLING TO VXWORKS
+4. USING
+5. DOXYGEN DOCUMENTATION
+6. QUESTIONS OR BUGS
+7. CONTRIBUTIONS
 
 #
-#  1. BUILDING:
+#  1. BUILDING WITH AUTOMAKE:
 #
 
 To distribute applications that use the Intel Cilk Plus language
@@ -40,22 +42,87 @@ configure script:
 
 % ./configure --prefix=/your/path/to/lib
 
-It is also possible to use CMake if the above method does not apply
-well in your environment. Instruction is available in CMakeLists.txt.
+#
+#  2. BUILDING WITH CMAKE:
+#
+
+To distribute applications that use the Intel Cilk Plus language
+extensions to non-development systems, you need to build the Intel
+Cilk Plus runtime library and distribute it with your application.
+This instruction describes the build process using CMake*, which
+supports Linux*, Windows*, and OS X*.  It is fine to use this process
+to build a Linux library, but it is highly recommended to use the
+more mature build process described above when building on Linux.
+
+You need the CMake tool and a C/C++ compiler that supports the Intel
+Cilk Plus language extensions, and the requirements for each operating
+systems are:
+
+Common:
+CMake 3.0.0 or later
+Make tools such as make (Linux, OS X) or nmake (Windows)
+Linux:
+GCC* 4.9.2 or later, or Intel(R) C+

RE: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads

2016-11-17 Thread Claudiu Zissulescu
Hi Andrew,

Approved, please apply, but ...

> +(define_peephole2
> +  [(set (match_operand:SI 0 "register_operand" "")
> +(sign_extend:SI
> +  (match_operand:QI 1 "any_mem_operand" "")))
> +   (set (reg:CC_ZN CC_REG)
> + (compare:CC_ZN (match_dup 0)
> +   (const_int 0)))
> +   (set (pc)
> +(if_then_else (match_operator 2 "ge_lt_comparison_operator"
> +[(reg:CC_ZN CC_REG) (const_int 0)])
> +  (match_operand 3 "" "")
> +  (match_operand 4 "" "")))]
> +  "TARGET_NPS_CMEM
> +   && cmem_address (XEXP (operands[1], 0), SImode)
> +   && peep2_reg_dead_p (2, operands[0])
> +   && peep2_regno_dead_p (3, CC_REG)"
> +  [(set (match_dup 0)
> +(zero_extend:SI
> +  (match_dup 1)))
> +   (set (reg:CC_ZN CC_REG)
> + (compare:CC_ZN (zero_extract:SI
> + (match_dup 0)
> + (const_int 1)
> + (const_int 7))
> +   (const_int 0)))
> +   (set (pc)
> +(if_then_else (match_dup 2)
> +  (match_dup 3)
> +  (match_dup 4)))]
> +  "if (GET_CODE (operands[2]) == GE)
> + operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> const0_rtx);
> +   else
> + operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61),
> const0_rtx);")
> +

It seems to me you use spaces instead of tabs.

Note on tests: It will be nice to add a test where the added peephole kicks in. 
If you consider to add this test to the current patch, please resubmit it.

Best,
Claudiu



RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.

2016-11-17 Thread Toma Tabacu
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Andrew Bennett
> Sent: 03 November 2016 11:33
> To: Matthew Fortune; 'Moore, Catherine'; 'gcc-patches@gcc.gnu.org'
> Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> library support check the sysroot supports the required test options.
> 
> Ping.
> 
> 
> Regards,
> 
> 
> 
> Andrew
> 
> > -Original Message-
> > From: Andrew Bennett
> > Sent: 28 August 2015 16:50
> > To: Matthew Fortune; Moore, Catherine; gcc-patches@gcc.gnu.org
> > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard
> > library support check the sysroot supports the required test options.
> >
> > > I had some comments on this that I hadn't got round to posting. The fix in
> > > this patch is not general enough as the missing header problem comes in
> > > two (related) forms:
> > >
> > > 1) Using the new MTI and IMG sysroot layout we can end up with GCC
> looking
> > >for headers in a sysroot that simply does not exist. The current patch
> > >handles this.
> > > 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible
> > >for the stdlib.h header to be found but the ABI dependent gnu-stubs
> > >header may not be installed depending on soft/hard nan1985/nan2008.
> > >
> > > The test for stdlib.h needs to therefore verify that preprocessing 
> > > succeeds
> > > rather than just testing for an error relating to stdlib.h. This could be
> > > done by adding a further option to mips_preprocess to indicate the
> processor
> > > output should go to a file and that the caller wants the messages emitted
> > > by the compiler instead.
> > >
> > > A second issue is that you have added (REQUIRES_STDLIB) to too many
> tests.
> > > You only need to add it to tests that request a compiler option (via
> > > dg-options) that could potentially lead to forcing soft/hard 
> > > nan1985/nan2008
> > > directly or indirectly. So -mips32r6 implies nan2008 so you need it -
> > mips32r5
> > > implies nan1985 so you need it. There are at least two tests which don't
> > > need the option but you need to check them all so we don't run the check
> > > needlessly.
> >
> > The updated patch and ChangeLog that addresses Matthew's comments is
> below.
> >
> > Ok to commit?
> >
> > Regards,
> >
> >
> > Andrew
> >
> >
> > testsuite/
> >
> > * gcc.target/mips/loongson-simd.c (dg-options): Add
> > (REQUIRES_STDLIB).
> > * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto
> > * gcc.target/mips/mips-3d-1.c: Ditto
> > * gcc.target/mips/mips-3d-2.c: Ditto
> > * gcc.target/mips/mips-3d-3.c: Ditto
> > * gcc.target/mips/mips-3d-4.c: Ditto
> > * gcc.target/mips/mips-3d-5.c: Ditto
> > * gcc.target/mips/mips-3d-6.c: Ditto
> > * gcc.target/mips/mips-3d-7.c: Ditto
> > * gcc.target/mips/mips-3d-8.c: Ditto
> > * gcc.target/mips/mips-3d-9.c: Ditto
> > * gcc.target/mips/mips-ps-1.c: Ditto
> > * gcc.target/mips/mips-ps-2.c: Ditto
> > * gcc.target/mips/mips-ps-3.c: Ditto
> > * gcc.target/mips/mips-ps-4.c: Ditto
> > * gcc.target/mips/mips-ps-6.c: Ditto
> > * gcc.target/mips/mips16-attributes.c: Ditto
> > * gcc.target/mips/mips32-dsp-run.c: Ditto
> > * gcc.target/mips/mips32-dsp.c: Ditto
> > * gcc.target/mips/save-restore-1.c: Ditto
> > * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib.
> > (mips_preprocess): Add ignore_output argument that when set
> > will not return the pre-processed output.
> > (mips_arch_info): Update arguments for the call to
> > mips_preprocess.
> > (mips-dg-init): Ditto.
> > (mips-dg-options): Check if a test having test option
> > (REQUIRES_STDLIB) has the required sysroot support for
> > the current test options.
> >
> >
> >
> > diff --git 
> > a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > index f57a18c..baed48c 100644
> > --- a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > +++ b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c
> > @@ -4,7 +4,7 @@
> >  /* loongson.h does not handle or check for MIPS16ness.  There doesn't
> > seem any good reason for it to, given that the Loongson processors
> > do not support MIPS16.  */
> > -/* { dg-options "isa=loongson -mhard-float -mno-mips16" } */
> > +/* { dg-options "isa=loongson -mhard-float -mno-mips16
> (REQUIRES_STDLIB)" }
> > */
> >  /* See PR 52155.  */
> >  /* { dg-options "isa=loongson -mhard-float -mno-mips16 -mlong64" { mips*-
> *-
> > elf* && ilp32 } } */
> >
> > diff --git a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > index 6d2ceb6..f263b43 100644
> > --- a/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > +++ b/gcc/testsuite/gcc.target/mips/loongson-simd.c
> > @@ -26,7 +26,7 @@ along with GCC; see the

RE: [PATCH 1/4] [ARC] Various fixes.

2016-11-17 Thread Claudiu Zissulescu
Hi Andrew,

Please ignore this patch for the time being. I would like to double check the 
add_n/sub_n patterns.

Sorry for this detour,
Claudiu

> -Original Message-
> From: Claudiu Zissulescu
> Sent: Wednesday, November 16, 2016 11:18 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Claudiu Zissulescu ;
> francois.bed...@synopsys.com; andrew.burg...@embecosm.com
> Subject: [PATCH 1/4] [ARC] Various fixes.
> 
> The ifconversion was failing because a move involving the lp_count was
> not match by movsi_ne.  This patch updates the constraints such that
> movsi_ne will match.  The failing test is dg-torture.exp=pr68955.c for
> archs and without small data.
> 
> gcc/
> 2016-07-11  Claudiu Zissulescu  
> 
>   * config/arc/arc.h (REG_CLASS_NAMES): Reorder.
>   * config/arc/arc.md (*add_n): Change.
>   (*sub_n): Likewise.
>   (movsi_ne): Update constraint to Rac.
> ---
>  gcc/config/arc/arc.h  |  2 +-
>  gcc/config/arc/arc.md | 18 +-
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
> index cd8b9f1..642bf73 100644
> --- a/gcc/config/arc/arc.h
> +++ b/gcc/config/arc/arc.h
> @@ -617,10 +617,10 @@ enum reg_class
>"MPY_WRITABLE_CORE_REGS",   \
>"WRITABLE_CORE_REGS",   \
>"CHEAP_CORE_REGS",   \
> +  "ALL_CORE_REGS", \
>"R0R3_CD_REGS", \
>"R0R1_CD_REGS", \
>"AC16_H_REGS", \
> -  "ALL_CORE_REGS", \
>"ALL_REGS"   \
>  }
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 5a7699f..217935c 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -2897,10 +2897,10 @@
> (set (match_dup 3) (match_dup 4))])
> 
>  (define_insn "*add_n"
> -  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w")
> - (plus:SI (mult:SI (match_operand:SI 1 "register_operand"
> "Rcqq,c,c,c,c,c")
> -   (match_operand:SI 2 "_2_4_8_operand" ""))
> -  (match_operand:SI 3 "nonmemory_operand"
> "0,0,c,?Cal,?c,??Cal")))]
> +  [(set (match_operand:SI 0 "dest_reg_operand"  "=Rcqq,Rcw,W,
> W,w,w")
> + (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "Rcqq,  c,c,
> c,c,c")
> +   (match_operand:SI 2 "_2_4_8_operand"   ""))
> +  (match_operand:SI 3 "nonmemory_operand""0,
> 0,c,Cal,c,Cal")))]
>""
>"add%z2%? %0,%3,%1%&"
>[(set_attr "type" "shift")
> @@ -2912,9 +2912,9 @@
>  ;; N.B. sub[123] has the operands of the MINUS in the opposite order from
>  ;; what synth_mult likes.
>  (define_insn "*sub_n"
> -  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w")
> - (minus:SI (match_operand:SI 1 "nonmemory_operand" "0,c,?Cal")
> -   (mult:SI (match_operand:SI 2 "register_operand" "c,c,c")
> +  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w")
> + (minus:SI (match_operand:SI 1 "nonmemory_operand"
> "0,c,?Cal")
> +   (mult:SI (match_operand:SI 2 "register_operand" "w,w,w")
>  (match_operand:SI 3 "_2_4_8_operand" ""]
>""
>"sub%z3%? %0,%1,%2"
> @@ -3570,8 +3570,8 @@
>  (define_insn "*movsi_ne"
>[(cond_exec
>   (ne (match_operand:CC_Z 2 "cc_use_register""Rcc,  Rcc,  
> Rcc,Rcc,Rcc")
> (const_int 0))
> - (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q,
> w,w")
> -   (match_operand:SI 1 "nonmemory_operand"   "C_0,h, ?Cal,
> Lc,?Cal")))]
> + (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q,
> w,w")
> +   (match_operand:SI 1 "nonmemory_operand"   "C_0,h,
> ?Cal,LRac,?Cal")))]
>""
>"@
>   * current_insn_predicate = 0; return \"sub%?.ne %0,%0,%0%&\";
> --
> 1.9.1



[PATCH] Remove -ftree-loop-if-convert-stores

2016-11-17 Thread Richard Biener

This removes a no longer used option, whether we if-convert stores
now depends on --param allow-store-data-races and/or the availability
of masked stores.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-11-17  Richard Biener  

* common.opt (ftree-loop-if-convert-stores): Mark as preserved for
backward compatibility.
* doc/invoke.texi (ftree-loop-if-convert-stores): Remove.
* tree-if-conv.c (pass_if_conversion::gate): Do not test
flag_tree_loop_if_convert_stores.
(pass_if_conversion::execute): Likewise.

diff --git a/gcc/common.opt b/gcc/common.opt
index 5e8d72d..1fa3629 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1526,8 +1526,8 @@ Common Report Var(flag_tree_loop_if_convert) Init(-1) 
Optimization
 Convert conditional jumps in innermost loops to branchless equivalents.
 
 ftree-loop-if-convert-stores
-Common Report Var(flag_tree_loop_if_convert_stores) Optimization
-Also if-convert conditional jumps containing memory writes.
+Common Ignore
+Does nothing. Preserved for backward compatibility.
 
 ; -finhibit-size-directive inhibits output of .size for ELF.
 ; This is used only for compiling crtstuff.c,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 620225c..e6c3dc2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
 -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting -ftree-loop-if-convert 
@gol
--ftree-loop-if-convert-stores -ftree-loop-im @gol
+-ftree-loop-im @gol
 -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
 -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
 -ftree-loop-vectorize @gol
@@ -8061,24 +8061,6 @@ the innermost loops in order to improve the ability of 
the
 vectorization pass to handle these loops.  This is enabled by default
 if vectorization is enabled.
 
-@item -ftree-loop-if-convert-stores
-@opindex ftree-loop-if-convert-stores
-Attempt to also if-convert conditional jumps containing memory writes.
-This transformation can be unsafe for multi-threaded programs as it
-transforms conditional memory writes into unconditional memory writes.
-For example,
-@smallexample
-for (i = 0; i < N; i++)
-  if (cond)
-A[i] = expr;
-@end smallexample
-is transformed to
-@smallexample
-for (i = 0; i < N; i++)
-  A[i] = cond ? expr : A[i];
-@end smallexample
-potentially producing data races.
-
 @item -ftree-loop-distribution
 @opindex ftree-loop-distribution
 Perform loop distribution.  This flag can improve cache performance on
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 46d6b34..dc97fc4 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -2872,8 +2872,7 @@ pass_if_conversion::gate (function *fun)
 {
   return (((flag_tree_loop_vectorize || fun->has_force_vectorize_loops)
   && flag_tree_loop_if_convert != 0)
- || flag_tree_loop_if_convert == 1
- || flag_tree_loop_if_convert_stores == 1);
+ || flag_tree_loop_if_convert == 1);
 }
 
 unsigned int
@@ -2887,7 +2886,6 @@ pass_if_conversion::execute (function *fun)
 
   FOR_EACH_LOOP (loop, 0)
 if (flag_tree_loop_if_convert == 1
-   || flag_tree_loop_if_convert_stores == 1
|| ((flag_tree_loop_vectorize || loop->force_vectorize)
&& !loop->dont_vectorize))
   todo |= tree_if_conversion (loop);


Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable

2016-11-17 Thread James Greenhalgh
On Tue, Nov 01, 2016 at 03:21:29PM +, Kyrill Tkachov wrote:
> Here it is.
> I've confirmed that it emits to STRs for 4 byte aligned stores when 
> -mtune=thunderx
> and still generates STP for the other tunings, though now sched-fusion is 
> responsible for
> merging them, which is ok by me.
> 
> Bootstrapped and tested on aarch64.
> Ok for trunk?

OK.

Thanks,
James

> 2016-11-01  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.md (mov): Call
> aarch64_split_dimode_const_store on DImode constant stores.
> * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store):
> New prototype.
> * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New
> function.
> 
> 2016-11-01  Kyrylo Tkachov  
> 
> * gcc.target/aarch64/store_repeating_constant_1.c: New test.
> * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.



[PATCH] Fix PR78383 middle-end ICE

2016-11-17 Thread Richard Biener

The following fixes the ICE part of PR78383.  We ICE when trying to
turn a computed goto to a CFG edge when we know the label but that
label is in another function (IPA propagation exposed this knowledge).

The transform shows that we miss a DECL_NONLOCAL on the label because
we happily miscompile the testcase by moving the label freely before
the call to the lambda.  This results in

_ZZ4mainENKUlPvE_clES_.isra.0.constprop.1:
main:
call_ZZ4mainENKUlPvE_clES_.isra.0.constprop.1

where you can see the goto has been optimized somehow on RTL as well.
Final GIMPLE is

main():: ()
{
  :
  goto &label;

}

int main() ()
{
label:
  main()_ZZ4mainENKUlPvE_clES_.isra.0.constprop ();

}

and initial RTL is already broken:

;; goto &label;

(insn 5 4 6 (set (reg:DI 87)
(label_ref/v:DI 0)) "t.ii":3 -1
 (nil))

(jump_insn 6 5 7 (set (pc)
(reg:DI 87)) "t.ii":3 -1
 (nil))

looks like it might be a bad idea to freely allow LABEL_DECL address
propagation but in the end it works fine for DECL_NONLOCAL labels
so I think this is what we miss (the label is not marked DECL_NONLOCAL).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Will commit w/o the testcase (because that is still miscompiled).

Richard.

2016-11-17  Richard Biener  

PR middle-end/78383
* tree-cfgcleanup.c (cleanup_control_flow_bb): Do not turn
non-local goto into CFG.

diff --git a/gcc/testsuite/g++.dg/torture/pr78383.C 
b/gcc/testsuite/g++.dg/torture/pr78383.C
new file mode 100644
index 000..2b35bc7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr78383.C
@@ -0,0 +1,11 @@
+// { dg-do run }
+
+int main()
+{
+  auto f = [](void* ptr) { goto *ptr; };
+
+  f(&&label);
+
+label:
+  return 0;
+}
diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c
index 21873f8..fe22ed3 100644
--- a/gcc/tree-cfgcleanup.c
+++ b/gcc/tree-cfgcleanup.c
@@ -230,6 +230,8 @@ cleanup_control_flow_bb (basic_block bb, bool first_p)
 edges which do not go to the right block.  For the one
 edge which goes to the right block, fix up its flags.  */
   label = TREE_OPERAND (gimple_goto_dest (stmt), 0);
+  if (DECL_CONTEXT (label) != cfun->decl)
+   return retval;
   target_block = label_to_block (label);
   for (ei = ei_start (bb->succs); (e = ei_safe_edge (ei)); )
{


[C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")

2016-11-17 Thread Paolo Carlini

Hi,

this rather old issue points out that, for the snippet below in the 
legacy -std=c++98 and -pedantic, we issue an hard error instead of a 
pedwarn - what we normally do, as clarified by Manuel. I guess we could 
imagine further clean-ups in this area but something as simple as the 
below seems correct to me and pretty conservative.


Tested x86_64-linux.

Thanks, Paolo.

///

/cp
2016-11-17  Paolo Carlini  

PR c++/55080
* parser.c (cp_parser_primary_expression): Issue a pedwarn instead
of an error floating-point literals in a constant-expression.

/testsuite
2016-11-17  Paolo Carlini  

PR c++/55080
* g++.dg/parse/pr55080.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 242540)
+++ cp/parser.c (working copy)
@@ -4840,8 +4840,10 @@ cp_parser_primary_expression (cp_parser *parser,
 cast is to an integral or enumeration type will be
 checked at that point.  If we are not within a cast, then
 this code is invalid.  */
- if (!cast_p)
-   cp_parser_non_integral_constant_expression (parser, NIC_FLOAT);
+ if (!cast_p && !parser->allow_non_integral_constant_expression_p)
+   pedwarn (input_location, OPT_Wpedantic, 
+"ISO C++ forbids using a floating-point literal in a "
+"constant-expression");
}
   return cp_expr (token->u.value, token->location);
 
Index: testsuite/g++.dg/parse/pr55080.C
===
--- testsuite/g++.dg/parse/pr55080.C(revision 0)
+++ testsuite/g++.dg/parse/pr55080.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/55080
+// { dg-options "-std=c++98 -pedantic" }
+
+class B {
+ static const int c = 3.1415926; // { dg-warning "constant-expression" }
+};


  1   2   >