Re: [PATCH] avoid calling alloca(0)
On Thu, Nov 17, 2016 at 05:32:31PM -0700, Martin Sebor wrote: > >The point is warning on an alloca (0) may not be as clear cut as it > >might seem. It's probably a reasonable thing to do on the host, but on > >a target, which might be embedded and explicitly overriding the builtin > >alloca, warning on alloca (0) is less of a slam dunk. > > I confess I haven't heard of such an implementation before but after > some searching I have managed to find a few examples of it online, > such as in QNX or in the BlackBerry SDK, or in the GCC shipped by > Texas Instruments. For instance: In the libiberty/alloca.c, I don't see anything special about alloca (0). Yes, alloca (0) reclaims garbage, but so does alloca (1) and alloca (4035). alloca (0) just returns NULL after it. The diffutils link is the same alloca.c as in libiberty. Returning NULL or returning a non-unique pointer for alloca (0) is just fine, it is the same thing as returning NULL or returning a non-unique pointer from malloc (0). We definitely don't want to warn for malloc (0), and IMNSHO don't want to warn for alloca (0), because it behaves the same. Jakub
Re: [PATCH][2/2] GIMPLE Frontend, middle-end changes
On Thu, 17 Nov 2016, Jeff Law wrote: > On 11/17/2016 01:20 AM, Richard Biener wrote: > > On Wed, 16 Nov 2016, Jeff Law wrote: > > > > > On 10/28/2016 05:51 AM, Richard Biener wrote: > > > > > > > > These are the middle-end changes and additions to the testsuite. > > > > > > > > They are pretty self-contained, I've organized the changelog > > > > entries below in areas of changes: > > > > > > > > 1) dump changes - we add a -gimple dump modifier that allows most > > > > function dumps to be directy fed back into the GIMPLE FE > > > > > > > > 2) pass manager changes to implement the startwith("pass-name") > > > > feature which implements unit-testing for GIMPLE passes > > > > > > > > 3) support for "SSA" input, a __PHI stmt that is lowered once the > > > > CFG is built, a facility to allow a specific SSA name to be allocated > > > > plus a small required change in the SSA rewriter to allow for > > > > pre-existing PHI arguments > > > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu (together with > > > > [1/2]). > > > > > > > > I can approve all these changes myself but any comments are welcome. > > > My only worry would be ensuring that in the case where we're asking for a > > > particular SSA_NAME in make_ssa_name_fn that we assert the requested name > > > is > > > available. > > > > > > ISTM that if it's > the highest current version or in the freelist, then > > > we > > > ought to be safe. If it isn't safe then we should either issue an error, > > > or > > > renumber the preexisting SSA_NAME (and determining if it's safe to > > > renumber > > > the preexisting SSA_NAME may require more context than we have). > > > > An alternative interface would be to return NULL rather than asserting. > > OTOH the single caller requesting a specific version first does a lookup > > if the SSA name already exists, so it is safe. > I have a slight preference for enforcing safety within the name manager, but I > can live with doing the checking in the caller. Actually safety is enforced by means of an assert ... the caller would have to check anyway (for a NULL result). IMHO doing tricks like re-numbering on already taken versions doesn't make sense as that doesn't guarantee earlier assigned versions stay the same. It's really a special thing for the parser which wants to preserve SSA name versions as in the source. Richard.
RE: [Aarch64][PATCH] Improve Logical And Immediate Expressions
James, I incorporated all your suggestions, and successfully bootstrapped and re-ran the testsuite. Okay for trunk? 2016-11-18 Michael Collison * config/aarch64/aarch64-protos.h (aarch64_and_split_imm1, aarch64_and_split_imm2) (aarch64_and_bitmask_imm): New prototypes * config/aarch64/aarch64.c (aarch64_and_split_imm1): New overloaded function to create bit mask covering the lowest to highest bits set. (aarch64_and_split_imm2): New overloaded functions to create bit mask of zeros between first and last bit set. (aarch64_and_bitmask_imm): New function to determine if a integer is a valid two instruction "and" operation. * config/aarch64/aarch64.md:(and3): New define_insn and _split allowing wider range of constants with "and" operations. * (ior3, xor3): Use new LOGICAL2 iterator to prevent "and" operator from matching restricted constant range used for ior and xor operators. * config/aarch64/constraints.md (UsO constraint): New SImode constraint for constants in "and" operantions. (UsP constraint): New DImode constraint for constants in "and" operations. * config/aarch64/iterators.md (lconst2): New mode iterator. (LOGICAL2): New code iterator. * config/aarch64/predicates.md (aarch64_logical_and_immediate): New predicate (aarch64_logical_and_operand): New predicate allowing extended constants for "and" operations. * testsuite/gcc.target/aarch64/and_const.c: New test to verify additional constants are recognized and fewer instructions generated. * testsuite/gcc.target/aarch64/and_const2.c: New test to verify additional constants are recognized and fewer instructions generated. -Original Message- From: James Greenhalgh [mailto:james.greenha...@arm.com] Sent: Thursday, November 17, 2016 9:26 AM To: Michael Collison Cc: gcc-patches@gcc.gnu.org; nd Subject: Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote: > This patch is designed to improve code generation for "and" > instructions with certain immediate operands. > > For the following test case: > > int f2(int x) > { >x &= 0x0ff8; > >x &= 0xff001fff; > >return x; > } > > the trunk aarch64 compiler generates: > > mov w1, 8184 > movk w1, 0xf00, lsl 16 > and w0, w0, w1 > > We can generate one fewer instruction if we recognize certain > constants. With the attached patch the current trunk compiler generates: > > and w0, w0, 268435448 > and w0, w0, -16769025 > > Bootstrapped and make check successfully completed with no regressions > on aarch64-linux-gnu. > > Okay for trunk? Hi Michael, I have some minor comments in line. > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 3cdd69b..7ef8cdf 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params; > HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); > int aarch64_get_condition_code (rtx); bool aarch64_bitmask_imm > (HOST_WIDE_INT val, machine_mode); > +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); > +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); > +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, > +machine_mode mode); > int aarch64_branch_cost (bool, bool); enum aarch64_symbol_type > aarch64_classify_symbolic_expression (rtx); bool > aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); diff --git > a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index > 3e663eb..db82c5c 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, > machine_mode mode) >return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26]; } > > +/* Create mask of ones, covering the lowest to highest bits set in > +VAL_IN. */ > + > +unsigned HOST_WIDE_INT > +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) { > + int first_bit_set = ctz_hwi (val_in); > + int last_bit_set = floor_log2 (val_in); > + gcc_assert (first_bit_set < last_bit_set); This assert can never trigger (by definition) unless VAL_IN == 0 (in which case floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case you're testing for, then this assert would be more explicit as gcc_assert (val_in != 0) I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first and last are ambiguous (you have them the opposite way round from what I'd consider first [highest, thus leading bits] and last [lowest/trailing bits]). > + > + return ((HOST_WIDE_INT_UC (2) << last_bit_set) - > + (HOST_WIDE_INT_1U << first_bit_set)); } > + > +/* Creat
Re: [PATCH] fix PR68468
Hi Jakub, Jakub Jelinek wrote, > On Wed, Nov 16, 2016 at 07:31:59AM +0100, Waldemar Brodkorb wrote: > > > On Wed, Nov 09, 2016 at 04:08:39PM +0100, Bernd Schmidt wrote: > > > > On 11/05/2016 06:14 PM, Waldemar Brodkorb wrote: > > > > >Hi, > > > > > > > > > >the following patch fixes PR68468. > > > > >Patch is used for a while in Buildroot without issues. > > > > > > > > > >2016-11-05 Waldemar Brodkorb > > > > > > Two spaces before < instead of just one. > > > > > > > > > > PR gcc/68468 > > > > > > PR libgcc/68468 > > > instead. > > > > > > > > * libgcc/unwind-dw2-fde-dip.c: fix build on FDPIC targets. > > > > > > Capital F in Fix. > > > No libgcc/ prefix for files in libgcc/ChangeLog. > > > > > > > This is ok. > > > > > > I think Waldemar does not have SVN write access, are you going to check it > > > in or who will do that? > > > > Should I resend the patch with the suggested fixes or will someone > > with write access fix it up for me? > > As nobody committed it yet, I've made the changes and committed it for you. Thanks! Waldemar
Re: PING [PATCH] enable -fprintf-return-value by default
On 11/16/2016 09:49 AM, Martin Sebor wrote: I'm looking for an approval of the attached patch. I've adjusted the documentation based on Sandra's input (i.e., documented the negative of the option rather than the positive; thank you for the review, btw.) On 11/08/2016 08:13 PM, Martin Sebor wrote: The -fprintf-return-value optimization has been disabled since the last time it caused a bootstrap failure on powerpc64le. With the underlying problems fixed GCC has bootstrapped fine on all of powerpc64, powerpc64le and x86_64 and tested with no regressions. I'd like to re-enable the option. The attached patch does that. [snip] Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 242500) +++ gcc/doc/invoke.texi (working copy) @@ -384,7 +384,7 @@ Objective-C and Objective-C++ Dialects}. -fno-toplevel-reorder -fno-trapping-math -fno-zero-initialized-in-bss @gol -fomit-frame-pointer -foptimize-sibling-calls @gol -fpartial-inlining -fpeel-loops -fpredictive-commoning @gol --fprefetch-loop-arrays -fprintf-return-value @gol +-fprefetch-loop-arrays -fno-printf-return-value @gol -fprofile-correction @gol -fprofile-use -fprofile-use=@var{path} -fprofile-values @gol -fprofile-reorder-functions @gol Please keep this list alphabetized -- the other "-fno-*" options are sorted as such. The documentation parts of the patch are OK with that fixed, but I can't approve changing the default for the option. -Sandra
[PATCH] [AArch64] Fix PR78382
Hi, Please find attached the patch that fixes PR78382. The "SYMBOL_SMALL_TLSGD" was not handled for ILP32. Hence it generates error when compiled for ILP32. The attached patch adds the support and handles it properly as expected for ILP32. Please review the patch and let me know if its okay? Regression tested on AArch64 with no regressions. Thanks, Naveen 2016-11-18 Naveen H.S * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Handle SYMBOL_SMALL_TLSGD for ILP32. * config/aarch64/aarch64.md : tlsgd_small modified into tlsgd_small_ to support SImode and DImode. *tlsgd_small modified into *tlsgd_small_ to support SImode and DImode. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 11d41cf..1688f0d 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1374,10 +1374,17 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, case SYMBOL_SMALL_TLSGD: { rtx_insn *insns; - rtx result = gen_rtx_REG (Pmode, R0_REGNUM); + rtx result; + if (TARGET_ILP32) + result = gen_rtx_REG (SImode, R0_REGNUM); + else + result = gen_rtx_REG (DImode, R0_REGNUM); start_sequence (); - aarch64_emit_call_insn (gen_tlsgd_small (result, imm)); + if (TARGET_ILP32) + aarch64_emit_call_insn (gen_tlsgd_small_si (result, imm)); + else + aarch64_emit_call_insn (gen_tlsgd_small_di (result, imm)); insns = get_insns (); end_sequence (); diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a652a7c..4833c7f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5089,20 +5089,20 @@ ;; The TLS ABI specifically requires that the compiler does not schedule ;; instructions in the TLS stubs, in order to enable linker relaxation. ;; Therefore we treat the stubs as an atomic sequence. -(define_expand "tlsgd_small" +(define_expand "tlsgd_small_" [(parallel [(set (match_operand 0 "register_operand" "") (call (mem:DI (match_dup 2)) (const_int 1))) - (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS) + (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "")] UNSPEC_GOTSMALLTLS) (clobber (reg:DI LR_REGNUM))])] "" { operands[2] = aarch64_tls_get_addr (); }) -(define_insn "*tlsgd_small" +(define_insn "*tlsgd_small_" [(set (match_operand 0 "register_operand" "") (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1))) - (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS) + (unspec:DI [(match_operand:PTR 1 "aarch64_valid_symref" "S")] UNSPEC_GOTSMALLTLS) (clobber (reg:DI LR_REGNUM)) ] "" diff --git a/gcc/testsuite/gcc.target/aarch64/pr78382.c b/gcc/testsuite/gcc.target/aarch64/pr78382.c new file mode 100644 index 000..6c98e5e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr78382.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -fpic -mabi=ilp32 -mtls-dialect=trad" } */ + +__thread int abc; +void +foo () +{ + int *p; + p = &abc; +}
Go patch committed: Remove more diffs from gofrontend and GCC repos
This patch to gcc/go/gofrontend and libgo removes the remaining extraneous differences between the GCC repo and gofrontend repo. A few files were somehow not deleted from the GCC repo. The aclocal.m4 file was changed recently, and gcc/go/gofrontend/lex.cc was changed a while back. This patch simply restores them to the versions in the gofrontend repo. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/lex.cc === --- gcc/go/gofrontend/lex.cc(revision 242581) +++ gcc/go/gofrontend/lex.cc(working copy) @@ -882,7 +882,7 @@ Lex::gather_identifier() && (cc < '0' || cc > '9')) { // Check for an invalid character here, as we get better - // error behavior if we swallow them as part of the + // error behaviour if we swallow them as part of the // identifier we are building. if ((cc >= ' ' && cc < 0x7f) || cc == '\t' @@ -923,7 +923,7 @@ Lex::gather_identifier() { // There is no valid place for a non-ASCII character // other than an identifier, so we get better error - // handling behavior if we swallow this character after + // handling behaviour if we swallow this character after // giving an error. if (!issued_error) go_error_at(this->location(), Index: libgo/aclocal.m4 === --- libgo/aclocal.m4(revision 242581) +++ libgo/aclocal.m4(working copy) @@ -579,27 +579,6 @@ if test x"${install_sh}" != xset; then fi AC_SUBST(install_sh)]) -# Copyright (C) 2003, 2005 Free Software Foundation, Inc. -# -# This file is free software; the Free Software Foundation -# gives unlimited permission to copy and/or distribute it, -# with or without modifications, as long as this notice is preserved. - -# serial 2 - -# Check whether the underlying file-system supports filenames -# with a leading dot. For instance MS-DOS doesn't. -AC_DEFUN([AM_SET_LEADING_DOT], -[rm -rf .tst 2>/dev/null -mkdir .tst 2>/dev/null -if test -d .tst; then - am__leading_dot=. -else - am__leading_dot=_ -fi -rmdir .tst 2>/dev/null -AC_SUBST([am__leading_dot])]) - # Add --enable-maintainer-mode option to configure. -*- Autoconf -*- # From Jim Meyering @@ -764,74 +743,6 @@ case $mkdir_p in esac ]) -# Copyright (C) 1998, 1999, 2000, 2001, 2003, 2004, 2005, 2006, 2012 -# Free Software Foundation, Inc. -# -# This file is free software; the Free Software Foundation -# gives unlimited permission to copy and/or distribute it, -# with or without modifications, as long as this notice is preserved. - -# serial 6 - -# AM_ENABLE_MULTILIB([MAKEFILE], [REL-TO-TOP-SRCDIR]) -# --- -# Add --enable-multilib to configure. -AC_DEFUN([AM_ENABLE_MULTILIB], -[m4_warn([obsolete], [$0 will be removed from Automake core soon. -Files implementing the "multilib" feature are (and will remain) available -to the 'contrib/' directory in the Automake distribution.])]dnl -[# Default to --enable-multilib -AC_ARG_ENABLE(multilib, -[ --enable-multilib build many library versions (default)], -[case "$enableval" in - yes) multilib=yes ;; - no) multilib=no ;; - *) AC_MSG_ERROR([bad value $enableval for multilib option]) ;; - esac], - [multilib=yes]) - -# We may get other options which we leave undocumented: -# --with-target-subdir, --with-multisrctop, --with-multisubdir -# See config-ml.in if you want the gory details. - -if test "$srcdir" = "."; then - if test "$with_target_subdir" != "."; then -multi_basedir="$srcdir/$with_multisrctop../$2" - else -multi_basedir="$srcdir/$with_multisrctop$2" - fi -else - multi_basedir="$srcdir/$2" -fi -AC_SUBST(multi_basedir) - -# Even if the default multilib is not a cross compilation, -# it may be that some of the other multilibs are. -if test $cross_compiling = no && test $multilib = yes \ - && test "x${with_multisubdir}" != x ; then - cross_compiling=maybe -fi - -AC_OUTPUT_COMMANDS([ -# Only add multilib support code if we just rebuilt the top-level -# Makefile. -case " $CONFIG_FILES " in - *" ]m4_default([$1],Makefile)[ "*) - ac_file=]m4_default([$1],Makefile)[ . ${multi_basedir}/config-ml.in - ;; -esac], - [ -srcdir="$srcdir" -host="$host" -target="$target" -with_multisubdir="$with_multisubdir" -with_multisrctop="$with_multisrctop" -with_target_subdir="$with_target_subdir" -ac_configure_args="${multilib_arg} ${ac_configure_args}" -multi_basedir="$multi_basedir" -CONFIG_SHELL=${CONFIG_SHELL-/bin/sh} -CC="$CC"])])dnl - # Helper functions for option handling. -*- Autoconf -*- # Copyright (C) 2001, 2002, 2003, 2005, 2008, 2010 Free Software @@ -1077,6 +988,11 @@ AC_SUBST([am__tar]) AC_SUBST(
libgo patch committed: Update configure, a few binary files
This patch to libgo updates the configure script and a few binary files. The configure script was changed without going through the external repo in a way that somehow dropped the --with-system-libunwind argument and made a few other changes. This patch restores these changes. A few binary files were changed in the external repo in merges from the master repo, but the changes did not get reflected in the GCC repo. These changes are all irrelevant--a couple of "http" became "https" in files that appeared compressed but were not actually compressed, and a couple of timestamps were zeroed out in files that actually were compressed. Patch bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: libgo/configure === --- libgo/configure (revision 242581) +++ libgo/configure (working copy) @@ -810,6 +810,7 @@ enable_werror enable_version_specific_runtime_libs with_libffi with_libatomic +with_system_libunwind ' ac_precious_vars='build_alias host_alias @@ -1459,6 +1460,7 @@ Optional Packages: both] --without-libffidon't use libffi --without-libatomic don't use libatomic + --with-system-libunwind use installed libunwind Some influential environment variables: CC C compiler command @@ -2483,6 +2485,9 @@ ac_compiler_gnu=$ac_cv_c_compiler_gnu + + + ac_config_headers="$ac_config_headers config.h" @@ -3467,12 +3472,10 @@ done cat confdefs.h - <<_ACEOF >conftest.$ac_ext /* end confdefs.h. */ -#include + int main () { -FILE *f = fopen ("conftest.out", "w"); - return ferror (f) || fclose (f) != 0; ; return 0; @@ -11206,7 +11209,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat > conftest.$ac_ext <<_LT_EOF -#line 11207 "configure" +#line 11212 "configure" #include "confdefs.h" #if HAVE_DLFCN_H @@ -14679,7 +14682,42 @@ $as_echo "#define AC_APPLE_UNIVERSAL_BUI esac -GCC_CHECK_UNWIND_GETIPINFO + + +# Check whether --with-system-libunwind was given. +if test "${with_system_libunwind+set}" = set; then : + withval=$with_system_libunwind; +fi + + # If system-libunwind was not specifically set, pick a default setting. + if test x$with_system_libunwind = x; then +case ${target} in + ia64-*-hpux*) with_system_libunwind=yes ;; + *) with_system_libunwind=no ;; +esac + fi + # Based on system-libunwind and target, do we have ipinfo? + if test x$with_system_libunwind = xyes; then +case ${target} in + ia64-*-*) have_unwind_getipinfo=no ;; + *) have_unwind_getipinfo=yes ;; +esac + else +# Darwin before version 9 does not have _Unwind_GetIPInfo. + +case ${target} in + *-*-darwin[3-8]|*-*-darwin[3-8].*) have_unwind_getipinfo=no ;; + *) have_unwind_getipinfo=yes ;; +esac + + fi + + if test x$have_unwind_getipinfo = xyes; then + +$as_echo "#define HAVE_GETIPINFO 1" >>confdefs.h + + fi + for ac_header in port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h do : @@ -16269,6 +16307,8 @@ ac_configure_args="${multilib_arg} ${ac_ multi_basedir="$multi_basedir" CONFIG_SHELL=${CONFIG_SHELL-/bin/sh} CC="$CC" +CXX="$CXX" +GFORTRAN="$GFORTRAN" AMDEP_TRUE="$AMDEP_TRUE" ac_aux_dir="$ac_aux_dir" Index: libgo/go/archive/zip/testdata/readme.notzip === Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: libgo/go/archive/zip/testdata/readme.zip === Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: libgo/go/compress/gzip/testdata/issue6550.gz === Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: libgo/go/encoding/json/testdata/code.json.gz === Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 05:32 PM, Martin Sebor wrote: On 11/17/2016 03:52 PM, Jeff Law wrote: On 11/17/2016 03:03 PM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote: On 11/17/2016 11:24 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, for (f = formal; f; f = f->next) n++; - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); + /* Take care not to call alloca with a zero argument. */ + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); for (i = 0; i < n; i++) new_arg[i] = NULL; Ugh, that is just too ugly. I don't see anything wrong on alloca (0), and we don't rely on those pointers being distinct from other pointers. On systems where alloca was implemented on top of malloc, alloca (0) would cause collection of alloca'd objects that had gone out of scope. Ouch. Do we support any such systems as hosts? If yes, can't we just define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1) on those systems and leave the sane hosts as is? I would guess they're all dead as hosts for building GCC. I was most familiar with hpux, but I'm pretty sure there were others as emacs (IIRC) had a replacement alloca for systems without it as a builtin. They probably all fall into the "retro-computing" bucket these days. Essentially those systems worked by recording all the allocations as well as the frame depth at which they occurred. The next time alloca was called, anything at a deeper depth than the current frame was released. So even if we called alloca (0) unexpectedly, it's not going to cause anything to break. Failing to call alloca (0) could run the system out of heap memory. It's left as an exercise to the reader to ponder how that might happen -- it can and did happen building GCC "in the old days". The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html Their GCC-derived compilers apparently enable it with -fno-builtins. Other references include source code derived from an implementation with Doug Gwyn's name on it, such as this one: https://opensource.apple.com/source/gnudiff/gnudiff-10/diffutils/alloca.c There's also a copy in libiberty: https://gcc.gnu.org/onlinedocs/libiberty/Functions.html#index-alloca-58 https://gcc.gnu.org/viewcvs/gcc/trunk/libiberty/alloca.c?view=markup A comment at the top the file mentions the alloca(0) use case: As a special case, alloca(0) reclaims storage without allocating any. It is a good idea to use alloca(0) in your main control loop, etc. to force garbage collection. That said, I don't view this as reason to exclude the warning from -Wextra. The vendor-provided compilers are obviously customized versions of GCC with their own features, including their own set of options and warnings. It shouldn't stop us from enabling those we expect to be useful to the typical GCC user on typical targets, and especially those that can help expose sources of common bugs. Recognizing I'm preaching to choir here: Calling alloca with any argument is considered dangerous enough to be discouraged in most man pages. Alloca(0) is a potentially dangerous corner case of an already dangerous API. It seems at least as problematic as any of the warnings already in -Wextra, and I'd say as many in -Wall. Even on systems with this unusual alloca implementation, since alloca(0) is special (and expensive), warning on such calls would likely be useful. In fact, since calling alloca(0) in these environments is actually important, warning on them coukd help detect their unintended absence. The few such calls that are intended can be easily tweaked to suppress the warning. Martin
Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)
On Thu, Nov 17, 2016 at 09:10:51AM +0100, Jakub Jelinek wrote: > On Wed, Nov 16, 2016 at 04:29:05PM -0800, Marek Polacek wrote: > > Sorry. So consider the following: > > > > class S > > { > > virtual void foo () = 0; > > }; > > > > struct T { > > T &operator << (const char *s); > > }; > > > > T t; > > > > void > > S::foo () > > { > > t << "a" << "b" << "c"; > > } > > > > Before > > 1498 if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT)) > > 1499 ubsan_maybe_instrument_member_call (stmt, is_ctor); > > > > the stmt will be > > > > T::operator<< (T::operator<< (T::operator<< (&t, "a"), "b"), "c") > > > > after ubsan_maybe_instrument_member_call it will be > > > > T::operator<< (UBSAN_NULL (SAVE_EXPR > "a"), "b")>, 4B, 0);, SAVE_EXPR > "b")>;, "c") > > > > and that's what is saved into the hash table. Then another stmt will be the > > inner > > > > T::operator<< (T::operator<< (&t, "a"), "b") > > > > which we instrument and put into the hash table, and so on. But those > > SAVE_EXPRs aren't the same. So we have a T::operator<< call that has nested > > T::operator<< calls and we kind of recursively instrument all of them. > > But those SAVE_EXPRs are the same. If you put a breakpoint on the: > 489 if (op) > 490 CALL_EXPR_ARG (stmt, 0) = op; > line 490 in c-ubsan.c, on the above short testcase is called 2 times > (the T< not to be NULL), rather than 3 times. > When trying larger testcase like below I count ~ 120 calls (counted by hand, > so it is possible it was the right 119 times). > If this has not been linear, the testcase would be compiling for years. > Yes, there will be 119 SAVE_EXPRs, and when you -fdump-tree-original it, > it will be just insanely huge, but each SAVE_EXPR appears exactly twice > in its containing SAVE_EXPR and the second time cp_genericize_r sees > the SAVE_EXPR, it will do the > 1138/* Other than invisiref parms, don't walk the same tree twice. */ > 1139if (p_set->contains (stmt)) > 1140 { > 1141*walk_subtrees = 0; > 1142return NULL_TREE; > 1143 } > early exit. Clearly I misread things here. I blame the cold I've caught! In any case, I'm sorry for wasting your time and I'll just close the PR. Marek
[PR78365] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413
Hi, I was relying on ipa_get_callee_param_type to get type of parameter and then convert arguments to this type while computing jump functions. However, in cases like shown in PR78365, ipa_get_callee_param_type, instead of giving up, would return the wrong type. I think the current uses of ipa_get_callee_param_type are fine with this. Attached patch now uses callee's DECL_ARGUMENTS to get the type. If it cannot be found, then I would give up and set the jump function to varying. Bootstrapped and regression tested on x86_64-linux-gnu with no new regressions. Is this OK for trunk? Thanks, Kugan gcc/testsuite/ChangeLog: 2016-11-18 Kugan Vivekanandarajah PR IPA/78365 * gcc.dg/torture/pr78365.c: New test. gcc/ChangeLog: 2016-11-18 Kugan Vivekanandarajah PR IPA/78365 * ipa-cp.c (propagate_constants_accross_call): get param type from callees DECL_ARGUMENTS if available. * ipa-prop.c (ipa_compute_jump_functions_for_edge): Likewise. (ipcp_update_vr): Remove now redundant conversion of precision for VR. * ipa-prop.h: Make ipa_get_callee_param_type local again. diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index 2ec671f..924c846 100644 --- a/gcc/ipa-cp.c +++ b/gcc/ipa-cp.c @@ -2200,6 +2200,9 @@ propagate_constants_accross_call (struct cgraph_edge *cs) struct ipa_edge_args *args; bool ret = false; int i, args_count, parms_count; + struct cgraph_node *calee = cs->callee; + tree fndecl = calee ? calee->decl : NULL_TREE; + tree parm = fndecl ? DECL_ARGUMENTS (fndecl) : NULL_TREE; callee = cs->callee->function_symbol (&availability); if (!callee->definition) @@ -2247,7 +2250,6 @@ propagate_constants_accross_call (struct cgraph_edge *cs) { struct ipa_jump_func *jump_func = ipa_get_ith_jump_func (args, i); struct ipcp_param_lattices *dest_plats; - tree param_type = ipa_get_callee_param_type (cs, i); dest_plats = ipa_get_parm_lattices (callee_info, i); if (availability == AVAIL_INTERPOSABLE) @@ -2265,10 +2267,12 @@ propagate_constants_accross_call (struct cgraph_edge *cs) if (opt_for_fn (callee->decl, flag_ipa_vrp)) ret |= propagate_vr_accross_jump_function (cs, jump_func, dest_plats, - param_type); + parm ? + TREE_TYPE (parm) : NULL_TREE); else ret |= dest_plats->m_value_range.set_to_bottom (); } + parm = parm ? DECL_CHAIN (parm) : NULL_TREE; } for (; i < parms_count; i++) ret |= set_all_contains_variable (ipa_get_parm_lattices (callee_info, i)); diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c index 6321fdd..0f102c6c 100644 --- a/gcc/ipa-prop.c +++ b/gcc/ipa-prop.c @@ -1651,7 +1651,7 @@ determine_locally_known_aggregate_parts (gcall *call, tree arg, /* Return the Ith param type of callee associated with call graph edge E. */ -tree +static tree ipa_get_callee_param_type (struct cgraph_edge *e, int i) { int n; @@ -1695,6 +1695,9 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, gcall *call = cs->call_stmt; int n, arg_num = gimple_call_num_args (call); bool useful_context = false; + struct cgraph_node *calee = cs->callee; + tree fndecl = calee ? calee->decl : NULL_TREE; + tree parm = fndecl ? DECL_ARGUMENTS (fndecl) : NULL_TREE; if (arg_num == 0 || args->jump_functions) return; @@ -1751,8 +1754,8 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, { wide_int min, max; value_range_type type; - if (TREE_CODE (arg) == SSA_NAME - && param_type + if (parm + && TREE_CODE (arg) == SSA_NAME && (type = get_range_info (arg, &min, &max)) && (type == VR_RANGE || type == VR_ANTI_RANGE)) { @@ -1764,7 +1767,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, vr.equiv = NULL; extract_range_from_unary_expr (&jfunc->m_vr, NOP_EXPR, -param_type, +TREE_TYPE (parm), &vr, TREE_TYPE (arg)); if (jfunc->m_vr.type == VR_RANGE || jfunc->m_vr.type == VR_ANTI_RANGE) @@ -1775,6 +1778,7 @@ ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi, else gcc_assert (!jfunc->vr_known); } + parm = parm ? DECL_CHAIN (parm) : NULL_TREE; if (INTEGRAL_TYPE_P (TREE_TYPE (arg)) && (TREE_CODE (arg) == SSA_NAME || TREE_CODE (arg) == INTEGER_CST)) @@ -5699,8 +5703,6 @@ ipcp_update_vr (struct cgraph_node *node) if (vr[i].known &
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 03:52 PM, Jeff Law wrote: On 11/17/2016 03:03 PM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote: On 11/17/2016 11:24 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, for (f = formal; f; f = f->next) n++; - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); + /* Take care not to call alloca with a zero argument. */ + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); for (i = 0; i < n; i++) new_arg[i] = NULL; Ugh, that is just too ugly. I don't see anything wrong on alloca (0), and we don't rely on those pointers being distinct from other pointers. On systems where alloca was implemented on top of malloc, alloca (0) would cause collection of alloca'd objects that had gone out of scope. Ouch. Do we support any such systems as hosts? If yes, can't we just define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1) on those systems and leave the sane hosts as is? I would guess they're all dead as hosts for building GCC. I was most familiar with hpux, but I'm pretty sure there were others as emacs (IIRC) had a replacement alloca for systems without it as a builtin. They probably all fall into the "retro-computing" bucket these days. Essentially those systems worked by recording all the allocations as well as the frame depth at which they occurred. The next time alloca was called, anything at a deeper depth than the current frame was released. So even if we called alloca (0) unexpectedly, it's not going to cause anything to break. Failing to call alloca (0) could run the system out of heap memory. It's left as an exercise to the reader to ponder how that might happen -- it can and did happen building GCC "in the old days". The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. I confess I haven't heard of such an implementation before but after some searching I have managed to find a few examples of it online, such as in QNX or in the BlackBerry SDK, or in the GCC shipped by Texas Instruments. For instance: http://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_lib_ref/a/alloca.html Their GCC-derived compilers apparently enable it with -fno-builtins. Other references include source code derived from an implementation with Doug Gwyn's name on it, such as this one: https://opensource.apple.com/source/gnudiff/gnudiff-10/diffutils/alloca.c A comment at the top the file mentions the alloca(0) use case: As a special case, alloca(0) reclaims storage without allocating any. It is a good idea to use alloca(0) in your main control loop, etc. to force garbage collection. That said, I don't view this as reason to exclude the warning from -Wextra. The vendor-provided compilers are obviously customized versions of GCC with their own features, including their own set of options and warnings. It shouldn't stop us from enabling those we expect to be useful to the typical GCC user on typical targets, and especially those that can help expose sources of common bugs. Recognizing I'm preaching to choir here: Calling alloca with any argument is considered dangerous enough to be discouraged in most man pages. Alloca(0) is a potentially dangerous corner case of an already dangerous API. It seems at least as problematic as any of the warnings already in -Wextra, and I'd say as many in -Wall. Even on systems with this unusual alloca implementation, since alloca(0) is special (and expensive), warning on such calls would likely be useful. In fact, since calling alloca(0) in these environments is actually important, warning on them coukd help detect their unintended absence. The few such calls that are intended can be easily tweaked to suppress the warning. Martin
libgo patch committed: Rewrite Go type to libffi type conversion in Go
This patch to libgo rewrites the code that converts from a Go type to a libffi type in Go. As we move toward the Go 1.7 garbage collector, it's essential that all allocation of values that can contain Go pointers be done using the correct type descriptor. That is simplest if we do all such allocation in Go code. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 242509) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d9189ebc139ff739af956094626ccc5eb92c3091 +bc5ad6d10092d6238495357468ee093f7caf39f9 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/Makefile.am === --- libgo/Makefile.am (revision 242509) +++ libgo/Makefile.am (working copy) @@ -1052,6 +1052,14 @@ runtime_internal_atomic_lo_check_GOCFLAG runtime_internal_sys_lo_GOCFLAGS = -fgo-compiling-runtime runtime_internal_sys_lo_check_GOCFLAGS = -fgo-compiling-runtime +# If libffi is supported (the normal case) use the ffi build tag for +# the runtime package. +if USE_LIBFFI +matchargs_runtime = --tag=libffi +else +matchargs_runtime = +endif + # At least for now, we need -static-libgo for this test, because # otherwise we can't get the line numbers. # Also use -fno-inline to get better results from the memory profiler. Index: libgo/configure.ac === --- libgo/configure.ac (revision 242072) +++ libgo/configure.ac (working copy) @@ -121,6 +121,7 @@ if test "$with_libffi" != no; then fi AC_SUBST(LIBFFI) AC_SUBST(LIBFFIINCS) +AM_CONDITIONAL(USE_LIBFFI, test "$with_liffi" != "no") # See if the user wants to configure without libatomic. This is useful if we are # on an architecture for which libgo does not need an atomic support library and Index: libgo/go/reflect/makefunc.go === --- libgo/go/reflect/makefunc.go(revision 241341) +++ libgo/go/reflect/makefunc.go(working copy) @@ -63,7 +63,7 @@ func MakeFunc(typ Type, fn func(args []V method: -1, } - makeFuncFFI(ftyp, unsafe.Pointer(impl)) + makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(impl)) return Value{t, unsafe.Pointer(&impl), flag(Func) | flagIndir} } @@ -102,7 +102,7 @@ func makeMethodValue(op string, v Value) rcvr: rcvr, } - makeFuncFFI(ftyp, unsafe.Pointer(fv)) + makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(fv)) return Value{ft, unsafe.Pointer(&fv), v.flag&flagRO | flag(Func) | flagIndir} } @@ -128,7 +128,7 @@ func makeValueMethod(v Value) Value { rcvr: v, } - makeFuncFFI(ftyp, unsafe.Pointer(impl)) + makeFuncFFI(makeCIF(ftyp), unsafe.Pointer(impl)) return Value{t, unsafe.Pointer(&impl), v.flag&flagRO | flag(Func) | flagIndir} } Index: libgo/go/reflect/makefunc_ffi.go === --- libgo/go/reflect/makefunc_ffi.go(revision 241341) +++ libgo/go/reflect/makefunc_ffi.go(working copy) @@ -10,7 +10,10 @@ import ( // The makeFuncFFI function, written in C, fills in an FFI closure. // It arranges for ffiCall to be invoked directly from FFI. -func makeFuncFFI(ftyp *funcType, impl unsafe.Pointer) +func makeFuncFFI(cif unsafe.Pointer, impl unsafe.Pointer) + +// The makeCIF function, implemented in the runtime package, allocates a CIF. +func makeCIF(ft *funcType) unsafe.Pointer // FFICallbackGo implements the Go side of the libffi callback. // It is exported so that C code can call it. Index: libgo/go/reflect/makefunc_ffi_c.c === --- libgo/go/reflect/makefunc_ffi_c.c (revision 241341) +++ libgo/go/reflect/makefunc_ffi_c.c (working copy) @@ -8,7 +8,7 @@ #ifdef USE_LIBFFI -#include "go-ffi.h" +#include "ffi.h" #if FFI_GO_CLOSURES #define USE_LIBFFI_CLOSURES @@ -18,7 +18,7 @@ /* Declare C functions with the names used to call from Go. */ -void makeFuncFFI(const struct __go_func_type *ftyp, void *impl) +void makeFuncFFI(void *cif, void *impl) __asm__ (GOSYM_PREFIX "reflect.makeFuncFFI"); #ifdef USE_LIBFFI_CLOSURES @@ -70,20 +70,15 @@ ffi_callback (ffi_cif* cif __attribute__ /* Allocate an FFI closure and arrange to call ffi_callback. */ void -makeFuncFFI(const struct __go_func_type *ftyp, void *impl) +makeFuncFFI(void *cif, void *impl) { - ffi_cif *cif; - - cif = (ffi_cif *) __go_alloc (sizeof (ffi_cif)); - __go_func_to_cif (ftyp, 0, 0, cif); - - ffi_prep_go_closure(impl, cif, ffi_callback); + ffi_prep_go_closure(impl, (ffi_cif*)cif, ffi_callback); } #else /* !defined(USE_LIBFFI_CLOSURES) */ void -mak
Re: [PATCH] warn on overflow in calls to allocation functions (bugs 77531 and 78284)
Attached is an update to the patch that avoids duplicating the -Walloca-larger-than warnings. This version also avoids warning for calls with zero allocation size to functions declared with the returns_nonnull attribute (like libiberty's xmalloc). Since such functions cannot return null there's no portability issue to worry/warn about. When applied along with the patch to avoid calling alloca(0) in GCC posted earlier today (link below) this version bootstraps and passes all tests on x86_64. It also builds Binutils 2.27 and the Linux kernel with no new warnings. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01838.html Martin On 11/16/2016 10:19 AM, Martin Sebor wrote: Attached is an updated version of the patch that also adds attribute alloc_size to the standard allocation built-ins (aligned_alloc, alloca, malloc, calloc, and realloc) and handles alloca. Besides that, I've renamed the option to -Walloc-size-larger-than to make it less similar to -Walloca-larger-than. It think the new name works because the option works with the alloc_size attribute. Other suggestions are of course welcome. I've left the alloc_max_size function in place until I receive some feedback on it. I've regression-tested the patch on x86_64 with a few issues. The biggest is that the -Walloc-zero option enabled by -Wextra causes a number of errors during bootstrap due to invoking the XALLOCAVEC macro with a zero argument. The errors look valid to me (and I got past them by temporarily changing the XALLOCAVEC macro to always allocate at least one byte) but I haven't fixed the errors yet. I'll post a separate patch for those. The other open issue is that the new warning duplicates a small subset of the -Walloca-larger-than warnings. I expect removing the duplicates to be straightforward. I post this updated patch for review while I work on the remaining issues. Martin On 11/13/2016 08:19 PM, Martin Sebor wrote: Bug 77531 requests a new warning for calls to allocation functions (those declared with attribute alloc_size(X, Y)) that overflow the computation X * Z of the size of the allocated object. Bug 78284 suggests that detecting and diagnosing other common errors in calls to allocation functions, such as allocating more space than SIZE_MAX / 2 bytes, would help prevent subsequent buffer overflows. The attached patch adds two new warning options, -Walloc-zero and -Walloc-larger-than=bytes that implement these two enhancements. The patch is not 100% finished because, as it turns out, the GCC allocation built-ins (malloc et al.) do not make use of the attribute and so don't benefit from the warnings. The tests are also incomplete, and there's at least one bug in the implementation I know about. I'm posting the patch while stage 1 is still open and to give a heads up on it and to get early feedback. I expect completing it will be straightforward. Martin PS The alloc_max_size function added in the patch handles sizes specified using suffixes like KB, MB, etc. I added that to make it possible to specify sizes in excess of the maximum of INT_MAX that (AFAIK) options that take integer arguments handle out of the box. It only belatedly occurred to me that the suffixes are unnecessary if the option argument is handled using strtoull. I can remove the suffix (as I suspect it will raise objections) but I think that a general solution along these lines would be useful to let users specify large byte sizes in other options as well (such -Walloca-larger-than, -Wvla-larger-then). Are there any suggestions or preferences? PR c/77531 - __attribute__((alloc_size(1,2))) could also warn on multiplication overflow PR c/78284 - warn on malloc with very large arguments gcc/c-family/ChangeLog: PR c/77531 PR c/78284 * c.opt (-Walloc-zero, -Walloc-size-larger-than): New options. gcc/ChangeLog: PR c/77531 PR c/78284 * builtin-attrs.def (ATTR_ALLOC_SIZE): New identifier tree. (ATTR_MALLOC_SIZE_1_NOTHROW_LIST): New attribute list. (ATTR_MALLOC_SIZE_1_NOTHROW_LEAF_LIST): Same. (ATTR_MALLOC_SIZE_1_2_NOTHROW_LEAF_LIST): Same. (ATTR_ALLOC_SIZE_2_NOTHROW_LEAF_LIST): Same. * builtins.c (expand_builtin_alloca): Call maybe_warn_alloc_args_overflow. * builtins.def (akigned_alloc, alloca, calloc, malloc, realloc): Add attribute alloc_size. * calls.h (maybe_warn_alloc_args_overflow): Declare. * calls.c (alloc_max_size): New function. (maybe_warn_alloc_args_overflow): Define. (initialize_argument_information): Diagnose overflow in functions declared with attaribute alloc_size. * doc/invoke.texi (Warning Options): Document -Walloc-zero and -Walloc-size-larger-than. gcc/testsuite/ChangeLog: PR c/77531 PR c/78284 * gcc.dg/attr-alloc_size-3.c: New test. * gcc.dg/attr-alloc_size-4.c: New test. * gcc.dg/attr-alloc_size-5.c: New test. * gcc.dg/attr-alloc_size-6.c: New test. * gcc.dg/attr-alloc_size-7.c: New test. * gcc.dg/attr-alloc_size-8.c: New test. * gcc.dg/attr-alloc_size-9.c: New test. * gcc/testsuite
[PR middle-end/38219] Don't test vrp47.c on m68k
AFAICT the m68k was the last target needing twiddling to avoid unexpected failures on vrp47.c (from reading comments in the bz). The m68k (like others already mentioned) breaks the conditionals down rather than combining them into a logical with single compare/test. Fixed in the obvious way and installed on the trunk. Jeff commit 1d9211ddf7fcede312120eba696ef143d4f00015 Author: law Date: Thu Nov 17 23:54:46 2016 + PR middle-end/38219 * gcc.dg/tree-ssa/vrp47.c: Do not run on m68k. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242576 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 5dd6f56..c039eca 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,5 +1,8 @@ 2016-11-17 Jeff Law + PR middle-end/38219 + * gcc.dg/tree-ssa/vrp47.c: Do not run on m68k. + PR target/47192 * gcc.target/m68k/pr47192.c: New test. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c index beca5fa..28a8808 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/vrp47.c @@ -3,7 +3,7 @@ /* Skip on S/390. Lower values in BRANCH_COST lead to two conditional jumps when evaluating an && condition. VRP is not able to optimize this. */ -/* { dg-do compile { target { ! { logical_op_short_circuit || { s390*-*-* mn10300-*-* hppa*-*-* } } } } } */ +/* { dg-do compile { target { ! { logical_op_short_circuit || { s390*-*-* mn10300-*-* hppa*-*-* m68k*-*-* } } } } } */ /* { dg-options "-O2 -fdump-tree-vrp1 -fdump-tree-dom2 -fdump-tree-vrp2" } */ /* { dg-additional-options "-march=i586" { target { { i?86-*-* x86_64-*-* } && ia32 } } } */
[PR target/47192] Avoid unsafe code motion in m68k epilogue
Like so many ports, the m68k port forgot to emit a barrier prior to deallocation of the stack in its epilogue. As a result, a memory reference could schedule after the stack deallocation. That memory reference might refer to an object in the just allocated stack. If we get an interrupt between the deallocation and the memory reference, boom! This can be seen in the code for pr47192 when using the appropriate options. Fixed in the usual way. Verified the m68k target tests pass. Installing on the trunk. jeff commit 6dcf231a5ddf1f8dfcf1a6ec59fcf56520666271 Author: law Date: Thu Nov 17 23:39:08 2016 + PR target/47192 * config/m68k/m68k.c (m68k_expand_epilogue): Emit a scheduling barrier prior to deallocating the stack. PR target/47192 * gcc.target/m68k/pr47192.c: New test. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@242575 138bc75d-0d04-0410-961f-82ee72b054a4 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 24ed0f8..86a8fdf 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2016-11-17 Jeff Law + + PR target/47192 + * config/m68k/m68k.c (m68k_expand_epilogue): Emit a scheduling + barrier prior to deallocating the stack. + 2016-11-17 Andrew Burgess * config/arc/arc.md (cmem bit/sign-extend peephole2): New peephole diff --git a/gcc/config/m68k/m68k.c b/gcc/config/m68k/m68k.c index ce56692..346c2dc 100644 --- a/gcc/config/m68k/m68k.c +++ b/gcc/config/m68k/m68k.c @@ -1205,6 +1205,7 @@ m68k_expand_epilogue (bool sibcall_p) stack-based restore. */ emit_move_insn (gen_rtx_REG (Pmode, A1_REG), GEN_INT (-(current_frame.offset + fsize))); + emit_insn (gen_blockage ()); emit_insn (gen_addsi3 (stack_pointer_rtx, gen_rtx_REG (Pmode, A1_REG), frame_pointer_rtx)); @@ -1306,6 +1307,7 @@ m68k_expand_epilogue (bool sibcall_p) current_frame.fpu_mask, false, false); } + emit_insn (gen_blockage ()); if (frame_pointer_needed) emit_insn (gen_unlink (frame_pointer_rtx)); else if (fsize_with_regs) diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 356d75a..5dd6f56 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2016-11-17 Jeff Law + + PR target/47192 + * gcc.target/m68k/pr47192.c: New test. + 2016-11-17 Toma Tabacu * gcc.target/mips/branch-cost-1.c (dg-options): Use (HAS_MOVN) diff --git a/gcc/testsuite/gcc.target/m68k/pr47192.c b/gcc/testsuite/gcc.target/m68k/pr47192.c new file mode 100644 index 000..5bbef58 --- /dev/null +++ b/gcc/testsuite/gcc.target/m68k/pr47192.c @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=51qe -fdisable-ipa-pure-const -fdump-rtl-pro_and_epilogue" } */ +/* { dg-final { scan-rtl-dump-times "unspec_volatile" 1 "pro_and_epilogue"} } */ + + +char F(short *ty); + +short V(char cmd) +{ + static short st256; + static short stc; + short sc; + short scd; + short d; + + F(&sc); + + if (cmd == 4) + { +st256 = 0; +d = 0; + } + else + { +scd = sc - stc; +if (scd < -128) +{ + scd += 256; +} +d = st256 >> 8; +st256 -= d << 8; + } + stc = sc; + return d; +} +
RE: [PATCH,testsuite] MIPS: Downgrade from R6 to R5 to prevent redundant testing of branch-cost-1.c.
> gcc/testsuite/ChangeLog: > > 2016-11-15 Toma Tabacu > > * gcc.target/mips/branch-cost-1.c (dg-options): Use > (HAS_MOVN) instead > of isa>=4, in order to downgrade to R5. > > diff --git a/gcc/testsuite/gcc.target/mips/branch-cost-1.c > b/gcc/testsuite/gcc.target/mips/branch-cost-1.c > index 61c3029..7f7ebbe 100644 > --- a/gcc/testsuite/gcc.target/mips/branch-cost-1.c > +++ b/gcc/testsuite/gcc.target/mips/branch-cost-1.c > @@ -1,4 +1,4 @@ > -/* { dg-options "-mbranch-cost=1 isa>=4" } */ > +/* { dg-options "-mbranch-cost=1 (HAS_MOVN)" } */ > /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */ > NOMIPS16 int > foo (int x, int y, int z, int k) I committed this patch for you. Have you requested write access to the repository? Catherine
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 03:03 PM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote: On 11/17/2016 11:24 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, for (f = formal; f; f = f->next) n++; - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); + /* Take care not to call alloca with a zero argument. */ + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); for (i = 0; i < n; i++) new_arg[i] = NULL; Ugh, that is just too ugly. I don't see anything wrong on alloca (0), and we don't rely on those pointers being distinct from other pointers. On systems where alloca was implemented on top of malloc, alloca (0) would cause collection of alloca'd objects that had gone out of scope. Ouch. Do we support any such systems as hosts? If yes, can't we just define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1) on those systems and leave the sane hosts as is? I would guess they're all dead as hosts for building GCC. I was most familiar with hpux, but I'm pretty sure there were others as emacs (IIRC) had a replacement alloca for systems without it as a builtin. They probably all fall into the "retro-computing" bucket these days. Essentially those systems worked by recording all the allocations as well as the frame depth at which they occurred. The next time alloca was called, anything at a deeper depth than the current frame was released. So even if we called alloca (0) unexpectedly, it's not going to cause anything to break. Failing to call alloca (0) could run the system out of heap memory. It's left as an exercise to the reader to ponder how that might happen -- it can and did happen building GCC "in the old days". The point is warning on an alloca (0) may not be as clear cut as it might seem. It's probably a reasonable thing to do on the host, but on a target, which might be embedded and explicitly overriding the builtin alloca, warning on alloca (0) is less of a slam dunk. jeff
Re: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads
* Claudiu Zissulescu [2016-11-17 13:23:44 +]: > > > > Note on tests: It will be nice to add a test where the added > > > peephole kicks in. If you consider to add this test to the current > > > patch, please resubmit it. > > > > There were cmem-bit-{1,2,3,4}.c added in that patch. All of which > > fail for me without the peephole, and work with the peephole. > > > > The code generated for L/E ARC is slightly different than the code > > generated for B/E ARC due to how the structures are laid out in > > memory, so, for now I've made parts of the test B/E only. > > > > In order to get code that is as efficient for L/E as B/E I'd end up > > adding a whole new peeophole, I'd rather not do that - it would be > > extra code to maintain for a combination CMEM+L/E that is not used. I > > figure we can come back to that if/when that combination ever becomes > > interesting. I'm hoping you'll be fine with that. > > > > Sure, please go ahead and apply your patch after having the spaces converted > ;) > Claudiu Thanks, committed as r242572. Andrew
C++ PATCH for c++/78193 (inherited ctor regression on sparc32)
The issue here was that we were still introducing extra temporaries of an empty class in a CALL_FROM_THUNK_P call to the inherited constructor. Tested x86_64-pc-linux-gnu, applying to trunk. commit 312678eec4a6dac8fe05e2ecfde60a717543ae76 Author: Jason Merrill Date: Thu Nov 17 17:18:26 2016 -0500 PR c++/78193 - inherited ctor regressions on sparc32. * call.c (build_over_call): Don't set CALL_FROM_THUNK_P here. (build_call_a): Set it here, and don't insert EMPTY_CLASS_EXPR. (convert_like_real) [ck_rvalue]: Also pass non-addressable types along directly. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index d25e2e7..97003e5 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -375,10 +375,18 @@ build_call_a (tree function, int n, tree *argarray) TREE_HAS_CONSTRUCTOR (function) = (decl && DECL_CONSTRUCTOR_P (decl)); + if (current_function_decl && decl + && flag_new_inheriting_ctors + && DECL_INHERITED_CTOR (current_function_decl) + && (DECL_INHERITED_CTOR (current_function_decl) + == DECL_CLONED_FUNCTION (decl))) +/* Pass arguments directly to the inherited constructor. */ +CALL_FROM_THUNK_P (function) = true; + /* Don't pass empty class objects by value. This is useful for tags in STL, which are used to control overload resolution. We don't need to handle other cases of copying empty classes. */ - if (! decl || ! DECL_BUILT_IN (decl)) + else if (! decl || ! DECL_BUILT_IN (decl)) for (i = 0; i < n; i++) { tree arg = CALL_EXPR_ARG (function, i); @@ -6844,8 +6852,7 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, constructor. */ if (current_function_decl && flag_new_inheriting_ctors - && DECL_INHERITED_CTOR (current_function_decl) - && TREE_ADDRESSABLE (totype)) + && DECL_INHERITED_CTOR (current_function_decl)) return expr; /* Fall through. */ @@ -8094,13 +8101,6 @@ build_over_call (struct z_candidate *cand, int flags, tsubst_flags_t complain) /* build_new_op_1 will clear this when appropriate. */ CALL_EXPR_ORDERED_ARGS (c) = true; } - if (current_function_decl - && flag_new_inheriting_ctors - && DECL_INHERITED_CTOR (current_function_decl) - && cand->num_convs) -/* Don't introduce copies when passing arguments along to the inherited - constructor. */ -CALL_FROM_THUNK_P (call) = true; return call; }
Re: [PATCH] Enable Intel AVX512_4FMAPS and AVX512_4VNNIW instructions
On Thu, Nov 17, 2016 at 4:20 AM, Andrew Senkevich wrote: > 16 Ноя 2016 г. 19:21 пользователь "Bernd Schmidt" > написал: > > >> >> On 11/15/2016 05:31 PM, Andrew Senkevich wrote: >>> >>> 2016-11-15 17:56 GMT+03:00 Jeff Law : On 11/15/2016 05:55 AM, Andrew Senkevich wrote: > > > 2016-11-11 14:16 GMT+03:00 Uros Bizjak : >> >> >> --- a/gcc/genmodes.c >> +++ b/gcc/genmodes.c >> --- a/gcc/init-regs.c >> +++ b/gcc/init-regs.c >> --- a/gcc/machmode.h >> +++ b/gcc/machmode.h >> >> These are middle-end changes, you will need a separate review for >> these. > > > > Who could review these changes? I can. I likely dropped the message because it looked x86 specific, so if you could resend it'd be appreciated. >>> >>> >>> Attached (diff with previous only in fixed comments typos). >> >> >> Next time please split middle-end changes out from target-related stuff >> and send them separately. >> >> These ones are OK. >> >> >> Bernd > > Hi HJ, could you please commit it? Done. -- H.J.
Re: [PATCH] Follow-up patch on enabling new AVX512 instructions
On Thu, Nov 17, 2016 at 4:26 AM, Andrew Senkevich wrote: > 16 Ноя 2016 г. 23:45 пользователь "Uros Bizjak" написал: >> >> On Tue, Nov 15, 2016 at 10:50 PM, Andrew Senkevich >> wrote: >> > Hi, >> > >> > this is follow-up with tests for new __target__ attributes and >> > __builtin_cpu_supports update. >> > >> > gcc/ >> > * config/i386/i386.c (processor_features): Add >> > F_AVX5124VNNIW, F_AVX5124FMAPS. >> > (isa_names_table): Handle new features. >> > libgcc/ >> > * config/i386/cpuinfo.c (processor_features): Add >> > FEATURE_AVX5124VNNIW, FEATURE_AVX5124FMAPS. >> > gcc/testsuite/ >> > * gcc.target/i386/builtin_target.c: Handle new "avx5124vnniw", >> > "avx5124fmaps". >> > * gcc.target/i386/funcspec-56.inc: Test new attributes. >> >> OK. >> >> Thanks, >> Uros. > > HJ, could you please commit this patch also? Done. -- H.J.
Re: [PATCH] avoid calling alloca(0)
On Thu, Nov 17, 2016 at 01:56:03PM -0700, Jeff Law wrote: > On 11/17/2016 11:24 AM, Jakub Jelinek wrote: > >On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: > >>--- a/gcc/fortran/interface.c > >>+++ b/gcc/fortran/interface.c > >>@@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, > >>gfc_formal_arglist *formal, > >> for (f = formal; f; f = f->next) > >> n++; > >> > >>- new_arg = XALLOCAVEC (gfc_actual_arglist *, n); > >>+ /* Take care not to call alloca with a zero argument. */ > >>+ new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); > >> > >> for (i = 0; i < n; i++) > >> new_arg[i] = NULL; > > > >Ugh, that is just too ugly. I don't see anything wrong on alloca (0), > >and we don't rely on those pointers being distinct from other pointers. > On systems where alloca was implemented on top of malloc, alloca (0) would > cause collection of alloca'd objects that had gone out of scope. Ouch. Do we support any such systems as hosts? If yes, can't we just define XALLOCAVEC etc. to alloca (len + 1) or alloca (len ? len : 1) on those systems and leave the sane hosts as is? Jakub
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Matthew Fortune > Sent: Thursday, November 17, 2016 8:47 AM > To: Toma Tabacu ; Andrew Bennett > ; Moore, Catherine > ; 'gcc-patches@gcc.gnu.org' patc...@gcc.gnu.org> > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires > standard library support check the sysroot supports the required test > options. > > Toma Tabacu writes: > > Hi, > > > > The patch below is a rebased version of Andrew's patch plus a few > more changes > > to test options. > > > > I have tested Andrew's patch by passing -msoft-float to the testsuite > without > > having a soft-float library available, and saw that the inline-memcpy- > {1,2,3,4,5}.c > > and memcpy-1.c tests were also failing to find standard library > headers. > > In the version below, I have added (REQUIRES_STDLIB) to them as > well. > > > > However, I believe that the memcpy-1.c test was removed from the > first version > > of Andrew's patch in response to Matthew's comments, but I don't > think it > > should be. > > > > Tested with mips-img-linux-gnu and mips-mti-linux-gnu. > > This looks OK to me but I then again I helped with the design for this. > > I'd like to give Catherine a chance to take a look though as the feature > is unusual. > > One comment below. > > > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp > > b/gcc/testsuite/gcc.target/mips/mips.exp > > index e22d782..ccd4ecb 100644 > > --- a/gcc/testsuite/gcc.target/mips/mips.exp > > +++ b/gcc/testsuite/gcc.target/mips/mips.exp > > @@ -1420,6 +1426,22 @@ proc mips-dg-options { args } { > > } > > } > > > > +# If the test is marked as requiring standard libraries check > > +# that the sysroot has support for the current set of test options. > > +if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } { > > + mips_push_test_options saved_options $extra_tool_flags > > + set output [mips_preprocess "" { > > + #include > > + } 1] > > + mips_pop_test_options saved_options > > + > > + # If the preprocessing of the stdlib.h file produced errors mark > > + # the test as unsupported. > > + if { ![string equal $output ""] } { > > + set do_what [lreplace $do_what 1 1 "N"] > > We should describe what we expect the format of do_what to be at > the time > of writing in case it ever changes. i.e. a comment to say what the > second > character means and therefore make it clear that setting it to 'n' makes > the test unsupported. > This patch looks okay to me after updating the comment as Matthew suggested.
Re: [PATCH, rs6000] Add built-in support for vector compare listed in the ABI
Hi Carl, On Thu, Nov 17, 2016 at 01:19:57PM -0800, Carl E. Love wrote: > 2016-11-15 Carl Love Two spaces between your name and email address. > * gcc.target/powerpc/builtins-3.c : New file to test the new No space before colon. > built-ins for vecotr compare equal and vector compare not equal. Typo ("vecotr"). Changelog is indented with tabs btw, not spaces. >{ ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW, > RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, > RS6000_BTI_unsigned_V4SI, 0 }, >{ ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, > +RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, > + > + { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, > RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, >{ ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, > RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, > RS6000_BTI_unsigned_V2DI, 0 }, Why the blank line? Otherwise looks fine. Thanks, Segher
C++ PATCH for c++/78124 (list-init and inherited ctor)
An inherited ctor makes the class non-aggregate. We weren't catching this before because we add the inheriting ctors after we set CLASSTYPE_NON_AGGREGATE, but it's easy enough to set it earlier, here. Tested x86_64-pc-linux-gnu, applying to trunk. commit d462f9acb6cdb05dcc49bee32f0741633de33aed Author: Jason Merrill Date: Thu Nov 17 16:23:09 2016 -0500 PR c++/78124 - list-initialization and inherited ctor * name-lookup.c (do_class_using_decl): Set CLASSTYPE_NON_AGGREGATE. diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index cb1b9fa..e5f9113 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5707,7 +5707,6 @@ extern bool type_has_user_nondefault_constructor (tree); extern tree in_class_defaulted_default_constructor (tree); extern bool user_provided_p(tree); extern bool type_has_user_provided_constructor (tree); -extern bool type_has_user_provided_or_explicit_constructor (tree); extern bool type_has_non_user_provided_default_constructor (tree); extern bool vbase_has_user_provided_move_assign (tree); extern tree default_init_uninitialized_part (tree); diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c index 7ad65b8..4f80e8d 100644 --- a/gcc/cp/name-lookup.c +++ b/gcc/cp/name-lookup.c @@ -3382,6 +3382,7 @@ do_class_using_decl (tree scope, tree name) { maybe_warn_cpp0x (CPP0X_INHERITING_CTORS); name = ctor_identifier; + CLASSTYPE_NON_AGGREGATE (current_class_type) = true; } if (constructor_name_p (name, current_class_type)) { diff --git a/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C b/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C new file mode 100644 index 000..1bb05a2 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/inh-ctor23.C @@ -0,0 +1,16 @@ +// PR c++/78124 +// { dg-do compile { target c++11 } } + +struct base { +explicit constexpr base(int&&) {} +}; + +struct derived: base { +using base::base; +}; + +int main() +{ +derived d { 0 }; +} +
C++ PATCH for c++/68377 (parens in fold-expression)
finish_parenthesized_expr uses TREE_NO_WARNING to mark parenthesized expressions, so I think it's reasonable to use that to recognize them here. Tested x86_64-pc-linux-gnu, applying to trunk. commit 043b2830c4e296951a8209ade25b8d01394c7386 Author: Jason Merrill Date: Thu Nov 17 15:25:00 2016 -0500 PR c++/68377 - parenthesized expr in fold-expression * parser.c (cp_parser_fold_expression): Check TREE_NO_WARNING. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 9a5039f..3ab0b68 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -4679,7 +4679,9 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1) /* The operands of a fold-expression are cast-expressions, so binary or conditional expressions are not allowed. We check this here to avoid tentative parsing. */ - if (is_binary_op (TREE_CODE (expr1))) + if (EXPR_P (expr1) && TREE_NO_WARNING (expr1)) +/* OK, the expression was parenthesized. */; + else if (is_binary_op (TREE_CODE (expr1))) error_at (location_of (expr1), "binary expression in operand of fold-expression"); else if (TREE_CODE (expr1) == COND_EXPR) diff --git a/gcc/testsuite/g++.dg/cpp1z/fold8.C b/gcc/testsuite/g++.dg/cpp1z/fold8.C new file mode 100644 index 000..e27db7a --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/fold8.C @@ -0,0 +1,15 @@ +// PR c++/68377 +// { dg-options -std=c++1z } + +struct Sink { } s; +template Sink& operator<<(Sink&, const T&); + +template +int f(Tx... xs) { + return ((xs+1) + ...); +} + +int main() { + s << f(3,4,5) << "\n"; + return 0; +}
C++ PATCH for c++/78369 (ICE with {} default argument)
The C++17 copy elision code needed to handle CONSTRUCTORs. Tested x86_64-pc-linux-gnu, applying to trunk. commit 8b82502696188d92fc560cc55e0a564c507e6144 Author: Jason Merrill Date: Thu Nov 17 15:37:56 2016 -0500 PR c++/78369 - {} as default argument * call.c (build_special_member_call): Handle CONSTRUCTOR. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index f6f4590..d25e2e7 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8317,7 +8317,8 @@ build_special_member_call (tree instance, tree name, vec **args, if (!reference_related_p (class_type, TREE_TYPE (arg))) arg = perform_implicit_conversion_flags (class_type, arg, tf_warning, flags); - if (TREE_CODE (arg) == TARGET_EXPR + if ((TREE_CODE (arg) == TARGET_EXPR + || TREE_CODE (arg) == CONSTRUCTOR) && (same_type_ignoring_top_level_qualifiers_p (class_type, TREE_TYPE (arg { diff --git a/gcc/testsuite/g++.dg/overload/defarg11.C b/gcc/testsuite/g++.dg/overload/defarg11.C new file mode 100644 index 000..26fac6e --- /dev/null +++ b/gcc/testsuite/g++.dg/overload/defarg11.C @@ -0,0 +1,11 @@ +// PR c++/78369 +// { dg-do compile { target c++11 } } + +struct A { }; +inline void f(struct A a = {}) {} + +int main() +{ + f(); + return 0; +}
[v3 PATCH] LWG 2766, LWG 2749
The first patch does everything but the container adaptor parts (which are wrong in the p/r) and the tuple part (which is icky). The second part does the tuple parts, and needs some serious RFC. I know it's ugly as sin, but it works. I don't think we should try to teach our tuple to give the right answer for is_constructible etc., because the last attempt at it broke boost and would break reasonable existing code in addition to just boost - such things are not Stage 3 material, imnsho. We, imho, have two options: either skip the tuple parts of LWG 2766 or tolerate the ugliness of that second patch. I can tolerate that ugliness, it's not quite so hideous as I feared. 2016-11-18 Ville Voutilainen Implement LWG 2766, Swapping non-swappable types and LWG 2749, swappable traits for variants. * include/bits/stl_pair.h (swap(pair<_T1, _T2>&, pair<_T1, _T2>&)): Add a deleted overload. * include/bits/unique_ptr.h (swap(unique_ptr<_Tp, _Dp>&, unique_ptr<_Tp, _Dp>&)): Likewise. * include/std/array (swap(array<_Tp, _Nm>&, array<_Tp, _Nm>&)): Likewise. * include/std/optional (swap(optional<_Tp>&, optional<_Tp>&)): Likewise. * include/std/variant (swap(variant<_Types...>&, variant<_Types...>&)): Likewise. * testsuite/20_util/optional/swap/2.cc: Add tests for disabled swaps. * testsuite/20_util/pair/swap_cxx17.cc: New. * testsuite/20_util/unique_ptr/specialized_algorithms/swap_cxx17.cc: Likewise. * testsuite/20_util/variant/compile.cc: Add tests for disabled swaps. * testsuite/23_containers/array/specialized_algorithms/swap_cxx17.cc: New. * testsuite/23_containers/array/tuple_interface/get_neg.cc: Adjust. * testsuite/23_containers/array/tuple_interface/tuple_element_neg.cc: Likewise. 2016-11-18 Ville Voutilainen Implement LWG 2766 for tuple. * include/bits/move.h (swap(_Tp&, _Tp&)): Constrain with __is_tuple_like. * include/std/tuple (__is_tuple_like_impl, __is_tuple_like): Move to type_traits. (swap(tuple<_Elements...>&, tuple<_Elements...>&)): Add a deleted overload. * include/std/type_traits (__is_tuple_like_impl, __is_tuple_like): New. (swap(_Tp&, _Tp&)): Constrain with __is_tuple_like. * include/std/utility (__is_tuple_like_impl): Move to type_traits. * testsuite/20_util/tuple/swap_cxx17.cc: New. diff --git a/libstdc++-v3/include/bits/stl_pair.h b/libstdc++-v3/include/bits/stl_pair.h index ef52538..44568dd 100644 --- a/libstdc++-v3/include/bits/stl_pair.h +++ b/libstdc++-v3/include/bits/stl_pair.h @@ -478,6 +478,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION swap(pair<_T1, _T2>& __x, pair<_T1, _T2>& __y) noexcept(noexcept(__x.swap(__y))) { __x.swap(__y); } + +#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11 + template +inline +typename enable_if<__not_<__and_<__is_swappable<_T1>, +__is_swappable<_T2>>>::value>::type +swap(pair<_T1, _T2>&, pair<_T1, _T2>&) = delete; +#endif #endif // __cplusplus >= 201103L /** diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h index f9ec60f..c6bf4b6 100644 --- a/libstdc++-v3/include/bits/unique_ptr.h +++ b/libstdc++-v3/include/bits/unique_ptr.h @@ -649,6 +649,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION swap(unique_ptr<_Tp, _Dp>& __x, unique_ptr<_Tp, _Dp>& __y) noexcept { __x.swap(__y); } +#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11 + template +inline +typename enable_if<__not_<__is_swappable<_Dp>>::value>::type +swap(unique_ptr<_Tp, _Dp>&, +unique_ptr<_Tp, _Dp>&) = delete; +#endif template diff --git a/libstdc++-v3/include/std/array b/libstdc++-v3/include/std/array index 3ab0355..c43e281 100644 --- a/libstdc++-v3/include/std/array +++ b/libstdc++-v3/include/std/array @@ -287,6 +287,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER swap(array<_Tp, _Nm>& __one, array<_Tp, _Nm>& __two) noexcept(noexcept(__one.swap(__two))) { __one.swap(__two); } +#if __cplusplus > 201402L || !defined(__STRICT_ANSI__) // c++1z or gnu++11 + template +inline +typename enable_if<__not_< + typename _GLIBCXX_STD_C::__array_traits<_Tp, _Nm>::_Is_swappable> + ::value>::type +swap(array<_Tp, _Nm>&, array<_Tp, _Nm>&) = delete; +#endif template constexpr _Tp& diff --git a/libstdc++-v3/include/std/optional b/libstdc++-v3/include/std/optional index ea673cc..191d64b 100644 --- a/libstdc++-v3/include/std/optional +++ b/libstdc++-v3/include/std/optional @@ -930,6 +930,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { __lhs.swap(__rhs); } template +inline enable_if_t && is_swappable_v<_Tp>)> +swap(optional<_Tp>&, optional<_Tp>&) = delete; + + template constexpr optional> make_optional(_Tp&& __t) { return optional> { std::forward<_Tp>(__t) }; } diff --git a/libstdc++-v3/include/std/varia
Re: [PATCH] avoid calling alloca(0)
On 11/17/2016 11:24 AM, Jakub Jelinek wrote: On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, for (f = formal; f; f = f->next) n++; - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); + /* Take care not to call alloca with a zero argument. */ + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); for (i = 0; i < n; i++) new_arg[i] = NULL; Ugh, that is just too ugly. I don't see anything wrong on alloca (0), and we don't rely on those pointers being distinct from other pointers. On systems where alloca was implemented on top of malloc, alloca (0) would cause collection of alloca'd objects that had gone out of scope. There was a time when you'd find alloca (0); calls sprinkled through GCC on purpose. Jeff
Re: [PATCH fix PR71767 2/4 : Darwin configury] Arrange for ld64 to be detected as Darwin's linker
On 11/08/2016 05:49 PM, Iain Sandoe wrote: On 8 Nov 2016, at 13:39, Mike Stump wrote: On Nov 8, 2016, at 1:05 PM, Iain Sandoe wrote: Simple for the simple case is already part of my patch, but capability for the expert and non-simple case is also present, I'm trying to ask a specific question, what does the patch allow that can't be done otherwise? Kinda a trivial question, and I don't see any answer. I'm not looking for, it allows it to work, or it makes the expert case work. I'm look for the specific question, and the specific information you want, and why ld -v doesn't get it. ld -v gets it when you can execute ld. It doesn’t get it when the $host ld is not executable on $build. Exactly! Providing the option to give the version allows that without requiring the complexity of other (possibly valid) solutions. If you know that you’re building (my patched) ld64-253.9 for powerpc-darwin9 (crossed from x86-64-darwin14) it’s easy, just put —with-ld64=253.9 .. I think we’ve debated this enough - I’m OK with keeping my extra facility locally and will resubmit the patch with it removed in due course, Iain Your call. But ISTM the ability to specify the linker version or even better, its behaviour is a notable improvement for these crosses. jeff
[PATCH, rs6000] Add built-in support for vector compare listed in the ABI
GCC maintainers: The Power ABI document lists a number of built-ins that it is supposed to support. There are still some missing. This patch adds the built-in support for the following built-ins: vector bool char vec_cmpeq vector bool char vector bool char vector bool int vec_cmpeq vector bool intvector bool int vector bool long long vec_cmpeq vector bool long long vector bool long long vector bool short vec_cmpeq vector bool short vector bool short vector bool char vec_cmpne vector bool char vector bool char vector bool int vec_cmpne vector bool intvector bool int vector bool long long vec_cmpne vector bool long long vector bool long long vector bool short vec_cmpne vector bool short vector bool short Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Carl Love gcc/ChangeLog: 2016-11-15 Carl Love * config/rs6000/rs6000-c.c : Add built-in support for vector compare equal and vector compare not equal. The vector compares take two arguments of type vector bool char, vector bool short, vector bool int, vector bool long long with the same return type. gcc/testsuite/ChangeLog: 2016-11-15 Carl Love * gcc.target/powerpc/builtins-3.c : New file to test the new built-ins for vecotr compare equal and vector compare not equal. --- gcc/config/rs6000/rs6000-c.c | 19 +++- gcc/testsuite/gcc.target/powerpc/builtins-3.c | 68 +++ 2 files changed, 85 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3.c diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c index 4bba293..6566279 100644 --- a/gcc/config/rs6000/rs6000-c.c +++ b/gcc/config/rs6000/rs6000-c.c @@ -1107,15 +1107,24 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUB, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUB, +RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH, +RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH, RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUH, RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW, +RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, ALTIVEC_BUILTIN_VCMPEQUW, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, +RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, + + { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, RS6000_BTI_bool_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPEQ, P8V_BUILTIN_VCMPEQUD, RS6000_BTI_bool_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, @@ -4486,9 +4495,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB, +RS6000_BTI_bool_V16QI, RS6000_BTI_bool_V16QI, +RS6000_BTI_bool_V16QI, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, - { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEH, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, RS6000_BTI_bool_V8HI, 0 }, @@ -4508,7 +4519,11 @@ const struct altivec_builtin_types altivec_overloaded_builtins[] = { { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEW, RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, 0 }, - + { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEB, +RS6000_BTI_bool_V4SI, RS6000_BTI_bool_V4SI, +RS6000_BTI_bool_V4SI, 0 }, + { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNED, +RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNEF, RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, { ALTIVEC_BUILTIN_VEC_CMPNE, P9V_BUILTIN_CMPNED, diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3.c b/gcc/testsuite/gcc.target/powerpc/builtins-3.c new file mode 100644 index 000..8d4b63d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/builtins-3.c @@ -0,0 +1,68 @@ +
Re: [PATCH] libgcc/config/bfin: use the generic linker version information (PR74748)
On 11/06/2016 09:39 AM, Waldemar Brodkorb wrote: Hi, This commit makes the Blackfin platform use the generic linker version information, rather than a completely duplicated file, specific for the Blackfin architecture. This is made possible using the newly introduced skip_underscore variable of the mkmap-symver script. This also allows to get a correct linker version file, with symbol names matching the ones found in libgcc. Thanks to this, the necessary symbols are marked "GLOBAL" instead of "LOCAL", which makes them visible at link time, and solves a large number of "undefined reference" issues. Indeed, the Blackfin specific linker version script had an extra underscore in front of all symbols, which meant none of them matched the symbols in libgcc, and therefore all libgcc symbols were marked as "LOCAL", making them invisible for linking. Signed-off-by: Thomas Petazzoni Tested-by: Waldemar Brodkorb 2016-11-06 Thomas Petazzoni PR gcc/74748 * libgcc/config/bfin/libgcc-glibc.ver, libgcc/config/bfin/t-linux: use generic linker version information on Blackfin. This is fine once the prereqs are approved. jeff
Re: [PATCH] libgcc/mkmap-symver: support skip_underscore (PR74748)
On 11/06/2016 09:32 AM, Waldemar Brodkorb wrote: Hi, Some platforms, such as Blackfin, have a special prefix for assembly symbols as opposed to C symbols. For this reason, a function named "foo()" in C will in fact be visible as a symbol called "_foo" in the ELF binary. The current linker version script logic in libgcc doesn't take into account this situation properly. The Blackfin specific libgcc/config/bfin/libgcc-glibc.ver has an additional "_" in front of every symbol so that it matches the output of "nm" (which gets parsed to produce the final linker version script). But due to this additional "_", ld no longer matches with the symbols since "ld" does the matching with the original symbol name, not the one prefixed with "_". Due to this, none of the symbols in libgcc/config/bfin/libgcc-glibc.ver are actually matched with symbols in libgcc. This causes all libgcc symbols to be left as "LOCAL", which causes lots of "undefined reference" whenever some C or C++ code that calls a function of libgcc is compiled. To address this, this commit introduces a "skip_underscore" variable to the mkmap-symver script. It tells mkmap-symver to ignore the leading underscore from the "nm" output. Note that this new argument is different from the existing "leading_underscore" argument, which *adds* an additional underscore to the generated linker version script. Having this functionality paves the way to using the generic linker version information for Blackfin, instead of using a custom one. Signed-off-by: Thomas Petazzoni Tested-by: Waldemar Brodkorb 2016-11-06 Thomas Petazzoni PR gcc/74748 * libgcc/mkmap-symver.awk: add support for skip_underscore AFAICT this skips the first character regardless of whether or not it is an underscore when skip_underscore is in effect, right. Is that intentional? Jeff
Re: [PATCH, wwwdocs] document changes appropriate for bug-fix releases
On 11/05/2016 04:04 PM, Martin Sebor wrote: The patch below documents the rule of thumb for what changes are appropriate for bug-fix release branches that we discussed in the gcc-patches thread Re: relax rule for flexible array members in 6.x (78039 - fails to compile glibc tests) https://gcc.gnu.org/ml/gcc-patches/2016-10/msg01818.html Please let me know if it accurately reflects the convention in place of if you have suggestions for changes. This is fine. Please install if you haven't already jeff
[PATCH v3] bb-reorder: Improve compgotos pass (PR71785)
For code like the testcase in PR71785 GCC factors all the indirect branches to a single dispatcher that then everything jumps to. This is because having many indirect branches with each many jump targets does not scale in large parts of the compiler. Very late in the pass pipeline (right before peephole2) the indirect branches are then unfactored again, by the duplicate_computed_gotos pass. This pass works by replacing branches to such a common dispatcher by a copy of the dispatcher. For code like this testcase this does not work so well: most cases do a single addition instruction right before the dispatcher, but not all, and we end up with only two indirect jumps: the one without the addition, and the one with the addition in its own basic block, and now everything else jumps _there_. This patch rewrites the algorithm to deal with this. It also makes it simpler: it does not need the "candidates" array anymore, it does not need RTL layout mode, it does not need cleanup_cfg, and it does not need to keep track of what blocks it already visited. Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version of the testcase that has 2000 cases instead of 4. Is this okay for trunk? Segher 2016-11-17 Segher Boessenkool PR rtl-optimization/71785 * bb-reorder.c (maybe_duplicate_computed_goto): New function. (duplicate_computed_gotos): New function. (pass_duplicate_computed_gotos::execute): Rewrite. --- gcc/bb-reorder.c | 214 --- 1 file changed, 93 insertions(+), 121 deletions(-) diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c index 85bc569..90de2de 100644 --- a/gcc/bb-reorder.c +++ b/gcc/bb-reorder.c @@ -2587,11 +2587,100 @@ make_pass_reorder_blocks (gcc::context *ctxt) return new pass_reorder_blocks (ctxt); } +/* Duplicate a block (that we already know ends in a computed jump) into its + predecessors, where possible. Return whether anything is changed. */ +static bool +maybe_duplicate_computed_goto (basic_block bb, int max_size) +{ + if (single_pred_p (bb)) +return false; + + /* Make sure that the block is small enough. */ + rtx_insn *insn; + FOR_BB_INSNS (bb, insn) +if (INSN_P (insn)) + { + max_size -= get_attr_min_length (insn); + if (max_size < 0) + return false; + } + + bool changed = false; + edge e; + edge_iterator ei; + for (ei = ei_start (bb->preds); (e = ei_safe_edge (ei)); ) +{ + basic_block pred = e->src; + + /* Do not duplicate BB into PRED if that is the last predecessor, or if +we cannot merge a copy of BB with PRED. */ + if (single_pred_p (bb) + || !single_succ_p (pred) + || e->flags & EDGE_COMPLEX + || pred->index < NUM_FIXED_BLOCKS + || (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred + { + ei_next (&ei); + continue; + } + + if (dump_file) + fprintf (dump_file, "Duplicating computed goto bb %d into bb %d\n", +bb->index, e->src->index); + + /* Remember if PRED can be duplicated; if so, the copy of BB merged +with PRED can be duplicated as well. */ + bool can_dup_more = can_duplicate_block_p (pred); + + /* Make a copy of BB, merge it into PRED. */ + basic_block copy = duplicate_block (bb, e, NULL); + emit_barrier_after_bb (copy); + reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred)); + merge_blocks (pred, copy); + + changed = true; + + /* Try to merge the resulting merged PRED into further predecessors. */ + if (can_dup_more) + maybe_duplicate_computed_goto (pred, max_size); +} + + return changed; +} + /* Duplicate the blocks containing computed gotos. This basically unfactors computed gotos that were factored early on in the compilation process to - speed up edge based data flow. We used to not unfactoring them again, - which can seriously pessimize code with many computed jumps in the source - code, such as interpreters. See e.g. PR15242. */ + speed up edge based data flow. We used to not unfactor them again, which + can seriously pessimize code with many computed jumps in the source code, + such as interpreters. See e.g. PR15242. */ +static void +duplicate_computed_gotos (function *fun) +{ + /* We are estimating the length of uncond jump insn only once + since the code for getting the insn length always returns + the minimal length now. */ + if (uncond_jump_length == 0) +uncond_jump_length = get_uncond_jump_length (); + + /* Never copy a block larger than this. */ + int max_size += uncond_jump_length * PARAM_VALUE (PARAM_MAX_GOTO_DUPLICATION_INSNS); + + bool changed = false; + + /* Try to duplicate all blocks that end in a computed jump and that + can be duplicated at all. */ + basic_block bb; + FOR_EACH_BB_FN (bb, fun) +if (computed_jump_p (BB_END (bb))
Re: [PATCH v2] bb-reorder: Improve compgotos pass (PR71785)
On Wed, Nov 16, 2016 at 11:52:23AM -0700, Jeff Law wrote: > >2016-10-30 Segher Boessenkool > > > > PR rtl-optimization/71785 > > * bb-reorder.c (duplicate_computed_gotos_find_candidates): New > > function, factored out from pass_duplicate_computed_gotos::execute. > > (duplicate_computed_gotos_do_duplicate): Ditto. Don't use > > BB_VISITED. > > (pass_duplicate_computed_gotos::execute): Rewrite. Rerun the pass as > > long as it makes changes. > OK. I'm just going to note for the record here that while we iterate > until nothing changes, the statement and block clamps should in practice > ensure we hit a point where nothing changes. It is hard limited as well, the loop is not infinite. Anyway, I said I'd revisit this in stage 3; posting the v3 patch now. Segher
Re: [PATCH, GCC/ARM 2/2, ping3] Allow combination of aprofile and rmprofile multilibs
Ping? Best regards, Thomas On 08/11/16 13:36, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 02/11/16 10:05, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 24/10/16 09:07, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 13/10/16 16:35, Thomas Preudhomme wrote: Hi ARM maintainers, This patchset aims at adding multilib support for R and M profile ARM architectures and allowing it to be built alongside multilib for A profile ARM architectures. This specific patch is concerned with the latter. The patch works by moving the bits shared by both aprofile and rmprofile multilib build (variable initilization as well as ISA and float ABI to build multilib for) to a new t-multilib file. Then, based on which profile was requested in --with-multilib-list option, that files includes t-aprofile and/or t-rmprofile where the architecture and FPU to build the multilib for are specified. Unfortunately the duplication of CPU to A profile architectures could not be avoided because substitution due to MULTILIB_MATCHES are not transitive. Therefore, mapping armv7-a to armv7 for rmprofile multilib build does not have the expected effect. Two patches were written to allow this using 2 different approaches but I decided against it because this is not the right solution IMO. See caveats below for what I believe is the correct approach. *** combined build caveats *** As the documentation in this patch warns, there is a few caveats to using a combined multilib build due to the way the multilib framework works. 1) For instance, when using only rmprofile the combination of options -mthumb -march=armv7 -mfpu=neon the thumb/-march=armv7 multilib but in a combined multilib build the default multilib would be used. This is because in the rmprofile build -mfpu=neon is not specified in MULTILIB_OPTION and thus the option is ignored when considering MULTILIB_REQUIRED entries. 2) Another issue is the fact that aprofile and rmprofile multilib build have some conflicting requirements in terms of how to map options for which no multilib is built to another option. (i) A first example of this is the difference of CPU to architecture mapping mentionned above: rmprofile multilib build needs A profile CPUs and architectures to be mapped down to ARMv7 so that one of the v7-ar multilib gets chosen in such a case but aprofile needs A profile architectures to stand on their own because multilibs are built for several architectures. (ii) Another example of this is that in aprofile multilib build no multilib is built with -mfpu=fpv5-d16 but some multilibs are built with -mfpu=fpv4-d16. Therefore, aprofile defines a match rule to map fpv5-d16 onto fpv4-d16. However, rmprofile multilib profile *does* build some multilibs with -mfpu=fpv5-d16. This has the consequence that when building for -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard the default multilib is chosen because this is rewritten into -mthumb -march=armv7e-m -mfpu=fpv5-d16 -mfloat-abi=hard and there is no multilib for that. Both of these issues could be handled by using MULTILIB_REUSE instead of MULTILIB_MATCHES but this would require a large set of rules. I believe instead the right approach is to create a new mechanism to inform GCC on how options can be down mapped _when no multilib can be found_ which would require a smaller set of rules and would make it explicit that the options are not equivalent. A patch will be posted to this effect at a later time. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow combinations of aprofile and rmprofile values for --with-multilib-list. * config/arm/t-multilib: New file. * config/arm/t-aprofile: Remove initialization of MULTILIB_* variables. Remove setting of ISA and floating-point ABI in MULTILIB_OPTIONS and MULTILIB_DIRNAMES. Set architecture and FPU in MULTI_ARCH_OPTS_A and MULTI_ARCH_DIRS_A rather than MULTILIB_OPTIONS and MULTILIB_DIRNAMES respectively. Add comment to introduce all matches. Add architecture matches for marvel-pj4 and generic-armv7-a CPU options. * config/arm/t-rmprofile: Likewise except for the matches changes. * doc/install.texi (--with-multilib-list): Document the combination of aprofile and rmprofile values and warn about pitfalls in doing that. Testing: * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the patchset for both aprofile and rmprofile * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same for aprofile,rmprofile and rmprofile,aprofile * default spec (gcc -dumpspecs) is the same for aprofile,rmprofile and rmprofile,aprofile * Difference in --print-multi-directory between aprofile or rmprofile and aprofile,rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI is as expected (best multilib for the combination is chosen), modulo the caveat
Re: [PATCH, GCC/ARM 1/2, ping3] Add multilib support for embedded bare-metal targets
Ping? Best regards, Thomas On 08/11/16 13:36, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 02/11/16 10:05, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 27/10/16 15:26, Thomas Preudhomme wrote: Hi Kyrill, On 27/10/16 10:45, Kyrill Tkachov wrote: Hi Thomas, On 24/10/16 09:06, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 13/10/16 16:35, Thomas Preudhomme wrote: Hi ARM maintainers, This patchset aims at adding multilib support for R and M profile ARM architectures and allowing it to be built alongside multilib for A profile ARM architectures. This specific patch adds the t-rmprofile multilib Makefile fragment for the former objective. Multilib are built for all M profile architecture involved: ARMv6S-M, ARMv7-M and ARMv7E-M as well as ARMv7. ARMv7 multilib is used for R profile architectures but also A profile architectures. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow new rmprofile value for configure option --with-multilib-list. * config/arm/t-rmprofile: New file. * doc/install.texi (--with-multilib-list): Document new rmprofile value for ARM. Testing: == aprofile == * "tree install/lib/gcc/arm-none-eabi/7.0.0" is the same before and after the patchset for both aprofile and rmprofile * default spec (gcc -dumpspecs) is the same before and after the patchset for aprofile * No difference in --print-multi-directory between before and after the patchset for aprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI == rmprofile == * aprofile and rmprofile use similar directory structure (ISA/arch/FPU/float ABI) and directory naming * Difference in --print-multi-directory between before [1] and after the patchset for rmprofile for all combination of ISA (ARM/Thumb), architecture, CPU, FPU and float ABI modulo the name and directory structure changes [1] as per patch applied in ARM embedded branches https://gcc.gnu.org/viewcvs/gcc/branches/ARM/embedded-5-branch/gcc/config/arm/t-baremetal?view=markup == aprofile + rmprofile == * aprofile,rmprofile and rmprofile,aprofile builds give an error saying it is not supported Is this ok for master branch? Best regards, Thomas +# Arch Matches +MULTILIB_MATCHES += march?armv6s-m=march?armv6-m +MULTILIB_MATCHES += march?armv8-m.main=march?armv8-m.main+dsp +MULTILIB_MATCHES += march?armv7=march?armv7-r +ifeq (,$(HAS_APROFILE)) +MULTILIB_MATCHES += march?armv7=march?armv7-a +MULTILIB_MATCHES += march?armv7=march?armv7ve +MULTILIB_MATCHES += march?armv7=march?armv8-a +MULTILIB_MATCHES += march?armv7=march?armv8-a+crc +MULTILIB_MATCHES += march?armv7=march?armv8.1-a +MULTILIB_MATCHES += march?armv7=march?armv8.1-a+crc +endif I think you want to update the patch to handle -march=armv8.2-a and armv8.2-a+fp16 Thanks, Kyrill Indeed. Please find updated ChangeLog and patch (attached): *** gcc/ChangeLog *** 2016-10-03 Thomas Preud'homme * config.gcc: Allow new rmprofile value for configure option --with-multilib-list. * config/arm/t-rmprofile: New file. * doc/install.texi (--with-multilib-list): Document new rmprofile value for ARM. Ok for trunk? Best regards, Thomas diff --git a/gcc/config.gcc b/gcc/config.gcc index d956da22ad60abfe9c6b4be0882f9e7dd64ac39f..15b662ad5449f8b91eb760b7fbe45f33d8cecb4b 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3739,6 +3739,16 @@ case "${target}" in # pragmatic. tmake_profile_file="arm/t-aprofile" ;; + rmprofile) +# Note that arm/t-rmprofile is a +# stand-alone make file fragment to be +# used only with itself. We do not +# specifically use the +# TM_MULTILIB_OPTION framework because +# this shorthand is more +# pragmatic. +tmake_profile_file="arm/t-rmprofile" +;; default) ;; *) @@ -3748,9 +3758,10 @@ case "${target}" in esac if test "x${tmake_profile_file}" != x ; then -# arm/t-aprofile is only designed to work -# without any with-cpu, with-arch, with-mode, -# with-fpu or with-float options. +# arm/t-aprofile and arm/t-rmprofile are only +# designed to work without any with-cpu, +# with-arch, with-mode, with-fpu or with-float +# options. if test "x$with_arch" != x \ || test "x$with_cpu" != x \ || test "x$with_float" != x \ diff --git a/gcc/config/arm/t-rmprofile b/gcc/config/arm/t-rmprofile new file mode 100644 index ..c8b5c9cbd03694eea69855e20372afa3e97d6b4c --- /dev/null +++ b/gcc/config/arm/t-rmprofile @@ -0,0 +1,174 @@ +# Copyright (C) 2016 Free Software Foundation, Inc. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either
Re: [PATCH, ARM/testsuite 6/7, ping3] Force soft float in ARMv6-M and ARMv8-M Baseline options
Ping? Best regards, Thomas On 08/11/16 13:35, Thomas Preudhomme wrote: Ping, Best regards, Thomas On 02/11/16 10:04, Thomas Preudhomme wrote: Ping? Best regards, Thomas On 28/10/16 10:49, Thomas Preudhomme wrote: On 22/09/16 16:47, Richard Earnshaw (lists) wrote: On 22/09/16 15:51, Thomas Preudhomme wrote: Sorry, noticed an error in the patch. It was not caught during testing because GCC was built with --with-mode=thumb. Correct patch attached. Best regards, Thomas On 22/09/16 14:49, Thomas Preudhomme wrote: Hi, ARMv6-M and ARMv8-M Baseline only support soft float ABI. Therefore, the arm_arch_v8m_base add option should pass -mfloat-abi=soft, much like -mthumb is passed for architectures that only support Thumb instruction set. This patch adds -mfloat-abi=soft to both arm_arch_v6m and arm_arch_v8m_base add options. Patch is in attachment. ChangeLog entry is as follows: *** gcc/testsuite/ChangeLog *** 2016-07-15 Thomas Preud'homme * lib/target-supports.exp (add_options_for_arm_arch_v6m): Add -mfloat-abi=soft option. (add_options_for_arm_arch_v8m_base): Likewise. Is this ok for trunk? Best regards, Thomas 6_softfloat_testing_v6m_v8m_baseline.patch diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0dabea0850124947a7fe333e0b94c4077434f278..b5d72f1283be6a6e4736a1d20936e169c1384398 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3540,24 +3540,25 @@ proc check_effective_target_arm_fp16_hw { } { # Usage: /* { dg-require-effective-target arm_arch_v5_ok } */ #/* { dg-add-options arm_arch_v5 } */ # /* { dg-require-effective-target arm_arch_v5_multilib } */ -foreach { armfunc armflag armdef } { v4 "-march=armv4 -marm" __ARM_ARCH_4__ - v4t "-march=armv4t" __ARM_ARCH_4T__ - v5 "-march=armv5 -marm" __ARM_ARCH_5__ - v5t "-march=armv5t" __ARM_ARCH_5T__ - v5te "-march=armv5te" __ARM_ARCH_5TE__ - v6 "-march=armv6" __ARM_ARCH_6__ - v6k "-march=armv6k" __ARM_ARCH_6K__ - v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ - v6z "-march=armv6z" __ARM_ARCH_6Z__ - v6m "-march=armv6-m -mthumb" __ARM_ARCH_6M__ - v7a "-march=armv7-a" __ARM_ARCH_7A__ - v7r "-march=armv7-r" __ARM_ARCH_7R__ - v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ - v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ - v8a "-march=armv8-a" __ARM_ARCH_8A__ - v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ - v8m_base "-march=armv8-m.base -mthumb" __ARM_ARCH_8M_BASE__ - v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { +foreach { armfunc armflag armdef } { +v4 "-march=armv4 -marm" __ARM_ARCH_4__ +v4t "-march=armv4t" __ARM_ARCH_4T__ +v5 "-march=armv5 -marm" __ARM_ARCH_5__ +v5t "-march=armv5t" __ARM_ARCH_5T__ +v5te "-march=armv5te" __ARM_ARCH_5TE__ +v6 "-march=armv6" __ARM_ARCH_6__ +v6k "-march=armv6k" __ARM_ARCH_6K__ +v6t2 "-march=armv6t2" __ARM_ARCH_6T2__ +v6z "-march=armv6z" __ARM_ARCH_6Z__ +v6m "-march=armv6-m -mthumb -mfloat-abi=soft" __ARM_ARCH_6M__ +v7a "-march=armv7-a" __ARM_ARCH_7A__ +v7r "-march=armv7-r" __ARM_ARCH_7R__ +v7m "-march=armv7-m -mthumb" __ARM_ARCH_7M__ +v7em "-march=armv7e-m -mthumb" __ARM_ARCH_7EM__ +v8a "-march=armv8-a" __ARM_ARCH_8A__ +v8_1a "-march=armv8.1a" __ARM_ARCH_8A__ +v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft" __ARM_ARCH_8M_BASE__ +v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__ } { eval [string map [list FUNC $armfunc FLAG $armflag DEF $armdef ] { proc check_effective_target_arm_arch_FUNC_ok { } { if { [ string match "*-marm*" "FLAG" ] && I think if you're going to do this you need to also check that changing the ABI in this way isn't incompatible with other aspects of how the user has invoked dejagnu. The reason this patch was made is that without it dg-require-effective-target arm_arch_v8m_base_ok evaluates to true for an arm-none-linux-gnueabihf toolchain but then any testcase containing a function for such a target (such as the atomic-op-* in gcc.target/arm) will error out because ARMv8-M Baseline does not support hard float ABI. I see 2 ways to fix this: 1) the approach taken in this patch, ie saying that to select ARMv8-M baseline architecture you need the right -march, -mthumb but also the right float ABI. Note that the comment at the top of that procedure says: # Creates a series of routines that return 1 if the given architecture # can be selected and a routine to give the flags to select that architecture 2) Add a function to the assembly that is used to test support for the architecture. The reason I favor t
Re: [PATCH, GCC/ARM] Make arm_feature_set agree with type of FL_* macros
On 17/11/16 09:10, Kyrill Tkachov wrote: On 16/11/16 18:09, Thomas Preudhomme wrote: Hi, I've rebased the patch to make arm_feature_set agree with type of FL_* macros on top of trunk rather than on top of the optional -mthumb patch. That involved doing the changes to gcc/config/arm/arm-protos.h rather than gcc/config/arm/arm-flags.h. I also took advantage of the fact that each line is changed to change the indentation to tabs and add dots in comments missing one. For reference, please find below the original patch description: Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array of 2 unsigned long. However, the flags stored in these two entries are (signed) int, being combinations of bits set via expression of the form 1 << bitno. This creates 3 issues: 1) undefined behavior when setting the msb (1 << 31) 2) undefined behavior when storing a flag with msb set (negative int) into one of the unsigned array entries (positive int) 3) waste of space since the top 32 bits of each entry is not used This patch changes the definition of FL_* macro to be unsigned int by using the form 1U << bitno instead and changes the definition of arm_feature_set to be an array of 2 unsigned (int) entries. *** gcc/ChangeLog *** 2016-10-15 Thomas Preud'homme * config/arm/arm-protos.h (FL_NONE, FL_ANY, FL_CO_PROC, FL_ARCH3M, FL_MODE26, FL_MODE32, FL_ARCH4, FL_ARCH5, FL_THUMB, FL_LDSCHED, FL_STRONG, FL_ARCH5E, FL_XSCALE, FL_ARCH6, FL_VFPV2, FL_WBUF, FL_ARCH6K, FL_THUMB2, FL_NOTM, FL_THUMB_DIV, FL_VFPV3, FL_NEON, FL_ARCH7EM, FL_ARCH7, FL_ARM_DIV, FL_ARCH8, FL_CRC32, FL_SMALLMUL, FL_NO_VOLATILE_CE, FL_IWMMXT, FL_IWMMXT2, FL_ARCH6KZ, FL2_ARCH8_1, FL2_ARCH8_2, FL2_FP16INST): Reindent comment, add final dot when missing and make value unsigned. (arm_feature_set): Use unsigned entries instead of unsigned long. Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state. Is this ok for trunk? Best regards, Thomas On 14/11/16 18:56, Thomas Preudhomme wrote: My apologize, I realized when trying to apply the patch that I wrote it on top of the optional -mthumb patch instead of the reverse. I'll rebase it to not screw up bisect. Best regards, Thomas On 14/11/16 14:47, Kyrill Tkachov wrote: On 14/11/16 14:07, Thomas Preudhomme wrote: Hi, Currently arm_feature_set is defined in gcc/config/arm/arm-flags as an array of 2 unsigned long. However, the flags stored in these two entries are (signed) int, being combinations of bits set via expression of the form 1 << bitno. This creates 3 issues: 1) undefined behavior when setting the msb (1 << 31) 2) undefined behavior when storing a flag with msb set (negative int) into one of the unsigned array entries (positive int) 3) waste of space since the top 32 bits of each entry is not used This patch changes the definition of FL_* macro to be unsigned int by using the form 1U << bitno instead and changes the definition of arm_feature_set to be an array of 2 unsigned (int) entries. Bootstrapped on arm-linux-gnueabihf targeting Thumb-2 state. Is this ok for trunk? Ok. Thanks, Kyrill Best regards, Thomas +#define FL_ARCH7 (1U << 22)/* Architecture 7. */ +#define FL_ARM_DIV(1U << 23)/* Hardware divide (ARM mode). */ +#define FL_ARCH8 (1U << 24)/* Architecture 8. */ +#define FL_CRC32 (1U << 25)/* ARMv8 CRC32 instructions. */ +#define FL_SMALLMUL (1U << 26)/* Small multiply supported. */ +#define FL_NO_VOLATILE_CE (1U << 27)/* No volatile memory in IT block. */ + +#define FL_IWMMXT (1U << 29)/* XScale v2 or "Intel Wireless MMX + technology". */ +#define FL_IWMMXT2(1U << 30)/* "Intel Wireless MMX2 +technology". */ +#define FL_ARCH6KZ(1U << 31)/* ARMv6KZ architecture. */ + +#define FL2_ARCH8_1 (1U << 0)/* Architecture 8.1. */ +#define FL2_ARCH8_2 (1U << 1)/* Architecture 8.2. */ +#define FL2_FP16INST (1U << 2)/* FP16 Instructions for ARMv8.2 and + later. */ Since you're here can you please replace the "Architecture " by "ARMv(7,8,8.1,8.2)-A" in these entries. That seems to make it less accurate though. For a start, most of them would also apply to the R profile. There is also a couple of them that would apply to the M profile: for instance FL_ARCH7 would be set for ARMv7-M and FL_ARCH8 would be set for ARMv8-M. Best regards, Thomas
Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")
OK. On Thu, Nov 17, 2016 at 3:25 PM, Paolo Carlini wrote: > Hi, > > On 17/11/2016 18:14, Jason Merrill wrote: >> >> How about changing cp_parser_non_integral_constant_expression to issue >> a pedwarn instead of error for NIC_FLOAT? > > Indeed. And a relevant observation is that we don't pass NIC_FLOAT from > anywhere else in the parser, please correct me if I'm wrong. The below is > finishing testing, all good so far. > > Thanks, > Paolo. > >
Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")
Hi, On 17/11/2016 18:14, Jason Merrill wrote: How about changing cp_parser_non_integral_constant_expression to issue a pedwarn instead of error for NIC_FLOAT? Indeed. And a relevant observation is that we don't pass NIC_FLOAT from anywhere else in the parser, please correct me if I'm wrong. The below is finishing testing, all good so far. Thanks, Paolo. Index: cp/parser.c === --- cp/parser.c (revision 242540) +++ cp/parser.c (working copy) @@ -3028,8 +3028,9 @@ cp_parser_non_integral_constant_expression (cp_par switch (thing) { case NIC_FLOAT: - error ("floating-point literal " - "cannot appear in a constant-expression"); + pedwarn (input_location, OPT_Wpedantic, +"ISO C++ forbids using a floating-point literal " +"in a constant-expression"); return true; case NIC_CAST: error ("a cast to a type other than an integral or " Index: testsuite/g++.dg/parse/pr55080.C === --- testsuite/g++.dg/parse/pr55080.C(revision 0) +++ testsuite/g++.dg/parse/pr55080.C(working copy) @@ -0,0 +1,6 @@ +// PR c++/55080 +// { dg-options "-std=c++98 -pedantic" } + +class B { + static const int c = 3.1415926; // { dg-warning "constant-expression" } +};
Re: [PATCH, GCC/ARM] Fix PR77933: stack corruption on ARM when using high registers and lr
Hi Kyrill, I've committed the following updated patch where the test is restricted to Thumb execution mode and skipping it if not possible since -mtpcs-leaf-frame is only available in Thumb mode. I've considered the change obvious. *** gcc/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr being live in the function and lr needing to be saved. Distinguish between already saved pushable registers and registers to push. Check for LR being an available pushable register. *** gcc/testsuite/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * gcc.target/arm/pr77933-1.c: New test. * gcc.target/arm/pr77933-2.c: Likewise. Best regards, Thomas On 17/11/16 10:04, Kyrill Tkachov wrote: On 09/11/16 16:41, Thomas Preudhomme wrote: I've reworked the patch following comments from Wilco [1] (sorry could not find it in my MUA for some reason). [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00317.html == Context == When saving registers, function thumb1_expand_prologue () aims at minimizing the number of push instructions. One of the optimization it does is to push LR alongside high register(s) (after having moved them to low register(s)) when there is no low register to save. The way this is implemented is to add LR to the pushable_regs mask if it is live just before pushing the registers in that mask. The mask of live pushable registers which is used to detect whether LR needs to be saved is then clear to ensure LR is only saved once. == Problem == However beyond deciding what register to push pushable_regs is used to track what pushable register can be used to move a high register before being pushed, hence the name. That mask is cleared when all high registers have been assigned a low register but the clearing assumes the high registers were assigned to the registers with the biggest number in that mask. This is not the case because LR is not considered when looking for a register in that mask. Furthermore, LR might have been saved in the TARGET_BACKTRACE path above yet the mask of live pushable registers is not cleared in that case. == Solution == This patch changes the loop to iterate over register LR to r0 so as to both fix the stack corruption reported in PR77933 and reuse lr to push some high register when possible. This patch also introduce a new variable lr_needs_saving to record whether LR (still) needs to be saved at a given point in code and sets the variable accordingly throughout the code, thus fixing the second issue. Finally, this patch create a new push_mask variable to distinguish between the mask of registers to push and the mask of live pushable registers. == Note == Other bits could have been improved but have been left out to allow the patch to be backported to stable branch: (1) using argument registers that are not holding an argument (2) using push_mask consistently instead of l_mask (in TARGET_BACKTRACE), mask (low register push) and push_mask (3) the !l_mask case improved in TARGET_BACKTRACE since offset == 0 (4) rename l_mask to a more appropriate name (live_pushable_regs_mask?) ChangeLog entry are as follow: *** gcc/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * config/arm/arm.c (thumb1_expand_prologue): Distinguish between lr being live in the function and lr needing to be saved. Distinguish between already saved pushable registers and registers to push. Check for LR being an available pushable register. *** gcc/testsuite/ChangeLog *** 2016-11-08 Thomas Preud'homme PR target/77933 * gcc.target/arm/pr77933-1.c: New test. * gcc.target/arm/pr77933-2.c: Likewise. Testing: no regression on arm-none-eabi GCC cross-compiler targeting Cortex-M0 Is this ok for trunk? Ok. Thanks, Kyrill Best regards, Thomas On 02/11/16 17:08, Thomas Preudhomme wrote: Hi, When saving registers, function thumb1_expand_prologue () aims at minimizing the number of push instructions. One of the optimization it does is to push lr alongside high register(s) (after having moved them to low register(s)) when there is no low register to save. The way this is implemented is to add lr to the list of registers that can be pushed just before the push happens. This would then push lr and allows it to be used for further push if there was not enough registers to push all high registers to be pushed. However, the logic that decides what register to move high registers to before being pushed only looks at low registers (see for loop initialization). This means not only that lr is not used for pushing high registers but also that lr is not removed from the list of registers to be pushed when it's not used. This extra lr push is not poped in epilogue leading in stack corruption. This patch changes the loop to iterate over register r0 to lr so as
Re: [PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode
Hi Christophe, On 17/11/16 13:36, Christophe Lyon wrote: On 16 November 2016 at 10:39, Kyrill Tkachov wrote: On 09/11/16 16:19, Thomas Preudhomme wrote: Hi, This patch fixes the following ICE when building when compiling an empty FIQ interrupt handler in ARM mode: empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints: } ^ (insn/f 13 12 14 (set (reg/f:SI 13 sp) (plus:SI (reg/f:SI 11 fp) (const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3} (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp) (plus:SI (reg/f:SI 11 fp) (const_int 4 [0x4]))) (nil))) The ICE was provoked by missing an alternative to reflect that ARM mode can do an add of general register into sp which is unpredictable in Thumb mode add immediate. ChangeLog entries are as follow: *** gcc/ChangeLog *** 2016-11-04 Thomas Preud'homme * config/arm/arm.md (arm_addsi3): Add alternative for addition of general register with general register or ARM constant into SP register. *** gcc/testsuite/ChangeLog *** 2016-11-04 Thomas Preud'homme * gcc.target/arm/empty_fiq_handler.c: New test. Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression. Is this ok for trunk? I see that "r" does not include the stack pointer (STACK_REG is separate from GENERAL_REGs) so we are indeed missing that constraint. Ok for trunk. Thanks, Kyrill Best regards, Thomas Hi Thomas, The new test fails when compiled with -mthumb -march=armv5t: gcc.target/arm/empty_fiq_handler.c: In function 'fiq_handler': gcc.target/arm/empty_fiq_handler.c:11:1: error: interrupt Service Routines cannot be coded in Thumb mode Right, interrupt handler can only be compiled in the mode where the CPU boots. So for non Thumb-only targets it should be compiled with -marm. I'll push a patch tomorrow. Best regards, Thomas
Re: [Patch 3/5] OpenACC tile clause support, C/C++ front-end parts
On 11/17/2016 04:34 AM, Chung-Lin Tang wrote: + /* These clauses really need a positive integral constant expression, +but we don't have a predicate for that (yet). */ What do you mean? Don't you check this in finish_omp_clauses after the tsubst? Jason
Re: [PATCH] PR target/78101, Fix PowerPC power9-fusion support
Hi! On Thu, Nov 17, 2016 at 01:55:25PM -0500, Michael Meissner wrote: > Are these patches ok to change into the trunk? Since the bug shows up in GCC > 6.x, can I apply and submit these patches to the GCC 6.x branch? This is okay for trunk. Please watch that for regressions for a week or so before backporting (unless it is urgent for 6?) One question... > +case DFmode: > + if ((!TARGET_POWERPC64 && !TARGET_DF_FPR) || !TARGET_P9_FUSION) > + return 0; > + break; > case DFmode: > - if (!TARGET_DF_FPR) > + if (!TARGET_POWERPC64 && !TARGET_DF_FPR) > return 0; >break; These conditions aren't obvious, could you add a comment please? Segher
[committed] strncmp builtin expansion improvement
Committed to trunk as 242556 after removing the use->clobber change from cmpstrnsi and bootstrap/regtest. gcc/ChangeLog 2016-11-17 Aaron Sawdey * config/i386/i386.md (cmpstrnsi): New test to bail out if neither string input is a string constant. 2016-11-17 Aaron Sawdey * builtins.c (expand_builtin_strncmp): Attempt expansion of strncmp via cmpstrnsi even if neither string is constant. -- Aaron Sawdey, Ph.D. acsaw...@linux.vnet.ibm.com 050-2/C113 (507) 253-7520 home: 507/263-0782 IBM Linux Technology Center - PPC Toolchain
[PATCH] PR target/78101, Fix PowerPC power9-fusion support
This patch fixes the problem reported in PR 78101 where the power9-fusion support generates an insn that isn't matched: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101 It also fixes the bug that Andrew Stubbs reported. https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01367.html There were two bugs in the code: 1) The fusion peephole for SFmode and DFmode was inconsistant about whether those values were allowed in GPRs if software floating point is used. 2) The power9 fusion store insn had an early clobber in the match_scratch, which prevented having an address that uses a register, does ADDIS to add to the upper 16 bits, and then folds the lower ADDI into the store operation would not work if the address used the scratch register. In addition to the two bugs, the fusion code had been written much earlier than the support for the new ISA 3.0 scalar d-form (register+offset) instructions, and the fusion code did not match these types of stores. I have fixed this, so that those memory references can also be fused. I have bootstraped the compiler with these changes and there were no regressions on the following systems: 1) Little endian power8 2) Big endian power8 (no support for 32-bit libraries) 3) Big endian power7 (support for 32-bit libraries) I have also built and ran spec 2006 CPU tests with this option enabled, and they run fine, with some minor performance changes on power8 using power9 fusion. I have built the cam4_r and cam4_s benchmarks of the next generation Spec (kit 102) and they now compile fine with -mpower9-fusion (they were the source of the bug 78101). Are these patches ok to change into the trunk? Since the bug shows up in GCC 6.x, can I apply and submit these patches to the GCC 6.x branch? [gcc] 2016-11-17 Michael Meissner PR target/78101 * config/rs6000/predicates.md (fusion_addis_mem_combo_load): Add the appropriate checks for SFmode/DFmode load/stores in GPR registers. (fusion_addis_mem_combo_store): Likewise. * config/rs6000/rs6000.c (rs6000_init_hard_regno_mode_ok): Rename fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar d-form instructions for traditional Altivec registers. (emit_fusion_p9_load): Likewise. (emit_fusion_p9_store): Likewise. * config/rs6000/rs6000.md (p9 fusion store peephole2): Remove early clobber from scratch register. Do not match if the register being stored is the scratch register. (fusion_vsx___load): Rename fusion_fpr_* to fusion_vsx_* and add in support for ISA 3.0 scalar d-form instructions for traditional Altivec registers. (fusion_fpr___load): Likewise. (fusion_vsx___store): Likewise. (fusion_fpr___store): Likewise. [gcc/testsuite] 2016-11-17 Michael Meissner PR target/78101 * gcc.target/powerpc/fusion4.c: New test. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 242456) +++ gcc/config/rs6000/predicates.md (.../gcc/config/rs6000) (working copy) @@ -1844,7 +1844,7 @@ (define_predicate "fusion_gpr_mem_load" ;; Match a GPR load (lbz, lhz, lwz, ld) that uses a combined address in the ;; memory field with both the addis and the memory offset. Sign extension ;; is not handled here, since lha and lwa are not fused. -;; With extended fusion, also match a FPR load (lfd, lfs) and float_extend +;; With P9 fusion, also match a fpr/vector load and float_extend (define_predicate "fusion_addis_mem_combo_load" (match_code "mem,zero_extend,float_extend") { @@ -1873,11 +1873,15 @@ (define_predicate "fusion_addis_mem_comb break; case SFmode: -case DFmode: if (!TARGET_P9_FUSION) return 0; break; +case DFmode: + if ((!TARGET_POWERPC64 && !TARGET_DF_FPR) || !TARGET_P9_FUSION) + return 0; + break; + default: return 0; } @@ -1920,6 +1924,7 @@ (define_predicate "fusion_addis_mem_comb case QImode: case HImode: case SImode: +case SFmode: break; case DImode: @@ -1927,13 +1932,8 @@ (define_predicate "fusion_addis_mem_comb return 0; break; -case SFmode: - if (!TARGET_SF_FPR) - return 0; - break; - case DFmode: - if (!TARGET_DF_FPR) + if (!TARGET_POWERPC64 && !TARGET_DF_FPR) return 0; break; Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revis
Re: [PATCH] avoid calling alloca(0)
On Thu, Nov 17, 2016 at 11:14:26AM -0700, Martin Sebor wrote: > --- a/gcc/fortran/interface.c > +++ b/gcc/fortran/interface.c > @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, > gfc_formal_arglist *formal, >for (f = formal; f; f = f->next) > n++; > > - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); > + /* Take care not to call alloca with a zero argument. */ > + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); > >for (i = 0; i < n; i++) > new_arg[i] = NULL; Ugh, that is just too ugly. I don't see anything wrong on alloca (0), and we don't rely on those pointers being distinct from other pointers. Jakub
[PATCH] avoid calling alloca(0)
Testing my patch to warn on overflow in calls to allocation functions (bugs 77531 and 78284): https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01672.html exposed a number of instances of alloca(0) calls in GCC sources. Besides this patch which enables the new warnings with -Wextra (the patch is still in the review queue) the same code also triggers a -Walloca-larger-than warning (which has to be explicitly enabled). It doesn't look to me like these calls are bugs per se but in the interest of eating our own dog food so to speak and being able to compile GCC with -Walloca-larger-than (and not to have to disable the yet-to-be committed -Walloc-zero warning) the attached patch makes sure that the alloca calls that are subject to the warnings always request at least 1 byte. Thanks Martin PS Alloca(0) is not necessarily undefined but the pointer the call returns need not be distinct from other pointers returned from other such calls in the same function (and there are compilers that do not make it distinct). Relying on the pointer being distinct could be a source of subtle bugs, hence the warning. gcc/fortran/ChangeLog: * interface.c (compare_actual_formal): Avoid calling alloca with an argument of zero. * symbol.c (do_traverse_symtree): Same. * trans-intrinsic.c (gfc_conv_intrinsic_ishftc): Same. gcc/ChangeLog: * reg-stack.c (subst_asm_stack_regs): Same. * tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Same. diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index a0cb0bb..2da51d7 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -2821,7 +2821,8 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal, for (f = formal; f; f = f->next) n++; - new_arg = XALLOCAVEC (gfc_actual_arglist *, n); + /* Take care not to call alloca with a zero argument. */ + new_arg = XALLOCAVEC (gfc_actual_arglist *, n + !n); for (i = 0; i < n; i++) new_arg[i] = NULL; diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c index 0b711ca..cf403c4 100644 --- a/gcc/fortran/symbol.c +++ b/gcc/fortran/symbol.c @@ -3979,7 +3979,8 @@ do_traverse_symtree (gfc_symtree *st, void (*st_func) (gfc_symtree *), gcc_assert ((st_func && !sym_func) || (!st_func && sym_func)); nodes = count_st_nodes (st); - st_vec = XALLOCAVEC (gfc_symtree *, nodes); + /* Take care not to call alloca with a zero argument. */ + st_vec = XALLOCAVEC (gfc_symtree *, nodes + !nodes); node_cntr = 0; fill_st_vector (st, st_vec, node_cntr); diff --git a/gcc/fortran/trans-intrinsic.c b/gcc/fortran/trans-intrinsic.c index 463bb58..e4715f8 100644 --- a/gcc/fortran/trans-intrinsic.c +++ b/gcc/fortran/trans-intrinsic.c @@ -5525,7 +5525,8 @@ gfc_conv_intrinsic_ishftc (gfc_se * se, gfc_expr * expr) unsigned int num_args; num_args = gfc_intrinsic_argument_list_length (expr); - args = XALLOCAVEC (tree, num_args); + /* Take care not to call alloca with a zero argument. */ + args = XALLOCAVEC (tree, num_args + !num_args); gfc_conv_intrinsic_function_args (se, expr, args, num_args); diff --git a/gcc/reg-stack.c b/gcc/reg-stack.c index 4e86fa9..7da5a5f 100644 --- a/gcc/reg-stack.c +++ b/gcc/reg-stack.c @@ -2050,9 +2050,10 @@ subst_asm_stack_regs (rtx_insn *insn, stack_ptr regstack) for (i = 0, note = REG_NOTES (insn); note; note = XEXP (note, 1)) i++; - note_reg = XALLOCAVEC (rtx, i); - note_loc = XALLOCAVEC (rtx *, i); - note_kind = XALLOCAVEC (enum reg_note, i); + /* Take care not to call alloca with a zero argument. */ + note_reg = XALLOCAVEC (rtx, i + !i); + note_loc = XALLOCAVEC (rtx *, i + !i); + note_kind = XALLOCAVEC (enum reg_note, i + !i); n_notes = 0; for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c index 534292c..0564e69 100644 --- a/gcc/tree-ssa-threadedge.c +++ b/gcc/tree-ssa-threadedge.c @@ -340,7 +340,8 @@ record_temporary_equivalences_from_stmts_at_dest (edge e, unsigned int num, i = 0; num = NUM_SSA_OPERANDS (stmt, SSA_OP_ALL_USES); - copy = XALLOCAVEC (tree, num); + /* Take care not to call alloca with a zero argument. */ + copy = XALLOCAVEC (tree, num + !num); /* Make a copy of the uses & vuses into USES_COPY, then cprop into the operands. */
Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.
On 11/16/2016 03:12 PM, Andrew Burgess wrote: * Mike Stump [2016-11-16 12:59:53 -0800]: On Nov 16, 2016, at 12:09 PM, Andrew Burgess wrote: My only remaining concern is the new tests, I've tried to restrict them to targets that I suspect they'll pass on with: /* { dg-final-use { scan-assembler "\.section\[\t \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target *-*-linux* *-*-gnu* } } } */ but I'm still nervous that I'm going to introduce test failures. Is there any advice / guidance I should follow before I commit, or are folk pretty relaxed so long as I've made a reasonable effort? So, if you are worried about the way the line is constructed, I usually test it by misspelling the *-*-linux* *-*-gnu* part as *-*-linNOTux* *-*-gnNOTu* and see if the test then doesn't run on your machine. If it doesn't then you can be pretty confident that only machines that match the target triplet can be impacted. I usually do this type of testing by running the test case in isolation (not the full tests suite). Anyway, do the best you can, and don't worry about t it too much, learn from the experience, even if it goes wrong in some way. If it did go wrong, just be responsive (don't check it in just before a 6 week vacation) about fixing it, if you can. Thanks for the feedback. Change committed as revision 242519. If anyone sees any issues just let me know. Thanks. And thanks for seeing this through... I know it's a pain sometimes. jeff
Re: Default associative containers constructors/destructor/assignment
On 28/10/16 21:42 +0200, François Dumont wrote: /** * @brief %Map move constructor. - * @param __x A %map of identical element and allocator types. * - * The newly-created %map contains the exact contents of @a __x. - * The contents of @a __x are a valid, but unspecified %map. + * The newly-created %map contains the exact contents of the moved + * instance. The moved instance is a valid, but unspecified, %map. This comment isn't true, because non-propagating or non-equal allocators can force elements to be copied, but that's unrelated to your patch. There are no problems with the changes to the map and set containers, but I have some comments on the _Rb_tree changes. Overall I like the change. diff --git a/libstdc++-v3/include/bits/stl_tree.h b/libstdc++-v3/include/bits/stl_tree.h index 0bc5f4b..126087e 100644 --- a/libstdc++-v3/include/bits/stl_tree.h +++ b/libstdc++-v3/include/bits/stl_tree.h @@ -137,6 +137,81 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } }; + // Helper type offering value initialization guaranty on the compare functor. Spelling: "guarantee" + template +struct _Rb_tree_key_compare +{ + _Key_compare _M_key_compare; + + _Rb_tree_key_compare() + _GLIBCXX_NOEXCEPT_IF( + is_nothrow_default_constructible<_Key_compare>::value) + : _M_key_compare() + { } + + _Rb_tree_key_compare(const _Key_compare& __comp) + : _M_key_compare(__comp) + { } + +#if __cplusplus >= 201103L + _Rb_tree_key_compare(_Rb_tree_key_compare&& __x) + noexcept(is_nothrow_copy_constructible<_Key_compare>::value) + : _M_key_compare(__x._M_key_compare) + { } +#endif This constructor makes the type move-only (i.e. non-copyable) in C++11 and later. It's copyable in C++98. Is that what you want? Adding defaulted copy operations would fix that. Even if we don't actually need those copy operations, I'm uncomfortable with it being copyable in C++98 and non-copyable otherwise. +void +_M_reset() +{ + _M_initialize(); + _M_node_count = 0; +} This introduces a small change in behaviour, because _M_reset() now does _M_header._M_color = _S_red, which it didn't do before. That store is redundant. It could be avoided by just doing the three assignments in _M_reset() instead of calling _M_initialize(). And we could remove _M_initialize() entirely, and remove the redundant mem-initializers for _M_node_count (because it's set my _M_reset and _M_move_data anyway): _Rb_tree_header() _GLIBCXX_NOEXCEPT { _M_reset(); _M_header._M_color = _S_red; } #if __cplusplus >= 201103L _Rb_tree_header(_Rb_tree_header&& __x) noexcept { if (__x._M_header._M_parent != nullptr) _M_move_data(__x); else { _M_reset(); _M_header._M_color = _S_red; } } void _M_move_data(_Rb_tree_header& __x) { _M_header._M_parent = __x._M_header._M_parent; _M_header._M_left = __x._M_header._M_left; _M_header._M_right = __x._M_header._M_right; _M_header._M_parent->_M_parent = &_M_header; _M_node_count = __x._M_node_count; __x._M_reset(); } #endif void _M_reset() { _M_header._M_parent = 0; _M_header._M_left = &_M_header; _M_header._M_right = &_M_header; _M_node_count = 0; } @@ -599,50 +674,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Unused _Is_pod_comparator is kept as it is part of mangled name. template -struct _Rb_tree_impl : public _Node_allocator +struct _Rb_tree_impl + : public _Node_allocator + , public _Rb_tree_key_compare<_Key_compare> + , public _Rb_tree_header Do these need to be base classes, rather than data members? We derive from the allocator to benefit from the empty base-class optimization, but that isn't relevant for the _Rb_tree_key_compare and _Rb_tree_header types. It *could* be relevant for the comparison function, but would be an ABI change. We could do that ABI change conditionally, for gnu-versioned-namespace builds, but that's still possible using data members (e.g. have a data member that derives from the comparison function and contains the header node and/or node count members). Making them data members would simply mean restoring the function _Rb_tree_impl::_M_reset() and making it call reset on the member: void _M_reset() { _M_header._M_reset(); } The minor convenience of inheriting this function from a base class doesn't seem worth the stronger coupling that comes from using inheritance. Am I missing some other reason that inheritance is used here? - _Rb_tree_impl(const _Key_compare& __comp, const _Node_allocator& __a) - : _Node_allocator(__a), _M_key_compare(__comp), _M_header(), - _M_node_count(0) - { _M_initialize(); } Please mention the removal of this constructor in the changelog. @@ -1534,19 +1583,7 @@ _GLIBC
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On Thu, Nov 17, 2016 at 04:50:46PM +, Kyrill Tkachov wrote: > >>Is loading/storing a pair as cheap as loading/storing a single register? > >>In that case you could shrink-wrap per pair of registers instead. > > > >I suppose it can vary by microarchitecture. For the purposes of codegen > >I'd say > >it's more expensive than load/storing a single register (as there's more > >memory bandwidth required after all) > >but cheaper than two separate loads stores (alignment quirks > >notwithstanding). > >Interesting idea. That could help with code size too. I'll try it out. > > I'm encountering some difficulties implementing this idea. > I want to still keep the per-register structures across the hooks but > basically restrict the number > of components in a basic block to an even number of FPRs and GPRs. I tried > doing this in COMPONENTS_FOR_BB So your COMPONENTS_FOR_BB returns both components in a pair whenever one of those is needed? That should work afaics. > but apparently this ended up not saving/restoring some of the registers at > all because the components that were > "filtered out" that way still made their way to the bitmap passed into > SET_HANDLED_COMPONENTS and so the normal > prologue/epilogue didn't end up saving and restoring them. I am not sure what this means? "filtered out"? Segher
Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")
How about changing cp_parser_non_integral_constant_expression to issue a pedwarn instead of error for NIC_FLOAT? Jason
Re: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)
On 11/10/2016 10:00 AM, Jakub Jelinek wrote: Hi! On arm/aarch64 we ICE because some decls that make it here has non-NULL DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that doesn't fit into shwi would ICE similarly). While it is arguably a FE bug that it creates for VLA initialization from STRING_CST such a decl, I believe we have some PRs about it already open. I think it won't hurt to check for the large sizes properly even in this function though, and punt on unexpected cases, or even extremely large ones. Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going to test on arm and/or aarch64. Ok for trunk if the testing there passes too? 2016-11-10 Jakub Jelinek PR middle-end/78201 * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo. Don't test decl != NULL. Don't look at DECL_SIZE, but DECL_SIZE_UNIT instead, return false if it is NULL, or doesn't fit into uhwi, or is larger or equal to targetm.max_anchor_offset. * g++.dg/opt/pr78201.C: New test. OK. jeff
Re: Import libcilkrts Build 4467 (PR target/68945)
On 11/17/2016 09:56 AM, Ilya Verbin wrote: 2016-11-17 18:50 GMT+03:00 Rainer Orth : Rainer Orth writes: I happened to notice that my libcilkrts SPARC port has been applied upstream. So to reach closure on this issue for the GCC 7 release, I'd like to import upstream into mainline which seems to be covered by the free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even though https://gcc.gnu.org/codingconventions.html#upstream lists nothing specific and we have no listed maintainer. I initially used Ilya's intel.com address, which bounced. Now using the current address listed in MAINTAINERS... Yeah, I don't work for Intel anymore. And I'm not a libcilkrts maintainer, so I can't approve it. Do you want to be? IIRC I was going to nominate you, but held off knowing your situation was going to change. If you're interested in maintainer positions, I can certainly still nominate you. jeff
Re: Import libcilkrts Build 4467 (PR target/68945)
2016-11-17 18:50 GMT+03:00 Rainer Orth : > Rainer Orth writes: > >> I happened to notice that my libcilkrts SPARC port has been applied >> upstream. So to reach closure on this issue for the GCC 7 release, I'd >> like to import upstream into mainline which seems to be covered by the >> free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even >> though https://gcc.gnu.org/codingconventions.html#upstream lists nothing >> specific and we have no listed maintainer. > > I initially used Ilya's intel.com address, which bounced. Now using the > current address listed in MAINTAINERS... Yeah, I don't work for Intel anymore. And I'm not a libcilkrts maintainer, so I can't approve it. -- Ilya >> The following patch has passed x86_64-pc-linux-gnu bootstrap without >> regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps >> are currently running. >> >> Ok for mainline if they pass? > > Both Solaris bootstraps have completed successfully now. > > Rainer > > -- > - > Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [fixincludes, v3] Don't define libstdc++-internal macros in Solaris 10+
On 03/11/16 15:11 +0100, Rainer Orth wrote: Fortunately, this is all easily fixed by wrapping the affected templates in a new macro. That's what this patch does. The new libstdc++ acinclude.m4 test may well need wording changes in comments etc. and can perhaps be shortened a bit, bit it worked for me. See below. The fixincludes part passes make check. The patch has been regtested (C++ only) on * Solaris 11.2 SRU 13.6 (i.e. without both overloads and templates in ), * Solaris 11.3 SRU 3.6 (i.e. with overloads only), and * Solaris 12 Build 110 (i.e. with both overloads and templates, and the _GLIBCXX_USE_C99_MATH #undef's) * Linux/x86_64 I've checked that __CORRECT_ISO_CPP11_MATH_H_PROTO[12] were defined in config.h as expected, and there were no testsuite regressions. On the contrary, a couple of tests that before showed up as UNSUPPORTED due to the _GLIBCXX_USE_C99_MATH* #undef's now PASS as they should. Ok for mainline now, and for backports to the gcc-6 and gcc-5 branches after some soak time? The change is OK in principle, but I'd prefer more meaningful names for the macros ... --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -2181,7 +2181,8 @@ AC_DEFUN([GLIBCXX_CHECK_STDIO_PROTO], [ ]) dnl -dnl Check whether required C++11 overloads are present in . +dnl Check whether required C++11 overloads and templates are present +dnl in . The standard doesn't actually require templates to be used here, it only requires "sufficient additional overloads" so that integral arguments work. We happen to do that with templates, and apparently so do the Solaris headers, but other implementations are possible. So a more accurate description would be: dnl Check whether required C++11 overloads for FP and integral types dnl are present in . And rather than PROTO1 and PROTO2 the macros could be called PROTO_FP for the overloads for FP types, and PROTO_INT (or just PROTO_I) for the overloads for integral types (which happen to be templates in our header). + # Solaris 12 Build 90, Solaris 11.3 SRU 5.6, and Solaris 10 Patch + # 11996[67]-02 introduced the C++11 templates. + AC_MSG_CHECKING([for C++11 templates]) + AC_CACHE_VAL(glibcxx_cv_math11_template, [ + AC_COMPILE_IFELSE([AC_LANG_SOURCE( + [#include + namespace std { +struct __true_type { }; +struct __false_type { }; We don't need these structs. +template + struct __is_integer + { +enum { __value = 0 }; +typedef __false_type __type; + }; We don't need a definition for the primary template, this will work: template struct __is_integer; +template<> + struct __is_integer + { +enum { __value = 1 }; +typedef __true_type __type; The __type typedef is not needed here, just __value. + }; + } + namespace __gnu_cxx { +template + struct __enable_if + { }; Again, no need to define the primary template: template struct __enable_if; +template + struct __enable_if + { typedef _Tp __type; }; + } The suggestions above would make it shorter, while still remaining accurate to what the real code in the header does. It could be reduced even further without altering the meaning for the purposes of this test: + namespace std { +template + constexpr typename __gnu_cxx::__enable_if +<__is_integer<_Tp>::__value, double>::__type + log2(_Tp __x) + { return __builtin_log2(__x); } template struct __enable_if_int; template<> struct __enable_if_int { typedef double __type; }; template constexpr typename __enable_if_int<_Tp>::__type log2(_Tp __x) { return __builtin_log2(__x); } I don't mind whether you go with this or just remove the unnecessary structs, typedef and primary template bodies. Either is OK. + } + int + main (void) + { +int i = 1000; +return std::log2(i); + }
Re: [PR target/78213] Do not ICE on non-empty -fself-test
On 11/14/2016 08:18 AM, Bernd Schmidt wrote: On 11/11/2016 06:10 PM, Aldy Hernandez wrote: The problem in this PR is that -fself-test is being run on a non empty source file. This causes init_emit() to run, which sets: REG_POINTER (virtual_incoming_args_rtx) = 1; Setting REG_POINTER on the virtual incoming args, causes /f to be printed on some RTL dumps, causing the -fself-test machinery to fail at matching the expected value. How about always running init_emit and testing for the correct output? I would prefer Jakub's suggestion of running in finish_options(). I assume there are other places throughout the self-tests that depend on NOT continuing the compilation process, and I'd hate to plug each one. Would the attached patch be acceptable to both of you? Aldy commit 52b906160a109b84a26f5aa15a653f4aee612942 Author: Aldy Hernandez Date: Fri Nov 11 06:32:08 2016 -0800 PR target/78213 * opts.c (finish_options): Set -fsyntax-only if running self tests. diff --git a/gcc/opts.c b/gcc/opts.c index d2d6100..cb20154 100644 --- a/gcc/opts.c +++ b/gcc/opts.c @@ -744,6 +744,14 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, opts->x_flag_toplevel_reorder = 0; } + /* -fself-test depends on the state of the compiler prior to + compiling anything. Ideally it should be run on an empty source + file. However, in case we get run with actual source, assume + -fsyntax-only which will inhibit any compiler initialization + which may confuse the self tests. */ + if (opts->x_flag_self_test) +opts->x_flag_syntax_only = 1; + if (opts->x_flag_tm && opts->x_flag_non_call_exceptions) sorry ("transactional memory is not supported with non-call exceptions"); diff --git a/gcc/testsuite/gcc.dg/pr78213.c b/gcc/testsuite/gcc.dg/pr78213.c new file mode 100644 index 000..e43c83c --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr78213.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-fself-test" } */ + +/* Verify that -fself-test does not fail on a non empty source. */ + +int i; void bar(); void foo() +{ + while (i--) +bar(); +} +/* { dg-message "fself\-test: " "-fself-test" { target *-*-* } 0 } */
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 17/11/16 14:55, Kyrill Tkachov wrote: On 17/11/16 14:44, Segher Boessenkool wrote: Hi Kyrill, On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote: I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some interesting swings. 458.sjeng +1.45% 471.omnetpp +2.19% 445.gobmk -2.01% On SPECFP: 453.povray+7.00% After looking at the gobmk performance with performance counters it looks like more icache pressure. I see an increase in misses. This looks to me like an effect of code size increase, though it is not that large an increase (0.4% with SWS). Right. I don't see how to improve on this (but see below); ideas welcome :-) Branch mispredicts also go up a bit but not as much as icache misses. I don't see that happening -- for some testcases we get unlucky and have more branch predictor aliasing, and for some we have less, it's pretty random. Some testcases are really sensitive to this. Right, I don't think it's the branch prediction at fault in this case, rather the above icache stuff. I don't think there's anything we can do here, or at least that this patch can do about it. Overall, there's a slight improvement in SPECINT, even with the gobmk regression and a slightly larger improvement on SPECFP due to povray. And that is for only the "normal" GPRs, not LR or FP yet, right? This patch does implement FP registers wrapping as well but not LR. Though I remember seeing the improvement even when only GPRs were wrapped in an earlier version of the patch. Segher, one curious artifact I spotted while looking at codegen differences in gobmk was a case where we fail to emit load-pairs as effectively in the epilogue and its preceeding basic block. So before we had this epilogue: .L43: ldpx21, x22, [sp, 16] ldpx23, x24, [sp, 32] ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret and I see this becoming (among numerous other changes in the function): .L69: ldpx21, x22, [sp, 16] ldrx24, [sp, 40] .L43: ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx23, [sp, 32] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret So this is better in the cases where we jump straight into .L43 because we load fewer registers but worse when we jump to or fallthrough to .L69 because x23 and x24 are now restored using two loads rather than a single load-pair. This hunk isn't critical to performance in gobmk though. Is loading/storing a pair as cheap as loading/storing a single register? In that case you could shrink-wrap per pair of registers instead. I suppose it can vary by microarchitecture. For the purposes of codegen I'd say it's more expensive than load/storing a single register (as there's more memory bandwidth required after all) but cheaper than two separate loads stores (alignment quirks notwithstanding). Interesting idea. That could help with code size too. I'll try it out. I'm encountering some difficulties implementing this idea. I want to still keep the per-register structures across the hooks but basically restrict the number of components in a basic block to an even number of FPRs and GPRs. I tried doing this in COMPONENTS_FOR_BB but apparently this ended up not saving/restoring some of the registers at all because the components that were "filtered out" that way still made their way to the bitmap passed into SET_HANDLED_COMPONENTS and so the normal prologue/epilogue didn't end up saving and restoring them. I don't want to do it in GET_SEPARATE_COMPONENTS as that doesn't see each basic block. Is this something DISQUALIFY_COMPONENTS could be used for? Thanks, Kyrill Thanks, Kyrill Segher
Patch ping: [PATCH] Fix ICE in default_use_anchors_for_symbol_p (PR middle-end/78201)
Hi! On Thu, Nov 10, 2016 at 06:00:29PM +0100, Jakub Jelinek wrote: > On arm/aarch64 we ICE because some decls that make it here has non-NULL > DECL_SIZE, which is a VAR_DECL rather than CONST_INT (or DECL_SIZE that > doesn't fit into shwi would ICE similarly). While it is arguably a FE bug > that it creates for VLA initialization from STRING_CST such a decl, > I believe we have some PRs about it already open. > I think it won't hurt to check for the large sizes properly even in this > function though, and punt on unexpected cases, or even extremely large ones. > > Bootstrapped/regtested on x86_64-linux and i686-linux, Yvan said he is going > to test on arm and/or aarch64. Ok for trunk if the testing there passes > too? > > 2016-11-10 Jakub Jelinek > > PR middle-end/78201 > * varasm.c (default_use_anchors_for_symbol_p): Fix a comment typo. > Don't test decl != NULL. Don't look at DECL_SIZE, but DECL_SIZE_UNIT > instead, return false if it is NULL, or doesn't fit into uhwi, or > is larger or equal to targetm.max_anchor_offset. > > * g++.dg/opt/pr78201.C: New test. I'd like to ping this patch, Yvan has successfully tested it on arm and aarch64. Jakub
Re: [Aarch64][PATCH] Improve Logical And Immediate Expressions
On Thu, Oct 27, 2016 at 08:44:02PM +, Michael Collison wrote: > This patch is designed to improve code generation for "and" instructions with > certain immediate operands. > > For the following test case: > > int f2(int x) > { >x &= 0x0ff8; > >x &= 0xff001fff; > >return x; > } > > the trunk aarch64 compiler generates: > > mov w1, 8184 > movk w1, 0xf00, lsl 16 > and w0, w0, w1 > > We can generate one fewer instruction if we recognize certain constants. With > the attached patch the current trunk compiler generates: > > and w0, w0, 268435448 > and w0, w0, -16769025 > > Bootstrapped and make check successfully completed with no regressions on > aarch64-linux-gnu. > > Okay for trunk? Hi Michael, I have some minor comments in line. > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 3cdd69b..7ef8cdf 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -300,6 +300,9 @@ extern struct tune_params aarch64_tune_params; > HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); > int aarch64_get_condition_code (rtx); > bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode); > +unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in); > +unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in); > +bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode > mode); > int aarch64_branch_cost (bool, bool); > enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx); > bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 3e663eb..db82c5c 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -3600,6 +3600,46 @@ aarch64_bitmask_imm (HOST_WIDE_INT val_in, > machine_mode mode) >return val == mask * bitmask_imm_mul[__builtin_clz (bits) - 26]; > } > > +/* Create mask of ones, covering the lowest to highest bits set in VAL_IN. > */ > + > +unsigned HOST_WIDE_INT > +aarch64_and_split_imm1 (HOST_WIDE_INT val_in) > +{ > + int first_bit_set = ctz_hwi (val_in); > + int last_bit_set = floor_log2 (val_in); > + gcc_assert (first_bit_set < last_bit_set); This assert can never trigger (by definition) unless VAL_IN == 0 (in which case floor_log2 = -1, ctz_hwi = HOST_BITS_PER_WORD_SIZE). If that's the case you're testing for, then this assert would be more explicit as gcc_assert (val_in != 0) I'd suggest renaming these to lowest_bit_set and highest_bit_set, as first and last are ambiguous (you have them the opposite way round from what I'd consider first [highest, thus leading bits] and last [lowest/trailing bits]). > + > + return ((HOST_WIDE_INT_UC (2) << last_bit_set) - > + (HOST_WIDE_INT_1U << first_bit_set)); > +} > + > +/* Create constant with zero bits between first and last bit set and one > + bits elsewhere. */ I can't parse this comment in relation to what the code below does. Looking at the code, you're building a new constant where the bits between the first and last bits are unmodified, and all other bits are set to one. i.e. for 1010 1000 you'd generate 1010 . > +unsigned HOST_WIDE_INT > +aarch64_and_split_imm2 (HOST_WIDE_INT val_in) > +{ > + return val_in | ~aarch64_and_split_imm1 (val_in); > +} > + > +/* Return true if VAL_IN is a valid 'and' bitmask immediate. */ > + > +bool > +aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode mode) > +{ > + if (aarch64_bitmask_imm (val_in, mode)) > +return false; > + > + if (aarch64_move_imm (val_in, mode)) > +return false; > + > + unsigned HOST_WIDE_INT imm2 = aarch64_and_split_imm2 (val_in); > + > + if (!aarch64_bitmask_imm (imm2, mode)) > +return false; > + > + return true; You can replace these four lines with: return aarch64_bitmask_imm (imm2, mode); > +} > > /* Return true if val is an immediate that can be loaded into a > register in a single instruction. */ > diff --git a/gcc/testsuite/gcc.target/aarch64/and_const.c > b/gcc/testsuite/gcc.target/aarch64/and_const.c > new file mode 100644 > index 000..9c377d7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/and_const.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2" } */ > + > +int f2 (int x) > +{ > + x &= 0x0ff8; > + > + x &= 0xff001fff; > + > + return x; > +} It would be good to have a test for the DImode cases too (using long long). Thanks, James
Re: [patch, libfortran] Add AVX-specific matmul
On Thu, Nov 17, 2016 at 08:41:48AM +0100, Thomas Koenig wrote: > Am 17.11.2016 um 00:20 schrieb Jakub Jelinek: > >On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote: > >>>Don't you need to test in configure if the assembler supports AVX? > >>>Otherwise if somebody is bootstrapping gcc with older assembler, it will > >>>just fail to bootstrap. > >> > >>That's a good point. The AVX instructions were added in binutils 2.19, > >>which was released in 2011. This could be put in the prerequisites. > >> > >>What should the test do? Fail with an error message "you need newer > >>binutils" or simply (and silently) not compile the AVX vesion? > > > >>From what I understood, you want those functions just to be implementation > >details, not exported from libgfortran.so*. Thus the test would do > >something similar to what gcc/testsuite/lib/target-supports.exp > >(check_effective_target_avx) > >does, but of course in autoconf way, not in tcl. > > OK, that looks straightworward enough. I'll give it a shot. > > >Also, from what I see, target_clones just use IFUNCs, so you probably also > >need some configure test whether ifuncs are supported (the > >gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something > >similar again in configure. But if so, then I have no idea why you use > >a wrapper around the function, instead of using it on the exported APIs. > > As you wrote above, I wanted this as an implementation detail. I also > wanted the ability to be able to add new instruction sets without > breaking the ABI. But even exported IFUNC is an implementation detail. For other libraries/binaries IFUNC symbol is like any other symbol, they will have SHN_UNDEF symbol pointing to that, and it matters only for the dynamic linker during relocation processing. Whether some function is IFUNC or not is not an ABI change, you can change at any time a normal function into IFUNC or vice versa, without breaking ABI. > You're right - integer multiplication looks different. > > Nobody I know cares about integer matrix multiplication > speed, whereas real has gotten a _lot_ of attention over > the decades. So, putting in AVX will make the code run > faster on more machines, while putting in AVX2 will > (IMHO) bloat the library for no good reason. However, > I am willing to stand corrected on this. Putting in AVX512f > makes sense. Which is why I've been proposing to use avx2,default for the matmul_i* files and avx,default for the others. avx will not buy much for matmul_i*, while avx2 will. > I have also been trying to get target_clones to work on POWER > to get Altivec instructions, but to no avail. I also cannot > find any examples in the testsuite. Haven't checked, but maybe the target_clones attribute has been only implemented for x86_64/i686 and not for other targets. But power supports target attribute, so you e.g. have the option of #including the routine multiple times in one TU, each time with different name and target attribute, and then write the IFUNC routine for it by hand. Or attempt to support target_clones on power, or ask power maintainers to do that. Jakub
Re: [PATCH] Fix PR78333
On 17 November 2016 at 11:27, Christophe Lyon wrote: > On 17 November 2016 at 10:53, Richard Biener wrote: >> On Thu, 17 Nov 2016, Rainer Orth wrote: >> >>> Hi Richard, >>> >>> >> Probably providing dummy implemenations as in the original testcase in >>> >> the PR is enough? >>> > >>> > Maybe { dg-require weak } plus weak definitions of the cyg_profile funcs? >>> > Or simply restrict the test to { target *-*-linux* }? >>> >>> the only existing dg-do run testcase (gcc.dg/20001117-1.c) just has >>> >>> void __attribute__((no_instrument_function)) >>> __cyg_profile_func_enter(void *this_fn, void *call_site) >>> { >>> if (call_site == (void *)0) >>> abort (); >>> } >>> >>> void __attribute__((no_instrument_function)) >>> __cyg_profile_func_exit(void *this_fn, void *call_site) >>> { >>> if (call_site == (void *)0) >>> abort (); >>> } >>> >>> In the case at hand, we could do with empty implementations. This >>> certainly works on Solaris. >> >> Ok. Christophe, can you add the above and verify it works for you >> (and then commit)? >> > OK, I'm taking a look. > I tested and committed (r242553) the attached patch (on arm-none-eabi and arm-none-linux-gnueabihf). I used empty function as suggested by Rainer. Thanks, Christophe >> Thanks, >> Richard. Fix PR78333 testcase for non-glibc systems. 2016-11-17 Christophe Lyon gcc/testsuite/ * gcc.dg/pr78333.c: Add empty implementations of __cyg_profile_func_enter() and __cyg_profile_func_exit() to avoid problems on non-glibc systems. diff --git a/gcc/testsuite/gcc.dg/pr78333.c b/gcc/testsuite/gcc.dg/pr78333.c index fd3669c..ca037e5 100644 --- a/gcc/testsuite/gcc.dg/pr78333.c +++ b/gcc/testsuite/gcc.dg/pr78333.c @@ -1,6 +1,19 @@ /* { dg-do link } */ /* { dg-options "-finstrument-functions" } */ +/* Add empty implementations of __cyg_profile_func_enter() and + __cyg_profile_func_exit() to avoid problems on non-glibc + systems. */ +void __attribute__((no_instrument_function)) +__cyg_profile_func_enter(void *this_fn, void *call_site) +{ +} + +void __attribute__((no_instrument_function)) +__cyg_profile_func_exit(void *this_fn, void *call_site) +{ +} + extern inline __attribute__((gnu_inline, always_inline)) int foo () { } int main() {
Fix PR rtl-optimization/78355
As diagnosed by the submitter, LRA can generate unaligned accesses when SLOW_UNALIGNED_ACCESS is 1, for example when STRICT_ALIGNMENT is 1, because simplify_operand_subreg can wrongly reject aligned accesses when reloading SUBREGs of MEMs in favor of unaligned ones. The fix is to add the missing guard on the alignment before invoking SLOW_UNALIGNED_ACCESS (as in every other use of SLOW_UNALIGNED_ACCESS in the compiler). Tested on arm-eabi, approved by Vladimir, applied on the mainline. 2016-11-17 Pip Cet Eric Botcazou PR rtl-optimization/78355 * doc/tm.texi.in (SLOW_UNALIGNED_ACCESS): Document that the macro only needs to deal with unaligned accesses. * doc/tm.texi: Regenerate. * lra-constraints.c (simplify_operand_subreg): Only invoke SLOW_UNALIGNED_ACCESS on innermode if the MEM is not aligned enough. -- Eric BotcazouIndex: doc/tm.texi.in === --- doc/tm.texi.in (revision 242377) +++ doc/tm.texi.in (working copy) @@ -4654,7 +4654,8 @@ other fields in the same word of the str Define this macro to be the value 1 if memory accesses described by the @var{mode} and @var{alignment} parameters have a cost many times greater than aligned accesses, for example if they are emulated in a trap -handler. +handler. This macro is invoked only for unaligned accesses, i.e. when +@code{@var{alignment} < GET_MODE_ALIGNMENT (@var{mode})}. When this macro is nonzero, the compiler will act as if @code{STRICT_ALIGNMENT} were nonzero when generating code for block Index: lra-constraints.c === --- lra-constraints.c (revision 242377) +++ lra-constraints.c (working copy) @@ -1486,9 +1486,10 @@ simplify_operand_subreg (int nop, machin equivalences in function lra_constraints) and because for spilled pseudos we allocate stack memory enough for the biggest corresponding paradoxical subreg. */ - if (!SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg)) - || SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)) - || MEM_ALIGN (reg) >= GET_MODE_ALIGNMENT (mode)) + if (!(MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (mode) + && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (reg))) + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode) + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg return true; /* INNERMODE is fast, MODE slow. Reload the mem in INNERMODE. */
Re: [PATCH 1/2] PR77822
On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote: > The following two patches fix PR 77822 on s390x for gcc-7. As the > macro doing the argument range checks can be used on other targets > as well, I've put it in system.h (couldn't think of a better > place; maybe rtl.h?). > > Bootstrapped on s390x biarch, regression tested on s390x biarch > and s390, all on a zEC12 with -march=zEC12. > > Please check the commit messages for details. Common code patch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog * system.h (SIZE_POS_IN_RANGE): New. >From 44654374a0d62e7bc91356fc2189fac5cd99ae12 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Thu, 17 Nov 2016 14:49:18 +0100 Subject: [PATCH 1/2] PR target/77822: Add helper macro SIZE_POS_IN_RANGE to system.h. The macro can be used to validate the arguments of zero_extract and sign_extract to fix this problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77822 --- gcc/system.h | 8 1 file changed, 8 insertions(+) diff --git a/gcc/system.h b/gcc/system.h index 8c6127c..cc68f07 100644 --- a/gcc/system.h +++ b/gcc/system.h @@ -316,6 +316,14 @@ extern int errno; ((unsigned HOST_WIDE_INT) (VALUE) - (unsigned HOST_WIDE_INT) (LOWER) \ <= (unsigned HOST_WIDE_INT) (UPPER) - (unsigned HOST_WIDE_INT) (LOWER)) +/* A convenience macro to determine whether a SIZE lies inclusively + within [1, RANGE], POS lies inclusively within between + [0, RANGE - 1] and the sum lies inclusively within [1, RANGE]. */ +#define SIZE_POS_IN_RANGE(SIZE, POS, RANGE) \ + (IN_RANGE (POS, 0, (RANGE) - 1) \ + && IN_RANGE (SIZE, 1, RANGE) \ + && IN_RANGE ((SIZE) + (POS), 1, RANGE)) + /* Infrastructure for defining missing _MAX and _MIN macros. Note that macros defined with these cannot be used in #if. */ -- 2.3.0
[committed] Fix locations within raw strings
Whilst investigating PR preprocessor/78324 I noticed that the substring location code currently doesn't handle raw strings correctly, by not skipping the 'R', opening quote, delimiter and opening parenthesis. For example, an attempt to underline chars 4-7 with caret at 6 of this raw string yields this erroneous output: __emit_string_literal_range (R"foo(0123456789)foo", ~~^~ With the patch, the correct range/caret is printed: __emit_string_literal_range (R"foo(0123456789)foo", ~~^~ Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. Committed to trunk as r242552. gcc/ChangeLog: * input.c (selftest::test_lexer_string_locations_long_line): New function. (selftest::test_lexer_string_locations_raw_string_multiline): New function. (selftest::input_c_tests): Call the new functions, via for_each_line_table_case. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic-test-string-literals-1.c (test_raw_string_one_liner): New function. (test_raw_string_multiline): New function. libcpp/ChangeLog: * charset.c (cpp_interpret_string_1): Skip locations from loc_reader when advancing 'p' when handling raw strings. --- gcc/input.c| 74 ++ .../plugin/diagnostic-test-string-literals-1.c | 33 ++ libcpp/charset.c | 13 +++- 3 files changed, 119 insertions(+), 1 deletion(-) diff --git a/gcc/input.c b/gcc/input.c index c2042e8..728f4dd 100644 --- a/gcc/input.c +++ b/gcc/input.c @@ -3156,6 +3156,78 @@ test_lexer_string_locations_long_line (const line_table_case &case_) i, 2, 7 + i, 7 + i); } +/* Test of locations within a raw string that doesn't contain a newline. */ + +static void +test_lexer_string_locations_raw_string_one_line (const line_table_case &case_) +{ + /* .00.0001122. + .12.3456789012345678901. */ + const char *content = ("R\"foo(0123456789)foo\"\n"); + lexer_test test (case_, content, NULL); + + /* Verify that we get the expected token back. */ + const cpp_token *tok = test.get_token (); + ASSERT_EQ (tok->type, CPP_STRING); + + /* Verify that cpp_interpret_string works. */ + cpp_string dst_string; + const enum cpp_ttype type = CPP_STRING; + bool result = cpp_interpret_string (test.m_parser, &tok->val.str, 1, + &dst_string, type); + ASSERT_TRUE (result); + ASSERT_STREQ ("0123456789", (const char *)dst_string.text); + free (const_cast (dst_string.text)); + + if (!should_have_column_data_p (line_table->highest_location)) +return; + + /* 0-9, plus the nil terminator. */ + ASSERT_NUM_SUBSTRING_RANGES (test, tok->src_loc, CPP_STRING, 11); + for (int i = 0; i < 11; i++) +ASSERT_CHAR_AT_RANGE (test, tok->src_loc, CPP_STRING, + i, 1, 7 + i, 7 + i); +} + +/* Test of locations within a raw string that contains a newline. */ + +static void +test_lexer_string_locations_raw_string_multiline (const line_table_case &case_) +{ + /* .00.. + .12.3456. */ + const char *content = ("R\"foo(\n" + /* .0. + .12345. */ +"hello\n" +"world\n" + /* .0. + .12345. */ +")foo\"\n"); + lexer_test test (case_, content, NULL); + + /* Verify that we get the expected token back. */ + const cpp_token *tok = test.get_token (); + ASSERT_EQ (tok->type, CPP_STRING); + + /* Verify that cpp_interpret_string works. */ + cpp_string dst_string; + const enum cpp_ttype type = CPP_STRING; + bool result = cpp_interpret_string (test.m_parser, &tok->val.str, 1, + &dst_string, type); + ASSERT_TRUE (result); + ASSERT_STREQ ("\nhello\nworld\n", (const char *)dst_string.text); + free (const_cast (dst_string.text)); + + if (!should_have_column_data_p (line_table->highest_location)) +return; + + /* Currently we don't support locations within raw strings that + contain newlines. */ + ASSERT_HAS_NO_SUBSTRING_RANGES (test, tok->src_loc, tok->type, + "range endpoints are on different lines"); +} + /* Test of lexing char constants. */ static void @@ -3297,6 +3369,8 @@ input_c_tests () for_each_line_table_case (test_lexer_string_locations_stringified_macro_argument); for_each_line_table_case (test_lexer_string_locations_non_string); for_each_line_table_case (test_lexer_string_locations_long_line); + for_each_line_table_case (test_lexer_string_locations_raw_string_one_line); + for_each_line_table_case (test_lexer_string_locations_raw_string_multiline); for_each_line_table_case (test_lexer_c
Re: [PATCH 2/2] PR77822
On Thu, Nov 17, 2016 at 04:53:03PM +0100, Dominik Vogt wrote: > The following two patches fix PR 77822 on s390x for gcc-7. As the > macro doing the argument range checks can be used on other targets > as well, I've put it in system.h (couldn't think of a better > place; maybe rtl.h?). > > Bootstrapped on s390x biarch, regression tested on s390x biarch > and s390, all on a zEC12 with -march=zEC12. > > Please check the commit messages for details. S390 backend patch. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/ChangeLog * config/s390/s390.md ("extzv") ("*extzv") ("*extzvdi_lshiftrt") ("*_ior_and_sr_ze") ("*extract1bitdi") ("*insv", "*insv_rnsbg_noshift") ("*insv_rnsbg_srl", "*insv_mem_reg") ("*insvdi_mem_reghigh", "*insvdi_reg_imm"): Use SIZE_POS_IN_RANGE to validate the arguments of zero_extract and sign_extract. >From 2e20be51e86cde2e02f8f02e0367456d3fe8d92b Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Thu, 17 Nov 2016 14:49:48 +0100 Subject: [PATCH 2/2] PR target/77822: S390: Validate argument range of {zero,sign}_extract. With some undefined code, combine generates patterns where the arguments to *_extract are out of range, e.b. a negative bit position. If the s390 backend accepts these, they lead to not just undefined behaviour but invalid assembly instructions (argument out of the allowed range). So this patch makes sure that the rtl expressions with out of range arguments are rejected. --- gcc/config/s390/s390.md | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md index fedd6f9..d2dfb1e 100644 --- a/gcc/config/s390/s390.md +++ b/gcc/config/s390/s390.md @@ -3741,6 +3741,8 @@ (clobber (reg:CC CC_REGNUM))])] "TARGET_Z10" { + if (! SIZE_POS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), 64)) +FAIL; /* Starting with zEC12 there is risbgn not clobbering CC. */ if (TARGET_ZEC12) { @@ -3760,7 +3762,9 @@ (match_operand 2 "const_int_operand" "") ; size (match_operand 3 "const_int_operand" ""))) ; start ] - "" + " + && SIZE_POS_IN_RANGE (INTVAL (operands[2]), INTVAL (operands[3]), +GET_MODE_BITSIZE (mode))" "\t%0,%1,64-%2,128+63,%3+%2" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -3773,6 +3777,7 @@ (lshiftrt:DI (match_operand:DI 3 "register_operand" "d") (match_operand:DI 4 "nonzero_shift_count_operand" "")))] " + && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64) && 64 - UINTVAL (operands[4]) >= UINTVAL (operands[1])" "\t%0,%3,%2,%2+%1-1,128-%2-%1-%4" [(set_attr "op_type" "RIE") @@ -3791,6 +3796,7 @@ (match_operand 5 "const_int_operand" "")) ; start 4)))] " + && SIZE_POS_IN_RANGE (INTVAL (operands[4]), INTVAL (operands[5]), 64) && UINTVAL (operands[2]) == (~(0ULL) << UINTVAL (operands[4]))" "\t%0,%3,64-%4,63,%4+%5" [(set_attr "op_type" "RIE") @@ -3804,7 +3810,8 @@ (const_int 1) ; size (match_operand 2 "const_int_operand" "")) ; start (const_int 0)))] - "" + " + && SIZE_POS_IN_RANGE (1, INTVAL (operands[2]), 64)" "\t%0,%1,64-1,128+63,%2+1" ; dst, src, start, end, shift [(set_attr "op_type" "RIE") (set_attr "z10prop" "z10_super_E1")]) @@ -3919,6 +3926,8 @@ (match_operand 2 "const_int_operand""I")) ; pos (match_operand:GPR 3 "nonimmediate_operand" "d"))] " + && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), +GET_MODE_BITSIZE (mode)) && (INTVAL (operands[1]) + INTVAL (operands[2])) <= " "\t%0,%3,%2,%2+%1-1,-%2-%1" [(set_attr "op_type" "RIE") @@ -4214,6 +4223,7 @@ (match_operand:DI 3 "nonimmediate_operand" "d"))) (clobber (reg:CC CC_REGNUM))] "TARGET_Z10 + && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64) && INTVAL (operands[1]) + INTVAL (operands[2]) == 64" "rnsbg\t%0,%3,%2,63,0" [(set_attr "op_type" "RIE")]) @@ -4230,6 +4240,7 @@ (match_operand:DI 4 "nonimmediate_operand" "d"))) (clobber (reg:CC CC_REGNUM))] "TARGET_Z10 + && SIZE_POS_IN_RANGE (INTVAL (operands[1]), INTVAL (operands[2]), 64) && INTVAL (operands[3]) == 64 - INTVAL (operands[1]) - INTVAL (operands[2])" "rnsbg\t%0,%4,%2,%2+%1-1,%3" [(set_attr "op_type" "RIE")]) @@ -4239,7 +4250,8 @@ (match_operand 1 "const_int_operand" "n,n") (const_int 0)) (match_operand:W 2 "register_operand" "d,d"))] - "INTVAL (operands[1]) > 0 + "SIZE_POS_IN_RANGE (INTVAL (operands[1]), 0, 64) + && INTVAL (operands[1]) > 0 && INTVAL (operands[1]) <= GET_MODE_BITSIZE (SImode) && INTVAL (operands[1]) % BITS_PER_UNIT == 0" { @@ -4260,6 +4272,
[PATCH 0/2] PR77822
The following two patches fix PR 77822 on s390x for gcc-7. As the macro doing the argument range checks can be used on other targets as well, I've put it in system.h (couldn't think of a better place; maybe rtl.h?). Bootstrapped on s390x biarch, regression tested on s390x biarch and s390, all on a zEC12 with -march=zEC12. Please check the commit messages for details. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: Import libcilkrts Build 4467 (PR target/68945)
Rainer Orth writes: > I happened to notice that my libcilkrts SPARC port has been applied > upstream. So to reach closure on this issue for the GCC 7 release, I'd > like to import upstream into mainline which seems to be covered by the > free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even > though https://gcc.gnu.org/codingconventions.html#upstream lists nothing > specific and we have no listed maintainer. I initially used Ilya's intel.com address, which bounced. Now using the current address listed in MAINTAINERS... > The following patch has passed x86_64-pc-linux-gnu bootstrap without > regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps > are currently running. > > Ok for mainline if they pass? Both Solaris bootstraps have completed successfully now. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")
... oops, forgot about parser->non_integral_constant_expression_p. The version below, which I'm currently regtesting again, seems more correct. Thanks, Paolo. /// Index: cp/parser.c === --- cp/parser.c (revision 242540) +++ cp/parser.c (working copy) @@ -4841,7 +4841,13 @@ cp_parser_primary_expression (cp_parser *parser, checked at that point. If we are not within a cast, then this code is invalid. */ if (!cast_p) - cp_parser_non_integral_constant_expression (parser, NIC_FLOAT); + { + parser->non_integral_constant_expression_p = true; + if (!parser->allow_non_integral_constant_expression_p) + pedwarn (input_location, OPT_Wpedantic, +"ISO C++ forbids using a floating-point literal " +"in a constant-expression"); + } } return cp_expr (token->u.value, token->location); Index: testsuite/g++.dg/parse/pr55080.C === --- testsuite/g++.dg/parse/pr55080.C(revision 0) +++ testsuite/g++.dg/parse/pr55080.C(working copy) @@ -0,0 +1,6 @@ +// PR c++/55080 +// { dg-options "-std=c++98 -pedantic" } + +class B { + static const int c = 3.1415926; // { dg-warning "constant-expression" } +};
Re: [PATCH][2/2] GIMPLE Frontend, middle-end changes
On 11/17/2016 01:20 AM, Richard Biener wrote: On Wed, 16 Nov 2016, Jeff Law wrote: On 10/28/2016 05:51 AM, Richard Biener wrote: These are the middle-end changes and additions to the testsuite. They are pretty self-contained, I've organized the changelog entries below in areas of changes: 1) dump changes - we add a -gimple dump modifier that allows most function dumps to be directy fed back into the GIMPLE FE 2) pass manager changes to implement the startwith("pass-name") feature which implements unit-testing for GIMPLE passes 3) support for "SSA" input, a __PHI stmt that is lowered once the CFG is built, a facility to allow a specific SSA name to be allocated plus a small required change in the SSA rewriter to allow for pre-existing PHI arguments Bootstrapped and tested on x86_64-unknown-linux-gnu (together with [1/2]). I can approve all these changes myself but any comments are welcome. My only worry would be ensuring that in the case where we're asking for a particular SSA_NAME in make_ssa_name_fn that we assert the requested name is available. ISTM that if it's > the highest current version or in the freelist, then we ought to be safe. If it isn't safe then we should either issue an error, or renumber the preexisting SSA_NAME (and determining if it's safe to renumber the preexisting SSA_NAME may require more context than we have). An alternative interface would be to return NULL rather than asserting. OTOH the single caller requesting a specific version first does a lookup if the SSA name already exists, so it is safe. I have a slight preference for enforcing safety within the name manager, but I can live with doing the checking in the caller. jeff
Re: Fix PR78154
On 11/17/2016 01:48 AM, Richard Biener wrote: On Wed, 16 Nov 2016, Jeff Law wrote: On 11/16/2016 05:17 PM, Martin Sebor wrote: (I've heard some noise in C++-land about making memcpy(0,0,0) valid, but that may have just been noise) We may have read the same discussion. It would make some things a little easier in C++ (and remove what most people view as yet another unnecessary gotcha in the language). And that may be a reasonable thing to do. While GCC does take advantage of the non-null attribute when trying to prove certain pointers must be non-null, it only does so when the magic flag is turned on. There was a sense that it was too aggressive and that time may be necessary for folks to come to terms with what GCC was doing, particularly in the the memcpy (*, *, 0) case -- but I've never gotten the sense that happened and we've never turned that flag on by default. We only have -f[no-]delete-null-pointer-checks and that's on by default. So we _do_ take advantage of those. Hmm, maybe it's the path isolation I'm thinking about where we have the additional flag to control whether or not we use the attributes to help identify erroneous paths. jeff
Re: PR78319
On 11/17/2016 01:52 AM, Richard Biener wrote: On Thu, 17 Nov 2016, Prathamesh Kulkarni wrote: On 17 November 2016 at 03:20, Jeff Law wrote: On 11/16/2016 01:23 PM, Prathamesh Kulkarni wrote: Hi, As discussed in PR, this patch marks the test-case to xfail on arm-none-eabi. OK to commit ? You might check if Aldy's change to the uninit code helps your case (approved earlier today, so hopefully in the tree very soon). I quickly scanned the BZ. There's some overlap, but it might be too complex for Aldy's enhancements to catch. Hi Jeff, I tried Aldy's patch [1], but it didn't catch the case in PR78319. [1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00225.html XFAILing is ok. Agreed. Thanks for checking Prathamesh. Jeff
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 17/11/16 15:01, Segher Boessenkool wrote: On Thu, Nov 17, 2016 at 02:55:20PM +, Kyrill Tkachov wrote: Overall, there's a slight improvement in SPECINT, even with the gobmk regression and a slightly larger improvement on SPECFP due to povray. And that is for only the "normal" GPRs, not LR or FP yet, right? This patch does implement FP registers wrapping as well but not LR. I meant frame pointer, not floating point :-) Ah :) HARD_FRAME_POINTER is wrapped with -fomit-frame-pointer but we don't touch the stack pointer (SP) in any case. Kyrill Segher
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On Thu, Nov 17, 2016 at 02:55:20PM +, Kyrill Tkachov wrote: > >>Overall, there's a slight improvement in SPECINT, even with the gobmk > >>regression and a slightly larger improvement > >>on SPECFP due to povray. > >And that is for only the "normal" GPRs, not LR or FP yet, right? > > This patch does implement FP registers wrapping as well but not LR. I meant frame pointer, not floating point :-) Segher
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 17/11/16 14:44, Segher Boessenkool wrote: Hi Kyrill, On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote: I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some interesting swings. 458.sjeng +1.45% 471.omnetpp +2.19% 445.gobmk -2.01% On SPECFP: 453.povray+7.00% After looking at the gobmk performance with performance counters it looks like more icache pressure. I see an increase in misses. This looks to me like an effect of code size increase, though it is not that large an increase (0.4% with SWS). Right. I don't see how to improve on this (but see below); ideas welcome :-) Branch mispredicts also go up a bit but not as much as icache misses. I don't see that happening -- for some testcases we get unlucky and have more branch predictor aliasing, and for some we have less, it's pretty random. Some testcases are really sensitive to this. Right, I don't think it's the branch prediction at fault in this case, rather the above icache stuff. I don't think there's anything we can do here, or at least that this patch can do about it. Overall, there's a slight improvement in SPECINT, even with the gobmk regression and a slightly larger improvement on SPECFP due to povray. And that is for only the "normal" GPRs, not LR or FP yet, right? This patch does implement FP registers wrapping as well but not LR. Though I remember seeing the improvement even when only GPRs were wrapped in an earlier version of the patch. Segher, one curious artifact I spotted while looking at codegen differences in gobmk was a case where we fail to emit load-pairs as effectively in the epilogue and its preceeding basic block. So before we had this epilogue: .L43: ldpx21, x22, [sp, 16] ldpx23, x24, [sp, 32] ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret and I see this becoming (among numerous other changes in the function): .L69: ldpx21, x22, [sp, 16] ldrx24, [sp, 40] .L43: ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx23, [sp, 32] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret So this is better in the cases where we jump straight into .L43 because we load fewer registers but worse when we jump to or fallthrough to .L69 because x23 and x24 are now restored using two loads rather than a single load-pair. This hunk isn't critical to performance in gobmk though. Is loading/storing a pair as cheap as loading/storing a single register? In that case you could shrink-wrap per pair of registers instead. I suppose it can vary by microarchitecture. For the purposes of codegen I'd say it's more expensive than load/storing a single register (as there's more memory bandwidth required after all) but cheaper than two separate loads stores (alignment quirks notwithstanding). Interesting idea. That could help with code size too. I'll try it out. Thanks, Kyrill Segher
Re: [patch, libfortran] Add AVX-specific matmul
Well, here is a newer version of the patch. I wrote a few configure tests to check for AVX. This version hast the advantage that, if anybody uses 32-bit programs with AVX, they would also benefit. Jakub, would you be OK with that patch? I do not yet want to commit this because it needs more testing on different platforms to see if it actually performs better. Regarding putting the blocked part into something separate: Quite doable, but I would rather like to do this in a follow-up patch, if we decide t do it. Regards Thomas 2016-11-17 Thomas Koenig PR fortran/78379 * acinclude.m4 (LIBGFOR_CHECK_AVX): New test. (LIBGFOR_CHECK_AVX2): New test. (LIBGFOR_CHECK_AVX512F): New test. * configure.ac: Call LIBGFOR_CHECK_AVX, LIBGFOR_CHECK_AVX2 and LIBGFOR_CHECK_AVX512F. * config.h.in: Regenerated. * configure: Regenerated. * m4/matmul.m4: For AVX, AVX2 and AVX_512F, make the work function for matmul static with target_clones for AVX and default, and create a wrapper function to call it. * generated/matmul_c10.c: Regenerated. * generated/matmul_c16.c: Regenerated. * generated/matmul_c4.c: Regenerated. * generated/matmul_c8.c: Regenerated. * generated/matmul_i1.c: Regenerated. * generated/matmul_i16.c: Regenerated. * generated/matmul_i2.c: Regenerated. * generated/matmul_i4.c: Regenerated. * generated/matmul_i8.c: Regenerated. * generated/matmul_r10.c: Regenerated. * generated/matmul_r16.c: Regenerated. * generated/matmul_r4.c: Regenerated. * generated/matmul_r8.c: Regenerated. Index: acinclude.m4 === --- acinclude.m4 (Revision 242477) +++ acinclude.m4 (Arbeitskopie) @@ -393,3 +393,54 @@ AC_DEFUN([LIBGFOR_CHECK_STRERROR_R], [ [Define if strerror_r takes two arguments and is available in .]),) CFLAGS="$ac_save_CFLAGS" ]) + +dnl Check for AVX + +AC_DEFUN([LIBGFOR_CHECK_AVX], [ + ac_save_CFLAGS="$CFLAGS" + CFLAGS="-O2 -mavx" + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + void _mm256_zeroall (void) +{ + __builtin_ia32_vzeroall (); +}]], [[]])], + AC_DEFINE(HAVE_AVX, 1, + [Define if AVX instructions can be compiled.]), + []) + CFLAGS="$ac_save_CFLAGS" +]) + +dnl Check for AVX2 + +AC_DEFUN([LIBGFOR_CHECK_AVX2], [ + ac_save_CFLAGS="$CFLAGS" + CFLAGS="-O2 -mavx2" + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + typedef long long __v4di __attribute__ ((__vector_size__ (32))); + __v4di + mm256_is32_andnotsi256 (__v4di __X, __v4di __Y) +{ + return __builtin_ia32_andnotsi256 (__X, __Y); +}]], [[]])], + AC_DEFINE(HAVE_AVX2, 1, + [Define if AVX2 instructions can be compiled.]), + []) + CFLAGS="$ac_save_CFLAGS" +]) + +dnl Check for AVX512f + +AC_DEFUN([LIBGFOR_CHECK_AVX512F], [ + ac_save_CFLAGS="$CFLAGS" + CFLAGS="-O2 -mavx512f" + AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ + typedef double __m512d __attribute__ ((__vector_size__ (64))); + __m512d _mm512_add (__m512d a) + { + return __builtin_ia32_addpd512_mask (a, a, a, 1, 4); +}]], [[]])], + AC_DEFINE(HAVE_AVX512F, 1, + [Define if AVX512f instructions can be compiled.]), + []) + CFLAGS="$ac_save_CFLAGS" +]) Index: config.h.in === --- config.h.in (Revision 242477) +++ config.h.in (Arbeitskopie) @@ -78,6 +78,15 @@ /* Define to 1 if the target supports __attribute__((visibility(...))). */ #undef HAVE_ATTRIBUTE_VISIBILITY +/* Define if AVX instructions can be compiled. */ +#undef HAVE_AVX + +/* Define if AVX2 instructions can be compiled. */ +#undef HAVE_AVX2 + +/* Define if AVX512f instructions can be compiled. */ +#undef HAVE_AVX512F + /* Define to 1 if you have the `cabs' function. */ #undef HAVE_CABS Index: configure === --- configure (Revision 242477) +++ configure (Arbeitskopie) @@ -26174,6 +26174,93 @@ $as_echo "#define HAVE_CRLF 1" >>confdefs.h fi +# Check whether we support AVX extensions + + ac_save_CFLAGS="$CFLAGS" + CFLAGS="-O2 -mavx" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + + void _mm256_zeroall (void) +{ + __builtin_ia32_vzeroall (); +} +int +main () +{ + + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + +$as_echo "#define HAVE_AVX 1" >>confdefs.h + +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext + CFLAGS="$ac_save_CFLAGS" + + +# Check wether we support AVX2 extensions + + ac_save_CFLAGS="$CFLAGS" + CFLAGS="-O2 -mavx2" + cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + + typedef long long __v4di __attribute__ ((__vector_size__ (32))); + __v4di + mm256_is32_andnotsi256 (__v4di __X, __v4di __Y) +{ + return __builtin_ia32_andnotsi256 (__X, __Y); +} +int
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
Hi Kyrill, On Thu, Nov 17, 2016 at 02:22:08PM +, Kyrill Tkachov wrote: > >>I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there > >>were > >>some interesting swings. > >>458.sjeng +1.45% > >>471.omnetpp +2.19% > >>445.gobmk -2.01% > >> > >>On SPECFP: > >>453.povray+7.00% > After looking at the gobmk performance with performance counters it looks > like more icache pressure. > I see an increase in misses. > This looks to me like an effect of code size increase, though it is not > that large an increase (0.4% with SWS). Right. I don't see how to improve on this (but see below); ideas welcome :-) > Branch mispredicts also go up a bit but not as much as icache misses. I don't see that happening -- for some testcases we get unlucky and have more branch predictor aliasing, and for some we have less, it's pretty random. Some testcases are really sensitive to this. > I don't think there's anything we can do here, or at least that this patch > can do about it. > Overall, there's a slight improvement in SPECINT, even with the gobmk > regression and a slightly larger improvement > on SPECFP due to povray. And that is for only the "normal" GPRs, not LR or FP yet, right? > Segher, one curious artifact I spotted while looking at codegen differences > in gobmk was a case where we fail > to emit load-pairs as effectively in the epilogue and its preceeding basic > block. > So before we had this epilogue: > .L43: > ldpx21, x22, [sp, 16] > ldpx23, x24, [sp, 32] > ldpx25, x26, [sp, 48] > ldpx27, x28, [sp, 64] > ldrx30, [sp, 80] > ldpx19, x20, [sp], 112 > ret > > and I see this becoming (among numerous other changes in the function): > > .L69: > ldpx21, x22, [sp, 16] > ldrx24, [sp, 40] > .L43: > ldpx25, x26, [sp, 48] > ldpx27, x28, [sp, 64] > ldrx23, [sp, 32] > ldrx30, [sp, 80] > ldpx19, x20, [sp], 112 > ret > > So this is better in the cases where we jump straight into .L43 because we > load fewer registers > but worse when we jump to or fallthrough to .L69 because x23 and x24 are > now restored using two loads > rather than a single load-pair. This hunk isn't critical to performance in > gobmk though. Is loading/storing a pair as cheap as loading/storing a single register? In that case you could shrink-wrap per pair of registers instead. Segher
Re: [PATCH] Fix PR77848
> On Nov 17, 2016, at 3:42 AM, Richard Biener > wrote: > > On Thu, Nov 17, 2016 at 4:14 AM, Bill Schmidt > wrote: >> >> Yes, the new predicate works fine. New patch below, bootstrapped and tested >> on powerpc64le-unknown-linux-gnu, with only the bb-slp-cond-1.c regressions >> previously discussed. Is this ok for trunk? > > Yes. Please open a bug for the bb-slp-cond-1.c testsuite FAIL (so we have an > excuse to either improve loop vectorization data dependence analysis or > BB vectorization in stage3). Fix committed as r242550, new bug open at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78396. Thanks! Bill > > Thanks, > Richard.
Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation
On 14/11/16 14:25, Kyrill Tkachov wrote: On 11/11/16 15:31, Kyrill Tkachov wrote: On 11/11/16 10:17, Kyrill Tkachov wrote: On 10/11/16 23:39, Segher Boessenkool wrote: On Thu, Nov 10, 2016 at 02:42:24PM -0800, Andrew Pinski wrote: On Thu, Nov 10, 2016 at 6:25 AM, Kyrill Tkachov I ran SPEC2006 on a Cortex-A72. Overall scores were neutral but there were some interesting swings. 458.sjeng +1.45% 471.omnetpp +2.19% 445.gobmk -2.01% On SPECFP: 453.povray+7.00% Wow, this looks really good. Thank you for implementing this. If I get some time I am going to try it out on other processors than A72 but I doubt I have time any time soon. I'd love to hear what causes the slowdown for gobmk as well, btw. I haven't yet gotten a direct answer for that (through performance analysis tools) but I have noticed that load/store pairs are not generated as aggressively as I hoped. They are being merged by the sched fusion pass and peepholes (which runs after this) but it still misses cases. I've hacked the SWS hooks to generate pairs explicitly and that increases the number of pairs and helps code size to boot. It complicates the logic of the hooks a bit but not too much. I'll make those changes and re-benchmark, hopefully that will help performance. And here's a version that explicitly emits pairs. I've looked at assembly codegen on SPEC2006 and it generates quite a few more LDP/STP pairs than the original version. I kicked off benchmarks over the weekend to see the effect. Andrew, if you want to try it out (more benchmarking and testing always welcome) this is the one to try. And I discovered over the weekend that gamess and wrf have validation errors. This version runs correctly. SPECINT results were fine though and there is even a small overall gain due to sjeng and omnetpp. However, gobmk still has the regression. I'll rerun SPECFP with this patch (it's really just a small bugfix over the previous version) and get on with analysing gobmk. After looking at the gobmk performance with performance counters it looks like more icache pressure. I see an increase in misses. This looks to me like an effect of code size increase, though it is not that large an increase (0.4% with SWS). Branch mispredicts also go up a bit but not as much as icache misses. I don't think there's anything we can do here, or at least that this patch can do about it. Overall, there's a slight improvement in SPECINT, even with the gobmk regression and a slightly larger improvement on SPECFP due to povray. Segher, one curious artifact I spotted while looking at codegen differences in gobmk was a case where we fail to emit load-pairs as effectively in the epilogue and its preceeding basic block. So before we had this epilogue: .L43: ldpx21, x22, [sp, 16] ldpx23, x24, [sp, 32] ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret and I see this becoming (among numerous other changes in the function): .L69: ldpx21, x22, [sp, 16] ldrx24, [sp, 40] .L43: ldpx25, x26, [sp, 48] ldpx27, x28, [sp, 64] ldrx23, [sp, 32] ldrx30, [sp, 80] ldpx19, x20, [sp], 112 ret So this is better in the cases where we jump straight into .L43 because we load fewer registers but worse when we jump to or fallthrough to .L69 because x23 and x24 are now restored using two loads rather than a single load-pair. This hunk isn't critical to performance in gobmk though. Given that there is an overall gain is this ok for trunk? https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01352.html Thanks, Kyrill 2016-11-11 Kyrylo Tkachov * config/aarch64/aarch64.h (machine_function): Add reg_is_wrapped_separately field. * config/aarch64/aarch64.c (emit_set_insn): Change return type to rtx_insn *. (aarch64_save_callee_saves): Don't save registers that are wrapped separately. (aarch64_restore_callee_saves): Don't restore registers that are wrapped separately. (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p): Move earlier in the file. (aarch64_get_separate_components): New function. (aarch64_get_next_set_bit): Likewise. (aarch64_components_for_bb): Likewise. (aarch64_disqualify_components): Likewise. (aarch64_emit_prologue_components): Likewise. (aarch64_emit_epilogue_components): Likewise. (aarch64_set_handled_components): Likewise. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define.
Re: [PATCH PR78114]Refine gfortran.dg/vect/fast-math-mgrid-resid.f
On Thu, Nov 17, 2016 at 10:53 AM, Richard Biener wrote: > On Thu, Nov 17, 2016 at 11:26 AM, Bin.Cheng wrote: >> On Thu, Nov 17, 2016 at 8:32 AM, Richard Biener >> wrote: >>> On Wed, Nov 16, 2016 at 6:20 PM, Bin Cheng wrote: Hi, Currently test gfortran.dg/vect/fast-math-mgrid-resid.f checks all predictive commoning opportunities for all possible loops. This makes it fragile because vectorizer may peel the loop differently, as well as may choose different vector factors. For example, on x86-solaris, vectorizer doesn't peel for prologue loop; for -march=haswell, the case is long time failed because vector factor is 4, while iteration distance of predictive commoning opportunity is smaller than 4. This patch refines it by only checking if predictive commoning variable is created when vector factor is 2; or vectorization variable is created when factor is 4. This works since we have only one main loop, and only one vector factor can be used. Test result checked for various x64 targets. Is it OK? >>> >>> I think that as you write the test is somewhat fragile. But rather >>> than adjusting the scanning like you do >>> I'd add --param vect-max-peeling-for-alignment=0 and -mprefer-avx128 >> In this way, is it better to add "--param >> vect-max-peeling-for-alignment=0" for all targets? Otherwise we still >> need to differentiate test string to handle different targets. But I >> have another question here: what if a target can't handle unaligned >> access and vectorizer have to peel for alignment for it? > > You'd get versioning for alignment instead. > >> Also do you think it's ok to check predictive commoning PHI node as below? >> # vectp_u.122__lsm0.158_94 = PHI >> In this way, we don't need to take possible prologue/epilogue loops >> into consideration. > > I hoped w/o peeling we can simply scan for "Executing predictive commoning". > But with versioning for alignment you'd still get two loops. > > So maybe checking for both "Executing predictive commoning" and looking > for a vect_lsm PHI node is ok... Understood. Here is the update patch. Test result checked on x86_64 and aarch64. Is it OK? Thanks, bin gcc/testsuite/ChangeLog 2016-11-15 Bin Cheng PR testsuite/78114 * gfortran.dg/vect/fast-math-mgrid-resid.f: Add additional options. Refine test by checking predictive commining PHI nodes in vectorized loop wrto vector factor. > >>> as additional option on x86_64-*-* i?86-*-*. >>> >>> Your new pattern would fail with avx512 if vector (8) real would be used. >>> >>> What's the actual change that made the testcase fail btw? >> There are two cases. >> A) After vect_do_peeling change, vectorizer may only peel one >> iteration for prologue loop (if vf == 2), below test string was added >> for this reason: >> ! { dg-final { scan-tree-dump-times "Loop iterates only 1 time, >> nothing to do" 1 "pcom" } } >> This fails on x86_64 solaris because prologue loop is not peeled at all. >> B) Depending on ilp, I think below test strings fail for long time with >> haswell: >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 1 "pcom" { target lp64 } } } >> ! { dg-final { scan-tree-dump-times "Executing predictive commoning >> without unrolling" 2 "pcom" { target ia32 } } } >> Because vectorizer choose vf==4 in this case, and there is no >> predictive commoning opportunities at all. > > Yes. I suggest -mprefer-avx128 for that. > >> Also the newly added test string fails in this case too because the >> prolog peeled iterates more than 1 times. >> >> Thanks, >> bin >>> >>> Richard. >>> Thanks, bin gcc/testsuite/ChangeLog 2016-11-16 Bin Cheng PR testsuite/78114 * gfortran.dg/vect/fast-math-mgrid-resid.f: Refine test by checking predictive commining variables in vectorized loop wrto vector factor. diff --git a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f index 88238f9..293cac9 100644 --- a/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f +++ b/gcc/testsuite/gfortran.dg/vect/fast-math-mgrid-resid.f @@ -1,7 +1,7 @@ ! { dg-do compile } ! { dg-require-effective-target vect_double } -! { dg-options "-O3 -fpredictive-commoning -fdump-tree-pcom-details" } - +! { dg-options "-O3 --param vect-max-peeling-for-alignment=0 -fpredictive-commoning -fdump-tree-pcom-details" } +! { dg-additional-options "-mprefer-avx128" { target { i?86-*-* x86_64-*-* } } } *** RESID COMPUTES THE RESIDUAL: R = V - AU * @@ -38,8 +38,8 @@ C RETURN END ! we want to check that predictive commoning did something on the -! vectorized loop. -! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling" 1 "pcom" { target lp64 } } } -! { dg-final { scan-tree-dump-times "Executing predictive commoning without unrolling"
Re: [patch,testsuite] Support dg-require-effective-target label_offsets.
Georg-Johann Lay writes: > On 04.11.2016 06:18, Senthil Kumar Selvaraj wrote: >> >> Georg-Johann Lay writes: >>> State of matters is that Binutils support is missing, and if I understand >>> you >>> correctly, dg-require is not appropriate to factor out availability of such >>> features? >> >> I'll take a stab at adding the missing binutils support for avr label diffs. > > Thanks for taking care of this. I had a look into it but got stuck with a > test > case derived from gcc.c-torture/execute/pr70460.c > > int c; > > __attribute__((noinline, noclone)) void > call (void) > { >__asm ("nop"); > } > > __attribute__((noinline, noclone)) void > foo (int x) > { >static int b[] = { &&lab1 - &&lab0, &&lab2 - &&lab0 }; >void *a = &&lab0 + b[x]; >goto *a; > lab1: >c += 2; >call(); > lab2: >c++; > lab0: >; > } > > int > main () > { >foo (0); >if (c != 3) > __builtin_abort (); >foo (1); >if (c != 4) > __builtin_abort (); >return 0; > } > > > The problem is when relaxation is turned on and the CALL is shortened to > RCALL > but respective literals are not fixed up. That linker issue is fixed with https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4cb771f214ed6a2102e37bce255c6be5d0642f3a Regards Senthil
[patch,avr,committed] More use of CONST_INT_P, SUBREG_P.
This is a no-op code clean-up that makes more use of SUBREG_P and CONST_INT_P if possible. Applied as obvious. Johann * config/avr/avr.c (avr_print_operand_address): Use CONST_INT_P if appropriate. (ashlqi3_out, ashlsi3_out, ashrqi3_out, ashrhi3_out): Same. (ashrsi3_out, lshrqi3_out, lshrhi3_out, lshrsi3_out): Same. (avr_rtx_costs_1, extra_constraint_Q): Same. (avr_address_cost): Use SUBREG_P if appropriate. Index: config/avr/avr.c === --- config/avr/avr.c (revision 242541) +++ config/avr/avr.c (working copy) @@ -2544,7 +2544,7 @@ avr_print_operand_address (FILE *file, m rtx x = addr; if (GET_CODE (x) == CONST) x = XEXP (x, 0); - if (GET_CODE (x) == PLUS && GET_CODE (XEXP (x,1)) == CONST_INT) + if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x,1))) { /* Assembler gs() will implant word address. Make offset a byte offset inside gs() for assembler. This is @@ -6083,7 +6083,7 @@ out_shift_with_cnt (const char *templ, r const char * ashlqi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; @@ -6180,7 +6180,7 @@ ashlqi3_out (rtx_insn *insn, rtx operand const char * ashlhi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL); int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]); @@ -6500,7 +6500,7 @@ avr_out_ashlpsi3 (rtx_insn *insn, rtx *o const char * ashlsi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; int *t = len; @@ -6589,7 +6589,7 @@ ashlsi3_out (rtx_insn *insn, rtx operand const char * ashrqi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; @@ -6661,7 +6661,7 @@ ashrqi3_out (rtx_insn *insn, rtx operand const char * ashrhi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL); int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]); @@ -6883,7 +6883,7 @@ avr_out_ashrpsi3 (rtx_insn *insn, rtx *o const char * ashrsi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; int *t = len; @@ -6980,7 +6980,7 @@ ashrsi3_out (rtx_insn *insn, rtx operand const char * lshrqi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; @@ -7075,7 +7075,7 @@ lshrqi3_out (rtx_insn *insn, rtx operand const char * lshrhi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int scratch = (GET_CODE (PATTERN (insn)) == PARALLEL); int ldi_ok = test_hard_reg_class (LD_REGS, operands[0]); @@ -7386,7 +7386,7 @@ avr_out_lshrpsi3 (rtx_insn *insn, rtx *o const char * lshrsi3_out (rtx_insn *insn, rtx operands[], int *len) { - if (GET_CODE (operands[2]) == CONST_INT) + if (CONST_INT_P (operands[2])) { int k; int *t = len; @@ -10549,7 +10549,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod return true; } *total = COSTS_N_INSNS (1); - if (GET_CODE (XEXP (x, 1)) != CONST_INT) + if (!CONST_INT_P (XEXP (x, 1))) *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, speed); break; @@ -10568,7 +10568,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod *total = COSTS_N_INSNS (1) + *total; return true; } - if (GET_CODE (XEXP (x, 1)) != CONST_INT) + if (!CONST_INT_P (XEXP (x, 1))) { *total = COSTS_N_INSNS (2); *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, @@ -10594,7 +10594,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod break; case SImode: - if (GET_CODE (XEXP (x, 1)) != CONST_INT) + if (!CONST_INT_P (XEXP (x, 1))) { *total = COSTS_N_INSNS (4); *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, @@ -10645,7 +10645,7 @@ avr_rtx_costs_1 (rtx x, machine_mode mod case IOR: *total = COSTS_N_INSNS (GET_MODE_SIZE (mode)); *total += avr_operand_rtx_cost (XEXP (x, 0), mode, code, 0, speed); - if (GET_CODE (XEXP (x, 1)) != CONST_INT) + if (!CONST_INT_P (XEXP (x, 1))) *total += avr_operand_rtx_cost (XEXP (x, 1), mode, code, 1, speed); return true; @@ -10813,7 +10813,7 @@ avr_rtx_costs_1 (rtx x, mac
RE: [PATCH 2/2] [ARC] Update target specific tests.
> These entries should be going into the gcc/testsuite/ChangeLog file, > and so don't need the "testsuite/" prefix. > Ups :) Fix it. > Otherwise I'm happy for this to be merged. I've only skimmed the > change, but assuming you've run the tests this all seems good. Results of running arc.exp on LE/arc700 default cpu should be: === gcc Summary === # of expected passes151 # of unsupported tests 2 Committed, thank you for your review, Claudiu
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
Toma Tabacu writes: > Hi, > > The patch below is a rebased version of Andrew's patch plus a few more changes > to test options. > > I have tested Andrew's patch by passing -msoft-float to the testsuite without > having a soft-float library available, and saw that the > inline-memcpy-{1,2,3,4,5}.c > and memcpy-1.c tests were also failing to find standard library headers. > In the version below, I have added (REQUIRES_STDLIB) to them as well. > > However, I believe that the memcpy-1.c test was removed from the first version > of Andrew's patch in response to Matthew's comments, but I don't think it > should be. > > Tested with mips-img-linux-gnu and mips-mti-linux-gnu. This looks OK to me but I then again I helped with the design for this. I'd like to give Catherine a chance to take a look though as the feature is unusual. One comment below. > diff --git a/gcc/testsuite/gcc.target/mips/mips.exp > b/gcc/testsuite/gcc.target/mips/mips.exp > index e22d782..ccd4ecb 100644 > --- a/gcc/testsuite/gcc.target/mips/mips.exp > +++ b/gcc/testsuite/gcc.target/mips/mips.exp > @@ -1420,6 +1426,22 @@ proc mips-dg-options { args } { > } > } > > +# If the test is marked as requiring standard libraries check > +# that the sysroot has support for the current set of test options. > +if { [mips_have_test_option_p options "REQUIRES_STDLIB"] } { > + mips_push_test_options saved_options $extra_tool_flags > + set output [mips_preprocess "" { > + #include > + } 1] > + mips_pop_test_options saved_options > + > + # If the preprocessing of the stdlib.h file produced errors mark > + # the test as unsupported. > + if { ![string equal $output ""] } { > + set do_what [lreplace $do_what 1 1 "N"] We should describe what we expect the format of do_what to be at the time of writing in case it ever changes. i.e. a comment to say what the second character means and therefore make it clear that setting it to 'n' makes the test unsupported. Thanks, Matthew
RE: [PATCH] [ARC] Add support for QuarkSE processor.
> > Looks fine. > Committed, thank you for your review, Claudiu
RE: [PATCH 4/4] [ARC] Fix compilation issue in pr71872.
> Looks fine. > > Thanks, > Andrew > Committed, thank you for your review, Claudiu
Re: [PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode
On 16 November 2016 at 10:39, Kyrill Tkachov wrote: > > On 09/11/16 16:19, Thomas Preudhomme wrote: >> >> Hi, >> >> This patch fixes the following ICE when building when compiling an empty >> FIQ interrupt handler in ARM mode: >> >> empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints: >> } >> ^ >> >> (insn/f 13 12 14 (set (reg/f:SI 13 sp) >> (plus:SI (reg/f:SI 11 fp) >> (const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3} >> (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp) >> (plus:SI (reg/f:SI 11 fp) >> (const_int 4 [0x4]))) >> (nil))) >> >> The ICE was provoked by missing an alternative to reflect that ARM mode >> can do an add of general register into sp which is unpredictable in Thumb >> mode add immediate. >> >> ChangeLog entries are as follow: >> >> *** gcc/ChangeLog *** >> >> 2016-11-04 Thomas Preud'homme >> >> * config/arm/arm.md (arm_addsi3): Add alternative for addition of >> general register with general register or ARM constant into SP >> register. >> >> >> *** gcc/testsuite/ChangeLog *** >> >> 2016-11-04 Thomas Preud'homme >> >> * gcc.target/arm/empty_fiq_handler.c: New test. >> >> >> Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression. >> >> Is this ok for trunk? >> > > I see that "r" does not include the stack pointer (STACK_REG is separate > from GENERAL_REGs) so we are indeed missing > that constraint. > > Ok for trunk. > Thanks, > Kyrill > >> Best regards, >> >> Thomas > > Hi Thomas, The new test fails when compiled with -mthumb -march=armv5t: gcc.target/arm/empty_fiq_handler.c: In function 'fiq_handler': gcc.target/arm/empty_fiq_handler.c:11:1: error: interrupt Service Routines cannot be coded in Thumb mode Christophe
RE: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads
> > Note on tests: It will be nice to add a test where the added > > peephole kicks in. If you consider to add this test to the current > > patch, please resubmit it. > > There were cmem-bit-{1,2,3,4}.c added in that patch. All of which > fail for me without the peephole, and work with the peephole. > > The code generated for L/E ARC is slightly different than the code > generated for B/E ARC due to how the structures are laid out in > memory, so, for now I've made parts of the test B/E only. > > In order to get code that is as efficient for L/E as B/E I'd end up > adding a whole new peeophole, I'd rather not do that - it would be > extra code to maintain for a combination CMEM+L/E that is not used. I > figure we can come back to that if/when that combination ever becomes > interesting. I'm hoping you'll be fine with that. > Sure, please go ahead and apply your patch after having the spaces converted ;) Claudiu
Re: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads
* Claudiu Zissulescu [2016-11-17 13:02:02 +]: > Hi Andrew, > > Approved, please apply, but ... > > > +(define_peephole2 > > + [(set (match_operand:SI 0 "register_operand" "") > > +(sign_extend:SI > > + (match_operand:QI 1 "any_mem_operand" ""))) > > + (set (reg:CC_ZN CC_REG) > > + (compare:CC_ZN (match_dup 0) > > + (const_int 0))) > > + (set (pc) > > +(if_then_else (match_operator 2 "ge_lt_comparison_operator" > > +[(reg:CC_ZN CC_REG) (const_int 0)]) > > + (match_operand 3 "" "") > > + (match_operand 4 "" "")))] > > + "TARGET_NPS_CMEM > > + && cmem_address (XEXP (operands[1], 0), SImode) > > + && peep2_reg_dead_p (2, operands[0]) > > + && peep2_regno_dead_p (3, CC_REG)" > > + [(set (match_dup 0) > > +(zero_extend:SI > > + (match_dup 1))) > > + (set (reg:CC_ZN CC_REG) > > + (compare:CC_ZN (zero_extract:SI > > + (match_dup 0) > > + (const_int 1) > > + (const_int 7)) > > + (const_int 0))) > > + (set (pc) > > +(if_then_else (match_dup 2) > > + (match_dup 3) > > + (match_dup 4)))] > > + "if (GET_CODE (operands[2]) == GE) > > + operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), > > const0_rtx); > > + else > > + operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), > > const0_rtx);") > > + > > It seems to me you use spaces instead of tabs. Ooops. I'll fix. > > Note on tests: It will be nice to add a test where the added > peephole kicks in. If you consider to add this test to the current > patch, please resubmit it. There were cmem-bit-{1,2,3,4}.c added in that patch. All of which fail for me without the peephole, and work with the peephole. The code generated for L/E ARC is slightly different than the code generated for B/E ARC due to how the structures are laid out in memory, so, for now I've made parts of the test B/E only. In order to get code that is as efficient for L/E as B/E I'd end up adding a whole new peeophole, I'd rather not do that - it would be extra code to maintain for a combination CMEM+L/E that is not used. I figure we can come back to that if/when that combination ever becomes interesting. I'm hoping you'll be fine with that. Thanks, Andrew
Import libcilkrts Build 4467 (PR target/68945)
I happened to notice that my libcilkrts SPARC port has been applied upstream. So to reach closure on this issue for the GCC 7 release, I'd like to import upstream into mainline which seems to be covered by the free-for-all clause in https://gcc.gnu.org/svnwrite.html#policies, even though https://gcc.gnu.org/codingconventions.html#upstream lists nothing specific and we have no listed maintainer. A few issues are worth mention: * Upstream still has a typo in the git URL in many files. I've corrected that during the import to avoid a massive diff: -# https://bitbucket.org/intelcilkruntime/itnel-cilk-runtime.git are +# https://bitbucket.org/intelcilkruntime/intel-cilk-runtime.git are * libcilkrts.spec.in is missing upstream. I've no idea if this is intentional. * A few of my changes have been lost and I can't tell if this is by accident: ** Lost whitespace: --- libcilkrts-old/Makefile.am 2016-05-04 16:44:24.0 + +++ libcilkrts-new/Makefile.am 2016-11-17 11:35:33.782987017 + @@ -54,7 +54,7 @@ GENERAL_FLAGS = -I$(top_srcdir)/include # Enable Intel Cilk Plus extension GENERAL_FLAGS += -fcilkplus -# Always generate unwind tables +#Always generate unwind tables GENERAL_FLAGS += -funwind-tables AM_CFLAGS = $(XCFLAGS) $(GENERAL_FLAGS) -std=c99 ** Lost alphabetical order of targets: diff -rup libcilkrts-old/configure.ac libcilkrts-new/configure.ac --- libcilkrts-old/configure.ac 2016-11-16 18:34:28.0 + +++ libcilkrts-new/configure.ac 2016-11-17 11:35:33.800015570 + @@ -143,14 +145,14 @@ esac # contains information on what's needed case "${target}" in - arm-*-*) -config_dir="arm" -;; - i?86-*-* | x86_64-*-*) config_dir="x86" ;; + arm-*-*) +config_dir="arm" +;; + sparc*-*-*) config_dir="sparc" ;; diff -rup libcilkrts-old/configure.tgt libcilkrts-new/configure.tgt --- libcilkrts-old/configure.tgt2016-11-16 18:34:28.0 + +++ libcilkrts-new/configure.tgt2016-11-17 11:35:33.807873451 + @@ -44,10 +44,10 @@ # Disable Cilk Runtime library for unsupported architectures. case "${target}" in - arm-*-*) -;; i?86-*-* | x86_64-*-*) ;; + arm-*-*) +;; sparc*-*-*) ;; *-*-*) I've done nothing about those, just wanted to point them out. The following patch has passed x86_64-pc-linux-gnu bootstrap without regressions; i386-pc-solaris2.12 and sparc-sun-solaris2.12 bootstraps are currently running. Ok for mainline if they pass? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University 2016-11-17 Rainer Orth PR target/68945 Merge from upstream, version 2.0.4467.0. Fix typo in git URL. * aclocal.m4, configure, Makefile.in: Regenerate. # HG changeset patch # Parent f20811cd2fde8357b5c51307decee8c877c1e16a Import libcilkrts Build 4467 (PR target/68945) diff --git a/libcilkrts/Makefile.am b/libcilkrts/Makefile.am --- a/libcilkrts/Makefile.am +++ b/libcilkrts/Makefile.am @@ -54,7 +54,7 @@ GENERAL_FLAGS = -I$(top_srcdir)/include # Enable Intel Cilk Plus extension GENERAL_FLAGS += -fcilkplus -# Always generate unwind tables +#Always generate unwind tables GENERAL_FLAGS += -funwind-tables AM_CFLAGS = $(XCFLAGS) $(GENERAL_FLAGS) -std=c99 diff --git a/libcilkrts/README b/libcilkrts/README --- a/libcilkrts/README +++ b/libcilkrts/README @@ -1,14 +1,16 @@ -Intel(R) Cilk(TM) Plus runtime library +Intel(R) Cilk(TM) Plus Runtime Library Index: -1. BUILDING -2. USING -3. DOXYGEN DOCUMENTATION -4. QUESTIONS OR BUGS -5. CONTRIBUTIONS +1. BUILDING WITH AUTOMAKE +2. BUILDING WITH CMAKE +3. INSTALLING TO VXWORKS +4. USING +5. DOXYGEN DOCUMENTATION +6. QUESTIONS OR BUGS +7. CONTRIBUTIONS # -# 1. BUILDING: +# 1. BUILDING WITH AUTOMAKE: # To distribute applications that use the Intel Cilk Plus language @@ -40,22 +42,87 @@ configure script: % ./configure --prefix=/your/path/to/lib -It is also possible to use CMake if the above method does not apply -well in your environment. Instruction is available in CMakeLists.txt. +# +# 2. BUILDING WITH CMAKE: +# + +To distribute applications that use the Intel Cilk Plus language +extensions to non-development systems, you need to build the Intel +Cilk Plus runtime library and distribute it with your application. +This instruction describes the build process using CMake*, which +supports Linux*, Windows*, and OS X*. It is fine to use this process +to build a Linux library, but it is highly recommended to use the +more mature build process described above when building on Linux. + +You need the CMake tool and a C/C++ compiler that supports the Intel +Cilk Plus language extensions, and the requirements for each operating +systems are: + +Common: +CMake 3.0.0 or later +Make tools such as make (Linux, OS X) or nmake (Windows) +Linux: +GCC* 4.9.2 or later, or Intel(R) C+
RE: [PATCH] arc/nps400: New peephole2 pattern allow more cmem loads
Hi Andrew, Approved, please apply, but ... > +(define_peephole2 > + [(set (match_operand:SI 0 "register_operand" "") > +(sign_extend:SI > + (match_operand:QI 1 "any_mem_operand" ""))) > + (set (reg:CC_ZN CC_REG) > + (compare:CC_ZN (match_dup 0) > + (const_int 0))) > + (set (pc) > +(if_then_else (match_operator 2 "ge_lt_comparison_operator" > +[(reg:CC_ZN CC_REG) (const_int 0)]) > + (match_operand 3 "" "") > + (match_operand 4 "" "")))] > + "TARGET_NPS_CMEM > + && cmem_address (XEXP (operands[1], 0), SImode) > + && peep2_reg_dead_p (2, operands[0]) > + && peep2_regno_dead_p (3, CC_REG)" > + [(set (match_dup 0) > +(zero_extend:SI > + (match_dup 1))) > + (set (reg:CC_ZN CC_REG) > + (compare:CC_ZN (zero_extract:SI > + (match_dup 0) > + (const_int 1) > + (const_int 7)) > + (const_int 0))) > + (set (pc) > +(if_then_else (match_dup 2) > + (match_dup 3) > + (match_dup 4)))] > + "if (GET_CODE (operands[2]) == GE) > + operands[2] = gen_rtx_EQ (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), > const0_rtx); > + else > + operands[2] = gen_rtx_NE (VOIDmode, gen_rtx_REG (CC_ZNmode, 61), > const0_rtx);") > + It seems to me you use spaces instead of tabs. Note on tests: It will be nice to add a test where the added peephole kicks in. If you consider to add this test to the current patch, please resubmit it. Best, Claudiu
RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard library support check the sysroot supports the required test options.
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Andrew Bennett > Sent: 03 November 2016 11:33 > To: Matthew Fortune; 'Moore, Catherine'; 'gcc-patches@gcc.gnu.org' > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard > library support check the sysroot supports the required test options. > > Ping. > > > Regards, > > > > Andrew > > > -Original Message- > > From: Andrew Bennett > > Sent: 28 August 2015 16:50 > > To: Matthew Fortune; Moore, Catherine; gcc-patches@gcc.gnu.org > > Subject: RE: [PATCH] MIPS: If a test in the MIPS testsuite requires standard > > library support check the sysroot supports the required test options. > > > > > I had some comments on this that I hadn't got round to posting. The fix in > > > this patch is not general enough as the missing header problem comes in > > > two (related) forms: > > > > > > 1) Using the new MTI and IMG sysroot layout we can end up with GCC > looking > > >for headers in a sysroot that simply does not exist. The current patch > > >handles this. > > > 2) Using any sysroot layout (i.e. a simple mips-linux-gnu) it is possible > > >for the stdlib.h header to be found but the ABI dependent gnu-stubs > > >header may not be installed depending on soft/hard nan1985/nan2008. > > > > > > The test for stdlib.h needs to therefore verify that preprocessing > > > succeeds > > > rather than just testing for an error relating to stdlib.h. This could be > > > done by adding a further option to mips_preprocess to indicate the > processor > > > output should go to a file and that the caller wants the messages emitted > > > by the compiler instead. > > > > > > A second issue is that you have added (REQUIRES_STDLIB) to too many > tests. > > > You only need to add it to tests that request a compiler option (via > > > dg-options) that could potentially lead to forcing soft/hard > > > nan1985/nan2008 > > > directly or indirectly. So -mips32r6 implies nan2008 so you need it - > > mips32r5 > > > implies nan1985 so you need it. There are at least two tests which don't > > > need the option but you need to check them all so we don't run the check > > > needlessly. > > > > The updated patch and ChangeLog that addresses Matthew's comments is > below. > > > > Ok to commit? > > > > Regards, > > > > > > Andrew > > > > > > testsuite/ > > > > * gcc.target/mips/loongson-simd.c (dg-options): Add > > (REQUIRES_STDLIB). > > * gcc.target/mips/loongson-shift-count-truncated-1.c: Ditto > > * gcc.target/mips/mips-3d-1.c: Ditto > > * gcc.target/mips/mips-3d-2.c: Ditto > > * gcc.target/mips/mips-3d-3.c: Ditto > > * gcc.target/mips/mips-3d-4.c: Ditto > > * gcc.target/mips/mips-3d-5.c: Ditto > > * gcc.target/mips/mips-3d-6.c: Ditto > > * gcc.target/mips/mips-3d-7.c: Ditto > > * gcc.target/mips/mips-3d-8.c: Ditto > > * gcc.target/mips/mips-3d-9.c: Ditto > > * gcc.target/mips/mips-ps-1.c: Ditto > > * gcc.target/mips/mips-ps-2.c: Ditto > > * gcc.target/mips/mips-ps-3.c: Ditto > > * gcc.target/mips/mips-ps-4.c: Ditto > > * gcc.target/mips/mips-ps-6.c: Ditto > > * gcc.target/mips/mips16-attributes.c: Ditto > > * gcc.target/mips/mips32-dsp-run.c: Ditto > > * gcc.target/mips/mips32-dsp.c: Ditto > > * gcc.target/mips/save-restore-1.c: Ditto > > * gcc.target/mips/mips.exp (mips_option_groups): Add stdlib. > > (mips_preprocess): Add ignore_output argument that when set > > will not return the pre-processed output. > > (mips_arch_info): Update arguments for the call to > > mips_preprocess. > > (mips-dg-init): Ditto. > > (mips-dg-options): Check if a test having test option > > (REQUIRES_STDLIB) has the required sysroot support for > > the current test options. > > > > > > > > diff --git > > a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c > > b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c > > index f57a18c..baed48c 100644 > > --- a/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c > > +++ b/gcc/testsuite/gcc.target/mips/loongson-shift-count-truncated-1.c > > @@ -4,7 +4,7 @@ > > /* loongson.h does not handle or check for MIPS16ness. There doesn't > > seem any good reason for it to, given that the Loongson processors > > do not support MIPS16. */ > > -/* { dg-options "isa=loongson -mhard-float -mno-mips16" } */ > > +/* { dg-options "isa=loongson -mhard-float -mno-mips16 > (REQUIRES_STDLIB)" } > > */ > > /* See PR 52155. */ > > /* { dg-options "isa=loongson -mhard-float -mno-mips16 -mlong64" { mips*- > *- > > elf* && ilp32 } } */ > > > > diff --git a/gcc/testsuite/gcc.target/mips/loongson-simd.c > > b/gcc/testsuite/gcc.target/mips/loongson-simd.c > > index 6d2ceb6..f263b43 100644 > > --- a/gcc/testsuite/gcc.target/mips/loongson-simd.c > > +++ b/gcc/testsuite/gcc.target/mips/loongson-simd.c > > @@ -26,7 +26,7 @@ along with GCC; see the
RE: [PATCH 1/4] [ARC] Various fixes.
Hi Andrew, Please ignore this patch for the time being. I would like to double check the add_n/sub_n patterns. Sorry for this detour, Claudiu > -Original Message- > From: Claudiu Zissulescu > Sent: Wednesday, November 16, 2016 11:18 AM > To: gcc-patches@gcc.gnu.org > Cc: Claudiu Zissulescu ; > francois.bed...@synopsys.com; andrew.burg...@embecosm.com > Subject: [PATCH 1/4] [ARC] Various fixes. > > The ifconversion was failing because a move involving the lp_count was > not match by movsi_ne. This patch updates the constraints such that > movsi_ne will match. The failing test is dg-torture.exp=pr68955.c for > archs and without small data. > > gcc/ > 2016-07-11 Claudiu Zissulescu > > * config/arc/arc.h (REG_CLASS_NAMES): Reorder. > * config/arc/arc.md (*add_n): Change. > (*sub_n): Likewise. > (movsi_ne): Update constraint to Rac. > --- > gcc/config/arc/arc.h | 2 +- > gcc/config/arc/arc.md | 18 +- > 2 files changed, 10 insertions(+), 10 deletions(-) > > diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h > index cd8b9f1..642bf73 100644 > --- a/gcc/config/arc/arc.h > +++ b/gcc/config/arc/arc.h > @@ -617,10 +617,10 @@ enum reg_class >"MPY_WRITABLE_CORE_REGS", \ >"WRITABLE_CORE_REGS", \ >"CHEAP_CORE_REGS", \ > + "ALL_CORE_REGS", \ >"R0R3_CD_REGS", \ >"R0R1_CD_REGS", \ >"AC16_H_REGS", \ > - "ALL_CORE_REGS", \ >"ALL_REGS" \ > } > > diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md > index 5a7699f..217935c 100644 > --- a/gcc/config/arc/arc.md > +++ b/gcc/config/arc/arc.md > @@ -2897,10 +2897,10 @@ > (set (match_dup 3) (match_dup 4))]) > > (define_insn "*add_n" > - [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w") > - (plus:SI (mult:SI (match_operand:SI 1 "register_operand" > "Rcqq,c,c,c,c,c") > - (match_operand:SI 2 "_2_4_8_operand" "")) > - (match_operand:SI 3 "nonmemory_operand" > "0,0,c,?Cal,?c,??Cal")))] > + [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W, > W,w,w") > + (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "Rcqq, c,c, > c,c,c") > + (match_operand:SI 2 "_2_4_8_operand" "")) > + (match_operand:SI 3 "nonmemory_operand""0, > 0,c,Cal,c,Cal")))] >"" >"add%z2%? %0,%3,%1%&" >[(set_attr "type" "shift") > @@ -2912,9 +2912,9 @@ > ;; N.B. sub[123] has the operands of the MINUS in the opposite order from > ;; what synth_mult likes. > (define_insn "*sub_n" > - [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w") > - (minus:SI (match_operand:SI 1 "nonmemory_operand" "0,c,?Cal") > - (mult:SI (match_operand:SI 2 "register_operand" "c,c,c") > + [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w") > + (minus:SI (match_operand:SI 1 "nonmemory_operand" > "0,c,?Cal") > + (mult:SI (match_operand:SI 2 "register_operand" "w,w,w") > (match_operand:SI 3 "_2_4_8_operand" ""] >"" >"sub%z3%? %0,%1,%2" > @@ -3570,8 +3570,8 @@ > (define_insn "*movsi_ne" >[(cond_exec > (ne (match_operand:CC_Z 2 "cc_use_register""Rcc, Rcc, > Rcc,Rcc,Rcc") > (const_int 0)) > - (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q, > w,w") > - (match_operand:SI 1 "nonmemory_operand" "C_0,h, ?Cal, > Lc,?Cal")))] > + (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q, > w,w") > + (match_operand:SI 1 "nonmemory_operand" "C_0,h, > ?Cal,LRac,?Cal")))] >"" >"@ > * current_insn_predicate = 0; return \"sub%?.ne %0,%0,%0%&\"; > -- > 1.9.1
[PATCH] Remove -ftree-loop-if-convert-stores
This removes a no longer used option, whether we if-convert stores now depends on --param allow-store-data-races and/or the availability of masked stores. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2016-11-17 Richard Biener * common.opt (ftree-loop-if-convert-stores): Mark as preserved for backward compatibility. * doc/invoke.texi (ftree-loop-if-convert-stores): Remove. * tree-if-conv.c (pass_if_conversion::gate): Do not test flag_tree_loop_if_convert_stores. (pass_if_conversion::execute): Likewise. diff --git a/gcc/common.opt b/gcc/common.opt index 5e8d72d..1fa3629 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1526,8 +1526,8 @@ Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization Convert conditional jumps in innermost loops to branchless equivalents. ftree-loop-if-convert-stores -Common Report Var(flag_tree_loop_if_convert_stores) Optimization -Also if-convert conditional jumps containing memory writes. +Common Ignore +Does nothing. Preserved for backward compatibility. ; -finhibit-size-directive inhibits output of .size for ELF. ; This is used only for compiling crtstuff.c, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 620225c..e6c3dc2 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting -ftree-loop-if-convert @gol --ftree-loop-if-convert-stores -ftree-loop-im @gol +-ftree-loop-im @gol -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol -ftree-loop-vectorize @gol @@ -8061,24 +8061,6 @@ the innermost loops in order to improve the ability of the vectorization pass to handle these loops. This is enabled by default if vectorization is enabled. -@item -ftree-loop-if-convert-stores -@opindex ftree-loop-if-convert-stores -Attempt to also if-convert conditional jumps containing memory writes. -This transformation can be unsafe for multi-threaded programs as it -transforms conditional memory writes into unconditional memory writes. -For example, -@smallexample -for (i = 0; i < N; i++) - if (cond) -A[i] = expr; -@end smallexample -is transformed to -@smallexample -for (i = 0; i < N; i++) - A[i] = cond ? expr : A[i]; -@end smallexample -potentially producing data races. - @item -ftree-loop-distribution @opindex ftree-loop-distribution Perform loop distribution. This flag can improve cache performance on diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index 46d6b34..dc97fc4 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -2872,8 +2872,7 @@ pass_if_conversion::gate (function *fun) { return (((flag_tree_loop_vectorize || fun->has_force_vectorize_loops) && flag_tree_loop_if_convert != 0) - || flag_tree_loop_if_convert == 1 - || flag_tree_loop_if_convert_stores == 1); + || flag_tree_loop_if_convert == 1); } unsigned int @@ -2887,7 +2886,6 @@ pass_if_conversion::execute (function *fun) FOR_EACH_LOOP (loop, 0) if (flag_tree_loop_if_convert == 1 - || flag_tree_loop_if_convert_stores == 1 || ((flag_tree_loop_vectorize || loop->force_vectorize) && !loop->dont_vectorize)) todo |= tree_if_conversion (loop);
Re: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable
On Tue, Nov 01, 2016 at 03:21:29PM +, Kyrill Tkachov wrote: > Here it is. > I've confirmed that it emits to STRs for 4 byte aligned stores when > -mtune=thunderx > and still generates STP for the other tunings, though now sched-fusion is > responsible for > merging them, which is ok by me. > > Bootstrapped and tested on aarch64. > Ok for trunk? OK. Thanks, James > 2016-11-01 Kyrylo Tkachov > > * config/aarch64/aarch64.md (mov): Call > aarch64_split_dimode_const_store on DImode constant stores. > * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): > New prototype. > * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New > function. > > 2016-11-01 Kyrylo Tkachov > > * gcc.target/aarch64/store_repeating_constant_1.c: New test. > * gcc.target/aarch64/store_repeating_constant_2.c: Likewise.
[PATCH] Fix PR78383 middle-end ICE
The following fixes the ICE part of PR78383. We ICE when trying to turn a computed goto to a CFG edge when we know the label but that label is in another function (IPA propagation exposed this knowledge). The transform shows that we miss a DECL_NONLOCAL on the label because we happily miscompile the testcase by moving the label freely before the call to the lambda. This results in _ZZ4mainENKUlPvE_clES_.isra.0.constprop.1: main: call_ZZ4mainENKUlPvE_clES_.isra.0.constprop.1 where you can see the goto has been optimized somehow on RTL as well. Final GIMPLE is main():: () { : goto &label; } int main() () { label: main()_ZZ4mainENKUlPvE_clES_.isra.0.constprop (); } and initial RTL is already broken: ;; goto &label; (insn 5 4 6 (set (reg:DI 87) (label_ref/v:DI 0)) "t.ii":3 -1 (nil)) (jump_insn 6 5 7 (set (pc) (reg:DI 87)) "t.ii":3 -1 (nil)) looks like it might be a bad idea to freely allow LABEL_DECL address propagation but in the end it works fine for DECL_NONLOCAL labels so I think this is what we miss (the label is not marked DECL_NONLOCAL). Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Will commit w/o the testcase (because that is still miscompiled). Richard. 2016-11-17 Richard Biener PR middle-end/78383 * tree-cfgcleanup.c (cleanup_control_flow_bb): Do not turn non-local goto into CFG. diff --git a/gcc/testsuite/g++.dg/torture/pr78383.C b/gcc/testsuite/g++.dg/torture/pr78383.C new file mode 100644 index 000..2b35bc7 --- /dev/null +++ b/gcc/testsuite/g++.dg/torture/pr78383.C @@ -0,0 +1,11 @@ +// { dg-do run } + +int main() +{ + auto f = [](void* ptr) { goto *ptr; }; + + f(&&label); + +label: + return 0; +} diff --git a/gcc/tree-cfgcleanup.c b/gcc/tree-cfgcleanup.c index 21873f8..fe22ed3 100644 --- a/gcc/tree-cfgcleanup.c +++ b/gcc/tree-cfgcleanup.c @@ -230,6 +230,8 @@ cleanup_control_flow_bb (basic_block bb, bool first_p) edges which do not go to the right block. For the one edge which goes to the right block, fix up its flags. */ label = TREE_OPERAND (gimple_goto_dest (stmt), 0); + if (DECL_CONTEXT (label) != cfun->decl) + return retval; target_block = label_to_block (label); for (ei = ei_start (bb->succs); (e = ei_safe_edge (ei)); ) {
[C++ Patch] PR 55080 (" -pedantic produces error: floating-point literal cannot appear in a constant-expression")
Hi, this rather old issue points out that, for the snippet below in the legacy -std=c++98 and -pedantic, we issue an hard error instead of a pedwarn - what we normally do, as clarified by Manuel. I guess we could imagine further clean-ups in this area but something as simple as the below seems correct to me and pretty conservative. Tested x86_64-linux. Thanks, Paolo. /// /cp 2016-11-17 Paolo Carlini PR c++/55080 * parser.c (cp_parser_primary_expression): Issue a pedwarn instead of an error floating-point literals in a constant-expression. /testsuite 2016-11-17 Paolo Carlini PR c++/55080 * g++.dg/parse/pr55080.C: New. Index: cp/parser.c === --- cp/parser.c (revision 242540) +++ cp/parser.c (working copy) @@ -4840,8 +4840,10 @@ cp_parser_primary_expression (cp_parser *parser, cast is to an integral or enumeration type will be checked at that point. If we are not within a cast, then this code is invalid. */ - if (!cast_p) - cp_parser_non_integral_constant_expression (parser, NIC_FLOAT); + if (!cast_p && !parser->allow_non_integral_constant_expression_p) + pedwarn (input_location, OPT_Wpedantic, +"ISO C++ forbids using a floating-point literal in a " +"constant-expression"); } return cp_expr (token->u.value, token->location); Index: testsuite/g++.dg/parse/pr55080.C === --- testsuite/g++.dg/parse/pr55080.C(revision 0) +++ testsuite/g++.dg/parse/pr55080.C(working copy) @@ -0,0 +1,6 @@ +// PR c++/55080 +// { dg-options "-std=c++98 -pedantic" } + +class B { + static const int c = 3.1415926; // { dg-warning "constant-expression" } +};