[PATCH, cleanup] Remove PowerPC -mupper-regs-* options
The -mupper-regs-{df,di,sf} debug options that I added in previoius versions of GCC to control whether DFmode, DImode, or SFmode could go in the traditional Altivec registers has outlived its usefulness. This patch deletes the options. I fixed up the various tests that used the option, and deleted several tests that make no sense now that the options don't exist. I have checked this with bootstrap builds and make check on a little endian power8 system and a big endian power7 system. There were no regressions in any of the tests. One thing that I did was continue to define __UPPER_REGS_{DF,SF,DI}__ that were previously defined. I can delete them if desired and perhaps poison the names so that any use if flaged. Future patches will include removal of the TARGET_UPPER_REGS_* macros in the various files. I also plan to remove -mvsx-small-integer, and the various -mpower9-dform* options. Is this ok for trunk? At present, I do not intend to back port this to GCC 7 (but if the maintainers want that, I can do it). [gcc] 2017-07-22 Michael Meissner * config/rs6000/rs6000-cpus.def (ISA_2_6_MASKS_SERVER): Delete upper-regs options. (ISA_2_7_MASKS_SERVER): Likewise. (ISA_3_0_MASKS_IEEE): Likewise. (OTHER_P8_VECTOR_MASKS): Likewise. (OTHER_VSX_VECTOR_MASKS): Likewise. (POWERPC_MASKS): Likewise. (power7 cpu): Use ISA_2_6_MASKS_SERVER instead of using a duplicate list of options. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Remove explicit -mupper-regs options. Define __UPPER_REGS_*__ based on VSX or P8_VECTOR. * config/rs6000/rs6000.opt (-mvsx-scalar-memory): Delete -mupper-regs* options. Delete -mvsx-scalar-memory, which was an alias for -mupper-regs-df. * config/rs6000/rs6000.c (rs6000_setup_reg_addr_masks): Likewise. (rs6000_init_hard_regno_mode_ok): Likewise. (rs6000_option_override_internal): Likewise. (rs6000_opt_masks): Likewise. * config/rs6000/rs6000.h (TARGET_UPPER_REGS_DF): Define upper regs options in terms of whether -mvsx or -mpower8-vector was used. (TARGET_UPPER_REGS_DI): Likewise. (TARGET_UPPER_REGS_SF): Likewise. * doc/invoke.texi (RS/6000 and PowerPC Options): Delete the -mupper-regs-* options. [gcc/testsuite] 2017-07-22 Michael Meissner * gcc.target/powerpc/pr65849-1.c: Delete, test no longer valid since the upper-regs options have been deleted. * gcc.target/powerpc/pr65849-2.c: Likewise. * gcc.target/powerpc/pr80099-1.c: Likewise. * gcc.target/powerpc/pr80099-2.c: Likewise. * gcc.target/powerpc/pr80099-3.c: Likewise. * gcc.target/powerpc/pr80099-4.c: Likewise. * gcc.target/powerpc/pr80099-5.c: Likewise. * gcc.target/powerpc/builtins-2-p9-runnable.c: Update test to support removal of the upper-regs options. * gcc.target/powerpc/p8vector-fp.c: Likewise. * gcc.target/powerpc/p8vector-ldst.c: Likewise. * gcc.target/powerpc/p9-dimode1.c: Likewise. * gcc.target/powerpc/p9-dimode2.c: Likewise. * gcc.target/powerpc/ppc-fpconv-1.c: Likewise. * gcc.target/powerpc/ppc-fpconv-10.c: Likewise. * gcc.target/powerpc/ppc-fpconv-5.c: Likewise. * gcc.target/powerpc/ppc-fpconv-9.c: Likewise. * gcc.target/powerpc/ppc-round.c: Likewise. * gcc.target/powerpc/pr71720.c: Likewise. * gcc.target/powerpc/pr72853.c: Likewise. * gcc.target/powerpc/pr79907.c: Likewise. * gcc.target/powerpc/pr78953.c: Likewise. * gcc.target/powerpc/upper-regs-df.c: Likewise. * gcc.target/powerpc/upper-regs-sf.c: Likewise. * gcc.target/powerpc/vec-extract-1.c: Likewise. * gcc.target/powerpc/vec-init-3.c: Likewise. * gcc.target/powerpc/vec-init-6.c: Likewise. * gcc.target/powerpc/vec-init-7.c: Likewise. * gcc.target/powerpc/vec-set-char.c: Likewise. * gcc.target/powerpc/vec-set-int.c: Likewise. * gcc.target/powerpc/vec-set-short.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/rs6000-cpus.def === --- gcc/config/rs6000/rs6000-cpus.def (.../svn+ssh://meiss...@gcc.gnu.org/svn/gcc/trunk/gcc/config/rs6000) (revision 250405) +++ gcc/config/rs6000/rs6000-cpus.def (.../gcc/config/rs6000) (working copy) @@ -44,9 +44,7 @@ #define ISA_2_6_MASKS_SERVER (ISA_2_5_MASKS_SERVER \ | OPTION_MASK_POPCNTD \ | OPTION_MASK_ALTIVEC \ -| OPTION_MASK_VSX \ -| OPTION_MASK_UPPER_RE
Re: Deprecate DBX/stabs?
On 07/21/2017 01:07 PM, Nathan Sidwell wrote: > [darwin, cygwin, rx maintainers, you might have an opinion] > Let's at least deprecate it. I attach a patch to do so. With the > patch, you'll get a note about dbx being deprecated whenever you use > stabs debugging on a system that prefers stabs (thus both -g and -gstabs > will warn). On systems where stabs is not preferred, -gstabs will not > give you a warning. The patch survices an x86_64-linux bootstrap. > > A config can chose to override thus by defining 'DBX_DEBUG_OK' in the > build defines. > > I did try build time CPP tricks, but config/rx.h and > config/i386/darwin.h define it to be a conditional expression. > > AFAICT, the following include files are not used, and could probably be > binned too: > config/dbx.h > config/dbxcoff.h > config/dbxelf.h > (+ configi386/gstabs.h Jim found) > > It looks like DBX is the default for: > i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC > i386/darwin.h for 32-bit target > rx/rx.h when using AS100 syntax > > nathan Cygwin GCC has been using --with-dwarf2 for sometime now, so it shouldn't be affected. signature.asc Description: OpenPGP digital signature
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On Fri, 2017-07-21 at 19:58 +0200, Volker Reichelt wrote: > On 21 Jul, David Malcolm wrote: > > On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote: > >> Hi, > >> > >> the following patch introduces a new C++ warning option > >> -Wduplicated-access-specifiers that warns about redundant > >> access-specifiers in classes, e.g. > >> > >> class B > >> { > >> public: > >> B(); > >> > >> private: > >> void foo(); > >> private: > >> int i; > >> }; > >> > >> test.cc:8:5: warning: duplicate 'private' access-specifier [ > >> -Wduplicated-access-specifiers] > >> private: > >> ^~~ > >> --- > >> test.cc:6:5: note: access-specifier was previously given here > >> private: > >> ^~~ > > > > Thanks for working on this. > > > > I'm not able to formally review, but you did CC me; various > comments below throughout. > > > >> The test is implemented by tracking the location of the last > >> access-specifier together with the access-specifier itself. > >> The location serves two purposes: the obvious one is to be able to > >> print the location in the note that comes with the warning. > >> The second purpose is to be able to distinguish an access > -specifier > >> given by the user from the default of the class type (i.e. public > for > >> 'struct' and private for 'class') where the location is set to > >> UNKNOWN_LOCATION. The warning is only issued if the user gives the > >> access-specifier twice, i.e. if the previous location is not > >> UNKNOWN_LOCATION. > > > > Presumably given > > > > struct foo > > { > > public: > > /* ... * > > }; > > > > we could issue something like: > > > > warning: access-specifier 'public' is redundant within 'struct' > > > > for the first; similarly for: > > > > class bar > > { > > private: > > }; > > > > we could issue: > > > > warning: access-specifier 'private' is redundant within 'class' > > > > > >> One could actually make this a two-level warning so that on the > >> higher level also the default class-type settings are taken into > >> account. Would that be helpful? I could prepare a second patch for > >> that. > > > > I'm not sure how best to structure it. > > > > FWIW, when I first saw the patch, I wasn't a big fan of the > warning, as I like to use access-specifiers to break up the kinds of > entities within a class. > > > > For example, our coding guidelines > > https://gcc.gnu.org/codingconventions.html#Class_Form > > recommend: > > > > "first define all public types, > > then define all non-public types, > > then declare all public constructors, > > ... > > then declare all non-public member functions, and > > then declare all non-public member variables." > > > > I find it's useful to put a redundant "private:" between the last > two, > > e.g.: > > > > class baz > > { > > public: > > ... > > > > private: > > void some_private_member (); > > > > private: > > int m_some_field; > > }; > > > > to provide a subdivision between the private member functions and > the > > private data fields. > > That's what also can be seen in our libstdc++ to some extent. > The other half of the warnings indicate redundant access-specifiers. > > It's up to the user to keep those duplicate access-specifiers as > subdividers or to use something else (like comments) to do that > and to switch on the warning for her/his code. > Because the subdivider usage seems to be relatively common, > I don't want to enable the warning by -Wall or -Wextra. > > > This might be a violation of our guidelines (though if so, I'm not > sure > > it's explicitly stated), but I find it useful, and the patch would > warn > > about it. > > > > Having said that, looking at e.g. the "jit" subdir, I see lots of > > places where the warning would fire. In some of these, the code > has a > > bit of a "smell", so maybe I like the warning after all... in that > it > > can be good for a new warning to draw attention to code that might > need > > work. > > > > Sorry that this is rambling; comments on the patch inline below. > > > > Bootstrapped and regtested on x86_64-pc-linux-gnu. > >> OK for trunk? > >> > >> Btw, there are about 50 places in our libstdc++ headers where this > >> warning triggers. I'll come up with a patch for this later. > >> > >> Regards, > >> Volker > >> > >> > >> 2017-07-20 Volker Reichelt > >> > >> * doc/invoke.texi (-Wduplicated-access-specifiers): > Document > >> new > >> warning option. > >> > >> Index: gcc/doc/invoke.texi > >> > === > >> --- gcc/doc/invoke.texi (revision 250356) > >> +++ gcc/doc/invoke.texi (working copy) > >> @@ -275,7 +275,7 @@ > >> -Wdisabled-optimization @gol > >> -Wno-discarded-qualifiers -Wno-discarded-array-qualifiers @gol > >> -Wno-div-by-zero -Wdouble-promotion @gol > >> --Wduplicated-branches -Wduplicated-cond @gol > >> +-Wduplicated-access-specifiers -Wduplicated-bran
Re: libgo patch committed: Call f?statfs64 on GNU/Linux
On Fri, Jul 21, 2017 at 1:58 PM, Andreas Schwab wrote: > On Jul 21 2017, Ian Lance Taylor wrote: > >> In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in >> configure.ac, so we should unconditionally call the statfs64 and >> fstatfs64 functions rather than statfs/fstatfs. > > That should happen automatically when building with > _FILE_OFFSET_BITS=64. For C code, yes, because of #include magic. But these names are being generated directly by the Go frontend. >> These functions should be available on all versions of GNU/Linux >> since 2.6. > > Its a property of glibc, not the kernel. OK, but, in any case, a long time. > Wrong patch? Yes, sorry, not sure what happened. Correct patch attached. Ian commit e1bd9ea4dc16e228164c92a12c5229ddf20f2b50 Author: Ian Lance Taylor Date: Fri Jul 21 11:51:58 2017 -0700 syscall: call f?statfs64 on GNU/Linux We unconditionally set _FILE_OFFSET_BITS to 64 in configure.ac, so we should unconditionally call the statfs64 and fstatfs64 functions. These functions should be available on all versions of GNU/Linux since 2.6. On 64-bit systems they are aliased to statfs/fstatfs, and on 32-bit systems they use the 64-bit data structures. Fixes golang/go#20922 Change-Id: Ibcd1e53b03f3c8db91a3cd8825d06bcf8f43f443 Reviewed-on: https://go-review.googlesource.com/50635 Reviewed-by: Than McIntosh diff --git a/libgo/go/syscall/libcall_linux.go b/libgo/go/syscall/libcall_linux.go index b58b2ddd..5f477840 100644 --- a/libgo/go/syscall/libcall_linux.go +++ b/libgo/go/syscall/libcall_linux.go @@ -212,7 +212,7 @@ func Accept4(fd int, flags int) (nfd int, sa Sockaddr, err error) { //flock(fd _C_int, how _C_int) _C_int //sys Fstatfs(fd int, buf *Statfs_t) (err error) -//fstatfs(fd _C_int, buf *Statfs_t) _C_int +//fstatfs64(fd _C_int, buf *Statfs_t) _C_int func Gettid() (tid int) { r1, _, _ := Syscall(SYS_GETTID, 0, 0, 0) @@ -360,7 +360,7 @@ func Splice(rfd int, roff *int64, wfd int, woff *int64, len int, flags int) (n i } //sys Statfs(path string, buf *Statfs_t) (err error) -//statfs(path *byte, buf *Statfs_t) _C_int +//statfs64(path *byte, buf *Statfs_t) _C_int //sys SyncFileRange(fd int, off int64, n int64, flags int) (err error) //sync_file_range(fd _C_int, off Offset_t, n Offset_t, flags _C_uint) _C_int
Re: [PATCH,rs6000] Fine-tune vec_construct direct move cost
Hi Bill, On Fri, Jul 21, 2017 at 10:40:43AM -0500, Bill Schmidt wrote: > In https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00924.html, I raised the > vectorization cost for a vec_construct operation that requires direct > moves between GPRs and VSRs. The cost equation I substituted has since > proven to be slightly more conservative than attended, and we're seeing > some cases of SLP vectorization being avoided that should not be. This > patch adjusts the equation to reduce the cost somewhat. > > I've tested this to ensure the cases previously seen are now being > vectorized again, and done some benchmark testing that shows no measurable > result, positive or negative. So this is just minor fine-tuning, but > still important to get right. > > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. > Is this ok for trunk? Sure, thanks! Segher
Re: [PATCH] scheduler bug fix for AArch64 insn fusing SCHED_GROUP usage
Ping. https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00779.html On Thu, Jul 13, 2017 at 3:00 PM, Jim Wilson wrote: > The AArch64 port uses SCHED_GROUP to mark instructions that get fused > at issue time, to ensure that they will be issued together. However, > in the scheduler, use of a SCHED_GROUP forces all other instructions > to issue in the next cycle. This is wrong for AArch64 ports using > insn fusing which can issue multiple insns per cycle, as aarch64 > SCHED_GROUP insns can all issue in the same cycle, and other insns can > issue in the same cycle also. > > I put a testcase and some info in bug 81434. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81434 > > The attached patch fixes the problem. The behavior in pass == 0 is > same as now. All non sched group insns are ignored, and all sched > group insns are checked to see if they need to be queued for a latter > cycle. The difference is in the second pass where non sched group > insns are queued for a latter cycle only if there is a sched group > insn that got queued. Since sched group insns always sort to the top > of the list of insns to schedule, all sched group insns still get > scheduled together as before. > > This has been tested with an Aarch64 bootstrap and make check. > > OK? > > Jim
Re: [PATCH/AARCH64] Decimal floating point support for AARCH64
On 7/21/17 4:51 PM, Andrew Pinski wrote: > So right now how TFmode is handled on AARCH64 not as a pair of 64bit > registers but rather one 128bit registers (qN registrer; the floating > point register and the SIMD register set on AARCH64 overlap already). > So handling TDmode like TFmode is more natural for AARCH64 than most > arch. that the DFP hw support for _Decimal128 on AARCH64 would take > the values in the qN register rather than a pair of registers. Ah, lucky you! Then nevermind. :-) Peter
Re: [PATCH/AARCH64] Decimal floating point support for AARCH64
On Fri, Jul 21, 2017 at 2:45 PM, Peter Bergner wrote: > On 7/13/17 7:12 PM, Andrew Pinski wrote: >> This patch adds Decimal floating point support to aarch64. It is >> the base support in that since there is no hardware support for DFP, >> it just defines the ABI. The ABI I chose is that _Decimal32 is >> treated like float, _Decimal64 is treated like double and _Decimal128 >> is treated like long double. In that they are passed via the floating >> registers (sN, dN, qN). >> Is this ok an ABI? > > It depends on whether AARCH ever plans on implementing HW DFP. > On POWER, we handle things similarly to what you mention above, > except for one extra constraint for _Decimal128 and that is that > they must live in even/odd register pairs. This was due to how > the instructions were implemented in the HW, they required even/odd > reg pairs. > > If there's zero chance AARCH ever implements HW DFP, then you're > probably fine with the above, but if you go with the above and HW DFP > is eventually added, then the HW would need handle even/odd and > odd/even register pairs in it's instructions...or you'd need to > add potential prologue/epilogue code to move formal args into > even/odd regs if the HW demands it. If there is a non-zero chance > or you just want to be safe, you could enforce even/odd reg usage > in the ABI upfront. So right now how TFmode is handled on AARCH64 not as a pair of 64bit registers but rather one 128bit registers (qN registrer; the floating point register and the SIMD register set on AARCH64 overlap already). So handling TDmode like TFmode is more natural for AARCH64 than most arch. that the DFP hw support for _Decimal128 on AARCH64 would take the values in the qN register rather than a pair of registers. Thanks, Andrew Pinski > > Peter >
Re: [PATCH/AARCH64] Decimal floating point support for AARCH64
On 7/13/17 7:12 PM, Andrew Pinski wrote: > This patch adds Decimal floating point support to aarch64. It is > the base support in that since there is no hardware support for DFP, > it just defines the ABI. The ABI I chose is that _Decimal32 is > treated like float, _Decimal64 is treated like double and _Decimal128 > is treated like long double. In that they are passed via the floating > registers (sN, dN, qN). > Is this ok an ABI? It depends on whether AARCH ever plans on implementing HW DFP. On POWER, we handle things similarly to what you mention above, except for one extra constraint for _Decimal128 and that is that they must live in even/odd register pairs. This was due to how the instructions were implemented in the HW, they required even/odd reg pairs. If there's zero chance AARCH ever implements HW DFP, then you're probably fine with the above, but if you go with the above and HW DFP is eventually added, then the HW would need handle even/odd and odd/even register pairs in it's instructions...or you'd need to add potential prologue/epilogue code to move formal args into even/odd regs if the HW demands it. If there is a non-zero chance or you just want to be safe, you could enforce even/odd reg usage in the ABI upfront. Peter
Re: [PATCH, rs6000] vec_mule and vec_mulo builtin fix
On Thu, Jul 20, 2017 at 03:56:01PM -0700, Carl Love wrote: > The following patch is a reworked patch to fix the bugs in the vec_mule > and vec_mulo patch that had to be reverted. The reverted fix, added the > correct mule and mulo instructions with vectorization. The > vectorization support resulted in vectorization in several testcases to > fail. This patch adds the correct instructions for the vec_mule and > vec_mulo builtins without vectorization. The goal is add the > vectorization support again in a future patch. Ah, as you explained offline, you removed vec_widen_umult_even_v4si and friends. Okay :-) The patch is okay for trunk then. Thanks! Segher
Re: libgo patch committed: Call f?statfs64 on GNU/Linux
On Jul 21 2017, Ian Lance Taylor wrote: > In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in > configure.ac, so we should unconditionally call the statfs64 and > fstatfs64 functions rather than statfs/fstatfs. That should happen automatically when building with _FILE_OFFSET_BITS=64. > These functions should be available on all versions of GNU/Linux > since 2.6. Its a property of glibc, not the kernel. > Index: gcc/go/gofrontend/MERGE > === > --- gcc/go/gofrontend/MERGE (revision 250436) > +++ gcc/go/gofrontend/MERGE (working copy) > @@ -1,4 +1,4 @@ > -a9f1aeced86691de891fbf2a8c97e848faf1962e > +b712bacd939466e66972337744983e180849c535 > > The first line of this file holds the git revision number of the last > merge done from the gofrontend repository. > Index: libgo/runtime/go-caller.c > === > --- libgo/runtime/go-caller.c (revision 250406) > +++ libgo/runtime/go-caller.c (working copy) > @@ -74,7 +74,7 @@ static void *back_state; > > /* A lock to control creating back_state. */ > > -static Lock back_state_lock; > +static uint32 back_state_lock; > > /* The program arguments. */ > Wrong patch? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH, rs6000] vec_mule and vec_mulo builtin fix
On Thu, Jul 20, 2017 at 03:56:01PM -0700, Carl Love wrote: > The following patch is a reworked patch to fix the bugs in the vec_mule > and vec_mulo patch that had to be reverted. The reverted fix, added the > correct mule and mulo instructions with vectorization. The > vectorization support resulted in vectorization in several testcases to > fail. This patch adds the correct instructions for the vec_mule and > vec_mulo builtins without vectorization. The goal is add the > vectorization support again in a future patch. The patch looks fine to me, but so did the previous version... What has changed? It's not clear to me. Segher
[PATCH] Emit DWARF5 DW_AT_export_symbols for namespaces
Hi! DWARF5 introduced DW_AT_export_symbols that may be preset on DW_TAG_namespace or DW_TAG_{structure,union,class}_type to signalize inline or anonymous namespaces or anonymous structures/unions/classes. What we were emitting instead is an implicit DW_TAG_imported_module in the outer namespace. The following patch changes nothing for -gdwarf-4 and below with -gstrict-dwarf, for -gdwarf-4 and below -gno-strict-dwarf it just adds DW_AT_export_symbols to inline namespaces (so that interested consumers can find out it is inline namespace, but consumers not knowing about DW_AT_export_symbols still have the implicit DW_TAG_imported_module). In that mode, no changes for anonymous namespaces, because those are already easily detectable by the consumers (missing DW_AT_name). For -gdwarf-5 it emits DW_AT_export_symbols on both inline namespaces and anonymous namespaces and doesn't emit the implicit DW_TAG_imported_module, which is shorter. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? This patch doesn't do anything about anon struct/union/class, I've tried to handle it, the problem is that ANON_AGGR_TYPE_P flag is set too late, after the debug hook adds the type DIE. Any thoughts on how to handle that? And I wonder what is the counterpart of ANON_AGGR_TYPE_P in the C FE, CCing Marek on that. 2017-07-21 Jakub Jelinek * debug.h (struct gcc_debug_hooks): Add IMPLICIT argument to imported_module_or_decl hook. (debug_nothing_tree_tree_tree_bool): Remove. (debug_nothing_tree_tree_tree_bool_bool): New declaration. * debug.c (do_nothing_debug_hooks): Use debug_nothing_tree_tree_tree_bool_bool instead of debug_nothing_tree_tree_tree_bool. * vmsdbgout.c (vmsdbg_debug_hooks): Likewise. * dbxout.c (dbx_debug_hooks, xcoff_debug_hooks): Likewise. * sdbout.c (sdb_debug_hooks): Likewise. * dwarf2out.c (dwarf2_lineno_debug_hooks): Likewise. (gen_namespace_die): Add DW_AT_export_symbols attribute if langhook wants it. (dwarf2out_imported_module_or_decl): Add IMPLICIT argument, if true, -gdwarf-5 and decl will have DW_AT_export_symbols attribute, don't add anything. cp/ * cp-objcp-common.c (cp_decl_dwarf_attribute): Handle DW_AT_export_symbols. * name-lookup.c (emit_debug_info_using_namespace): Add IMPLICIT argument, pass it through to the debug hook. (finish_namespace_using_directive): Adjust emit_debug_info_using_namespace caller. (push_namespace): Likewise. Call it after setting DECL_NAMESPACE_INLINE_P. (cp_emit_debug_info_for_using): Pass false as new argument to the imported_module_or_decl debug hook. fortran/ * trans-decl.c (gfc_trans_use_stmts): Pass false as new argument to the imported_module_or_decl debug hook. ada/ * gcc-interface/utils.c (gnat_write_global_declarations): Pass false as new argument to the imported_module_or_decl debug hook. testsuite/ * g++.dg/debug/dwarf2/inline-ns-1.C: New test. * g++.dg/debug/dwarf2/inline-ns-2.C: New test. --- gcc/debug.h.jj 2017-02-18 17:11:18.0 +0100 +++ gcc/debug.h 2017-07-21 12:19:09.486576092 +0200 @@ -145,7 +145,8 @@ struct gcc_debug_hooks /* Debug information for imported modules and declarations. */ void (* imported_module_or_decl) (tree decl, tree name, - tree context, bool child); + tree context, bool child, + bool implicit); /* DECL is an inline function, whose body is present, but which is not being output at this point. */ @@ -206,7 +207,8 @@ extern void debug_nothing_int_int (unsig extern void debug_nothing_tree (tree); extern void debug_nothing_tree_tree (tree, tree); extern void debug_nothing_tree_int (tree, int); -extern void debug_nothing_tree_tree_tree_bool (tree, tree, tree, bool); +extern void debug_nothing_tree_tree_tree_bool_bool (tree, tree, tree, + bool, bool); extern bool debug_true_const_tree (const_tree); extern void debug_nothing_rtx_insn (rtx_insn *); extern void debug_nothing_rtx_code_label (rtx_code_label *); --- gcc/debug.c.jj 2017-02-18 17:11:18.0 +0100 +++ gcc/debug.c 2017-07-21 11:53:45.452383785 +0200 @@ -47,7 +47,7 @@ const struct gcc_debug_hooks do_nothing_ debug_nothing_tree, /* early_global_decl */ debug_nothing_tree, /* late_global_decl */ debug_nothing_tree_int, /* type_decl */ - debug_nothing_tree_tree_tree_bool,/* imported_module_or_decl */ + debug_nothing_tree_tree_tree_bool_bool,/* imported_module_or_decl */ debug_nothing_tree, /* deferred_inline_function */ debug_nothing_tree, /* outlining_inline_function */ debug_nothing_rtx_code_label,
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
On Thu, Jul 20, 2017 at 08:20:52AM -0600, Jeff Law wrote: > > Can only combine-stack-adjustments do this? It seems like something > > many passes could do, and then your new note doesn't help. > SO far it's only been observed with c-s-a, but further auditing is > certainly desirable here, particularly with the upcoming changes to the > generic dynamic alloca handling. > > In the V2 patch only backends would emit unrolled inline alloca/probe > sequences like what you see above and only for prologues. Thus there > were a very limited number of passes to be concerned about. > > In the V3 patch we have unrolled inline probing for the dynamic space as > well, so this kind of sequence is exposed to everything after > gimple->rtl expansion. > > Unfortunately, the most thorough checker we have is x86 and on that > target, because of stack alignment issues, we'll never see a constant > size in the dynamic space and thus no unrolled inlined alloca/probe > sequences. > > In reality I suspect that with teh hard register references, most passes > are going to leave those insns alone, but some auditing is necessary. This is similar to what rs6000 uses stack_tie for. You want the prevent a store to the stack (the probe) from being moved after a later stack pointer update. By pretending (in the insn pattern) there is a store to stack with that stack pointer update, nothing can move stores after it. Segher
libgo patch committed: Call f?statfs64 on GNU/Linux
In libgo we unconditionally set _FILE_OFFSET_BITS to 64 in configure.ac, so we should unconditionally call the statfs64 and fstatfs64 functions rather than statfs/fstatfs. These functions should be available on all versions of GNU/Linux since 2.6. On 64-bit systems they are aliased to statfs/fstatfs, and on 32-bit systems they use the 64-bit data structures. This fixes https://golang.org/issue/20922. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 250436) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -a9f1aeced86691de891fbf2a8c97e848faf1962e +b712bacd939466e66972337744983e180849c535 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-caller.c === --- libgo/runtime/go-caller.c (revision 250406) +++ libgo/runtime/go-caller.c (working copy) @@ -74,7 +74,7 @@ static void *back_state; /* A lock to control creating back_state. */ -static Lock back_state_lock; +static uint32 back_state_lock; /* The program arguments. */ @@ -85,7 +85,15 @@ extern Slice runtime_get_args(void); struct backtrace_state * __go_get_backtrace_state () { - runtime_lock (&back_state_lock); + uint32 set; + + /* We may not have a g here, so we can't use runtime_lock. */ + set = 0; + while (!__atomic_compare_exchange_n (&back_state_lock, &set, 1, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) +{ + runtime_osyield (); + set = 0; +} if (back_state == NULL) { Slice args; @@ -113,7 +121,7 @@ __go_get_backtrace_state () back_state = backtrace_create_state (filename, 1, error_callback, NULL); } - runtime_unlock (&back_state_lock); + __atomic_store_n (&back_state_lock, 0, __ATOMIC_RELEASE); return back_state; }
Re: Deprecate DBX/stabs?
> On 21 Jul 2017, at 20:54, Jim Wilson wrote: > > On Fri, Jul 21, 2017 at 12:44 PM, Iain Sandoe wrote: >> It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 >> (OS X 10.5) ought to be defaulting to DWARF already, will check that >> sometime. > > Yes, they do default to dwarf2. The comments say pre-darwin9 32-bit > defaults to stabs. The question is whether anyone cares about that > anymore. >From my perspective, as per Mike’s comments, I’d say “let’s move on”, - Darwin8’s DWARF support is good enough for C at least - as per my other comments, there remain ways forward for someone who really wants to support it (TenFourFox seems still active and I do get a few queries per year from folks working with Darwin8). - deprecation gives other folks a chance to shout if they care. cheers Iain Iain Sandoe CodeSourcery / Mentor Embedded
Re: Deprecate DBX/stabs?
On Fri, Jul 21, 2017 at 12:44 PM, Iain Sandoe wrote: > It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 (OS > X 10.5) ought to be defaulting to DWARF already, will check that sometime. Yes, they do default to dwarf2. The comments say pre-darwin9 32-bit defaults to stabs. The question is whether anyone cares about that anymore. Jim
Re: Deprecate DBX/stabs?
> On 21 Jul 2017, at 20:10, Mike Stump wrote: > > On Jul 21, 2017, at 6:07 AM, Nathan Sidwell wrote: >> >> [Darwin, cygwin, rx maintainers, you might have an opinion] > > darwin going forward is a DWARF platform, so, shouldn't be a heartache for > real folks. agreed, in fact the default for current assemblers doesn’t support stabs - and we’ve adjusted the specs to deal with that. > For ancient machines, ancient compilers might be a requirement. Generally, > I like keeping things; but cleanups and removals are a part of life, and > eventually the old should be removed. I'd not stand in the way of such > removal. If we were within 5 years of such a transition point, I'd argue to > keep it for at least 5 years. But, the switch for darwin was Oct 26th, 2007. > 10 years I think is a nice cutover point for first tier things. Beyond 10, > and I'd say, you are dragging your feet. If _all_ the code for DBX were in > my port file, I'd be tempted to keep it indefinitely. It's not, so, that's > not possible/reasonable. > > Iain, do you still have the G5s? :-) Do they run 8 or 9? What do you > think? Seem reasonable? I still have access to i686 and powerpc Darwin8, but testing is extremely infrequent. For the record; anyone wanting modern toolchains on Darwin8 most likely has to face building a linker and more modern cctools anyway(the XCode 2.5 set was extremely flaky from at least 4.6/4.7). I’d suspect that a person serious in doing that would likely be willing to build GDB which does support x86 Darwin, at least. FWIW I have a patch that makes GDB (at least 7.x) work for PowerPC too. If anyone cares, ask ;-) (I doubt if I’ll try modernising it across the transition to c++ for GDB - since available time would prob be better spent elsewhere). Anyone wanting to build modern GCC on < Darwin8 is going to need to build most of the supporting infra too - the (XCode 1.5) linker/assembler there don’t support enough weak stuff to be viable for modern c++. These very old platforms are long past the “config && make” stage, supporting them needs a degree of commitment and understanding. - My G5 ( and most of my other ppc hardware) habitually runs 10.5 (Darwin9) and I’ve no motivation to reboot to 10.4 unless to test something more quickly than my really ancient G4s can manage. > The 64_BIT x86 darwin question likely can be be flipped to default to dwarf; > so, even that isn't an issue. It ought to be already, in fact anything (powerpc*/x86/x86-64) >= Darwin9 (OS X 10.5) ought to be defaulting to DWARF already, will check that sometime. cheers, Iain Iain Sandoe CodeSourcery / Mentor Embedded
Re: PING^2: Fwd: SSA range class and removal of VR_ANTI_RANGEs
On Mon, Jul 17, 2017 at 6:23 AM, Richard Biener wrote: > On Mon, Jul 17, 2017 at 8:51 AM, Aldy Hernandez wrote: >> How does this look? > > It's a change that on its own doesn't look worthwhile to me. > > So please post the changes that will build ontop of this. Like removing > anti-ranges from VRP or your on-demand value-range stuff. > > Thanks, > Richard. >From the looks of it, we can have a variety of VRP ranges that are not representable at all with the an integer range class. For instance, I see the following ranges built in set_value_range(): [INT, (nop_expr SSA)] [INT, (plus_expr SSA INT)] [(negate_expr SSA), (negate_expr SSA)] [(plus_expr (negate_expr SSA INT)), (plus_expr (negate_expr SSA) INT)] [SSA, SSA] So...I guess the first suggestion is out of the question ;-). Aldy
[C++ PATCH] merge various METHOD_VEC accessors
In the old days there was lookup_fnfields_1 which gave you an index. Then lazy special function creation hapened and lookup_fnfields_idx_nolazy appeared. Then users wanted the overloads directly so lookup_fnfields_slot and lookup_fnfields_slot_nolazy were born. Now everybody uses the slot accessors, and the only users of the index accessors are the slot accessors themselves. This patch does the obvious merging, so we only have lookup_fnfields_slot{,_nolazy}. Much simples. More simples later ... nathan -- Nathan Sidwell 2017-07-21 Nathan Sidwell * search.c (lookup_conversion_operator): Return overloads. (lookup_fnfields_idx_nolazy): Absorb into ... (lookup_fnfields_slot_nolaxy): ... here. (lookup_fnfields_1): Absorb into ... (lookup_fnfields_slot): ... here. Index: cp/search.c === --- cp/search.c (revision 250437) +++ cp/search.c (working copy) @@ -1530,62 +1530,53 @@ lookup_fnfields (tree xbasetype, tree na return rval; } -/* Return the index in the CLASSTYPE_METHOD_VEC for CLASS_TYPE - corresponding to "operator TYPE ()", or -1 if there is no such - operator. Only CLASS_TYPE itself is searched; this routine does - not scan the base classes of CLASS_TYPE. */ +/* Return the conversion operators in CLASS_TYPE corresponding to + "operator TYPE ()". Only CLASS_TYPE itself is searched; this + routine does not scan the base classes of CLASS_TYPE. */ -static int +static tree lookup_conversion_operator (tree class_type, tree type) { - int tpl_slot = -1; + tree tpls = NULL_TREE; if (TYPE_HAS_CONVERSION (class_type)) { - int i; - tree fn; + tree fns; vec *methods = CLASSTYPE_METHOD_VEC (class_type); - for (i = CLASSTYPE_FIRST_CONVERSION_SLOT; - vec_safe_iterate (methods, i, &fn); ++i) + for (int i = CLASSTYPE_FIRST_CONVERSION_SLOT; + vec_safe_iterate (methods, i, &fns); ++i) { /* All the conversion operators come near the beginning of the class. Therefore, if FN is not a conversion operator, there is no matching conversion operator in CLASS_TYPE. */ - fn = OVL_FIRST (fn); + tree fn = OVL_FIRST (fns); if (!DECL_CONV_FN_P (fn)) break; if (TREE_CODE (fn) == TEMPLATE_DECL) /* All the templated conversion functions are on the same slot, so remember it. */ - tpl_slot = i; + tpls = fns; else if (same_type_p (DECL_CONV_FN_TYPE (fn), type)) - return i; + return fns; } } - return tpl_slot; + return tpls; } -/* TYPE is a class type. Return the index of the fields within - the method vector with name NAME, or -1 if no such field exists. - Does not lazily declare implicitly-declared member functions. */ +/* TYPE is a class type. Return the member functions in the method + vector with name NAME. Does not lazily declare implicitly-declared + member functions. */ -static int -lookup_fnfields_idx_nolazy (tree type, tree name) +tree +lookup_fnfields_slot_nolazy (tree type, tree name) { - vec *method_vec; - tree fn; - size_t i; - - if (!CLASS_TYPE_P (type)) -return -1; - - method_vec = CLASSTYPE_METHOD_VEC (type); + vec *method_vec = CLASSTYPE_METHOD_VEC (type); if (!method_vec) -return -1; +return NULL_TREE; if (GATHER_STATISTICS) n_calls_lookup_fnfields_1++; @@ -1594,10 +1585,12 @@ lookup_fnfields_idx_nolazy (tree type, t return lookup_conversion_operator (type, TREE_TYPE (name)); /* Skip the conversion operators. */ + int i; + tree fns; for (i = CLASSTYPE_FIRST_CONVERSION_SLOT; - vec_safe_iterate (method_vec, i, &fn); + vec_safe_iterate (method_vec, i, &fns); ++i) -if (!DECL_CONV_FN_P (OVL_FIRST (fn))) +if (!DECL_CONV_FN_P (OVL_FIRST (fns))) break; /* If the type is complete, use binary search. */ @@ -1615,36 +1608,35 @@ lookup_fnfields_idx_nolazy (tree type, t if (GATHER_STATISTICS) n_outer_fields_searched++; - tree tmp = (*method_vec)[i]; - tmp = OVL_NAME (tmp); - if (tmp > name) + fns = (*method_vec)[i]; + tree fn_name = OVL_NAME (fns); + if (fn_name > name) hi = i; - else if (tmp < name) + else if (fn_name < name) lo = i + 1; else - return i; + return fns; } } else -for (; vec_safe_iterate (method_vec, i, &fn); ++i) +for (; vec_safe_iterate (method_vec, i, &fns); ++i) { if (GATHER_STATISTICS) n_outer_fields_searched++; - if (OVL_NAME (fn) == name) - return i; + if (OVL_NAME (fns) == name) + return fns; } - return -1; + return NULL_TREE; } -/* TYPE is a class type. Return the index of the fields within - the method vector with name NAME, or -1 if no such field exists. */ +/* TYPE is a class type. Return the overloads in + the method vector with name NAME. Lazily create ctors etc. */ -static int -lookup_fnfields_1 (tree type, tree name) +tree +lookup_fnfields_slot
[PATCH, RFC] Proposed PowerPC IEEE 128-bit floating point changes
This patch makes some changes to the way the PowerPC handles IEEE 128-bit floating point, and I wanted to point out where I would like to change the compiler, but be open to doing it in other ways. There are 3 changes in this patch, and presumably more work to do beyond this patch. The first change is to enable the C language to use _Float128 keyword (but not __float128) without having to use the -mfloat128 option on power7-power9 systems. My question is in the TR that introduced _Float128, is there any expectation that outside of the built-in functions we already provide, that we need to provide runtime functions? Yes, glibc 2.26 will be coming along shortly, and it should provide most/all of the F128 functions, but distros won't pick this library up for some time. I would like to enable it, but I want to avoid the problem that we have with __float128 in that once the keyword is supported, it is assumed everything is supported. GCC and GLIBC run on different cycles (and different people work on it), and so you typically have to have GCC add support before GLIBC can use it. We've discovered that boost and libstdc++ both assume the full library support exists if the __float128 keyword is used. We will eventually need to tackle both of these libraries, but we need to the full GLIBC support (or libquadmath) to provide this functionality. Part of the reason for wanting to enable _Float128 without library support or an explicit -mfloat128 is the types and built-in functions have to be set up at compiler startup, and it makes the second change simpler. The second change is to allow #pragma GCC target and attribute((target(...))) to enable IEEE 128-bit __float128 on a per module basis (PR 70589). If I don't enable _Float128 unconditionally, I've discovered that I couldn't find a way to allow _Float128 access after compiler startup. In the second change, I changed how we handle __float128 internally. Previously, we would create the keyword __float128 if -mfloat128, and we would create the same type with the undocumented __ieee128 keyword (you need a keyword to enable the type). Now, I always create the __ieee128 keyword, and if -mfloat128 (or pragma/attribute) is enabled, I add a define: #define __float128 __ieee128 It was a lot simpler to do it this way, rather than trying to keep track of whether the real keyword was __float128 or __ieee128, and do the appropriate define or undef. The third change is to add cpu names 'power7f', 'power8f', and 'power9f' that enable the appropriate power architecture but also does a -mfloat128. The motavation here is you cannot use either #pragma GCC target anywhere or the target attribute on the function with a target_clones attribute declaration. The question is do people see the need for -mcpu=power9f? Or should those be dropped? Assuming we are keeping those, is there a name people prefer instead of just using a 'f' suffix. Note, the GCC infrastructure does not allow a hyphen in the name in creating configargs.h for --with-cpu=, etc. At the current time, you cannot use --with-cpu=power8f (it fails in configuring libstdc++, which says that GLIBC supports __float128), but as GLIBC 2.26 becomes more refined, hopefully it will work someday. In writing the tests, I discovered two of the float128 tests could have the target requiements/options tweaked, and I included those changes here. I also discovered we had parallel bits for setting up power7, and I used the ISA 2.06 mask in the power7 definition. So comments on how to proceed? [gcc] 2017-07-20 Michael Meissner PR target/70589 * config.gcc (powerpc*-*-*): Add support for -mcpu=power{7,8,9}f that combines -mcpu=power{7,8,9} with -mfloat128. * config/rs6000/rs6000-cpus.def (OTHER_IEEE_SW_MASKS): New mask of the float128 software/hardware options. (OTHER_IEEE_HW_MASKS): Likewise. (power7 cpu): Replace masks with ISA_2_6_MASKS_SERVER. (power7f): New cpu target that combines a particular machine with -mfloat128. (power8f): Likewise. (power9f): Likewise. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Add support for allowing -mfloat128 to be enabled via target pragma or attribute. Always use __ieee128 for the IEEE 128-bit floating point internal type, and if -mfloat128, add a define from __float128 to __ieee128. (rs6000_cpu_cpp_builtins): Likewise. * config/rs6000/rs6000.c (rs6000_init_builtins): Enable _Float128 if -mfloat128-type is enabled (on by default for VSX systems) rather than -mfloat128 (which controls __float128). (rs6000_floatn_mode): Likewise. (rs6000_opt_masks): Allow float128 and float128-hardware target options to be set. * doc/invoke.texi (PowerPC options): Document -mcpu=power{7,8,9}f. [gcc/testsuite] 2017-07-20 Michael Meis
Re: Deprecate DBX/stabs?
On Jul 21, 2017, at 9:03 AM, Jim Wilson wrote: > There is also the matter of SDB_DEBUG which is still supported, and is > no longer used by anyone, as we have already deprecated all of the > COFF targets that were using it. SDB support for C++ is even worse > than the DBX support. This should be no problem to deprecate and > remove. We could perhaps even just drop it without bothering to > deprecate it first. I think that's reasonable. I haven't used adb and sdb in years. :-)
Re: Deprecate DBX/stabs?
On Jul 21, 2017, at 6:07 AM, Nathan Sidwell wrote: > > [darwin, cygwin, rx maintainers, you might have an opinion] darwin going forward is a DWARF platform, so, shouldn't be a heartache for real folks. For ancient machines, ancient compilers might be a requirement. Generally, I like keeping things; but cleanups and removals are a part of life, and eventually the old should be removed. I'd not stand in the way of such removal. If we were within 5 years of such a transition point, I'd argue to keep it for at least 5 years. But, the switch for darwin was Oct 26th, 2007. 10 years I think is a nice cutover point for first tier things. Beyond 10, and I'd say, you are dragging your feet. If _all_ the code for DBX were in my port file, I'd be tempted to keep it indefinitely. It's not, so, that's not possible/reasonable. Iain, do you still have the G5s? :-) Do they run 8 or 9? What do you think? Seem reasonable? The 64_BIT x86 darwin question likely can be be flipped to default to dwarf; so, even that isn't an issue.
libgo patch committed: don't use runtime_lock in __go_get_backtrace_state
If getSiginfo in libgo does not know how to determine the PC, it will call runtime_callers. That can happen in a thread that was started by non-Go code, in which case the TLS variable g will not be set, in which case runtime_lock will crash. Avoid the problem by using atomic operations for the lock. This is OK since creating a backtrace state is fast and never blocks. The test case is TestCgoExternalThreadSIGPROF in the runtime package on a system that getSiginfo doesn't handle specially. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 250436) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -a9f1aeced86691de891fbf2a8c97e848faf1962e +b712bacd939466e66972337744983e180849c535 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-caller.c === --- libgo/runtime/go-caller.c (revision 250406) +++ libgo/runtime/go-caller.c (working copy) @@ -74,7 +74,7 @@ static void *back_state; /* A lock to control creating back_state. */ -static Lock back_state_lock; +static uint32 back_state_lock; /* The program arguments. */ @@ -85,7 +85,15 @@ extern Slice runtime_get_args(void); struct backtrace_state * __go_get_backtrace_state () { - runtime_lock (&back_state_lock); + uint32 set; + + /* We may not have a g here, so we can't use runtime_lock. */ + set = 0; + while (!__atomic_compare_exchange_n (&back_state_lock, &set, 1, false, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) +{ + runtime_osyield (); + set = 0; +} if (back_state == NULL) { Slice args; @@ -113,7 +121,7 @@ __go_get_backtrace_state () back_state = backtrace_create_state (filename, 1, error_callback, NULL); } - runtime_unlock (&back_state_lock); + __atomic_store_n (&back_state_lock, 0, __ATOMIC_RELEASE); return back_state; }
[PATCH] unitialized memory access vs BIT_INSERT_EXPR
Hi, Due to the way bit-field access lower is done, we can get an unitialized memory load and this causes the unitialized warning to kick in. The case we have is: temp_1 = BIT_FIELD_REF temp_2 = BIT_INSERT BIT_FIELD_REF = temp_2 What this patch does is similar to what was done for the case of BIT_INSERT for unitialized ssa names (default) but for memory accesses. Note also tested with the bit-field lower pass added; this and the fold-const.c/tree-ssa-sccvn.c patch (which was just committed) are requirements to the lower pass. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. Thanks, Andrew Pinski * tree-ssa-uninit.c (warn_uninitialized_vars): Don't warn about memory accesses where the use is for the first operand of a BIT_INSERT. Index: tree-ssa-uninit.c === --- tree-ssa-uninit.c (revision 250430) +++ tree-ssa-uninit.c (working copy) @@ -273,6 +273,11 @@ warn_uninitialized_vars (bool warn_possi && gimple_has_location (stmt)) { tree rhs = gimple_assign_rhs1 (stmt); + tree lhs = gimple_assign_lhs (stmt); + bool has_bit_insert = false; + use_operand_p luse_p; + imm_use_iterator liter; + if (TREE_NO_WARNING (rhs)) continue; @@ -300,6 +305,26 @@ warn_uninitialized_vars (bool warn_possi ref.offset) <= 0))) continue; + /* Do not warn if the access is then used for a BIT_INSERT_EXPR. */ + if (TREE_CODE (lhs) == SSA_NAME) + FOR_EACH_IMM_USE_FAST (luse_p, liter, lhs) + { + gimple *use_stmt = USE_STMT (luse_p); +/* BIT_INSERT_EXPR first operand should not be considered + a use for the purpose of uninit warnings. */ + if (gassign *ass = dyn_cast (use_stmt)) + { + if (gimple_assign_rhs_code (ass) == BIT_INSERT_EXPR + && luse_p->use == gimple_assign_rhs1_ptr (ass)) + { + has_bit_insert = true; + break; + } + } + } + if (has_bit_insert) + continue; + /* Limit the walking to a constant number of stmts after we overcommit quadratic behavior for small functions and O(n) behavior. */
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 07/08 V2
Jeff Law wrote: > Examples please? We should be probing the outgoing args at the probe > interval once the total static frame is greater than 3k. The dynamic > space should be probed by generic code. OK, here are a few simple examples that enable a successful jump of the stack guard despite -fstack-clash-protection: int t1(int x) { char arr[3000]; return arr[x]; } int t2(int x) { char *p = __builtin_alloca (4050); x = t1 (x); return p[x]; } #define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X #define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X) void out1(ARG192(__int128)); int t3(int x) { if (x < 1000) return t1 (x) + 1; out1 (ARG192(0)); return 0; } This currently generates: t1: sub sp, sp, #3008 add x1, sp, 8 ldrbw0, [x1, w0, sxtw] add sp, sp, 3008 ret t2: stp x29, x30, [sp, -16]! mov x1, 4072 add x29, sp, 0 sub sp, sp, #4080 mov x2, sp str xzr, [sp, x1] bl t1 ldrbw0, [x2, w0, sxtw] add sp, x29, 0 ldp x29, x30, [sp], 16 ret t3: stp x29, x30, [sp, -16]! cmp w0, 999 add x29, sp, 0 sub sp, sp, #3008 bgt .L15 bl t1 add w0, w0, 1 .L14: add sp, x29, 0 ldp x29, x30, [sp], 16 ret As you can see t2 allocates 4080 bytes on the stack but probes only the top, leaving 4072 bytes unprobed. When it calls t1, it drops the stack by another 3008 bytes without a probe (as it is allowed up to 3KB), so now we've got a distance of almost 7KB between 2 probes... Similarly functions with large outgoing arguments are not correctly probed. t3 creates 3008 bytes of outgoing area without a probe and then calls t1 which will decrement the stack by another 3008 bytes without a probe. Both t2 and t3 must probe SP+1024 to ensure the callee can adjust SP by up to 3KB before emitting a probe. > What we should be doing, per your request is emit an initial probe if we > know the function is going to require probing of any form. Then we emit > probes at 4k intervals. At least that's how I understood your > simplification. So for a 7k stack that's two probes -- one at *sp at > the start of the prologue then the second after the first 4k is allocated. There is no benefit in doing an initial probe that way. We don't have to probe the first 3KB even if the function has a larger stack. So if we have a 6KB stack, we can drop the stack by 3KB, probe, drop it by another 3KB, then push the callee-saves. If the stack is larger than 7KB we probe at 7KB, then 11KB etc. > > I don't understand what last_probe_offset is for, it should always be zero > > after saving the callee-saves (though currently it is not set correctly). > > And it > > has to be set to a fixed value to limit the maximum outgoing args when doing > > final_adjust. > It's supposed to catch the case where neither the initial adjustment nor > final adjustment in and of themselves require probes, but the > combination would. That's not possible - the 2 cases are completely independent given that the callee-saves always provide an implicit probe (you can't have outgoing arguments if there is no call, and you can't have a call without callee-saving LR). > My understanding was that you didn't want that level of trackign around > the callee saves. It's also not clear if that will work in the presence > of separate shrink wrapping. Shrinkwrapping will be safe once I ensure LR is at the bottom of the callee-saves. With this fix we know for sure that if there is a call, we've saved LR at SP+0 before adjusting SP by final_adjust. This means we don't need to try to figure out where the last probe was - it's always SP+0 if final_adjust != 0, simplifying things. > I have no idea what you mean by setting to a fixed value for limit the > maximum outgoing args. We don't limit the maximum outgoing args -- we > probe that space just like any other as we cross 4k boundaries. No that's not correct - remember we want to reserve 1024 bytes for the outgoing area. So we must probe if the outgoing area is > 1024, not 4KB (and we need to probe again at 5KB, 9KB etc). > We're clearly not communicating well. Example would probably help. My t3 example above shows how you can easily jump the stack guard using lots of outgoing args (but only if you call a different function with no outgoing args first!). Wilco
[C++ PATCH] no special cdtor slots
ctors and dtors had special slots in the METHOD_VEC, because they used not to have proper names. Now they have unique identifiers, we can treat them just as any other function and look them up by [cd]tor_identifier. (the eventual goal is to replace separate METHOD_VEC and FIELD_VEC with a single map map, so don't worry about this making looking for cdtors slower than an array access right now) Applied to trunk. nathan -- Nathan Sidwell 2017-07-21 Nathan Sidwell Remove special CDtor METHOD_VEC slots. * cp-tree.h (CLASSTYPE_CONSTRUCTOR_SLOT, CLASSTYPE_DESTRUCTOR_SLOT): Delete. (CLASSTYPE_CONSTRUCTORS): Use lookup_fnfields_slot_nolazy. (CLASSTYPE_DESTRUCTOR): Likewise. * class (add_method): Don't use special cdtor slots. * search.c (lookup_fnfields_idx_nolazy): Likewise. (look_for_overrides_here): Use lookup_fnfields_slot. * semantics (classtype_has_nothrow_assign_or_copy_p): Likewise. Index: class.c === --- class.c (revision 250426) +++ class.c (working copy) @@ -1039,50 +1039,39 @@ add_method (tree type, tree method, bool we're going to end up with an assignment operator at some point as well. */ vec_alloc (method_vec, 8); - /* Create slots for constructors and destructors. */ - method_vec->quick_push (NULL_TREE); - method_vec->quick_push (NULL_TREE); CLASSTYPE_METHOD_VEC (type) = method_vec; } /* Maintain TYPE_HAS_USER_CONSTRUCTOR, etc. */ grok_special_member_properties (method); - /* Constructors and destructors go in special slots. */ - if (DECL_MAYBE_IN_CHARGE_CONSTRUCTOR_P (method)) -slot = CLASSTYPE_CONSTRUCTOR_SLOT; - else if (DECL_MAYBE_IN_CHARGE_DESTRUCTOR_P (method)) -slot = CLASSTYPE_DESTRUCTOR_SLOT; - else -{ - tree m; + tree m; - insert_p = true; - /* See if we already have an entry with this name. */ - for (slot = CLASSTYPE_FIRST_CONVERSION_SLOT; - vec_safe_iterate (method_vec, slot, &m); - ++slot) + insert_p = true; + /* See if we already have an entry with this name. */ + for (slot = CLASSTYPE_FIRST_CONVERSION_SLOT; + vec_safe_iterate (method_vec, slot, &m); + ++slot) +{ + m = OVL_FIRST (m); + if (template_conv_p) { - m = OVL_FIRST (m); - if (template_conv_p) - { - if (TREE_CODE (m) == TEMPLATE_DECL - && DECL_TEMPLATE_CONV_FN_P (m)) - insert_p = false; - break; - } - if (conv_p && !DECL_CONV_FN_P (m)) - break; - if (DECL_NAME (m) == DECL_NAME (method)) - { - insert_p = false; - break; - } - if (complete_p - && !DECL_CONV_FN_P (m) - && DECL_NAME (m) > DECL_NAME (method)) - break; + if (TREE_CODE (m) == TEMPLATE_DECL + && DECL_TEMPLATE_CONV_FN_P (m)) + insert_p = false; + break; + } + if (conv_p && !DECL_CONV_FN_P (m)) + break; + if (DECL_NAME (m) == DECL_NAME (method)) + { + insert_p = false; + break; } + if (complete_p + && !DECL_CONV_FN_P (m) + && DECL_NAME (m) > DECL_NAME (method)) + break; } current_fns = insert_p ? NULL_TREE : (*method_vec)[slot]; @@ -1256,7 +1245,7 @@ add_method (tree type, tree method, bool if (conv_p) TYPE_HAS_CONVERSION (type) = 1; - else if (slot >= CLASSTYPE_FIRST_CONVERSION_SLOT && !complete_p) + else if (!complete_p && !IDENTIFIER_CDTOR_P (DECL_NAME (method))) push_class_level_binding (DECL_NAME (method), current_fns); if (insert_p) Index: cp-tree.h === --- cp-tree.h (revision 250426) +++ cp-tree.h (working copy) @@ -2148,29 +2148,21 @@ struct GTY(()) lang_type { and the RECORD_TYPE for the class template otherwise. */ #define CLASSTYPE_DECL_LIST(NODE) (LANG_TYPE_CLASS_CHECK (NODE)->decl_list) -/* The slot in the CLASSTYPE_METHOD_VEC where constructors go. */ -#define CLASSTYPE_CONSTRUCTOR_SLOT 0 - -/* The slot in the CLASSTYPE_METHOD_VEC where destructors go. */ -#define CLASSTYPE_DESTRUCTOR_SLOT 1 - /* The first slot in the CLASSTYPE_METHOD_VEC where conversion operators can appear. */ -#define CLASSTYPE_FIRST_CONVERSION_SLOT 2 +#define CLASSTYPE_FIRST_CONVERSION_SLOT 0 /* A FUNCTION_DECL or OVERLOAD for the constructors for NODE. These are the constructors that take an in-charge parameter. */ #define CLASSTYPE_CONSTRUCTORS(NODE) \ - ((*CLASSTYPE_METHOD_VEC (NODE))[CLASSTYPE_CONSTRUCTOR_SLOT]) + (lookup_fnfields_slot_nolazy (NODE, ctor_identifier)) /* A FUNCTION_DECL for the destructor for NODE. This is the destructors that take an in-charge parameter. If CLASSTYPE_LAZY_DESTRUCTOR is true, then this entry will be NULL until the destructor is created with lazily_declare_fn. */ #define CLASSTYPE_DESTRUCTOR(NODE) \ - (CLASSTYPE_METHOD_VEC (NODE) \ - ? (*CLASSTYPE_METHOD_VEC (NODE))[CLASSTYPE_DESTRUCTOR_SLOT] \ - : NULL_TREE) + (lookup_fnfields_slot_nolazy (NODE, dtor_ident
libgo patch committed: Recognize PPC PC in signal handler
This patch to libgo changes the signal handler for PPC (and PPC64) GNU/Linux to fetch the PC from the signal context. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 250433) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -e34cb8dee6c1f215329e0eea79202b48cb83817c +a9f1aeced86691de891fbf2a8c97e848faf1962e The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/go-signal.c === --- libgo/runtime/go-signal.c (revision 250406) +++ libgo/runtime/go-signal.c (working copy) @@ -215,6 +215,11 @@ getSiginfo(siginfo_t *info, void *contex ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[REG_EIP]; #endif #endif +#ifdef __PPC__ + #ifdef __linux__ + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip; + #endif +#endif if (ret.sigpc == 0) { // Skip getSiginfo/sighandler/sigtrampgo/sigtramp/handler.
Re: [PATCH,AIX] Enable XCOFF in libbacktrace on AIX
On Mon, May 15, 2017 at 7:24 AM, REIX, Tony wrote: > Description: > * This patch enables libbacktrace to handle XCOFF on AIX. > > Tests: > * Fedora25/x86_64 + GCC v7.1.0 : Configure/Build: SUCCESS >- build made by means of a .spec file based on Fedora gcc-7.0.1-0.12 .spec > file > ../configure --enable-bootstrap > --enable-languages=c,c++,objc,obj-c++,fortran,go,lto --prefix=/usr > --mandir=/usr/share/man --infodir=/usr/share/info > --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared > --enable-threads=posix --enable-checking=release --enable-multilib > --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions > --enable-gnu-unique-object --enable-linker-build-id > --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin > --enable-initfini-array --with-isl --enable-libmpx > --enable-offload-targets=nvptx-none --without-cuda-driver > --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 > --build=x86_64-redhat-linux > > ChangeLog: > * libbacktrace/Makefile.am : Add xcoff.c > * libbacktrace/Makefile.in : Regenerated > * libbacktrace/configure.ac : Add XCOFF output file type > * libbacktrace/configure : Regenerated > * libbacktrace/fileline.c : Handle AIX procfs tree > * libbacktrace/filetype.awk : Add AIX XCOFF type detection > * libbacktrace/xcoff.c : New file for handling XCOFF format Thanks. Committed with some minor changes, as follows. Ian 2017-07-21 Tony Reix * filetype.awk: Add AIX XCOFF type detection. * configure.ac: Recognize xcoff format. * Makefile.am (FORMAT_FILES): Add xcoff.c. * fileline.c: Include . (fileline_initialize): Add case for AIX procfs. * xcoff.c: New file. * configure, Makefile.in: Rebuild. Index: Makefile.am === --- Makefile.am (revision 250406) +++ Makefile.am (working copy) @@ -57,7 +57,8 @@ BACKTRACE_FILES = \ FORMAT_FILES = \ elf.c \ pecoff.c \ - unknown.c + unknown.c \ + xcoff.c VIEW_FILES = \ read.c \ @@ -155,3 +156,5 @@ sort.lo: config.h backtrace.h internal.h stest.lo: config.h backtrace.h internal.h state.lo: config.h backtrace.h backtrace-supported.h internal.h unknown.lo: config.h backtrace.h internal.h +xcoff.lo: config.h backtrace.h internal.h + Index: configure.ac === --- configure.ac(revision 250406) +++ configure.ac(working copy) @@ -233,6 +233,9 @@ elf*) FORMAT_FILE="elf.lo" ;; pecoff) FORMAT_FILE="pecoff.lo" backtrace_supports_data=no ;; +xcoff) FORMAT_FILE="xcoff.lo" + backtrace_supports_data=no + ;; *) AC_MSG_WARN([could not determine output file type]) FORMAT_FILE="unknown.lo" backtrace_supported=no Index: fileline.c === --- fileline.c (revision 250406) +++ fileline.c (working copy) @@ -37,6 +37,7 @@ POSSIBILITY OF SUCH DAMAGE. */ #include #include #include +#include #include "backtrace.h" #include "internal.h" @@ -57,6 +58,7 @@ fileline_initialize (struct backtrace_st int pass; int called_error_callback; int descriptor; + char buf[64]; if (!state->threaded) failed = state->fileline_initialization_failed; @@ -80,7 +82,7 @@ fileline_initialize (struct backtrace_st descriptor = -1; called_error_callback = 0; - for (pass = 0; pass < 4; ++pass) + for (pass = 0; pass < 5; ++pass) { const char *filename; int does_not_exist; @@ -99,6 +101,10 @@ fileline_initialize (struct backtrace_st case 3: filename = "/proc/curproc/file"; break; + case 4: + snprintf (buf, sizeof (buf), "/proc/%d/object/a.out", getpid ()); + filename = buf; + break; default: abort (); } Index: filetype.awk === --- filetype.awk(revision 250406) +++ filetype.awk(working copy) @@ -3,3 +3,6 @@ /\177ELF\002/ { if (NR == 1) { print "elf64"; exit } } /\114\001/{ if (NR == 1) { print "pecoff"; exit } } /\144\206/{ if (NR == 1) { print "pecoff"; exit } } +/\001\337/{ if (NR == 1) { print "xcoff"; exit } } +/\001\367/{ if (NR == 1) { print "xcoff"; exit } } + Index: xcoff.c === --- xcoff.c (revision 0) +++ xcoff.c (working copy) @@ -0,0 +1,76 @@ +/* xcoff.c -- Get debug data from a XCOFFF file for backtraces. + Copyright (C) 2017 Free Software Foundation, Inc. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are +met: + +(1) Redistributions of source code must retain the above copyright +notice, this list of conditions and the following disclaimer. + +(2) Redistributions in binary form must reproduce
[C++ PACTH] small add_candidates cleanup
I wandered into add_candidates and got confused by !!. There were a few opportunities for other cleanup too, so committed the attached. nathan -- Nathan Sidwell 2017-07-21 Nathan Sidwell * call.c (add_candidates): Move decls to initialization. Don't use !!. Index: call.c === --- call.c (revision 250426) +++ call.c (working copy) @@ -5423,8 +5423,8 @@ add_candidates (tree fns, tree first_arg { tree ctype; const vec *non_static_args; - bool check_list_ctor; - bool check_converting; + bool check_list_ctor = false; + bool check_converting = false; unification_kind_t strict; if (!fns) @@ -5435,7 +5435,7 @@ add_candidates (tree fns, tree first_arg if (DECL_CONV_FN_P (fn)) { check_list_ctor = false; - check_converting = !!(flags & LOOKUP_ONLYCONVERTING); + check_converting = (flags & LOOKUP_ONLYCONVERTING) != 0; if (flags & LOOKUP_NO_CONVERSION) /* We're doing return_type(x). */ strict = DEDUCE_CONV; @@ -5452,18 +5452,13 @@ add_candidates (tree fns, tree first_arg { if (DECL_CONSTRUCTOR_P (fn)) { - check_list_ctor = !!(flags & LOOKUP_LIST_ONLY); + check_list_ctor = (flags & LOOKUP_LIST_ONLY) != 0; /* For list-initialization we consider explicit constructors and complain if one is chosen. */ check_converting = ((flags & (LOOKUP_ONLYCONVERTING|LOOKUP_LIST_INIT_CTOR)) == LOOKUP_ONLYCONVERTING); } - else - { - check_list_ctor = false; - check_converting = false; - } strict = DEDUCE_CALL; ctype = conversion_path ? BINFO_TYPE (conversion_path) : NULL_TREE; } @@ -5476,9 +5471,6 @@ add_candidates (tree fns, tree first_arg for (lkp_iterator iter (fns); iter; ++iter) { - tree fn_first_arg; - const vec *fn_args; - fn = *iter; if (check_converting && DECL_NONCONVERTING_P (fn)) @@ -5486,10 +5478,13 @@ add_candidates (tree fns, tree first_arg if (check_list_ctor && !is_list_ctor (fn)) continue; - /* Figure out which set of arguments to use. */ + tree fn_first_arg = NULL_TREE; + const vec *fn_args = args; + if (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn)) { - /* If this function is a non-static member and we didn't get an + /* Figure out where the object arg comes from. If this + function is a non-static member and we didn't get an implicit object argument, move it out of args. */ if (first_arg == NULL_TREE) { @@ -5506,12 +5501,6 @@ add_candidates (tree fns, tree first_arg fn_first_arg = first_arg; fn_args = non_static_args; } - else - { - /* Otherwise, just use the list of arguments provided. */ - fn_first_arg = NULL_TREE; - fn_args = args; - } if (TREE_CODE (fn) == TEMPLATE_DECL) add_template_candidate (candidates,
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On 21 Jul, David Malcolm wrote: > On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote: >> Hi, >> >> the following patch introduces a new C++ warning option >> -Wduplicated-access-specifiers that warns about redundant >> access-specifiers in classes, e.g. >> >> class B >> { >> public: >> B(); >> >> private: >> void foo(); >> private: >> int i; >> }; >> >> test.cc:8:5: warning: duplicate 'private' access-specifier [ >> -Wduplicated-access-specifiers] >> private: >> ^~~ >> --- >> test.cc:6:5: note: access-specifier was previously given here >> private: >> ^~~ > > Thanks for working on this. > > I'm not able to formally review, but you did CC me; various comments below > throughout. > >> The test is implemented by tracking the location of the last >> access-specifier together with the access-specifier itself. >> The location serves two purposes: the obvious one is to be able to >> print the location in the note that comes with the warning. >> The second purpose is to be able to distinguish an access-specifier >> given by the user from the default of the class type (i.e. public for >> 'struct' and private for 'class') where the location is set to >> UNKNOWN_LOCATION. The warning is only issued if the user gives the >> access-specifier twice, i.e. if the previous location is not >> UNKNOWN_LOCATION. > > Presumably given > > struct foo > { > public: > /* ... * > }; > > we could issue something like: > > warning: access-specifier 'public' is redundant within 'struct' > > for the first; similarly for: > > class bar > { > private: > }; > > we could issue: > > warning: access-specifier 'private' is redundant within 'class' > > >> One could actually make this a two-level warning so that on the >> higher level also the default class-type settings are taken into >> account. Would that be helpful? I could prepare a second patch for >> that. > > I'm not sure how best to structure it. > > FWIW, when I first saw the patch, I wasn't a big fan of the warning, as I > like to use access-specifiers to break up the kinds of entities within a > class. > > For example, our coding guidelines > https://gcc.gnu.org/codingconventions.html#Class_Form > recommend: > > "first define all public types, > then define all non-public types, > then declare all public constructors, > ... > then declare all non-public member functions, and > then declare all non-public member variables." > > I find it's useful to put a redundant "private:" between the last two, > e.g.: > > class baz > { > public: > ... > > private: > void some_private_member (); > > private: > int m_some_field; > }; > > to provide a subdivision between the private member functions and the > private data fields. That's what also can be seen in our libstdc++ to some extent. The other half of the warnings indicate redundant access-specifiers. It's up to the user to keep those duplicate access-specifiers as subdividers or to use something else (like comments) to do that and to switch on the warning for her/his code. Because the subdivider usage seems to be relatively common, I don't want to enable the warning by -Wall or -Wextra. > This might be a violation of our guidelines (though if so, I'm not sure > it's explicitly stated), but I find it useful, and the patch would warn > about it. > > Having said that, looking at e.g. the "jit" subdir, I see lots of > places where the warning would fire. In some of these, the code has a > bit of a "smell", so maybe I like the warning after all... in that it > can be good for a new warning to draw attention to code that might need > work. > > Sorry that this is rambling; comments on the patch inline below. > > Bootstrapped and regtested on x86_64-pc-linux-gnu. >> OK for trunk? >> >> Btw, there are about 50 places in our libstdc++ headers where this >> warning triggers. I'll come up with a patch for this later. >> >> Regards, >> Volker >> >> >> 2017-07-20 Volker Reichelt >> >> * doc/invoke.texi (-Wduplicated-access-specifiers): Document >> new >> warning option. >> >> Index: gcc/doc/invoke.texi >> === >> --- gcc/doc/invoke.texi (revision 250356) >> +++ gcc/doc/invoke.texi (working copy) >> @@ -275,7 +275,7 @@ >> -Wdisabled-optimization @gol >> -Wno-discarded-qualifiers -Wno-discarded-array-qualifiers @gol >> -Wno-div-by-zero -Wdouble-promotion @gol >> --Wduplicated-branches -Wduplicated-cond @gol >> +-Wduplicated-access-specifiers -Wduplicated-branches -Wduplicated >> -cond @gol >> -Wempty-body -Wenum-compare -Wno-endif-labels -Wexpansion-to >> -defined @gol >> -Werror -Werror=* -Wextra-semi -Wfatal-errors @gol >> -Wfloat-equal -Wformat -Wformat=2 @gol >> @@ -5388,6 +5388,12 @@ >> >> This warning is enabled by @option{-Wall}. >> >> +@item -Wduplicated-access-specifiers >> +@opindex Wno-duplicated-access-sp
libgo patch committed: Use a larger stack size in CgoCallbackGC test
This libgo patch tweaks the CgoCallbackGC test to allocate more stack space. This makes it more likely to succeed on a system that does not support split stacks. This test is actually not very meaningful for gccgo at present, but it doesn't hurt to keep running it. Bootstrapped and ran Go tests on x86_64-pc-linux-gnu and powerpc64le-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 250406) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -6572f7e35f962bdb8a7c174920dbb70350b96874 +e34cb8dee6c1f215329e0eea79202b48cb83817c The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/go/runtime/testdata/testprogcgo/callback.go === --- libgo/go/runtime/testdata/testprogcgo/callback.go (revision 250406) +++ libgo/go/runtime/testdata/testprogcgo/callback.go (working copy) @@ -23,7 +23,9 @@ static void foo() { pthread_t th; pthread_attr_t attr; pthread_attr_init(&attr); -pthread_attr_setstacksize(&attr, 256 << 10); +// For gccgo use a stack size large enough for all the callbacks, +// in case we are on a platform that does not support -fsplit-stack. +pthread_attr_setstacksize(&attr, 512 * 1); pthread_create(&th, &attr, thr, 0); pthread_join(th, 0); }
[PING][PATCH][aarch64] Enable ifunc resolver attribute by default
This is a ping for my patch to enable the ifunc resolver by default for aarch64. https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00806.html Steve Ellcey sell...@cavium.com
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On 07/20/2017 10:35 AM, Volker Reichelt wrote: Hi, the following patch introduces a new C++ warning option -Wduplicated-access-specifiers that warns about redundant access-specifiers in classes, e.g. class B { public: B(); private: void foo(); private: int i; }; I'm very fond of warnings that help find real bugs, or even that provide an opportunity to review code for potential problems or inefficiencies and suggest a possibility to improve it in some way (make it clearer, or easier for humans or compilers to find real bugs in, or faster, etc.), even if the code isn't strictly incorrect. In this case I'm having trouble coming up with an example where the warning would have this effect. What do you have in mind? (Duplicate access specifiers tend to crop up in large classes and/or as a result of preprocessor conditionals.) Btw., there are examples of poor coding practices where I can imagine a warning focused on access specifiers being helpful. For instance, a class whose member functions maintain non-trivial invariants among its data members should declare the data private to prevent clients from inadvertently breaking those invariants. A specific example might be a container class (like string or vector) that owns the block of memory it allocates. A warning that detected when such members are publicly accessible could help improve encapsulation. I suspect this would be challenging to implement and would likely require some non-trivial analysis in the middle end. Another example is a class that defines an inaccessible member that isn't used (e.g., class A { int i; }; that Clang detects with -Wunused-private-field; or class A { void f () { } };). I would expect this to be doable in the front end. Martin
Re: [PATCH, committed] Fix PR81162
Hi Richard, I would like to backport this fix to GCC 5, 6, and 7. All have passed regstrap on powerpc64le-linux-gnu (with the test case moved to gcc.dg/ubsan, of course). Is this ok? Thanks! -- Bill > On Jul 14, 2017, at 1:05 PM, Bill Schmidt wrote: > > Hi, > > PR81162 identifies a bug in SLSR involving overflow that occurs when > replacing a NEGATE_EXPR with a PLUS_EXPR. This is another example > of an unprofitable transformation that should be skipped anyway, > hence this simple patch. Bootstrapped and tested on > powerpc64le-unknown-linux-gnu, committed. Test case provided from > the bug report. > > Thanks, > Bill > > > [gcc] > > 2016-07-14 Bill Schmidt > > PR tree-optimization/81162 > * gimple-ssa-strength-reduction.c (replace_mult_candidate): Don't > replace a negate with an add. > > [gcc/testsuite] > > 2016-07-14 Bill Schmidt > > PR tree-optimization/81162 > * gcc.dg/pr81162.c: New file. > > > Index: gcc/gimple-ssa-strength-reduction.c > === > --- gcc/gimple-ssa-strength-reduction.c (revision 250189) > +++ gcc/gimple-ssa-strength-reduction.c (working copy) > @@ -2082,13 +2082,14 @@ replace_mult_candidate (slsr_cand_t c, tree basis_ > types but allows for safe negation without twisted logic. */ > if (wi::fits_shwi_p (bump) > && bump.to_shwi () != HOST_WIDE_INT_MIN > - /* It is not useful to replace casts, copies, or adds of > + /* It is not useful to replace casts, copies, negates, or adds of >an SSA name and a constant. */ > && cand_code != SSA_NAME > && !CONVERT_EXPR_CODE_P (cand_code) > && cand_code != PLUS_EXPR > && cand_code != POINTER_PLUS_EXPR > - && cand_code != MINUS_EXPR) > + && cand_code != MINUS_EXPR > + && cand_code != NEGATE_EXPR) > { > enum tree_code code = PLUS_EXPR; > tree bump_tree; > Index: gcc/testsuite/gcc.dg/pr81162.c > === > --- gcc/testsuite/gcc.dg/pr81162.c(nonexistent) > +++ gcc/testsuite/gcc.dg/pr81162.c(working copy) > @@ -0,0 +1,17 @@ > +/* PR tree-optimization/81162 */ > +/* { dg-do run } */ > +/* { dg-options "-fsanitize=undefined -O2" } */ > + > +short s; > +int i1 = 1; > +int i2 = 1; > +unsigned char uc = 147; > + > +int main() { > + s = (-uc + 2147483647) << 0; > + if (9031239389974324562ULL >= (-((i1 && i2) + uc) ^ -21096) ) { > +return 0; > + } else { > +return -1; > + } > +} >
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 V2
On 07/20/2017 07:23 AM, Segher Boessenkool wrote: > Hi Jeff, > > On Tue, Jul 18, 2017 at 11:17:19PM -0600, Jeff Law wrote: >> >> The biggest change in this update to patch 01/08 is moving of stack >> clash protection out of -fstack-check= and into its own option, >> -fstack-clash-protection. I believe other issues raised by reviewers >> have been addressed as well. > > I think the documentation for the new option should say this only > provides partial protection on targets that do not have fuller protection > enabled. Is there some way for users to find out which targets those > are / if their target is one of those? Agreed. I've added a note in the documentation. Sadly, there's no way short of listing them and keeping that list up-to-date over time by hand. If we want to do that, I would suggest we note the processors with full support as well as those with partial support using the -fstack-check=specific prologues. I think we ought to provide that target specific information in the documentation, though I worry it will get out of date. Thoughts? jeff
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On 21 Jul, David Malcolm wrote: > On Fri, 2017-07-21 at 11:56 -0400, David Malcolm wrote: >> On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote: >> > Hi, >> > >> > the following patch introduces a new C++ warning option >> > -Wduplicated-access-specifiers that warns about redundant >> > access-specifiers in classes, e.g. >> > >> > class B >> > { >> > public: >> > B(); >> > >> > private: >> > void foo(); >> > private: >> > int i; >> > }; >> > >> > test.cc:8:5: warning: duplicate 'private' access-specifier [ >> > -Wduplicated-access-specifiers] >> > private: >> > ^~~ >> > --- >> > test.cc:6:5: note: access-specifier was previously given here >> > private: >> > ^~~ >> >> Thanks for working on this. >> >> I'm not able to formally review, but you did CC me; various comments >> below throughout. >> >> > The test is implemented by tracking the location of the last >> > access-specifier together with the access-specifier itself. >> > The location serves two purposes: the obvious one is to be able to >> > print the location in the note that comes with the warning. >> > The second purpose is to be able to distinguish an access-specifier >> > given by the user from the default of the class type (i.e. public >> > for >> > 'struct' and private for 'class') where the location is set to >> > UNKNOWN_LOCATION. The warning is only issued if the user gives the >> > access-specifier twice, i.e. if the previous location is not >> > UNKNOWN_LOCATION. >> >> Presumably given >> >> struct foo >> { >> public: >> /* ... * >> }; >> >> we could issue something like: >> >> warning: access-specifier 'public' is redundant within 'struct' >> >> for the first; similarly for: >> >> class bar >> { >> private: >> }; >> >> we could issue: >> >> warning: access-specifier 'private' is redundant within 'class' >> >> >> > One could actually make this a two-level warning so that on the >> > higher level also the default class-type settings are taken into >> > account. Would that be helpful? I could prepare a second patch for >> > that. >> >> I'm not sure how best to structure it. > > Maybe combine the two ideas, and call it > -Wredundant-access-specifiers > ? > > Or just "-Waccess-specifiers" ? > > [...snip...] > > Dave Thanks for having a look at this! My favorite version would be to use '-Waccess-specifiers=1' for the original warning and '-Waccess-specifiers=2' for the stricter version that also checks against the class-type default. We could then let '-Waccess-specifiers' default to any of those two. I'm afraid that level 2 will fire way too often for legacy code (and there might be coding conventions that require an access specifier at the beginning of a class/struct). So I'd vote for level 1 as default. The last argument is also the reason why I'd like to see a two-lveel warning instead of just different error messages (although these are very helpful for seeing what's really goiing on with your code). What do you think? Regards, Volker
Re: [RFC/SCCVN] Handle BIT_INSERT_EXPR in vn_nary_op_eq
On Wed, Jul 19, 2017 at 11:13 AM, Richard Biener wrote: > On July 19, 2017 6:10:28 PM GMT+02:00, Andrew Pinski > wrote: >>On Mon, Jul 17, 2017 at 3:02 AM, Richard Biener >> wrote: >>> On Thu, Jul 13, 2017 at 6:18 AM, Andrew Pinski >>wrote: On Wed, Jul 12, 2017 at 9:10 PM, Marc Glisse >>wrote: > On Wed, 12 Jul 2017, Andrew Pinski wrote: > >> Hi, >> Unlike most other expressions, BIT_INSERT_EXPR has an implicit >> operand of the precision/size of the second operand. This means >>if we >> have an integer constant for the second operand and that compares >>to >> the same constant value, vn_nary_op_eq would return that these two >> expressions are the same. But in the case I was looking into the >> integer constants had different types, one with 1 bit precision >>and >> the other with 2 bit precision which means the BIT_INSERT_EXPR >>were >> not equal at all. >> >> This patches the problem by checking to see if BIT_INSERT_EXPR's >> operand 1's (second operand) type has different precision to >>return >> false. >> >> Is this the correct location or should we be checking for this >> differently? If this is the correct location, is the patch ok? >> Bootstrapped and tested on aarch64-linux-gnu with no regressions >>(and >> also tested with a few extra patches to expose BIT_INSERT_EXPR). >> >> Thanks, >> Andrew Pinski >> >> ChangeLog: >> * tree-ssa-sccvn.c (vn_nary_op_eq): Check BIT_INSERT_EXPR's >>operand 1 >> to see if the types precision matches. > > > Hello, > > since BIT_INSERT_EXPR is implicitly an expression with 4 arguments, >>it makes > sense that we may need a few such special cases. But shouldn't the >>hash > function be in sync with the equality comparator? Does >>operand_equal_p need > the same? The hash function does not need to be exactly the same. The only requirement there is if vn_nary_op_eq returns true then the hash has to be the same. Now we could improve the hash by using the >>precision which will allow us not to compare as much in some cases. Yes operand_equal_p needs the same handling; I did not notice that until you mention it.. Right now it does: case BIT_INSERT_EXPR: return OP_SAME (0) && OP_SAME (1) && OP_SAME (2); >>> >>> Aww. The issue is that operand_equal_p treats INTEGER_CSTs of >>different >>> type/precision but the same value as equal. >>> >>> Revisiting that, while a good idea, shouldn't block a fix here. So >>... >>> >>> Index: tree-ssa-sccvn.c >>> === >>> --- tree-ssa-sccvn.c(revision 250159) >>> +++ tree-ssa-sccvn.c(working copy) >>> @@ -2636,6 +2636,14 @@ vn_nary_op_eq (const_vn_nary_op_t const >>> if (!expressions_equal_p (vno1->op[i], vno2->op[i])) >>>return false; >>> >>> + /* BIT_INSERT_EXPR has an implict operand as the type precision >>> + of op1. Need to check to make sure they are the same. */ >>> + if (vno1->opcode == BIT_INSERT_EXPR) >>> +if (INTEGRAL_TYPE_P (TREE_TYPE (vno1->op[0])) >>> + && TYPE_PRECISION (TREE_TYPE (vno1->op[1])) >>> + != TYPE_PRECISION (TREE_TYPE (vno2->op[1]))) >>> + return false; >>> + >>> >>> the case can be restricted to INTEGER_CST vno1->op[0] I think: >>> >>> if (vno1->opcode == BIT_INSERT_EXPR >>> && TREE_CODE (vno1->op[0]) == INTEGER_CST >>> && TYPE_PRECISION ( >>> >>> and yes, operand_equal_p needs a similar fix. Can you re-post with >>that added? >> >>Here is that with the changes you requested too. >> >>> Do you have a testcase? >> >>I don't have one which fails with the trunk. With lowering of >>bit-fields accesses (which I hope to submit soon; just getting in the >>required patches first), many testcases fail (bootstrap fails for the >>same reason too). >> >>OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > In the fold-const.c hunk you need to verify arg1 op0 is an INTEGER_CST. This > is not necessary in sccvn because we passed operand_equal_p first. > > OK with that change. This is what I applied in the end. It is not op0 that needs to be changed here but rather op1; I had a typo in the fold-const.c check. Thanks, Andrew > > Richard. > >>Thanks, >>Andrew >> >>ChangeLog: >>* tree-ssa-sccvn.c (vn_nary_op_eq): Check BIT_INSERT_EXPR's operand 1 >>to see if the types precision matches. >>* fold-const.c (operand_equal_p): Likewise, >> >> >>> >>> Thanks, >>> Richard. >>> Thanks, Andrew Pinski > > -- > Marc Glisse > Index: fold-const.c === --- fold-const.c(revision 250430) +++ fold-const.c(working copy) @@ -3184,9 +3184,18 @@ operand_equal_p (const_tree arg0, const_ flags &= ~OEP_ADDRESS_OF; return OP_SAME (0); + ca
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 08/08 V2
On 07/21/2017 07:23 AM, Andreas Krebbel wrote: > Hi, > > I've used your patch as the base and applied my changes on top. The > attached patch is the result, so it is supposed to replace your > version. It now also supports emitting a runtime loop. Thanks a ton! I'll roll your changes into the V3 patch. > > It bootstraps fine but unfortunately I see an Ada regression which I > haven't tracked down yet. > >> FAIL: cb1010a >> FAIL: gnat.dg/stack_check1.adb execution test >> FAIL: gnat.dg/stack_check1.adb execution test FWIW, due to time constraints I haven't been testing Ada. An educated guess is that this is related to the #define STACK_CHECK_STATIC_BUILTIN which was never right for s390, but was useful temporarily. I bet pulling that out should fix the Ada regression. One of the design goals of this work is we're not supposed to affect Ada code generation at all. There's one slight tweak I think we'll want to make as part of the V3 patch I'm about ready to post. Specifically I have introduced PARAMS to control the size of the guard an the size of the probe interval. The size of the guard affects if we're going to need static probes. For s390 I think that we just replace PROBE_INTERVAL with the PARAM_VALUE for the guard size when we initialize last_probe_offset. The remainder of PROBE_INTERVAL uses turn into the PARAM_VALUE for the probe interval size. I can cobble those together easily I think. Thanks again! Jeff
std::list optimizations
Hi Here is a proposal for 2 optimizations in the std::list implementation. Optimization on the move constructor taking an allocator for always equal allocators. Compare to the version in my previous std::list patch I am now doing it at std::list level rather than at _List_base level. This way we won't instantiate the insert call and we won't check for empty list when the allocator always compare equal. 2nd optimization, I replace the _S_distance method by the std::distance algo which benefit from the nice [begin(), end()) range optimization when cxx11 abi is being used. Note that I am proposing the 2 change in 1 patch to save some review time but I can commit those separately. Tested under x86_64 Linux normal mode. * include/bits/stl_list.h (_List_base<>::_S_distance): Remove. (_List_impl(_List_impl&&, _Node_alloc_type&&)): New. (_List_base(_List_base&&, _Node_alloc_type&&)): Use latter. (_List_base(_Node_alloc_type&&)): New. (_List_base<>::_M_distance, _List_base<>::_M_node_count): Move... (list<>::_M_distance, list<>::_M_node_count): ...here. Replace calls to _S_distance with calls to std::distance. (list(list&&, const allocator_type&, true_type)): New. (list(list&&, const allocator_type&, false_type)): New. (list(list&&, const allocator_type&)): Adapt to call latters. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h index cef94f7..aaff500 100644 --- a/libstdc++-v3/include/bits/stl_list.h +++ b/libstdc++-v3/include/bits/stl_list.h @@ -364,19 +364,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 rebind<_List_node<_Tp> >::other _Node_alloc_type; typedef __gnu_cxx::__alloc_traits<_Node_alloc_type> _Node_alloc_traits; - static size_t - _S_distance(const __detail::_List_node_base* __first, - const __detail::_List_node_base* __last) - { - size_t __n = 0; - while (__first != __last) - { - __first = __first->_M_next; - ++__n; - } - return __n; - } - struct _List_impl : public _Node_alloc_type { @@ -393,6 +380,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 #if __cplusplus >= 201103L _List_impl(_List_impl&&) = default; + _List_impl(_List_impl&& __x, _Node_alloc_type&& __a) + : _Node_alloc_type(std::move(__a)), _M_node(std::move(__x._M_node)) + { } + _List_impl(_Node_alloc_type&& __a) noexcept : _Node_alloc_type(std::move(__a)) { } @@ -409,28 +400,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 void _M_inc_size(size_t __n) { _M_impl._M_node._M_size += __n; } void _M_dec_size(size_t __n) { _M_impl._M_node._M_size -= __n; } - - size_t - _M_distance(const __detail::_List_node_base* __first, - const __detail::_List_node_base* __last) const - { return _S_distance(__first, __last); } - - // return the stored size - size_t _M_node_count() const { return _M_get_size(); } #else // dummy implementations used when the size is not stored size_t _M_get_size() const { return 0; } void _M_set_size(size_t) { } void _M_inc_size(size_t) { } void _M_dec_size(size_t) { } - size_t _M_distance(const void*, const void*) const { return 0; } - - // count the number of nodes - size_t _M_node_count() const - { - return _S_distance(_M_impl._M_node._M_next, - std::__addressof(_M_impl._M_node)); - } #endif typename _Node_alloc_traits::pointer @@ -466,12 +441,12 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 _List_base(_List_base&&) = default; _List_base(_List_base&& __x, _Node_alloc_type&& __a) + : _M_impl(std::move(__x), std::move(__a)) + { } + + _List_base(_Node_alloc_type&& __a) : _M_impl(std::move(__a)) - { - if (__x._M_get_Node_allocator() == _M_get_Node_allocator()) - _M_move_nodes(std::move(__x)); - // else caller must move individual elements. - } + { } void _M_move_nodes(_List_base&& __x) @@ -616,6 +591,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 } #endif +#if _GLIBCXX_USE_CXX11_ABI + size_t + _M_distance(const_iterator __first, const_iterator __last) const + { return std::distance(__first, __last); } + + // return the stored size + size_t + _M_node_count() const + { return this->_M_get_size(); } +#else + // dummy implementations used when the size is not stored + size_t _M_distance(const_iterator, const_iterator) const { return 0; } + + // count the number of nodes + size_t + _M_node_count() const + { return std::distance(begin(), end()); } +#endif + public: // [23.2.2.1] construct/copy/destroy // (assign() and get_allocator() are also listed in this section) @@ -718,15 +712,27 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 : _Base(_Node_alloc_type(__a)) { _M_initialize_dispatch(__x.begin(), __x.end(), __false_type()); } - list(list&& __x, const allocator_type& __a) - noexcept(_Nod
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On Fri, 2017-07-21 at 11:56 -0400, David Malcolm wrote: > On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote: > > Hi, > > > > the following patch introduces a new C++ warning option > > -Wduplicated-access-specifiers that warns about redundant > > access-specifiers in classes, e.g. > > > > class B > > { > > public: > > B(); > > > > private: > > void foo(); > > private: > > int i; > > }; > > > > test.cc:8:5: warning: duplicate 'private' access-specifier [ > > -Wduplicated-access-specifiers] > > private: > > ^~~ > > --- > > test.cc:6:5: note: access-specifier was previously given here > > private: > > ^~~ > > Thanks for working on this. > > I'm not able to formally review, but you did CC me; various comments > below throughout. > > > The test is implemented by tracking the location of the last > > access-specifier together with the access-specifier itself. > > The location serves two purposes: the obvious one is to be able to > > print the location in the note that comes with the warning. > > The second purpose is to be able to distinguish an access-specifier > > given by the user from the default of the class type (i.e. public > > for > > 'struct' and private for 'class') where the location is set to > > UNKNOWN_LOCATION. The warning is only issued if the user gives the > > access-specifier twice, i.e. if the previous location is not > > UNKNOWN_LOCATION. > > Presumably given > > struct foo > { > public: > /* ... * > }; > > we could issue something like: > > warning: access-specifier 'public' is redundant within 'struct' > > for the first; similarly for: > > class bar > { > private: > }; > > we could issue: > > warning: access-specifier 'private' is redundant within 'class' > > > > One could actually make this a two-level warning so that on the > > higher level also the default class-type settings are taken into > > account. Would that be helpful? I could prepare a second patch for > > that. > > I'm not sure how best to structure it. Maybe combine the two ideas, and call it -Wredundant-access-specifiers ? Or just "-Waccess-specifiers" ? [...snip...] Dave
Re: [PATCH][AArch64] vec_pack_trunc_ should split after register allocator
On Thu, Apr 27, 2017 at 05:08:38AM +, Hurugalawadi, Naveen wrote: > Hi, > > The instruction "vec_pack_trunc_" should be split after register > allocator for scheduling reasons. Currently the instruction is marked as type > multiple which means it will scheduled as single issued. However, nothing can > be scheduled with either xtn/xtn2 which is a problem in some cases. What's the reason for splitting this only after reload? I think we can split this whenever we like, and that there isn't any benefit in keeping the pair together? Am I missing something? Thanks, James > > The patch splits the instruction and fixes the issue. > > Please review the patch and let me know if its okay. > Bootstrapped and Regression tested on aarch64-thunder-linux. > > 2017-04-27 Naveen H.S > > * config/aarch64/aarch64-simd.md > (aarch64_simd_vec_pack_trunc_hi_): New pattern. > (vec_pack_trunc_): Split the instruction pattern.
Re: C PATCH to fix bogus warning with -Wmultistatement-macros (PR c/81364)
On Fri, 2017-07-14 at 15:38 +0200, Marek Polacek wrote: > I think David might be able to approve this one, so CCing. It's something of a stretch for my "diagnostic messages" maintainer hat, but I've looked over these changes and they look good to me. Dave > On Tue, Jul 11, 2017 at 03:23:16PM +0200, Marek Polacek wrote: > > This patch fixes a bogus -Wmultistatement-macros warning. The code > > didn't > > notice that what came after a guard such as else was actually > > wrapped in { } > > which is a correct use. This bogus warning only triggered when the > > body of > > a conditional was coming from a different expansion than the > > conditional > > itself. > > > > Bootstrapped/regtested on x86_64-linux, ok for trunk? > > > > 2017-07-11 Marek Polacek > > > > PR c/81364 > > * c-parser.c (c_parser_else_body): Don't warn about > > multistatement > > macro expansion if the body is in { }. > > (c_parser_while_statement): Likewise. > > (c_parser_for_statement): Likewise. > > > > * Wmultistatement-macros-12.c: New test. > > > > diff --git gcc/c/c-parser.c gcc/c/c-parser.c > > index f8fbc92..7524a73 100644 > > --- gcc/c/c-parser.c > > +++ gcc/c/c-parser.c > > @@ -5557,7 +5557,8 @@ c_parser_else_body (c_parser *parser, const > > token_indent_info &else_tinfo, > > } > >else > > { > > - body_loc_after_labels = c_parser_peek_token (parser) > > ->location; > > + if (!c_parser_next_token_is (parser, CPP_OPEN_BRACE)) > > + body_loc_after_labels = c_parser_peek_token (parser) > > ->location; > >c_parser_statement_after_labels (parser, NULL, chain); > > } > > > > @@ -5811,6 +5812,7 @@ c_parser_while_statement (c_parser *parser, > > bool ivdep, bool *if_p) > > = get_token_indent_info (c_parser_peek_token (parser)); > > > >location_t loc_after_labels; > > + bool open_brace = c_parser_next_token_is (parser, > > CPP_OPEN_BRACE); > >body = c_parser_c99_block_statement (parser, if_p, > > &loc_after_labels); > >c_finish_loop (loc, cond, NULL, body, c_break_label, > > c_cont_label, true); > >add_stmt (c_end_compound_stmt (loc, block, flag_isoc99)); > > @@ -5820,7 +5822,7 @@ c_parser_while_statement (c_parser *parser, > > bool ivdep, bool *if_p) > > = get_token_indent_info (c_parser_peek_token (parser)); > >warn_for_misleading_indentation (while_tinfo, body_tinfo, > > next_tinfo); > > > > - if (next_tinfo.type != CPP_SEMICOLON) > > + if (next_tinfo.type != CPP_SEMICOLON && !open_brace) > > warn_for_multistatement_macros (loc_after_labels, > > next_tinfo.location, > > while_tinfo.location, > > RID_WHILE); > > > > @@ -6109,6 +6111,7 @@ c_parser_for_statement (c_parser *parser, > > bool ivdep, bool *if_p) > > = get_token_indent_info (c_parser_peek_token (parser)); > > > >location_t loc_after_labels; > > + bool open_brace = c_parser_next_token_is (parser, > > CPP_OPEN_BRACE); > >body = c_parser_c99_block_statement (parser, if_p, > > &loc_after_labels); > > > >if (is_foreach_statement) > > @@ -6122,7 +6125,7 @@ c_parser_for_statement (c_parser *parser, > > bool ivdep, bool *if_p) > > = get_token_indent_info (c_parser_peek_token (parser)); > >warn_for_misleading_indentation (for_tinfo, body_tinfo, > > next_tinfo); > > > > - if (next_tinfo.type != CPP_SEMICOLON) > > + if (next_tinfo.type != CPP_SEMICOLON && !open_brace) > > warn_for_multistatement_macros (loc_after_labels, > > next_tinfo.location, > > for_tinfo.location, RID_FOR); > > > > diff --git gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c > > gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c > > index e69de29..ac8915c 100644 > > --- gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c > > +++ gcc/testsuite/c-c++-common/Wmultistatement-macros-12.c > > @@ -0,0 +1,43 @@ > > +/* PR c/81364 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-Wmultistatement-macros" } */ > > + > > +#define FOO0 if (1) { } else > > +#define TST0 \ > > +void bar0 (void) \ > > +{ \ > > + FOO0 { } /* { dg-bogus "macro expands to multiple statements" } > > */ \ > > +} > > +TST0 > > + > > +#define FOO1 for (;;) > > +#define TST1 \ > > +void bar1 (void) \ > > +{ \ > > + FOO1 { } /* { dg-bogus "macro expands to multiple statements" } > > */ \ > > +} > > +TST1 > > + > > +#define FOO2 while (1) > > +#define TST2 \ > > +void bar2 (void) \ > > +{ \ > > + FOO2 { } /* { dg-bogus "macro expands to multiple statements" } > > */ \ > > +} > > +TST2 > > + > > +#define FOO3 switch (1) > > +#define TST3 \ > > +void bar3 (void) \ > > +{ \ > > + FOO3 { } /* { dg-bogus "macro expands to multiple statements" } > > */ \ > > +} > > +TST3 > > + > > +#define FOO4 if (1) > > +#define TST4 \ > > +void bar4 (void) \ > > +{ \ > > + FOO4 { } /* { dg-bogus "macro expands to multiple statements" } > > */ \ > > +} > > +TST4 > > > > Marek > > Marek
[PATCH 2/3] Use BUILD_PATH_PREFIX_MAP envvar to transform __FILE__
Use the BUILD_PATH_PREFIX_MAP environment variable when expanding the __FILE__ macro, in the same way that debug-prefix-map works for debugging symbol paths. This patch follows similar lines to the earlier patch for SOURCE_DATE_EPOCH. Specifically, we read the environment variable not in libcpp but via a hook which has an implementation defined in gcc/c-family. However, to achieve this is more complex than the earlier patch: we need to share the prefix_map data structure and associated functions between libcpp and c-family. Therefore, we need to move these to libiberty. (For comparison, the SOURCE_DATE_EPOCH patch did not need this because time_t et. al. are in the standard C library.) Acknowledgements Dhole who wrote the earlier patch for SOURCE_DATE_EPOCH which saved me a lot of time on figuring out what to edit. ChangeLogs -- gcc/ChangeLog: 2017-07-21 Ximin Luo * doc/invoke.texi (Environment Variables): Document form and behaviour of BUILD_PATH_PREFIX_MAP. gcc/c-family/ChangeLog: 2017-07-21 Ximin Luo * c-common.c (cb_get_build_path_prefix_map): Define new call target. * c-common.h (cb_get_build_path_prefix_map): Declare call target. * c-lex.c (init_c_lex): Set the get_build_path_prefix_map callback. libcpp/ChangeLog: 2017-07-21 Ximin Luo * include/cpplib.h (cpp_callbacks): Add get_build_path_prefix_map callback. * init.c (cpp_create_reader): Initialise build_path_prefix_map field. * internal.h (cpp_reader): Add new field build_path_prefix_map. * macro.c (_cpp_builtin_macro_text): Set the build_path_prefix_map field if unset and apply it when expanding __FILE__ macros. gcc/testsuite/ChangeLog: 2017-07-21 Ximin Luo * gcc.dg/cpp/build_path_prefix_map-1.c: New test. * gcc.dg/cpp/build_path_prefix_map-2.c: New test. Index: gcc-8-20170716/gcc/doc/invoke.texi === --- gcc-8-20170716.orig/gcc/doc/invoke.texi +++ gcc-8-20170716/gcc/doc/invoke.texi @@ -27197,6 +27197,26 @@ Recognize EUCJP characters. If @env{LANG} is not defined, or if it has some other value, then the compiler uses @code{mblen} and @code{mbtowc} as defined by the default locale to recognize and translate multibyte characters. + +@item BUILD_PATH_PREFIX_MAP +@findex BUILD_PATH_PREFIX_MAP +If this variable is set, it specifies an ordered map used to transform +filepaths output in debugging symbols and expansions of the @code{__FILE__} +macro. This may be used to achieve fully reproducible output. In the context +of running GCC within a higher-level build tool, it is typically more reliable +than setting command line arguments such as @option{-fdebug-prefix-map} or +common environment variables such as @env{CFLAGS}, since the build tool may +save these latter values into other output outside of GCC's control. + +The value is of the form +@samp{@var{dst@r{[0]}}=@var{src@r{[0]}}:@var{dst@r{[1]}}=@var{src@r{[1]}}@r{@dots{}}}. +If any @var{dst@r{[}i@r{]}} or @var{src@r{[}i@r{]}} contains @code{%}, @code{=} +or @code{:} characters, they must be replaced with @code{%#}, @code{%+}, and +@code{%.} respectively. + +Whenever GCC emits a filepath that starts with a whole path component matching +@var{src@r{[}i@r{]}} for some @var{i}, with rightmost @var{i} taking priority, +the matching part is replaced with @var{dst@r{[}i@r{]}} in the final output. @end table @noindent Index: gcc-8-20170716/gcc/c-family/c-common.c === --- gcc-8-20170716.orig/gcc/c-family/c-common.c +++ gcc-8-20170716/gcc/c-family/c-common.c @@ -21,6 +21,7 @@ along with GCC; see the file COPYING3. #include "config.h" #include "system.h" +#include "prefix-map.h" #include "coretypes.h" #include "target.h" #include "function.h" @@ -7905,6 +7906,25 @@ cb_get_source_date_epoch (cpp_reader *pf return (time_t) epoch; } +/* Read BUILD_PATH_PREFIX_MAP from environment to have deterministic relative + paths to replace embedded absolute paths to get reproducible results. + Returns NULL if BUILD_PATH_PREFIX_MAP is badly formed. */ + +prefix_map ** +cb_get_build_path_prefix_map (cpp_reader *pfile ATTRIBUTE_UNUSED) +{ + prefix_map **map = XCNEW (prefix_map *); + + const char *arg = getenv ("BUILD_PATH_PREFIX_MAP"); + if (!arg || prefix_map_parse (map, arg)) +return map; + + free (map); + error_at (input_location, "environment variable BUILD_PATH_PREFIX_MAP is " + "not well formed; see the GCC documentation for more details."); + return NULL; +} + /* Callback for libcpp for offering spelling suggestions for misspelled directives. GOAL is an unrecognized string; CANDIDATES is a NULL-terminated array of candidate strings. Return the closest Index: gcc-8-20170716/gcc/c-family/c-common.h === --- gcc-8-20170
[PATCH 1/3] Use BUILD_PATH_PREFIX_MAP envvar for debug-prefix-map
Define the BUILD_PATH_PREFIX_MAP environment variable, and treat it as implicit -fdebug-prefix-map CLI options specified before any explicit such options. Much of the generic code for applying and parsing prefix-maps is implemented in libiberty instead of the dwarf2 parts of the code, in order to make subsequent patches unrelated to debuginfo easier. Acknowledgements Daniel Kahn Gillmor who wrote the patch for r231835, which saved me a lot of time figuring out what to edit. HW42 for discussion on the details of the proposal, and for suggesting that we retain the ability to map the prefix to something other than ".". Other contributors to the BUILD_PATH_PREFIX_MAP specification, see https://reproducible-builds.org/specs/build-path-prefix-map/ ChangeLogs -- include/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.h: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. libiberty/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.c: New file implementing the BUILD_PATH_PREFIX_MAP specification; includes code from /gcc/final.c and code adapted from examples attached to the specification. * Makefile.in: Update for new files. gcc/ChangeLog: 2017-07-21 Ximin Luo * debug.h: Declare add_debug_prefix_map_from_envvar. * final.c: Define add_debug_prefix_map_from_envvar, and refactor prefix-map utilities to use equivalent code from libiberty instead. * opts-global.c: (handle_common_deferred_options): Call add_debug_prefix_map_from_envvar before processing options. gcc/testsuite/ChangeLog: 2017-07-21 Ximin Luo * lib/gcc-dg.exp: Allow dg-set-compiler-env-var to take only one argument in which case it unsets the given env var. * gcc.dg/debug/dwarf2/build_path_prefix_map-1.c: New test. * gcc.dg/debug/dwarf2/build_path_prefix_map-2.c: New test. Index: gcc-8-20170716/include/prefix-map.h === --- /dev/null +++ gcc-8-20170716/include/prefix-map.h @@ -0,0 +1,94 @@ +/* Declarations for manipulating filename prefixes. + Written 2017 by Ximin Luo + This code is in the public domain. */ + +#ifndef _PREFIX_MAP_H +#define _PREFIX_MAP_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_STDLIB_H +#include +#endif + +/* Linked-list of mappings from old prefixes to new prefixes. */ + +struct prefix_map +{ + const char *old_prefix; + const char *new_prefix; + size_t old_len; + size_t new_len; + struct prefix_map *next; +}; + + +/* Find a mapping suitable for the given OLD_NAME in the linked list MAP.\ + + If a mapping is found, writes a pointer to the non-matching suffix part of + OLD_NAME in SUFFIX, and its length in SUF_LEN. + + Returns NULL if there was no suitable mapping. */ +struct prefix_map * +prefix_map_find (struct prefix_map *map, const char *old_name, +const char **suffix, size_t *suf_len); + +/* Prepend a prefix map before a given SUFFIX. + + The remapped name is written to NEW_NAME and returned as a const pointer. No + allocations are performed; the caller must ensure it can hold at least + MAP->NEW_LEN + SUF_LEN + 1 characters. */ +const char * +prefix_map_prepend (struct prefix_map *map, char *new_name, + const char *suffix, size_t suf_len); + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + pointer to newly-allocated memory for the remapped filename. The memory is + allocated by the given ALLOC function, which also determines who is + responsible for freeing it. */ +#define prefix_map_remap_alloc_(map_head, old_name, alloc)\ + __extension__ \ + ({ \ +const char *__suffix; \ +size_t __suf_len; \ +struct prefix_map *__map; \ +(__map = prefix_map_find ((map_head), (old_name), &__suffix, &__suf_len)) \ + ? prefix_map_prepend (__map,\ + (char *) alloc (__map->new_len + __suf_len + 1), \ + __suffix, __suf_len) \ + : (old_name); \ + }) + +/* Remap a filename. + + Returns OLD_NAME unchanged if there was no remapping, otherwise returns a + stack-allocated pointer to the newly-remapped filename. */ +#define prefix_map_remap_alloca(map_head, old_name) \ + prefix_map_remap_alloc_ (map_head, old_name, alloca) + + +/* Parse prefix-maps acc
[PATCH 3/3] When remapping paths, only match whole path components
Change the remapping algorithm so that each old_prefix only matches paths that have old_prefix as a whole path component prefix. (A whole path component is a part of a path that begins and ends at a directory separator or at either end of the path string.) This remapping algorithm is more predictable than the old algorithm, because there is no chance of mappings for one directory interfering with mappings for other directories. It contains less corner cases and therefore it is easier for users to figure out how to set the mapping appropriately. Therefore, I believe it is better as a standardised algorithm that other build tools might like to adopt, and so in our BUILD_PATH_PREFIX_MAP specification we recommend this algorithm - though we allow others, and explicitly mention GCC's current algorithm. But it would be good for GCC to adopt this newer and cleaner one. (The original idea came from discussions with rustc developers on this topic.) This does technically break backwards-compatibility, but I was under the impression that this option was not seen as such a critical feature, that this would be too important. Nevertheless, this part is totally independent from the other patches and may be included or excluded as GCC maintainers desire. Acknowledgements Discussions with Michael Woerister and other members of the Rust compiler team on Github, and discussions with Daniel Shahaf on the rb-general@ mailing list on lists.reproducible-builds.org. ChangeLogs -- libiberty/ChangeLog: 2017-07-21 Ximin Luo * prefix-map.c: When remapping paths, only match whole path components. Index: gcc-8-20170716/libiberty/prefix-map.c === --- gcc-8-20170716.orig/libiberty/prefix-map.c +++ gcc-8-20170716/libiberty/prefix-map.c @@ -87,12 +87,22 @@ struct prefix_map * prefix_map_find (struct prefix_map *map, const char *old_name, const char **suffix, size_t *suf_len) { + size_t len; + for (; map; map = map->next) -if (filename_ncmp (old_name, map->old_prefix, map->old_len) == 0) - { - *suf_len = strlen (*suffix = old_name + map->old_len); - break; - } +{ + len = map->old_len; + /* Ignore trailing path separators at the end of old_prefix */ + while (len > 0 && IS_DIR_SEPARATOR (map->old_prefix[len-1])) len--; + /* Check if old_name matches old_prefix at a path component boundary */ + if (! filename_ncmp (old_name, map->old_prefix, len) + && (IS_DIR_SEPARATOR (old_name[len]) + || old_name[len] == '\0')) + { + *suf_len = strlen (*suffix = old_name + len); + break; + } +} return map; }
[PING^4][PATCH v2] Generate reproducible output independently of the build-path
(Please keep me on CC, I am not subscribed) Proposal This patch series adds a new environment variable BUILD_PATH_PREFIX_MAP. When this is set, GCC will treat this as extra implicit "-fdebug-prefix-map=$value" command-line arguments that precede any explicit ones. This makes the final binary output reproducible, and also hides the unreproducible value (the source path prefixes) from CFLAGS et. al. which many build tools (understandably) embed as-is into their build output. This environment variable also acts on the __FILE__ macro, mapping it in the same way that debug-prefix-map works for debug symbols. We have seen that __FILE__ is also a very large source of unreproducibility, and is represented quite heavily in the 3k+ figure given earlier. Finally, we tweak the mapping algorithm so that it applies only to whole path components when matching prefixes. This is justified in further detail in the patch header. It is an optional part of the patch series and could be dropped if the GCC maintainers are not convinced by our arguments there. Background == We have prepared a document that describes how this works in detail, so that projects can be confident that they are interoperable: https://reproducible-builds.org/specs/build-path-prefix-map/ The specification is currently in DRAFT status, awaiting some final feedback, including what the GCC maintainers think about it. We have written up some more detailed discussions on the topic, including a thorough justification on why we chose the mechanism of environment variables: https://wiki.debian.org/ReproducibleBuilds/StandardEnvironmentVariables The previous iteration of the patch series, essentially the same as the current re-submission, is here: https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00513.html An older version, that explains some GCC-specific background, is here: https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00182.html The current patch series applies cleanly to GCC-8 snapshot 20170716. Reproducibility testing === Over the past 3 months, we have tested this patch backported to Debian GCC-6. Together with a patched dpkg that sets the environment variable appropriately, it allows us to reproduce ~1800 extra packages. This is about 6.8% of ~26400 Debian source packages, and just over 1/2 of the ones whose irreproducibility is due to build-path issues. https://tests.reproducible-builds.org/debian/issues/unstable/gcc_captures_build_path_issue.html https://tests.reproducible-builds.org/debian/unstable/index_suite_amd64_stats.html The first major increase around 2017-04 is due to us deploying this patch. The next major increase later in 2017-04 is unrelated, due to us deploying a patch for R. The dip during the last part of 2017-06 is due to unpatched and patched packages getting out-of-sync partly because of extra admin work around the Debian stretch release, and we believe that the green will soon return to their previous high after this situation settles. Unit testing I've tested these patches on a Debian unstable x86_64-linux-gnu schroot running inside a Debian jessie system, on a full-bootstrap build. The output of contrib/compare_tests is as follows: gcc-8-20170716$ contrib/compare_tests ../gcc-build-{0,1} # Comparing directories ## Dir1=../gcc-build-0: 8 sum files ## Dir2=../gcc-build-1: 8 sum files # Comparing 8 common sum files ## /bin/sh contrib/compare_tests /tmp/gxx-sum1.13468 /tmp/gxx-sum2.13468 New tests that PASS: gcc.dg/cpp/build_path_prefix_map-1.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-1.c execution test gcc.dg/cpp/build_path_prefix_map-2.c (test for excess errors) gcc.dg/cpp/build_path_prefix_map-2.c execution test gcc.dg/debug/dwarf2/build_path_prefix_map-1.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-1.c scan-assembler DW_AT_comp_dir: "DWARF2TEST/gcc gcc.dg/debug/dwarf2/build_path_prefix_map-2.c (test for excess errors) gcc.dg/debug/dwarf2/build_path_prefix_map-2.c scan-assembler DW_AT_comp_dir: "/ # No differences found in 8 common sum files I can also provide the full logs on request. Fuzzing === I've also fuzzed the prefix-map code using AFL with ASAN enabled. Due to how AFL works I did not fuzz this patch directly but a smaller program with just the parser and remapper, available here: https://anonscm.debian.org/cgit/reproducible/build-path-prefix-map-spec.git/tree/consume Over the course of about ~4k cycles, no crashes were found. To reproduce, you could run something like: $ echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor $ make CC=afl-gcc clean reset-fuzz-pecsplit.c fuzz-pecsplit.c Copyright disclaimer I've signed a copyright disclaimer and the FSF has this on record. (RT #1209764)
Re: [PATCH] Add AddressSanitizer annotations to std::vector
On 05/07/17 21:24 +0100, Jonathan Wakely wrote: On 05/07/17 20:44 +0100, Yuri Gribov wrote: On Wed, Jul 5, 2017 at 8:00 PM, Jonathan Wakely wrote: This patch adds AddressSanitizer annotations to std::vector, so that ASan can detect out-of-bounds accesses to the unused capacity of a vector. e.g. std::vector v(2); int* p = v.data(); v.pop_back(); return p[1]; // ERROR This cannot be detected by Debug Mode, but with these annotations ASan knows that only v.data()[0] is valid and will give an error. The annotations are only enabled for vector> and only when std::allocator's base class is either malloc_allocator or new_allocator. For other allocators the memory might not come from the freestore and so isn't tracked by ASan. One important issue with enabling this by default is that it may (will?) break separate sanitization (which is extremely important feature in practice). If one part of application is sanitized but the other isn't and some poor std::vector is push_back'ed in latter and then accessed in former, we'll get a false positive because push_back wouldn't properly annotate memory. Good point. Perhaps hide this under a compilation flag (disabled by default)? If you define _GLIBCXX_SANITIZE_STD_ALLOCATOR to 0 the annotations are disabled. To make them disabled by default would need some changes, to use separate macros for "the std::allocator base class can be sanitized" and "the user wants std::vector to be sanitized". I'll do that before committing. Here's what I've committed. std::vector> operations are not annotated unless _GLIBCXX_SANITIZE_VECTOR is defined. Tested powerpc64le-linux and x86_64-linux, committed to trunk. commit 4212dcf491c1187be79c9c905d57eaf4bc17fece Author: Jonathan Wakely Date: Wed Jul 5 13:25:21 2017 +0100 Add AddressSanitizer annotations to std::vector * config/allocator/malloc_allocator_base.h [__SANITIZE_ADDRESS__] (_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define. * config/allocator/new_allocator_base.h [__SANITIZE_ADDRESS__] (_GLIBCXX_SANITIZE_STD_ALLOCATOR): Define. * doc/xml/manual/using.xml (_GLIBCXX_SANITIZE_VECTOR): Document macro. * include/bits/stl_vector.h [_GLIBCXX_SANITIZE_VECTOR] (_Vector_impl::_Asan, _Vector_impl::_Asan::_Reinit) (_Vector_impl::_Asan::_Grow, _GLIBCXX_ASAN_ANNOTATE_REINIT) (_GLIBCXX_ASAN_ANNOTATE_GROW, _GLIBCXX_ASAN_ANNOTATE_GREW) (_GLIBCXX_ASAN_ANNOTATE_SHRINK, _GLIBCXX_ASAN_ANNOTATE_BEFORE_DEALLOC): Define annotation helper types and macros. (vector::~vector, vector::push_back, vector::pop_back) (vector::_M_erase_at_end): Add annotations. * include/bits/vector.tcc (vector::reserve, vector::emplace_back) (vector::insert, vector::_M_erase, vector::operator=) (vector::_M_fill_assign, vector::_M_assign_aux) (vector::_M_insert_rval, vector::_M_emplace_aux) (vector::_M_insert_aux, vector::_M_realloc_insert) (vector::_M_fill_insert, vector::_M_default_append) (vector::_M_shrink_to_fit, vector::_M_range_insert): Annotate. diff --git a/libstdc++-v3/config/allocator/malloc_allocator_base.h b/libstdc++-v3/config/allocator/malloc_allocator_base.h index b091bbc..54e0837 100644 --- a/libstdc++-v3/config/allocator/malloc_allocator_base.h +++ b/libstdc++-v3/config/allocator/malloc_allocator_base.h @@ -52,4 +52,8 @@ namespace std # define __allocator_base __gnu_cxx::malloc_allocator #endif +#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR) +# define _GLIBCXX_SANITIZE_STD_ALLOCATOR 1 +#endif + #endif diff --git a/libstdc++-v3/config/allocator/new_allocator_base.h b/libstdc++-v3/config/allocator/new_allocator_base.h index 3d2bb67..e776ed3 100644 --- a/libstdc++-v3/config/allocator/new_allocator_base.h +++ b/libstdc++-v3/config/allocator/new_allocator_base.h @@ -52,4 +52,8 @@ namespace std # define __allocator_base __gnu_cxx::new_allocator #endif +#if defined(__SANITIZE_ADDRESS__) && !defined(_GLIBCXX_SANITIZE_STD_ALLOCATOR) +# define _GLIBCXX_SANITIZE_STD_ALLOCATOR 1 +#endif + #endif diff --git a/libstdc++-v3/doc/xml/manual/using.xml b/libstdc++-v3/doc/xml/manual/using.xml index 5c0e1b9..6ce29fd 100644 --- a/libstdc++-v3/doc/xml/manual/using.xml +++ b/libstdc++-v3/doc/xml/manual/using.xml @@ -991,6 +991,24 @@ g++ -Winvalid-pch -I. -include stdc++.h -H -g -O2 hello.cc -o test.exe +_GLIBCXX_SANITIZE_VECTOR + + + Undefined by default. When defined, std::vector +operations will be annotated so that AddressSanitizer can detect +invalid accesses to the unused capacity of a +std::vector. These annotations are only +enabled for +std::vector> +and only when std::allocator is derived from +new_allocator +or malloc_allocator. The annotations +must be present on all vector operations or none, so this macro must +be defined to the same value for all translation units that create,
Re: Deprecate DBX/stabs?
On Fri, Jul 21, 2017 at 7:15 AM, David Edelsohn wrote: > AIX still uses DBX as the primary debugging format. AIX supports > DWARF but the AIX toolchain does not fully interoperate with DWARF > generated by GCC. We could still deprecate DBX_DEBUG while leaving XCOFF_DEBUG alone for now. This would encourage people to migrate to DWARF2. We won't be able to drop dbxout.c until both DBX_DEBUG and XCOFF_DEBUG are dropped which could be a while, but we can perhaps avoid any new users of stabs. I see that avr-*, *-lynx, pre-darwin9 32-bit i686-darwin, *-openbsd, pdp11-*, vax-*, and cygwin/mingw32 with obsolete assemblers still default to DBX_DEBUG. Some of those can be dropped, and the others can migrate to dwarf. There is also the matter of SDB_DEBUG which is still supported, and is no longer used by anyone, as we have already deprecated all of the COFF targets that were using it. SDB support for C++ is even worse than the DBX support. This should be no problem to deprecate and remove. We could perhaps even just drop it without bothering to deprecate it first. Jim
Re: [PATCH] New C++ warning option '-Wduplicated-access-specifiers'
On Thu, 2017-07-20 at 18:35 +0200, Volker Reichelt wrote: > Hi, > > the following patch introduces a new C++ warning option > -Wduplicated-access-specifiers that warns about redundant > access-specifiers in classes, e.g. > > class B > { > public: > B(); > > private: > void foo(); > private: > int i; > }; > > test.cc:8:5: warning: duplicate 'private' access-specifier [ > -Wduplicated-access-specifiers] > private: > ^~~ > --- > test.cc:6:5: note: access-specifier was previously given here > private: > ^~~ Thanks for working on this. I'm not able to formally review, but you did CC me; various comments below throughout. > The test is implemented by tracking the location of the last > access-specifier together with the access-specifier itself. > The location serves two purposes: the obvious one is to be able to > print the location in the note that comes with the warning. > The second purpose is to be able to distinguish an access-specifier > given by the user from the default of the class type (i.e. public for > 'struct' and private for 'class') where the location is set to > UNKNOWN_LOCATION. The warning is only issued if the user gives the > access-specifier twice, i.e. if the previous location is not > UNKNOWN_LOCATION. Presumably given struct foo { public: /* ... * }; we could issue something like: warning: access-specifier 'public' is redundant within 'struct' for the first; similarly for: class bar { private: }; we could issue: warning: access-specifier 'private' is redundant within 'class' > One could actually make this a two-level warning so that on the > higher level also the default class-type settings are taken into > account. Would that be helpful? I could prepare a second patch for > that. I'm not sure how best to structure it. FWIW, when I first saw the patch, I wasn't a big fan of the warning, as I like to use access-specifiers to break up the kinds of entities within a class. For example, our coding guidelines https://gcc.gnu.org/codingconventions.html#Class_Form recommend: "first define all public types, then define all non-public types, then declare all public constructors, ... then declare all non-public member functions, and then declare all non-public member variables." I find it's useful to put a redundant "private:" between the last two, e.g.: class baz { public: ... private: void some_private_member (); private: int m_some_field; }; to provide a subdivision between the private member functions and the private data fields. This might be a violation of our guidelines (though if so, I'm not sure it's explicitly stated), but I find it useful, and the patch would warn about it. Having said that, looking at e.g. the "jit" subdir, I see lots of places where the warning would fire. In some of these, the code has a bit of a "smell", so maybe I like the warning after all... in that it can be good for a new warning to draw attention to code that might need work. Sorry that this is rambling; comments on the patch inline below. Bootstrapped and regtested on x86_64-pc-linux-gnu. > OK for trunk? > > Btw, there are about 50 places in our libstdc++ headers where this > warning triggers. I'll come up with a patch for this later. > > Regards, > Volker > > > 2017-07-20 Volker Reichelt > > * doc/invoke.texi (-Wduplicated-access-specifiers): Document > new > warning option. > > Index: gcc/doc/invoke.texi > === > --- gcc/doc/invoke.texi (revision 250356) > +++ gcc/doc/invoke.texi (working copy) > @@ -275,7 +275,7 @@ > -Wdisabled-optimization @gol > -Wno-discarded-qualifiers -Wno-discarded-array-qualifiers @gol > -Wno-div-by-zero -Wdouble-promotion @gol > --Wduplicated-branches -Wduplicated-cond @gol > +-Wduplicated-access-specifiers -Wduplicated-branches -Wduplicated > -cond @gol > -Wempty-body -Wenum-compare -Wno-endif-labels -Wexpansion-to > -defined @gol > -Werror -Werror=* -Wextra-semi -Wfatal-errors @gol > -Wfloat-equal -Wformat -Wformat=2 @gol > @@ -5388,6 +5388,12 @@ > > This warning is enabled by @option{-Wall}. > > +@item -Wduplicated-access-specifiers > +@opindex Wno-duplicated-access-specifiers > +@opindex Wduplicated-access-specifiers > +Warn when an access-specifier is redundant because it was already > given > +before. Presumably this should be marked as C++-specific. I think it's best to give an example for any warning, though we don't do that currently. > @item -Wduplicated-branches > @opindex Wno-duplicated-branches > @opindex Wduplicated-branches > === > > > 2017-07-20 Volker Reichelt > > * c.opt (Wduplicated-access-specifiers): New C++ warning > flag. > > Index: gcc/c-family/c.opt > === > --- gcc/c-family/c.opt
[PATCH v2 2/2] combine successive multiplications by constants
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg01090.html Reassociate X * CST1 * CST2 to X * (CST1 * CST2). Changed in this revision: - remove the check for @2 being 0 or -1 * match.pd ((X * CST1) * CST2): Simplify to X * (CST1 * CST2). testsuite: * gcc.dg/tree-ssa/assoc-2.c: Enhance. * gcc.dg/tree-ssa/slsr-4.c: Adjust. --- gcc/match.pd| 13 + gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 13 - gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c | 8 ++-- 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/gcc/match.pd b/gcc/match.pd index 39e1e5c..732b80c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -284,6 +284,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || mul != wi::min_value (TYPE_PRECISION (type), SIGNED)) { build_zero_cst (type); }) +/* Combine successive multiplications. Similar to above, but handling + overflow is different. */ +(simplify + (mult (mult @0 INTEGER_CST@1) INTEGER_CST@2) + (with { + bool overflow_p; + wide_int mul = wi::mul (@1, @2, TYPE_SIGN (type), &overflow_p); + } + /* Skip folding on overflow: the only special case is @1 * @2 == -INT_MIN, + otherwise undefined overflow implies that @0 must be zero. */ + (if (!overflow_p || TYPE_OVERFLOW_WRAPS (type)) + (mult @0 { wide_int_to_tree (type, mul); } + /* Optimize A / A to 1.0 if we don't care about NaNs or Infinities. */ (simplify diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c index a92c882..cc0e9d4 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c @@ -5,4 +5,15 @@ int f0(int a, int b){ return a * 33 * b * 55; } -/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */ +int f1(int a){ + a *= 33; + return a * 55; +} + +int f2(int a, int b){ + a *= 33; + return a * b * 55; +} + +/* { dg-final { scan-tree-dump-times "mult_expr" 7 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "mult_expr" 5 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c index 17d7b4c..1e943b7 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/slsr-4.c @@ -23,13 +23,9 @@ f (int i) foo (y); } -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "slsr" } } */ -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "slsr" } } */ -/* { dg-final { scan-tree-dump-times "\\+ 20;" 1 "slsr" } } */ +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "slsr" } } */ /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "slsr" } } */ -/* { dg-final { scan-tree-dump-times "\\- 16;" 1 "slsr" } } */ /* { dg-final { scan-tree-dump-times "\\- 160" 1 "slsr" } } */ -/* { dg-final { scan-tree-dump-times "\\* 4" 1 "optimized" } } */ -/* { dg-final { scan-tree-dump-times "\\* 10" 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times "\\* 40" 1 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\\+ 200" 1 "optimized" } } */ /* { dg-final { scan-tree-dump-times "\\+ 40" 1 "optimized" } } */ -- 1.8.3.1
[PATCH v2 1/2] match.pd: reassociate multiplications
Previous revision here: https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00889.html Reassociate (X * CST) * Y to (X * Y) * CST, this pushes constants in multiplication chains to outermost factors, where they can be combined. Changed in this revision: - remove !TYPE_OVERFLOW_SANITIZED and !TYPE_SATURATING checks; (in previous discussion Richard indicated that introducing false negatives in UBSAN by concealing signed overflow is not a concern, and saturating types shouldn't appear here because the constant operand should be FIXED_CST) The checks for @1 being 0 or -1 remain as they are required for correctness, but since this rule is ordered after the simpler rules that fold X * {0, -1}, those checks are always false at runtime. * match.pd ((X * CST) * Y): Reassociate to (X * Y) * CST. testsuite/ * gcc.dg/tree-ssa/assoc-2.c: New testcase. --- gcc/match.pd| 8 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c | 8 2 files changed, 16 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c diff --git a/gcc/match.pd b/gcc/match.pd index 7f5807c..39e1e5c 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -2213,6 +2213,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (mult @0 integer_minus_onep) (negate @0)) +/* Reassociate (X * CST) * Y to (X * Y) * CST. This does not introduce + signed overflow for CST != 0 && CST != -1. */ +(simplify + (mult:c (mult:s @0 INTEGER_CST@1) @2) + (if (TREE_CODE (@2) != INTEGER_CST + && !integer_zerop (@1) && !integer_minus_onep (@1)) + (mult (mult @0 @2) @1))) + /* True if we can easily extract the real and imaginary parts of a complex number. */ (match compositional_complex diff --git a/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c new file mode 100644 index 000..a92c882 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/assoc-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-gimple-raw -fdump-tree-optimized-raw" } */ + +int f0(int a, int b){ + return a * 33 * b * 55; +} + +/* { dg-final { scan-tree-dump-times "mult_expr" 2 "gimple" } } */ -- 1.8.3.1
Re: [patch,lto] Fix PR81487
On 21.07.2017 13:41, Richard Biener wrote: On Thu, Jul 20, 2017 at 3:18 PM, Georg-Johann Lay wrote: Hi, this patch fixes some minor problems in lto-plugin: Some older mingw32 host environments have broken asprintf. As far as I can tell, the problem is that the mingw asprintf implementation calls _vsnprintf (NULL, 0, ...) which always returns -1 as length on the host. The patch fixes this by using xasprintf from libiberty with the additional benefit of easier use and unified error reporting in the case when there is actually no more memory available, i.e. it will use the same error massages like xmalloc then. Moreover, the old implementation calls asprintf as asprintf (&objname, "%s@0x%x%08x", file->name, lo, hi) for large archives when the high 32 bits (hi) are non-zero. This looks like a typo: the high part must come first to yiels a proper 64-bit value. Bootstrapped & re-tested on x86_64-linux gnu and also on mingw32 which popped the "ld.exe: asprintf failed". Ok for trunk? Ok. I wonder if any of the other plain asprintf calls in GCC are problematic. From a quick look they are all inside code only exercised when dumping/debugging. But maybe we should replace those as well. Richard. Maybe I just didn't run into such spots yet. I just started using a different cross-toolchain for canadian cross builds. With my old i386-mingw32 (3.4.5) I never saw such problems, but the new v8 (maybe also v7) no more gave sane build result, i.e. the generated cross-compiler to run under mingw just hangs. Must be somewhere around tree dump no. 62, but I couldn't even find out which pass actually hangs. So I used 4.9.3 x64 -> mingw cross compiler in the hope that it works better than good old 3.4.5 which did a very good job for a really long time. Johann lto-plugin/ PR lto/81487 * lto-plugin.c (claim_file_handler): Use xasprintf instead of asprintf. [hi!=0]: Swap hi and lo arguments supplied to xasprintf.
[PATCH,rs6000] Fine-tune vec_construct direct move cost
Hi, In https://gcc.gnu.org/ml/gcc-patches/2017-05/msg00924.html, I raised the vectorization cost for a vec_construct operation that requires direct moves between GPRs and VSRs. The cost equation I substituted has since proven to be slightly more conservative than attended, and we're seeing some cases of SLP vectorization being avoided that should not be. This patch adjusts the equation to reduce the cost somewhat. I've tested this to ensure the cases previously seen are now being vectorized again, and done some benchmark testing that shows no measurable result, positive or negative. So this is just minor fine-tuning, but still important to get right. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2017-07-21 Bill Schmidt PR target/80695 * config/rs6000/rs6000.c (rs6000_builtin_vectorization_cost): Reduce cost estimate for direct moves. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 250426) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5757,7 +5757,7 @@ rs6000_builtin_vectorization_cost (enum vect_cost_ if (TARGET_P9_VECTOR) return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 2; else - return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 11; + return TYPE_VECTOR_SUBPARTS (vectype) - 1 + 5; } else /* V2DFmode doesn't need a direct move. */
Re: [PATCH] Add RDMA support to Falkor.
On 29/06/17 21:53, Jim Wilson wrote: > Falkor is an ARMV8-A part, but also includes the RDMA extension from > ARMV8.1-A. > I'd like to enable support for the RDMA instructions when -mcpu=falkor is > used, > and also make the RDMA intrisics available. To do that, I need to add rdma > as an architecture extension, and modify a few things to use it. Binutils > already supports rdma as an architecture extension. > > I only did the aarch64 port, and not the arm port. There are no supported > targets that have the RDMA instructions and also aarch32 support. There are > also no aarch32 RDMA testcases. So there is no way to test it. It wasn't > clear whether it was better to add something untested or leave it out. I > chose > to leave it out for now. > > I also needed a few testcase changes. There were redundant options being > added for the RDMA tests that I had to remove as they are now wrong. Also > the fact that I only did aarch64 means we need to check both armv8-a+rdma and > armv8.1-a for the rdma support. > > This was tested with an aarch64 bootstrap and make check. There were no > regressions. > > OK? OK. R. > > Jim > > gcc/ > * config/aarch64/aarch64-cores.def (falkor): Add AARCH64_FL_RDMA. > (qdf24xx): Likewise. > * config/aarch64/aarch64-options-extensions.def (rdma); New. > * config/aarch64/aarch64.h (AARCH64_FL_RDMA): New. > (AARCH64_FL_V8_1): Renumber. > (AARCH64_FL_FOR_ARCH8_1): Add AARCH64_FL_RDMA. > (AARCH64_ISA_RDMA): Use AARCH64_FL_RDMA. > * config/aarch64/arm_neon.h: Use +rdma instead of arch=armv8.1-a. > * doc/invoke.texi (AArch64 Options): Mention +rmda in -march docs. Add > rdma to feature modifiers list. > > gcc/testsuite/ > * lib/target-supports.exp (add_options_for_arm_v8_1a_neon): Delete > redundant -march option. > (check_effective_target_arm_v8_1a_neon_ok_nocache): Try armv8-a+rdma > in addition to armv8.1-a. > --- > gcc/config/aarch64/aarch64-cores.def | 4 ++-- > gcc/config/aarch64/aarch64-option-extensions.def | 4 > gcc/config/aarch64/aarch64.h | 8 +--- > gcc/config/aarch64/arm_neon.h| 2 +- > gcc/doc/invoke.texi | 5 - > gcc/testsuite/lib/target-supports.exp| 18 ++ > 6 files changed, 26 insertions(+), 15 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-cores.def > b/gcc/config/aarch64/aarch64-cores.def > index f8342ca..b8d0ba6 100644 > --- a/gcc/config/aarch64/aarch64-cores.def > +++ b/gcc/config/aarch64/aarch64-cores.def > @@ -65,8 +65,8 @@ AARCH64_CORE("thunderxt83", thunderxt83, thunderx, 8A, > AARCH64_FL_FOR_ARCH > AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, > xgene1, 0x50, 0x000, -1) > > /* Qualcomm ('Q') cores. */ > -AARCH64_CORE("falkor", falkor,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) > -AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx, 0x51, 0xC00, -1) > +AARCH64_CORE("falkor", falkor,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx, 0x51, > 0xC00, -1) > +AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_RDMA, qdf24xx, 0x51, > 0xC00, -1) > > /* Samsung ('S') cores. */ > AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 > | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, 0x53, 0x001, -1) > diff --git a/gcc/config/aarch64/aarch64-option-extensions.def > b/gcc/config/aarch64/aarch64-option-extensions.def > index c0752ce..c4f059a 100644 > --- a/gcc/config/aarch64/aarch64-option-extensions.def > +++ b/gcc/config/aarch64/aarch64-option-extensions.def > @@ -63,4 +63,8 @@ AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, > AARCH64_FL_FP, 0, "fphp asimdhp") > /* Enabling or disabling "rcpc" only changes "rcpc". */ > AARCH64_OPT_EXTENSION("rcpc", AARCH64_FL_RCPC, 0, 0, "lrcpc") > > +/* Enabling "rdma" also enables "fp", "simd". > + Disabling "rdma" just disables "rdma". */ > +AARCH64_OPT_EXTENSION("rdma", AARCH64_FL_RDMA, AARCH64_FL_FP | > AARCH64_FL_SIMD, 0, "rdma") > + > #undef AARCH64_OPT_EXTENSION > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 106cf3a..7f91edb 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -144,7 +144,8 @@ extern unsigned aarch64_architecture_version; > #define AARCH64_FL_CRC(1 << 3) /* Has CRC. */ > /* ARMv8.1-A architecture extensions. */ > #define AARCH64_FL_LSE (1 << 4) /* Has Large System Extensions. > */ > -#define AARCH64_FL_V8_1(1 << 5) /* Has ARMv8.1-A extensions. */ > +#define AARCH64_FL_RDMA
[PATCH 7/6] fortran: fix pair_cmp qsort comparator
Hello, The final tie-breaker in pair_cmp comparator looks strange, it correctly yields zero for equal expr->symtree-n.sym values, but for unequal values it produces 0 or 1. This would be correct for C++ STL-style comparators that require "less-than" predicate to be computed, but not for C qsort. The comment before the function seems to confirm that the intent was to indeed sort in ascending gfc_symbol order, but the code is doing mostly the opposite. Make the comparator properly anti-commutative by returning -1 in the last tie-breaker when appropriate. Bootstrapped and regtested on x86-64, OK for trunk? * interface.c (pair_cmp): Fix gfc_symbol comparison. Adjust comment. --- gcc/fortran/interface.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c index 6fe0647..13e2bdd 100644 --- a/gcc/fortran/interface.c +++ b/gcc/fortran/interface.c @@ -3294,7 +3294,7 @@ argpair; order: - p->a->expr == NULL - p->a->expr->expr_type != EXPR_VARIABLE -- growing p->a->expr->symbol. */ +- by gfc_symbol pointer value (larger first). */ static int pair_cmp (const void *p1, const void *p2) @@ -3320,6 +3320,8 @@ pair_cmp (const void *p1, const void *p2) } if (a2->expr->expr_type != EXPR_VARIABLE) return 1; + if (a1->expr->symtree->n.sym > a2->expr->symtree->n.sym) +return -1; return a1->expr->symtree->n.sym < a2->expr->symtree->n.sym; } -- 1.8.3.1
Re: Deprecate DBX/stabs?
> Nathan Sidwell writes: > Let's at least deprecate it. I attach a patch to do so. With the > patch, you'll get a note about dbx being deprecated whenever you use > stabs debugging on a system that prefers stabs (thus both -g and -gstabs > will warn). On systems where stabs is not preferred, -gstabs will not > give you a warning. The patch survices an x86_64-linux bootstrap. Absolutely not. AIX still uses DBX as the primary debugging format. AIX supports DWARF but the AIX toolchain does not fully interoperate with DWARF generated by GCC. With the extensive use of DBX by AIX and regular patches from me to fix xcoff stabs debugging, omitting me from the cc list implies that you really haven't done your homework. Thanks, David
Re: Deprecate DBX/stabs?
On 07/21/2017 09:16 AM, Richard Biener wrote: On Fri, Jul 21, 2017 at 3:07 PM, Nathan Sidwell wrote: +#ifndef DBX_DEBBUG_OK ^^^ typo? The patch doesn't define this anywhere - I suggest to add it to defaults.h as 0 and use #if? Also would need documenting if this is supposed to be a target macro. Like this? I've now included XCOFF, as it's a subset of DBX. Nothing appears to default to it. nathan -- Nathan Sidwell 2017-07-21 Nathan Sidwell * defaults.h (DBX_DEBUG_DEPRECATED): New. * toplev.c (process_options): Warn about DBX/SDB being deprecated. * doc/tm.texi.in (DBX_DEBUG_DEPRECATED): Document. * doc/tm.texi: Updated. Index: defaults.h === --- defaults.h (revision 250426) +++ defaults.h (working copy) @@ -889,6 +889,12 @@ see the files COPYING3 and COPYING.RUNTI #define SDB_DEBUGGING_INFO 0 #endif +/* DBX debugging is deprecated, and will generate a note if you + default to it. */ +#ifndef DBX_DEBUG_DEPRECATED +#define DBX_DEBUG_DEPRECATED 1 +#endif + /* If more than one debugging type is supported, you must define PREFERRED_DEBUGGING_TYPE to choose the default. */ Index: doc/tm.texi === --- doc/tm.texi (revision 250426) +++ doc/tm.texi (working copy) @@ -9553,6 +9553,14 @@ user can always get a specific type of o @c prevent bad page break with this line These are specific options for DBX output. +DBX debug data is deprecated and is expected to be removed. + +@defmac DBX_DEBUG_DEPRECATED +Defined this macro to 1 if GCC should not warn about defaulting to DBX +or XCOFF debug output. This is intended to give maintainers notice of +deprecation, but not be unnecessarily invasive. Defining this macro is +a short-term measure. You need to plan for DBX's removal. +@end defmac @defmac DBX_DEBUGGING_INFO Define this macro if GCC should produce debugging output for DBX Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 250426) +++ doc/tm.texi.in (working copy) @@ -6842,6 +6842,14 @@ user can always get a specific type of o @c prevent bad page break with this line These are specific options for DBX output. +DBX debug data is deprecated and is expected to be removed. + +@defmac DBX_DEBUG_DEPRECATED +Defined this macro to 1 if GCC should not warn about defaulting to DBX +or XCOFF debug output. This is intended to give maintainers notice of +deprecation, but not be unnecessarily invasive. Defining this macro is +a short-term measure. You need to plan for DBX's removal. +@end defmac @defmac DBX_DEBUGGING_INFO Define this macro if GCC should produce debugging output for DBX Index: toplev.c === --- toplev.c (revision 250426) +++ toplev.c (working copy) @@ -1413,6 +1413,12 @@ process_options (void) debug_info_level = DINFO_LEVEL_NONE; } + if (DBX_DEBUG_DEPRECATED + && write_symbols == PREFERRED_DEBUGGING_TYPE + && (PREFERRED_DEBUGGING_TYPE == DBX_DEBUG + || PREFERRED_DEBUGGING_TYPE == XCOFF_DEBUG)) +inform (UNKNOWN_LOCATION, "DBX/XCOFF (stabs) debugging is deprecated"); + if (flag_dump_final_insns && !flag_syntax_only && !no_backend) { FILE *final_output = fopen (flag_dump_final_insns, "w");
Re: [PATCH, PR81430] Use finalize_options in lto1
On 07/21/2017 11:41 AM, Richard Biener wrote: On Thu, 20 Jul 2017, Tom de Vries wrote: On 07/20/2017 12:10 PM, Richard Biener wrote: On Thu, 20 Jul 2017, Tom de Vries wrote: Hi, this patch fixes PR81430, an ICE in the libgomp testsuite for both openmp and openacc test-cases for x86_64 with nvptx accelerator. The scenario how we hit the ICE is as follows: - a testcase is compiled with -O2 - ix86_option_optimization_table enables OPT_freorder_blocks_and_partition at -O2 - cc1 writes out the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION - lto1 reads in the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION - lto1 uses the flag, and runs pass_partition_blocks - pass_partition_blocks ICEs, because it generates code that is not supported by the nvptx target. Note that for standalone compilation for single-thread ptx execution, we don't attempt to run pass_partition_blocks. This is because for nvptx, TARGET_HAVE_NAMED_SECTIONS is set to false, and this bit in finish_options switches off pass_partition_blocks: ... /* If the target requested unwind info, then turn off the partitioning optimization with a different message. Likewise, if the target does not support named sections. */ if (opts->x_flag_reorder_blocks_and_partition && (!targetm_common.have_named_sections || (opts->x_flag_unwind_tables && targetm_common.unwind_tables_default && (ui_except == UI_SJLJ || ui_except >= UI_TARGET { if (opts_set->x_flag_reorder_blocks_and_partition) inform (loc, "-freorder-blocks-and-partition does not work " "on this architecture"); opts->x_flag_reorder_blocks_and_partition = 0; opts->x_flag_reorder_blocks = 1; } ... The patch fixes this by calling finish_options in lto1 after cl_optimization_restore. Points for review: 1. I'm uncertain though about the placement of the call. Perhaps it should be in cl_optimization_restore, before targetm.override_options_after_change? 2. I think that this is offloading specific, so perhaps this should be guarded with lto_stream_offload_p or #ifdef ACCEL_COMPILER or some such. Hmm, I agree with #2. I think it conceptually is a LTO stream adjustment and thus we should do this at the time we stream in the optimization/target nodes (like we remap modes for example). Not sure if it's possible to do this at that point, but it looks like finish_options takes two option structs and thus we should be able to call it. With what parameters? Patch below tries with same option struct, but ... Do you get the inform note? I suppose we don't really want that, no? ... I think that way we'll get the inform note (while the previous solution did not). I've also tried with a tmp2 memset to 0, but that ran into problems when doing a maybe_set_param_value. Use global_options_set? That should do what the other patch did. I managed to get it working now. The variable tmp was only partly initialized, which caused the problems when calling maybe_set_param_value. I'm now using init_options_struct. There's no note when using -O2 or "-O2 -freorder-blocks-and-partition". But when I do "-O2 -foffload=-freorder-blocks-and-partition" I get: ... lto1: note: '-freorder-blocks-and-partition' does not work on this architecture lto1: note: '-freorder-blocks-and-partition' does not support unwind info on this architecture ... And for "-O0 -foffload=-freorder-blocks-and-partition" I just get: ... lto1: note: '-freorder-blocks-and-partition' does not work on this architecture ... Thanks, - Tom Call finish_options in lto1 --- gcc/tree-streamer-in.c | 16 1 file changed, 16 insertions(+) diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c index d7b6d22..eb41e75 100644 --- a/gcc/tree-streamer-in.c +++ b/gcc/tree-streamer-in.c @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see #include "ipa-chkp.h" #include "gomp-constants.h" #include "asan.h" +#include "opts.h" /* Read a STRING_CST from the string table in DATA_IN using input @@ -769,6 +770,21 @@ lto_input_ts_function_decl_tree_pointers (struct lto_input_block *ib, DECL_FUNCTION_SPECIFIC_TARGET (expr) = stream_read_tree (ib, data_in); #endif DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = stream_read_tree (ib, data_in); +#ifdef ACCEL_COMPILER + { +tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr); +if (opts) + { + struct gcc_options tmp; + init_options_struct (&tmp, NULL); + cl_optimization_restore (&tmp, TREE_OPTIMIZATION (opts)); + finish_options (&tmp, &global_options_set, UNKNOWN_LOCATION); + opts = build_optimization_node (&tmp); + finalize_options_struct (&tmp); + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = opts; + } + } +#endif /* If the file contains a function with an EH personality set, then it was compiled with -fexceptions. In that case, initialize
Re: [PATCH][RFA/RFC] Stack clash mitigation patch 08/08 V2
Hi, I've used your patch as the base and applied my changes on top. The attached patch is the result, so it is supposed to replace your version. It now also supports emitting a runtime loop. It bootstraps fine but unfortunately I see an Ada regression which I haven't tracked down yet. > FAIL: cb1010a > FAIL: gnat.dg/stack_check1.adb execution test > FAIL: gnat.dg/stack_check1.adb execution test Bye, -Andreas- diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c index bbae89b..796ca76 100644 --- a/gcc/config/s390/s390.c +++ b/gcc/config/s390/s390.c @@ -11040,6 +11040,179 @@ pass_s390_early_mach::execute (function *fun) } // anon namespace +/* Calculate TARGET = REG + OFFSET as s390_emit_prologue would do it. + - push too big immediates to the literal pool and annotate the refs + - emit frame related notes for stack pointer changes. */ + +static rtx +s390_prologue_plus_offset (rtx target, rtx reg, rtx offset, bool frame_related_p) +{ + rtx insn; + rtx orig_offset = offset; + + gcc_assert (REG_P (target)); + gcc_assert (REG_P (reg)); + gcc_assert (CONST_INT_P (offset)); + + if (offset == const0_rtx) /* lr/lgr */ +{ + insn = emit_move_insn (target, reg); +} + else if (DISP_IN_RANGE (INTVAL (offset))) /* la */ +{ + insn = emit_move_insn (target, gen_rtx_PLUS (Pmode, reg, + offset)); +} + else +{ + if (!satisfies_constraint_K (offset)/* ahi/aghi */ + && (!TARGET_EXTIMM + || (!satisfies_constraint_Op (offset) /* alfi/algfi */ + && !satisfies_constraint_On (offset /* slfi/slgfi */ + offset = force_const_mem (Pmode, offset); + + if (target != reg) + { + insn = emit_move_insn (target, reg); + RTX_FRAME_RELATED_P (insn) = frame_related_p ? 1 : 0; + } + + insn = emit_insn (gen_add2_insn (target, offset)); + + if (!CONST_INT_P (offset)) + { + annotate_constant_pool_refs (&PATTERN (insn)); + + if (frame_related_p) + add_reg_note (insn, REG_FRAME_RELATED_EXPR, + gen_rtx_SET (target, + gen_rtx_PLUS (Pmode, target, +orig_offset))); + } +} + + RTX_FRAME_RELATED_P (insn) = frame_related_p ? 1 : 0; + + return insn; +} + +/* Emit a compare instruction with a volatile memory access as stack + probe. It does not waste store tags and does not clobber any + registers apart from the condition code. */ +static void +s390_emit_stack_probe (rtx addr) +{ + rtx tmp = gen_rtx_MEM (Pmode, addr); + MEM_VOLATILE_P (tmp) = 1; + s390_emit_compare (EQ, gen_rtx_REG (Pmode, 0), tmp); +} + +#define PROBE_INTERVAL (1 << STACK_CHECK_PROBE_INTERVAL_EXP) +#if (PROBE_INTERVAL - 4 > 4095) +#error "S/390: stack probe offset must fit into short discplacement." +#endif + +/* Use a runtime loop if we have to emit more probes than this. */ +#define MIN_UNROLL_PROBES 3 + +/* Allocate SIZE bytes of stack space, using TEMP_REG as a temporary + if necessary. LAST_PROBE_OFFSET contains the offset of the closest + probe relative to the stack pointer. + + Note that SIZE is negative. + + The return value is true if TEMP_REG has been clobbered. */ +static bool +allocate_stack_space (rtx size, HOST_WIDE_INT last_probe_offset, + rtx temp_reg) +{ + bool temp_reg_clobbered_p = false; + + if (flag_stack_clash_protection) +{ + if (last_probe_offset + -INTVAL (size) < PROBE_INTERVAL) + dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true); + else + { + rtx offset = GEN_INT (PROBE_INTERVAL - UNITS_PER_LONG); + HOST_WIDE_INT rounded_size = -INTVAL (size) & -PROBE_INTERVAL; + HOST_WIDE_INT num_probes = rounded_size / PROBE_INTERVAL; + HOST_WIDE_INT residual = -INTVAL (size) - rounded_size; + + if (num_probes < MIN_UNROLL_PROBES) + { + /* Emit unrolled probe statements. */ + + for (unsigned int i = 0; i < num_probes; i++) + { + s390_prologue_plus_offset (stack_pointer_rtx, +stack_pointer_rtx, +GEN_INT (-PROBE_INTERVAL), true); + s390_emit_stack_probe (gen_rtx_PLUS (Pmode, + stack_pointer_rtx, + offset)); + } + dump_stack_clash_frame_info (PROBE_INLINE, residual != 0); + } + else + { + /* Emit a loop probing the pages. */ + + rtx_code_label *loop_start_label = gen_label_rtx (); + + /* From now on temp_reg will be the CFA register. */ + s390_prologue_plus_offset (temp_reg, st
Re: [Arm] Obsoleting Command line option -mstructure-size-boundary in eabi configurations
On 13/07/17 11:26, Michael Collison wrote: > Updated per Richard's comments and suggestions. > > Okay for trunk? OK. Please can you write up an entry for the release notes (wwwdocs). R. > > 2017-07-10 Michael Collison > > * config/arm/arm.c (arm_option_override): Deprecate > use of -mstructure-size-boundary. > * config/arm/arm.opt: Deprecate -mstructure-size-boundary. > * doc/invoke.texi: Deprecate -mstructure-size-boundary. > > -Original Message- > From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] > Sent: Thursday, July 6, 2017 3:17 AM > To: Michael Collison ; GCC Patches > > Cc: nd > Subject: Re: [Arm] Obsoleting Command line option -mstructure-size-boundary > in eabi configurations > > On 06/07/17 06:46, Michael Collison wrote: >> NetBSD/Arm requires that DEFAULT_STRUCTURE_SIZE_BOUNDARY (see >> config/arm/netbsd-elf.h for details). This patch disallows >> -mstructure-size-boundary on netbsd if the value is not equal to the >> DEFAULT_STRUCTURE_SIZE_BOUNDARY. >> >> Okay for trunk? >> >> 2017-07-05 Michael Collison >> >> * config/arm/arm.c (arm_option_override): Disallow >> -mstructure-size-boundary on netbsd if value is not >> DEFAULT_STRUCTURE_SIZE_BOUNDARY. >> >> > > Frankly, I'd rather we moved towards obsoleting this option entirely. > The origins are from the days of the APCS (note, not AAPCS) when the default > was 32 when most of the world expected 8. > > Now that the AAPCS is widely adopted, APCS is obsolete (NetBSD uses > ATPCS) and NetBSD (the only port not based on AAPCS these days) defaults to 8 > I can't see why anybody now would be interested in using a different value. > > So let's just mark this option as deprecated (emit a warning if > > global_options_set.x_arm_structure_size_boundary > > is ever set by the user, regardless of value). Then in GCC 9 we can perhaps > remove this code entirely. > > Documentation and release notes will need corresponding updates as well. > > R. > >> pr1556.patch >> >> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index >> bc1e607..911c272 100644 >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -3471,7 +3471,18 @@ arm_option_override (void) >> } >>else >> { >> - if (arm_structure_size_boundary != 8 >> + /* Do not allow structure size boundary to be overridden for >> + netbsd. */ >> + >> + if ((arm_abi == ARM_ABI_ATPCS) >> + && (arm_structure_size_boundary != DEFAULT_STRUCTURE_SIZE_BOUNDARY)) >> +{ >> + warning (0, >> + "option %<-mstructure-size-boundary%> is deprecated for >> netbsd; " >> + "defaulting to %d", >> + DEFAULT_STRUCTURE_SIZE_BOUNDARY); >> + arm_structure_size_boundary = DEFAULT_STRUCTURE_SIZE_BOUNDARY; >> +} >> + else if (arm_structure_size_boundary != 8 >>&& arm_structure_size_boundary != 32 >>&& !(ARM_DOUBLEWORD_ALIGN && arm_structure_size_boundary == 64)) >> { >> > > > pr1556v2.patch > > > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c > index c6101ef..b5dbfeb 100644 > --- a/gcc/config/arm/arm.c > +++ b/gcc/config/arm/arm.c > @@ -3489,6 +3489,8 @@ arm_option_override (void) > } >else > { > + warning (0, "option %<-mstructure-size-boundary%> is deprecated"); > + >if (arm_structure_size_boundary != 8 > && arm_structure_size_boundary != 32 > && !(ARM_DOUBLEWORD_ALIGN && arm_structure_size_boundary == 64)) > diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt > index b6c707b..6060516 100644 > --- a/gcc/config/arm/arm.opt > +++ b/gcc/config/arm/arm.opt > @@ -192,7 +192,7 @@ Target RejectNegative Alias(mfloat-abi=, soft) > Undocumented > > mstructure-size-boundary= > Target RejectNegative Joined UInteger Var(arm_structure_size_boundary) > Init(DEFAULT_STRUCTURE_SIZE_BOUNDARY) > -Specify the minimum bit alignment of structures. > +Specify the minimum bit alignment of structures. (Deprecated). > > mthumb > Target Report RejectNegative Negative(marm) Mask(THUMB) Save > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 3e5cee8..cfe5985 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -15685,6 +15685,8 @@ incompatible. Code compiled with one value cannot > necessarily expect to > work with code or libraries compiled with another value, if they exchange > information using structures or unions. > > +This option is deprecated. > + > @item -mabort-on-noreturn > @opindex mabort-on-noreturn > Generate a call to the function @code{abort} at the end of a >
Re: Deprecate DBX/stabs?
On Fri, Jul 21, 2017 at 3:07 PM, Nathan Sidwell wrote: > [darwin, cygwin, rx maintainers, you might have an opinion] > > On 07/21/2017 01:11 AM, Richard Biener wrote: >> >> On July 21, 2017 12:03:58 AM GMT+02:00, Jim Wilson >> wrote: >>> >>> On Thu, Jul 20, 2017 at 2:00 PM, Nathan Sidwell wrote: With this patch the gdb stabs test results are still awful, but they >>> >>> are unchanged awfulness. >>> >>> >>> Anyways, your new dbxout.c patch looks good. And maybe we should >>> think about deprecating the stabs support some day. >> >> >> I've suggested that multiple times also to be able to get rid of the debug >> hook interfacing in GCC and emit dwarf directly from FEs where that makes >> sense. DWARF should be a superset of other debug formats so it should be >> possible to build stabs from dwarf. > > > Let's at least deprecate it. I attach a patch to do so. With the patch, > you'll get a note about dbx being deprecated whenever you use stabs > debugging on a system that prefers stabs (thus both -g and -gstabs will > warn). On systems where stabs is not preferred, -gstabs will not give you a > warning. The patch survices an x86_64-linux bootstrap. > > A config can chose to override thus by defining 'DBX_DEBUG_OK' in the build > defines. > > I did try build time CPP tricks, but config/rx.h and config/i386/darwin.h > define it to be a conditional expression. > > AFAICT, the following include files are not used, and could probably be > binned too: > config/dbx.h > config/dbxcoff.h > config/dbxelf.h > (+ configi386/gstabs.h Jim found) > > It looks like DBX is the default for: > i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC > i386/darwin.h for 32-bit target > rx/rx.h when using AS100 syntax Index: toplev.c === --- toplev.c(revision 250424) +++ toplev.c(working copy) @@ -1413,6 +1413,12 @@ process_options (void) debug_info_level = DINFO_LEVEL_NONE; } +#ifndef DBX_DEBBUG_OK ^^^ typo? The patch doesn't define this anywhere - I suggest to add it to defaults.h as 0 and use #if? Also would need documenting if this is supposed to be a target macro. Thanks, Richard. > nathan > -- > Nathan Sidwell
Re: Add support to trace comparison instructions and switch statements
On Fri, Jul 21, 2017 at 1:38 AM, 吴潍浠(此彼) wrote: > Hi Jeff > > I have signed the copyright assignment, and used the name 'Wish Wu' . > Should I send you a copy of my assignment ? Your assignment now is on file in the FSF Copyright Assignment list where Jeff, I and other maintainers can see it. We cannot accept assurances from developers; please do not send copies of copyright assignments. Thanks, David P.S. It normally is unnecessary to send email to both GCC and GCC Patches mailing lists.
[PATCH][3/n] Fix PR81303
When forcing versioning for the vector profitability check I noticed we use different niters for the peeling check than for the versioning one leading to two branches, the 2nd being redundant. The following makes that consistent, also using the known not overflown number of latch execution count. Bootstrap / regtest pending on x86_64-unknown-linux-gnu. Richard. 2017-06-21 Richard Biener PR tree-optimization/81303 * tree-vect-loop-manip.c (vect_loop_versioning): Build profitability check against LOOP_VINFO_NITERSM1. Index: gcc/tree-vect-loop-manip.c === --- gcc/tree-vect-loop-manip.c (revision 250386) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -2136,7 +2136,7 @@ vect_loop_versioning (loop_vec_info loop tree arg; profile_probability prob = profile_probability::likely (); gimple_seq gimplify_stmt_list = NULL; - tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo); + tree scalar_loop_iters = LOOP_VINFO_NITERSM1 (loop_vinfo); bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo); bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo); bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo); @@ -2144,7 +2144,7 @@ vect_loop_versioning (loop_vec_info loop if (check_profitability) cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters, build_int_cst (TREE_TYPE (scalar_loop_iters), - th)); + th - 1)); if (version_niter) vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
Deprecate DBX/stabs?
[darwin, cygwin, rx maintainers, you might have an opinion] On 07/21/2017 01:11 AM, Richard Biener wrote: On July 21, 2017 12:03:58 AM GMT+02:00, Jim Wilson wrote: On Thu, Jul 20, 2017 at 2:00 PM, Nathan Sidwell wrote: With this patch the gdb stabs test results are still awful, but they are unchanged awfulness. Anyways, your new dbxout.c patch looks good. And maybe we should think about deprecating the stabs support some day. I've suggested that multiple times also to be able to get rid of the debug hook interfacing in GCC and emit dwarf directly from FEs where that makes sense. DWARF should be a superset of other debug formats so it should be possible to build stabs from dwarf. Let's at least deprecate it. I attach a patch to do so. With the patch, you'll get a note about dbx being deprecated whenever you use stabs debugging on a system that prefers stabs (thus both -g and -gstabs will warn). On systems where stabs is not preferred, -gstabs will not give you a warning. The patch survices an x86_64-linux bootstrap. A config can chose to override thus by defining 'DBX_DEBUG_OK' in the build defines. I did try build time CPP tricks, but config/rx.h and config/i386/darwin.h define it to be a conditional expression. AFAICT, the following include files are not used, and could probably be binned too: config/dbx.h config/dbxcoff.h config/dbxelf.h (+ configi386/gstabs.h Jim found) It looks like DBX is the default for: i386/cygming configured for 32-bit or lacking PE_SECREL32_RELOC i386/darwin.h for 32-bit target rx/rx.h when using AS100 syntax nathan -- Nathan Sidwell 2017-07-21 Nathan Sidwell * toplev.c (process_options): Warn about DBX being deprecated. Index: toplev.c === --- toplev.c (revision 250424) +++ toplev.c (working copy) @@ -1413,6 +1413,12 @@ process_options (void) debug_info_level = DINFO_LEVEL_NONE; } +#ifndef DBX_DEBBUG_OK + if (PREFERRED_DEBUGGING_TYPE == DBX_DEBUG + && write_symbols == DBX_DEBUG) +inform (UNKNOWN_LOCATION, "DBX (stabs) debugging is deprecated"); +#endif + if (flag_dump_final_insns && !flag_syntax_only && !no_backend) { FILE *final_output = fopen (flag_dump_final_insns, "w");
[libgomp] Doc update - TASKLOOP/GOMP_doacross_ description
Hi! This patch adds an Implementing-TASKLOOP-construct section as well as a description for GOMP_doacross_ runtime routines family to Implementing-FOR-Construct section. (I checked that 'make info' and 'make html' produce correct doc output) 2017-07-21 Igor Venevtsev * libgomp.texi (Implementing-TASKLOOP-Construct): New section (Implementing-FOR-Construct): Add documentaion for GOMP_doacross_ runtime routines family diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 230720f..1f17014 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -3096,6 +3096,7 @@ presented by libgomp. Only maintainers should need them. * Implementing SECTIONS construct:: * Implementing SINGLE construct:: * Implementing OpenACC's PARALLEL construct:: +* Implementing TASKLOOP construct:: @end menu @@ -3354,7 +3355,7 @@ Note that while it looks like there is trickiness to propagating a non-constant STEP, there isn't really. We're explicitly allowed to evaluate it as many times as we want, and any variables involved should automatically be handled as PRIVATE or SHARED like any other -variables. So the expression should remain evaluable in the +variables. So the expression should remain evaluable in the subfunction. We can also pull it into a local variable if we like, but since its supposed to remain unchanged, we can also not if we like. @@ -3367,7 +3368,197 @@ of these routines. There are separate routines for handling loops with an ORDERED clause. Bookkeeping for that is non-trivial... +...Yep! But let's try! +Loops with ORDERED clause @strong{with param} '#pragma omp for ordered @strong{(N)}' do present so-called DOACROSS parallelism +and axpanded to @strong{GOMP_doacross_...} family of runtime routines. +Loops with ORDERED clause @strong{without param} '#pragma omp for ordered' are expanded to @strong{GOMP_loop_ordered_...} routine family. +This section describes only the @strong{GOMP_doacross_} family so far. + +A small intoduction into Do-across parallelsism terms and syncronisation model could be found here: +@uref{https://en.wikipedia.org/wiki/Loop-level_parallelism#DOACROSS_parallelism} + +Another explanation of doacross parallelism from libgomp author Jakub Jelinek could be found here: +@uref{https://developers.redhat.com/blog/2016/03/22/what-is-new-in-openmp-4-5-3/} + +OMP v4.5 expresses wait() and post() operations by @strong{#pragma omp ordered depend(sink:...)} and @strong{#pragma omp ordered depend(source)} +constructs. These constructs are lowered to GOMP_doacross_wait() and GOMP_doacross_post() correspondingly, see example below. + +Here is an DOACROSS loop example from OpenMP v4.5 official examples (@uref{https://github.com/OpenMP/Examples}). +@*Note that ordered for clause has a parameter, '#pragma omp for ordered@strong{(2)}'. + + +@smallexample +/* +* @@name: doacross.2.c +* @@type: C +* @@compilable: yes +* @@linkable: no +* @@expect: success +*/ +float foo(int i, int j); +float bar(float a, float b, float c); +float baz(float b); + +void work( int N, int M, float **A, float **B, float **C ) +@{ + int i, j; + + #pragma omp for ordered(2) + for (i=1; i 1, then GOMP_doacross_static_start still has +all the collapsed loops as a single counts element (the 0th) and istart/iend is again from 0 to that count, +so the compiler generated code needs to compute all the collapsed loop iterators from that after GOMP_doacross_static_@{start,next@}. +@*@strong{bool +GOMP_loop_doacross_static_start (unsigned ncounts, long *counts, +long chunk_size, long *istart, long *iend);} + +The *_next routines are called when the thread completes processing of +the iteration block currently assigned to it. If the work-share +construct is bound directly to a parallel construct, then the iteration +bounds may have been set up before the parallel. In which case, this +may be the first iteration for the thread. +Returns true if there is work remaining to be performed; *ISTART and +*IEND are filled with a new iteration block. Returns false if all work +has been assigned. +@*@strong{bool +GOMP_loop_doacross_static_next (long *istart, long *iend)}; + +The following routines have the same semantics and signatures as *_doacross_static_* routines described above. +Routine type (static, dynamic, auto, guided) is selected according SCHEDULE(modifier) clause. Default is static. +@*@strong{bool +GOMP_loop_doacross_dynamic_start (unsigned ncounts, long *counts, +long chunk_size, long *istart, long *iend);} +@*@strong{bool +GOMP_loop_doacross_guided_start (unsigned ncounts, long *counts, +long chunk_size, long *istart, long *iend);} + +This routine chooses loop scheduling type (static, dynamic or guided) at runtime according to OMP_SCHEDULE ICV value +@*@strong{bool +GOMP_loop_doacross_runtime_start (unsigned ncounts, long *counts, +
[AArch64, Patch] Generate MLA when multiply + add vector by scalar
Hi all, This merges vector multiplies and adds into a single mla instruction when the multiplication is done by a scalar. Currently, for the following: typedef int __attribute__((vector_size(16))) vec; vec mla0(vec v0, vec v1, vec v2) { return v0 + v1 * v2[0]; } vec mla1(vec v0, vec v1, int v2) { return v0 + v1 * c; } The function `mla0` outputs a multiply accumulate by element instruction. `mla1` outputs two vector operations (multiply followed by add). That is, we currently have: mla0: mlav0.4s, v1.4s, v2.s[0] ret mla1: fmov s2, w0 mulv1.4s, v1.4s, v2.s[0] addv0.4s, v1.4s, v0.4s ret This patch replaces this with something similar to `mla0`: mla1: fmov s2, w0 mlav0.4s, v1.4s, v2.s[0] This is also done for the identical case for a multiply followed by a subtract of vectors with an integer operand on the multiply. Also add testcases for this. Bootstrap and testsuite run on aarch64. OK for trunk? Jackson Changelog entry: gcc/ 2017-06-06 Jackson Woodruff * config/aarch64/aarch64-simd.md (aarch64_mla_elt_merge, aarch64_mls_elt_merge, aarch64_fma4_elt_merge, aarch64_fnma_elt_merge): New define_insns to generate multiply accumulate instructions for unmerged multiply add vector instructions. gcc/testsuite/ 2017-06-06 Jackson Woodruff * gcc.target/aarch64/simd/vmla_elem_1.c: New. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 1cb6eeb318716aadacb84a44aa2062d486e0186b..ab1aa5ab84577b3cbddd1eb0e40d29e9b2aa4b42 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1033,6 +1033,18 @@ [(set_attr "type" "neon_mla__scalar")] ) +(define_insn "*aarch64_mla_elt_merge" + [(set (match_operand:VDQHS 0 "register_operand" "=w") + (plus:VDQHS + (mult:VDQHS (vec_duplicate:VDQHS + (match_operand: 1 "register_operand" "w")) + (match_operand:VDQHS 2 "register_operand" "w")) + (match_operand:VDQHS 3 "register_operand" "0")))] + "TARGET_SIMD" + "mla\t%0., %2., %1.[0]" + [(set_attr "type" "neon_mla__scalar")] +) + (define_insn "aarch64_mls" [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") (minus:VDQ_BHSI (match_operand:VDQ_BHSI 1 "register_operand" "0") @@ -1080,6 +1092,18 @@ [(set_attr "type" "neon_mla__scalar")] ) +(define_insn "*aarch64_mls_elt_merge" + [(set (match_operand:VDQHS 0 "register_operand" "=w") + (minus:VDQHS + (match_operand:VDQHS 1 "register_operand" "0") + (mult:VDQHS (vec_duplicate:VDQHS + (match_operand: 2 "register_operand" "w")) + (match_operand:VDQHS 3 "register_operand" "w"] + "TARGET_SIMD" + "mls\t%0., %3., %2.[0]" + [(set_attr "type" "neon_mla__scalar")] +) + ;; Max/Min operations. (define_insn "3" [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..df777581ab43b9b9e20b61f3f8d46193bdfda5fb 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmla_elem_1.c @@ -0,0 +1,67 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +typedef short int __attribute__ ((vector_size (16))) v8hi; + +v8hi +mla8hi (v8hi v0, v8hi v1, short int v2) +{ + /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.8h, v\[0-9\]\+\\.8h, v\[0-9\]\+\\.h\\\[0\\\]" } } */ + return v0 + v1 * v2; +} + + +v8hi +mls8hi (v8hi v0, v8hi v1, short int v2) +{ + /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.8h, v\[0-9\]\+\\.8h, v\[0-9\]\+\\.h\\\[0\\\]" } } */ + return v0 - v1 * v2; +} + +typedef short int __attribute__ ((vector_size (8))) v4hi; + +v4hi +mla4hi (v4hi v0, v4hi v1, short int v2) +{ + /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.4h, v\[0-9\]\+\\.4h, v\[0-9\]\+\\.h\\\[0\\\]" } } */ + return v0 + v1 * v2; +} + +v4hi +mls4hi (v4hi v0, v4hi v1, short int v2) +{ + /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.4h, v\[0-9\]\+\\.4h, v\[0-9\]\+\\.h\\\[0\\\]" } } */ + return v0 - v1 * v2; +} + +typedef int __attribute__ ((vector_size (16))) v4si; + +v4si +mla4si (v4si v0, v4si v1, int v2) +{ + /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.4s, v\[0-9\]\+\\.4s, v\[0-9\]\+\\.s\\\[0\\\]" } } */ + return v0 + v1 * v2; +} + +v4si +mls4si (v4si v0, v4si v1, int v2) +{ + /* { dg-final { scan-assembler "mls\\tv\[0-9\]\+\\.4s, v\[0-9\]\+\\.4s, v\[0-9\]\+\\.s\\\[0\\\]" } } */ + return v0 - v1 * v2; +} + +typedef int __attribute__((vector_size (8))) v2si; + +v2si +mla2si (v2si v0, v2si v1, int v2) +{ + /* { dg-final { scan-assembler "mla\\tv\[0-9\]\+\\.2s, v\[0-9\]\+\\.2s, v\[0-9\]\+\\.s\\\[0\\\]" } } */ + return v0 + v1 * v2; +} + +v2si
Re: [patch,lto] Fix PR81487
On Thu, Jul 20, 2017 at 3:18 PM, Georg-Johann Lay wrote: > Hi, this patch fixes some minor problems in lto-plugin: > > Some older mingw32 host environments have broken asprintf. > As far as I can tell, the problem is that the mingw asprintf > implementation calls _vsnprintf (NULL, 0, ...) which always > returns -1 as length on the host. > > The patch fixes this by using xasprintf from libiberty with > the additional benefit of easier use and unified error reporting > in the case when there is actually no more memory available, i.e. > it will use the same error massages like xmalloc then. > > Moreover, the old implementation calls asprintf as > >> asprintf (&objname, "%s@0x%x%08x", file->name, lo, hi) > > for large archives when the high 32 bits (hi) are non-zero. > This looks like a typo: the high part must come first to yiels > a proper 64-bit value. > > Bootstrapped & re-tested on x86_64-linux gnu and also on > mingw32 which popped the "ld.exe: asprintf failed". > > Ok for trunk? Ok. I wonder if any of the other plain asprintf calls in GCC are problematic. From a quick look they are all inside code only exercised when dumping/debugging. But maybe we should replace those as well. Richard. > Johann > > > lto-plugin/ > PR lto/81487 > * lto-plugin.c (claim_file_handler): Use xasprintf instead of > asprintf. > [hi!=0]: Swap hi and lo arguments supplied to xasprintf.
[PATCH] Fix PR81500
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2017-06-21 Richard Biener PR tree-optimization/81500 * tree-vect-loop.c (vect_is_simple_reduction): Properly fail if we didn't identify a reduction path. * gcc.dg/torture/pr81500.c: New testcase. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 250386) +++ gcc/tree-vect-loop.c(working copy) @@ -3243,7 +3243,7 @@ pop: } /* Check whether the reduction path detected is valid. */ - bool fail = false; + bool fail = path.length () == 0; bool neg = false; for (unsigned i = 1; i < path.length (); ++i) { @@ -3276,9 +3276,7 @@ pop: if (dump_enabled_p ()) { - report_vect_op (MSG_MISSED_OPTIMIZATION, - SSA_NAME_DEF_STMT - (USE_FROM_PTR (path[path.length ()-1].second)), + report_vect_op (MSG_MISSED_OPTIMIZATION, def_stmt, "reduction: unknown pattern: "); } Index: gcc/testsuite/gcc.dg/torture/pr81500.c === --- gcc/testsuite/gcc.dg/torture/pr81500.c (nonexistent) +++ gcc/testsuite/gcc.dg/torture/pr81500.c (working copy) @@ -0,0 +1,13 @@ +/* { dg-do compile } */ + +typedef int a; +void c(int *b) +{ + int d; + a e, f, *g, *h = b; + for (; d; d--) { + f = *g & 1; + *h-- = *g-- | e; + e = f; + } +}
Re: [RFC][PATCH][AArch64] Cleanup frame pointer usage
ping Wilco Dijkstra wrote: > James Greenhalgh wrote: > > > I note this is still marked as an RFC, are you now proposing it as a > > patch to be merged to trunk? > > Absolutely. It was marked as an RFC to get some comments - I thought it > may be controversial to separate the frame pointer and frame chain concept. > And this fixes the long standing bugs caused by changing the global frame > pointer option to an incorrect value for the leaf function optimization. Here is a rebased version that should patch without merge issues: Cleanup frame pointer usage. Introduce a boolean emit_frame_chain which determines whether to store FP and LR and setup FP to point at this record. When the frame pointer is enabled but not strictly required (eg. no use of alloca), we emit a frame chain in non-leaf functions, but don't use the frame pointer to access locals. This results in smaller code and unwind info. Simplify the logic in aarch64_override_options_after_change_1 () and compute whether the frame chain is required in aarch64_layout_frame () instead. As a result aarch64_frame_pointer_required is now redundant. Convert all callee save/restore functions to use gen_frame_mem. Bootstrap OK. ChangeLog: 2017-06-15 Wilco Dijkstra gcc/ PR middle-end/60580 * config/aarch64/aarch64.h (aarch64_frame): Add emit_frame_chain boolean. * config/aarch64/aarch64.c (aarch64_frame_pointer_required) Remove. (aarch64_layout_frame): Initialise emit_frame_chain. (aarch64_pushwb_single_reg): Use gen_frame_mem. (aarch64_pop_regs): Likewise. (aarch64_gen_load_pair): Likewise. (aarch64_save_callee_saves): Likewise. (aarch64_restore_callee_saves): Likewise. (aarch64_expand_prologue): Use emit_frame_chain. (aarch64_can_eliminate): Simplify. When FP needed or outgoing arguments are large, eliminate to FP, otherwise SP. (aarch64_override_options_after_change_1): Simplify. (TARGET_FRAME_POINTER_REQUIRED): Remove define. -- diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 08acdeb52d4083f50a4b44f43fb98009cdcc041f..722c39cfc4d57280d621fb6130e4d9f4d59d1e72 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -591,6 +591,9 @@ struct GTY (()) aarch64_frame /* The size of the stack adjustment after saving callee-saves. */ HOST_WIDE_INT final_adjust; + /* Store FP,LR and setup a frame pointer. */ + bool emit_frame_chain; + unsigned wb_candidate1; unsigned wb_candidate2; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index fd3005d8056e65cb32c92bbd5eb752c977c885a5..a97b4bbe9dc0f7bccc90a9337519038041241531 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2761,24 +2761,6 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2) return ""; } -static bool -aarch64_frame_pointer_required (void) -{ - /* In aarch64_override_options_after_change - flag_omit_leaf_frame_pointer turns off the frame pointer by - default. Turn it back on now if we've not got a leaf - function. */ - if (flag_omit_leaf_frame_pointer - && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM))) - return true; - - /* Force a frame pointer for EH returns so the return address is at FP+8. */ - if (crtl->calls_eh_return) - return true; - - return false; -} - /* Mark the registers that need to be saved by the callee and calculate the size of the callee-saved registers area and frame record (both FP and LR may be omitted). */ @@ -2791,6 +2773,18 @@ aarch64_layout_frame (void) if (reload_completed && cfun->machine->frame.laid_out) return; + /* Force a frame chain for EH returns so the return address is at FP+8. */ + cfun->machine->frame.emit_frame_chain + = frame_pointer_needed || crtl->calls_eh_return; + + /* Emit a frame chain if the frame pointer is enabled. + If -momit-leaf-frame-pointer is used, do not use a frame chain + in leaf functions which do not use LR. */ + if (flag_omit_frame_pointer == 2 + && !(flag_omit_leaf_frame_pointer && crtl->is_leaf + && !df_regs_ever_live_p (LR_REGNUM))) + cfun->machine->frame.emit_frame_chain = true; + #define SLOT_NOT_REQUIRED (-2) #define SLOT_REQUIRED (-1) @@ -2825,7 +2819,7 @@ aarch64_layout_frame (void) last_fp_reg = regno; } - if (frame_pointer_needed) + if (cfun->machine->frame.emit_frame_chain) { /* FP and LR are placed in the linkage record. */ cfun->machine->frame.reg_offset[R29_REGNUM] = 0; @@ -2997,7 +2991,7 @@ aarch64_pushwb_single_reg (machine_mode mode, unsigned regno, reg = gen_rtx_REG (mode, regno); mem = gen_rtx_PRE_MODIFY (Pmode, base_rtx, plus_constant (Pmode, base_rtx, -adjustment)); - mem = gen_rtx_MEM (mode, mem); + mem = gen_frame_mem (mode, mem); insn = emit_move_insn
Re: [PATCH v3][AArch64] Fix symbol offset limit
ping From: Wilco Dijkstra Sent: 17 January 2017 15:14 To: Richard Earnshaw; GCC Patches; James Greenhalgh Cc: nd Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a declaration is an integer. So the question is whether we should allow largish offsets outside of the bounds of symbols (v1), no offsets (this version), or small offsets (small negative and positive offsets just outside a symbol are common). The only thing we can't allow is any offset like we currently do... In aarch64_classify_symbol symbols are allowed full-range offsets on relocations. This means the offset can use all of the +/-4GB offset, leaving no offset available for the symbol itself. This results in relocation overflow and link-time errors for simple expressions like &global_char + 0xff00. To avoid this, limit the offset to +/-1GB so that the symbol needs to be within a 3GB offset from its references. For the tiny code model use a 64KB offset, allowing most of the 1MB range for code/data between the symbol and its references. For symbols with a defined size, limit the offset to be within the size of the symbol. ChangeLog: 2017-01-17 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.target/aarch64/symbol-range.c (foo): Set new limit. * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset) if (aarch64_tls_symbol_p (x)) return aarch64_classify_tls_symbol (x); + const_tree decl = SYMBOL_REF_DECL (x); + switch (aarch64_cmodel) { case AARCH64_CMODEL_TINY: @@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset) we have no way of knowing the address of symbol at compile time so we can't accurately say if the distance between the PC and symbol + offset is outside the addressible range of +/-1M in the - TINY code model. So we rely on images not being greater than - 1M and cap the offset at 1M and anything beyond 1M will have to - be loaded using an alternative mechanism. Furthermore if the - symbol is a weak reference to something that isn't known to - resolve to a symbol in this module, then force to memory. */ + TINY code model. So we limit the maximum offset to +/-64KB and + assume the offset to the symbol is not larger than +/-(1M - 64KB). + Furthermore force to memory if the symbol is a weak reference to + something that doesn't resolve to a symbol in this module. */ if ((SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x)) - || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575) + || !IN_RANGE (INTVAL (offset), -0x1, 0x1)) return SYMBOL_FORCE_TO_MEM; + + /* Limit offset to within the size of a declaration if available. */ + if (decl && DECL_P (decl)) + { + const_tree decl_size = DECL_SIZE (decl); + + if (tree_fits_uhwi_p (decl_size) + && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size))) + return SYMBOL_FORCE_TO_MEM; + } + return SYMBOL_TINY_ABSOLUTE; case AARCH64_CMODEL_SMALL: /* Same reasoning as the tiny code model, but the offset cap here is - 4G. */ + 1G, allowing +/-3G for the offset to the symbol. */ if ((SYMBOL_REF_WEAK (x) && !aarch64_symbol_binds_local_p (x)) - || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263), - HOST_WIDE_INT_C (4294967264))) + || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000)) return SYMBOL_FORCE_TO_MEM; + + /* Limit offset to within the size of a declaration if available. */ + if (decl && DECL_P (decl)) + { + const_tree decl_size = DECL_SIZE (decl); + + if (tree_fits_uhwi_p (decl_size) + && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size))) + return SYMBOL_FORCE_TO_MEM; + } + return SYMBOL_SMALL_ABSOLUTE; case AARCH64_CMODEL_TINY_PIC: diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c index d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5 100644 --- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
Re: [PATCH][AArch64] Improve aarch64_legitimate_constant_p
ping This patch further improves aarch64_legitimate_constant_p. Allow all integer, floating point and vector constants. Allow label references and non-anchor symbols with an immediate offset. This allows such constants to be rematerialized, resulting in smaller code and fewer stack spills. SPEC2006 codesize reduces by 0.08%, SPEC2017 by 0.13%. Bootstrap OK, OK for commit? ChangeLog: 2017-07-07 Wilco Dijkstra * config/aarch64/aarch64.c (aarch64_legitimate_constant_p): Return true for more constants, symbols and label references. (aarch64_valid_floating_const): Remove unused function. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index a2eca64a9c13e44d223b5552c079ef4e09659e84..810c17416db01681e99a9eb8cc9f5af137ed2054 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10173,49 +10173,46 @@ aarch64_legitimate_pic_operand_p (rtx x) return true; } -/* Return true if X holds either a quarter-precision or - floating-point +0.0 constant. */ -static bool -aarch64_valid_floating_const (machine_mode mode, rtx x) -{ - if (!CONST_DOUBLE_P (x)) - return false; - - if (aarch64_float_const_zero_rtx_p (x)) - return true; - - /* We only handle moving 0.0 to a TFmode register. */ - if (!(mode == SFmode || mode == DFmode)) - return false; - - return aarch64_float_const_representable_p (x); -} +/* Implement TARGET_LEGITIMATE_CONSTANT_P hook. Return true for constants + that should be rematerialized rather than spilled. */ static bool aarch64_legitimate_constant_p (machine_mode mode, rtx x) { + /* Support CSE and rematerialization of common constants. */ + if (CONST_INT_P (x) || CONST_DOUBLE_P (x) || GET_CODE (x) == CONST_VECTOR) + return true; + /* Do not allow vector struct mode constants. We could support 0 and -1 easily, but they need support in aarch64-simd.md. */ - if (TARGET_SIMD && aarch64_vect_struct_mode_p (mode)) + if (aarch64_vect_struct_mode_p (mode)) return false; - /* This could probably go away because - we now decompose CONST_INTs according to expand_mov_immediate. */ - if ((GET_CODE (x) == CONST_VECTOR - && aarch64_simd_valid_immediate (x, mode, false, NULL)) - || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x)) - return !targetm.cannot_force_const_mem (mode, x); + /* Do not allow wide int constants - this requires support in movti. */ + if (CONST_WIDE_INT_P (x)) + return false; - if (GET_CODE (x) == HIGH - && aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0 - return true; + /* Do not allow const (plus (anchor_symbol, const_int)). */ + if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == PLUS) + { + x = XEXP (XEXP (x, 0), 0); + if (SYMBOL_REF_P (x) && SYMBOL_REF_ANCHOR_P (x)) + return false; + } + + if (GET_CODE (x) == HIGH) + x = XEXP (x, 0); /* Treat symbols as constants. Avoid TLS symbols as they are complex, so spilling them is better than rematerialization. */ if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x)) return true; - return aarch64_constant_address_p (x); + /* Label references are always constant. */ + if (GET_CODE (x) == LABEL_REF) + return true; + + return false; } rtx
Re: [PATCH][AArch64] Fix PR79041
ping As described in PR79041, -mcmodel=large -mpc-relative-literal-loads may be used to avoid generating ADRP/ADD or ADRP/LDR. However both trunk and GCC7 may still emit ADRP for some constant pool literals. Fix this by adding a aarch64_pcrelative_literal_loads check. OK for trunk/GCC7 backport? ChangeLog: 2017-06-27 Wilco Dijkstra PR target/79041 * config/aarch64/aarch64.c (aarch64_classify_symbol): Avoid SYMBOL_SMALL_ABSOLUTE . * testsuite/gcc.target/aarch64/pr79041-2.c: New test. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 060cd8476d2954119daac495ecb059c9be73edbe..329d244e9cf16dbdf849e5dd02b3999caf0cd5a7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -10042,7 +10042,7 @@ aarch64_classify_symbol (rtx x, rtx offset) /* This is alright even in PIC code as the constant pool reference is always PC relative and within the same translation unit. */ - if (CONSTANT_POOL_ADDRESS_P (x)) + if (CONSTANT_POOL_ADDRESS_P (x) && !aarch64_pcrelative_literal_loads) return SYMBOL_SMALL_ABSOLUTE; else return SYMBOL_FORCE_TO_MEM; diff --git a/gcc/testsuite/gcc.target/aarch64/pr79041-2.c b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c new file mode 100644 index ..e7899725bad2b770f8488a07f99792113275bdf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr79041-2.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcmodel=large -mpc-relative-literal-loads" } */ + +__int128 +t (void) +{ + return (__int128)1 << 80; +} + +/* { dg-final { scan-assembler "adr" } } */
Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
Ping ? Thanks, Kugan On 27 June 2017 at 11:20, Kugan Vivekanandarajah wrote: > https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html added this > workaround to get kernel building with when TARGET_FIX_ERR_A53_843419 > is enabled. > > This was added to support building kernel loadable modules. In kernel, > when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed > for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and > R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid > loading objects with possibly offending sequence). Thus, it could only > support pc relative literal loads. > > However, the following patch was posted to kernel to add > -mpc-relative-literal-loads > http://www.spinics.net/lists/arm-kernel/msg476149.html > > -mpc-relative-literal-loads is unconditionally added to the kernel > build as can be seen from: > https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile > > Therefore this patch removes the hunk so that applications like > SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without > -mno-pc-relative-literal-loads > > Bootstrapped and regression tested on aarch64-linux-gnu with no new > regressions. > > Is this OK for trunk? > > Thanks, > Kugan > > gcc/testsuite/ChangeLog: > > 2017-06-27 Kugan Vivekanandarajah > > * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419. > > gcc/ChangeLog: > > 2017-06-27 Kugan Vivekanandarajah > > * config/aarch64/aarch64.c (aarch64_override_options_after_change_1): > Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341 > for default.
[committed, nvptx] Add nvptx_override_options_after_change
Hi, this patch adds nvptx_override_options_after_change, containing a workaround for PR81430. Tested on x86_64 with nvptx accelerator. Committed. Thanks, - Tom Add nvptx_override_options_after_change 2017-07-21 Tom de Vries PR lto/81430 * config/nvptx/nvptx.c (nvptx_override_options_after_change): New function. (TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE): Define to nvptx_override_options_after_change. --- gcc/config/nvptx/nvptx.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index d0aa054..a718054 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -207,6 +207,17 @@ nvptx_option_override (void) target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT; } +/* Implement TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE. */ + +static void +nvptx_override_options_after_change (void) +{ + /* This is a workaround for PR81430 - nvptx acceleration compilation broken + because of running pass_partition_blocks. This should be dealt with in the + common code, not in the target. */ + flag_reorder_blocks_and_partition = 0; +} + /* Return a ptx type for MODE. If PROMOTE, then use .u32 for QImode to deal with ptx ideosyncracies. */ @@ -5505,6 +5516,9 @@ nvptx_data_alignment (const_tree type, unsigned int basic_align) #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE nvptx_option_override +#undef TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE +#define TARGET_OVERRIDE_OPTIONS_AFTER_CHANGE nvptx_override_options_after_change + #undef TARGET_ATTRIBUTE_TABLE #define TARGET_ATTRIBUTE_TABLE nvptx_attribute_table
Re: [PATCH, PR81430] Use finalize_options in lto1
On Thu, 20 Jul 2017, Tom de Vries wrote: > On 07/20/2017 12:10 PM, Richard Biener wrote: > > On Thu, 20 Jul 2017, Tom de Vries wrote: > > > > > Hi, > > > > > > this patch fixes PR81430, an ICE in the libgomp testsuite for both openmp > > > and > > > openacc test-cases for x86_64 with nvptx accelerator. > > > > > > The scenario how we hit the ICE is as follows: > > > - a testcase is compiled with -O2 > > > - ix86_option_optimization_table enables > > >OPT_freorder_blocks_and_partition at -O2 > > > - cc1 writes out the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION > > > - lto1 reads in the flag as part of DECL_FUNCTION_SPECIFIC_OPTIMIZATION > > > - lto1 uses the flag, and runs pass_partition_blocks > > > - pass_partition_blocks ICEs, because it generates code that is not > > >supported by the nvptx target. > > > > > > Note that for standalone compilation for single-thread ptx execution, we > > > don't > > > attempt to run pass_partition_blocks. This is because for nvptx, > > > TARGET_HAVE_NAMED_SECTIONS is set to false, and this bit in finish_options > > > switches off pass_partition_blocks: > > > ... > > > /* If the target requested unwind info, then turn off the > > >partitioning optimization with a different message. Likewise, if > > >the target does not support named sections. */ > > > > > >if (opts->x_flag_reorder_blocks_and_partition > > >&& (!targetm_common.have_named_sections > > >|| (opts->x_flag_unwind_tables > > >&& targetm_common.unwind_tables_default > > >&& (ui_except == UI_SJLJ || ui_except >= UI_TARGET > > > { > > >if (opts_set->x_flag_reorder_blocks_and_partition) > > > inform (loc, > > > "-freorder-blocks-and-partition does not work " > > > "on this architecture"); > > >opts->x_flag_reorder_blocks_and_partition = 0; > > >opts->x_flag_reorder_blocks = 1; > > > } > > > ... > > > > > > The patch fixes this by calling finish_options in lto1 after > > > cl_optimization_restore. > > > > > > Points for review: > > > 1. I'm uncertain though about the placement of the call. Perhaps it should > > > be > > > in cl_optimization_restore, before targetm.override_options_after_change? > > > > > > 2. I think that this is offloading specific, so perhaps this should be > > > guarded > > > with lto_stream_offload_p or #ifdef ACCEL_COMPILER or some such. > > > > Hmm, I agree with #2. I think it conceptually is a LTO stream adjustment > > and thus we should do this at the time we stream in the > > optimization/target nodes (like we remap modes for example). Not > > sure if it's possible to do this at that point, but it looks like > > finish_options takes two option structs and thus we should be able to > > call it. > > > > With what parameters? Patch below tries with same option struct, but ... > > > Do you get the inform note? I suppose we don't really want that, no? > > > > ... I think that way we'll get the inform note (while the previous solution > did not). > > I've also tried with a tmp2 memset to 0, but that ran into problems when doing > a maybe_set_param_value. Use global_options_set? That should do what the other patch did. > Thanks, > - Tom > > diff --git a/gcc/tree-streamer-in.c b/gcc/tree-streamer-in.c > index 7f7ea7f..e0e792b 100644 > --- a/gcc/tree-streamer-in.c > +++ b/gcc/tree-streamer-in.c > @@ -33,6 +33,7 @@ along with GCC; see the file COPYING3. If not see > #include "ipa-chkp.h" > #include "gomp-constants.h" > #include "asan.h" > +#include "opts.h" > > > /* Read a STRING_CST from the string table in DATA_IN using input > @@ -769,6 +770,20 @@ lto_input_ts_function_decl_tree_pointers (struct > lto_input_block *ib, >DECL_FUNCTION_SPECIFIC_TARGET (expr) = stream_read_tree (ib, data_in); > #endif >DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = stream_read_tree (ib, > data_in); > +#ifdef ACCEL_COMPILER > + { > +tree opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr); > +if (opts) > + { > + struct gcc_options tmp; > + cl_optimization_restore (&tmp, TREE_OPTIMIZATION (opts)); > + finish_options (&tmp, &tmp, DECL_SOURCE_LOCATION (expr)); > + opts = build_optimization_node (&tmp); > + > + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (expr) = opts; > + } > + } > +#endif > >/* If the file contains a function with an EH personality set, > then it was compiled with -fexceptions. In that case, initialize > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Re: [PATCH] update edge profile info in nvptx.c
On 07/20/2017 03:04 PM, Tom de Vries wrote: On 07/13/2017 06:53 PM, Cesar Philippidis wrote: Similarly, for nvptx vector reductions, when it comes time to initialize the reduction variable, the nvptx BE constructs a branch so that only vector lanes 1 to vector_length-1 are initialized the the default value for a given reduction type, where vector lane 0 retains the original value of the reduction variable. For similar reason to the gang and worker reductions, I set the probability of the new edge introduced for the vector reduction to even. Hi, The problem that you describe in abstract term looks like this concretely: (gdb) call debug_bb_n (4) ;; basic block 4, loop depth 0, freq 662, maybe hot ;; prev block 3, next block 16, flags: (VISITED) ;; pred: 3 [always (guessed)] (FALLTHRU,EXECUTABLE) # VUSE <.MEM_61> # PT = nonlocal unit-escaped null _18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D) clique 1 base 1].s2D.1519; # VUSE <.MEMD.1540> # USE = anything _72 = GOACC_DIM_POS (2); if (_72 != 0) goto ; [100.00%] [count: INV] else goto ; [INV] [count: INV] ;; succ: 16 [always] (TRUE_VALUE) ;; 17 (FALSE_VALUE) ... The edge to bb16 has probability 100%. The edge to bb17 has no probability set. Hi, I. the patch below fixes the probabilities on the outgoing edges, setting them to even: ... (gdb) call debug_bb_n (4) ;; basic block 4, loop depth 0, freq 662, maybe hot ;; prev block 3, next block 16, flags: (VISITED) ;; pred: 3 [always (guessed)] (FALLTHRU,EXECUTABLE) # VUSE <.MEM_61> # PT = nonlocal unit-escaped null _18 = MEM[(const struct .omp_data_t.33D.1518 &).omp_data_i_9(D) clique 1 base 1].s2D.1519; # VUSE <.MEMD.1540> # USE = anything _72 = GOACC_DIM_POS (2); if (_72 != 0) goto ; [50.00%] [count: INV] else goto ; [50.00%] [count: INV] ;; succ: 16 [50.0% (adjusted)] (TRUE_VALUE) ;; 17 [50.0% (adjusted)] (FALSE_VALUE) ... II. The quality is 'adjusted'. [ Even() first calls always() which has quality precise, and then applies scale(), which downgrades the quality from 'precise' to 'adjusted'. ] The reason for that is explained in this comment AFAIU: ... Named probabilities except for never/always are assumed to be statically guessed and thus not necessarily accurate. ... When I look at the definitions of 'adjusted' and 'precise': ... /* Profile was originally based on feedback but it was adjusted by code duplicating optimization. It may not precisely reflect the particular code path. */ profile_adjusted = 2, /* Profile was read from profile feedback or determined by accurate static method. */ profile_precise = 3 ... I wonder: there seem to be two situations in which 'precise' is possible: - Profile was read from profile feedback - Profile was determined by accurate static method But there is only one situation where 'adjusted' is possible: - Profile was originally based on feedback but it was adjusted by code duplicating optimization. I can imagine as well that we originally have a static method giving precise information, and that this information is downgraded by a code duplication optimization. So, should this be instead: ... /* Profile was originally based on feedback or accurate static method, but it was adjusted by code duplicating optimization. It may not precisely reflect the particular code path. */ profile_adjusted = 2, ... ? III. In this particular case, we insert a conditional jump based on a special machine register which gives us the knowledge that both the true and false edge are equally likely (in fact, both are executed, with different warp enabling mask). So I think the most accurate representation would be 50/50 'precise', but that does not fit with the assumption above. I'll commit the patch below for now. But I'm curious if you have any comments. Thanks, - Tom diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 6314653..2c427fa 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -5149,6 +5149,7 @@ nvptx_goacc_reduction_init (gcall *call) /* Fixup flags from call_bb to init_bb. */ init_edge->flags ^= EDGE_FALLTHRU | EDGE_TRUE_VALUE; + init_edge->probability = profile_probability::even (); /* Set the initialization stmts. */ gimple_seq init_seq = NULL; @@ -5164,6 +5165,7 @@ nvptx_goacc_reduction_init (gcall *call) /* Create false edge from call_bb to dst_bb. */ edge nop_edge = make_edge (call_bb, dst_bb, EDGE_FALSE_VALUE); + nop_edge->probability = profile_probability::even (); /* Create phi node in dst block. */ gphi *phi = create_phi_node (lhs, dst_bb);
Re: [PATCH][1/n] Fix PR81303
On Fri, 21 Jul 2017, Bin.Cheng wrote: > On Fri, Jul 21, 2017 at 8:12 AM, Richard Biener wrote: > > > > The following is sth I noticed when looking at a way to fix PR81303. > > We happily compute a runtime cost model threshold that executes the > > vectorized variant even though no vector iteration takes place due > > to the number of prologue/epilogue iterations. The following fixes > > that -- note that if we do not know the prologue/epilogue counts > > statically they are estimated at vf/2 which means there's still the > > chance the vector iteration won't execute. To fix that we'd have to > > estimate those as vf-1 instead, sth we might consider doing anyway > > given that we regularly completely peel the epilogues vf-1 times > > in that case. Maybe as followup. > Hi, > Do we consider disabling epilogue peeling if # of iters is unknown at > compilation time? Well, we peel which if size doesn't explode should be profitable because of better branch prediction. > When loop is versioned, could we share epilogue > with versioned loop? Epilogue was shared before? When not versioning we share the epilogue, yes. With versioning we don't because we use the versioning code. It looks like sharing the epilogue makes RAs job harder though. Richard. > Thanks, > bin > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > > > Richard. > > > > 2016-07-21 Richard Biener > > > > PR tree-optimization/81303 > > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take > > into account prologue and epilogue iterations when raising > > min_profitable_iters to sth at least covering one vector iteration. > > > > Index: gcc/tree-vect-loop.c > > === > > --- gcc/tree-vect-loop.c(revision 250384) > > +++ gcc/tree-vect-loop.c(working copy) > > @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop > >" Calculated minimum iters for profitability: %d\n", > >min_profitable_iters); > > > > - min_profitable_iters = > > - min_profitable_iters < vf ? vf : min_profitable_iters; > > + /* We want the vectorized loop to execute at least once. */ > > + if (min_profitable_iters < (vf + peel_iters_prologue + > > peel_iters_epilogue)) > > +min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue; > > > >if (dump_enabled_p ()) > > dump_printf_loc (MSG_NOTE, vect_location, > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
[PATCH][2/n] Fix PR81303
This fixes mismatched cost compute in peeling cost estimation by using the same routine also for no peeling costing (the original implementation failed to handle the same special cases such as groups and thus always over-estimated no peeling cost...). It doesn't disable peeling for bwaves for me because we still improve (as expected) cost by 1 (but outside cost increases significantly). Factoring in outside cost looks difficult so maybe instead of a hard compare (13 vs. 14 in this case) we should require a percentage improvement. Anyway, this fixes a bug. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2017-07-21 Richard Biener PR tree-optimization/81303 * tree-vect-data-refs.c (vect_get_peeling_costs_all_drs): Pass in datarefs vector. Allow NULL dr0 for no peeling cost estimate. (vect_peeling_hash_get_lowest_cost): Adjust. (vect_enhance_data_refs_alignment): Likewise. Use vect_get_peeling_costs_all_drs to compute the penalty for no peeling to match up costs. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 250386) +++ gcc/tree-vect-data-refs.c (working copy) @@ -1159,25 +1159,21 @@ vect_peeling_hash_get_most_frequent (_ve misalignment will be zero after peeling. */ static void -vect_get_peeling_costs_all_drs (struct data_reference *dr0, +vect_get_peeling_costs_all_drs (vec datarefs, + struct data_reference *dr0, unsigned int *inside_cost, unsigned int *outside_cost, stmt_vector_for_cost *body_cost_vec, unsigned int npeel, bool unknown_misalignment) { - gimple *stmt = DR_STMT (dr0); - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); - vec datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); - unsigned i; data_reference *dr; FOR_EACH_VEC_ELT (datarefs, i, dr) { - stmt = DR_STMT (dr); - stmt_info = vinfo_for_stmt (stmt); + gimple *stmt = DR_STMT (dr); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); /* For interleaving, only the alignment of the first access matters. */ if (STMT_VINFO_GROUPED_ACCESS (stmt_info) @@ -1192,7 +1188,9 @@ vect_get_peeling_costs_all_drs (struct d int save_misalignment; save_misalignment = DR_MISALIGNMENT (dr); - if (unknown_misalignment && dr == dr0) + if (npeel == 0) + ; + else if (unknown_misalignment && dr == dr0) SET_DR_MISALIGNMENT (dr, 0); else vect_update_misalignment_for_peel (dr, dr0, npeel); @@ -1222,7 +1220,8 @@ vect_peeling_hash_get_lowest_cost (_vect body_cost_vec.create (2); epilogue_cost_vec.create (2); - vect_get_peeling_costs_all_drs (elem->dr, &inside_cost, &outside_cost, + vect_get_peeling_costs_all_drs (LOOP_VINFO_DATAREFS (loop_vinfo), + elem->dr, &inside_cost, &outside_cost, &body_cost_vec, elem->npeel, false); body_cost_vec.release (); @@ -1651,7 +1650,7 @@ vect_enhance_data_refs_alignment (loop_v stmt_vector_for_cost dummy; dummy.create (2); - vect_get_peeling_costs_all_drs (dr0, + vect_get_peeling_costs_all_drs (datarefs, dr0, &load_inside_cost, &load_outside_cost, &dummy, vf / 2, true); @@ -1660,7 +1659,7 @@ vect_enhance_data_refs_alignment (loop_v if (first_store) { dummy.create (2); - vect_get_peeling_costs_all_drs (first_store, + vect_get_peeling_costs_all_drs (datarefs, first_store, &store_inside_cost, &store_outside_cost, &dummy, vf / 2, true); @@ -1744,18 +1743,15 @@ vect_enhance_data_refs_alignment (loop_v dr0 = unsupportable_dr; else if (do_peeling) { - /* Calculate the penalty for no peeling, i.e. leaving everything -unaligned. -TODO: Adapt vect_get_peeling_costs_all_drs and use here. + /* Calculate the penalty for no peeling, i.e. leaving everything as-is. TODO: Use nopeel_outside_cost or get rid of it? */ unsigned nopeel_inside_cost = 0; unsigned nopeel_outside_cost = 0; stmt_vector_for_cost dummy; dummy.create (2); - FOR_EACH_VEC_ELT (datarefs, i, dr) - vect_get_data_access_cost (dr, &nopeel_inside_cost, - &nopeel_outside_cost, &dummy); + vect_get_peeling_costs_all_drs (datarefs, NULL, &nopeel_inside_cost, + &nopeel_outside_cost, &dummy, 0, false
Re: [PATCH] trivial cleanup in dwarf2out.c
On Fri, Jul 21, 2017 at 10:36:46AM +0200, Ulrich Drepper wrote: > While looking through dwarf2out.c I came across this if expression where > supposedly in case DWARF before 5 is used the 128 LEB encoding is used. > This of course cannot be the case. There isn't really a deeper problem > since the entire block is guarded by a test for at least DWARF 5. > > I propose the following patch. > > [gcc/ChangeLog] > > 2017-07-21 Ulrich Drepper > > * dwarf2out.c (output_file_names): Avoid double testing for > dwarf_version >= 5. Ok, thanks. > --- gcc/dwarf2out.c 2017-07-21 06:15:26.993826963 +0200 > +++ gcc/dwarf2out.c-new 2017-07-21 10:29:03.382742797 +0200 > @@ -11697,7 +11697,7 @@ output_file_names (void) >output_line_string (str_form, filename0, "File Entry", 0); > >/* Include directory index. */ > - if (dwarf_version >= 5 && idx_form != DW_FORM_udata) > + if (idx_form != DW_FORM_udata) > dw2_asm_output_data (idx_form == DW_FORM_data1 ? 1 : 2, >0, NULL); >else Jakub
[PATCH] trivial cleanup in dwarf2out.c
While looking through dwarf2out.c I came across this if expression where supposedly in case DWARF before 5 is used the 128 LEB encoding is used. This of course cannot be the case. There isn't really a deeper problem since the entire block is guarded by a test for at least DWARF 5. I propose the following patch. [gcc/ChangeLog] 2017-07-21 Ulrich Drepper * dwarf2out.c (output_file_names): Avoid double testing for dwarf_version >= 5. --- gcc/dwarf2out.c 2017-07-21 06:15:26.993826963 +0200 +++ gcc/dwarf2out.c-new 2017-07-21 10:29:03.382742797 +0200 @@ -11697,7 +11697,7 @@ output_file_names (void) output_line_string (str_form, filename0, "File Entry", 0); /* Include directory index. */ - if (dwarf_version >= 5 && idx_form != DW_FORM_udata) + if (idx_form != DW_FORM_udata) dw2_asm_output_data (idx_form == DW_FORM_data1 ? 1 : 2, 0, NULL); else
Re: Enable crossjumping with bb-reorering
Hi Honza, Just a couple of typo nits. Cheers, Kyrill On 21/07/17 00:38, Jan Hubicka wrote: Hello, Caroline disabled crossjumping with bb-reordering in initial patch implementing this pass. According to discussion with Rth it was only becuase she was unsure how to prevent crossjumping to mix up crossing and non-crossing jumps. THis is easy to do - we only need to avoid merging code across section as done by this patch. Bootstrapped/regtested x86_64-linux, will commit it tomorrow. * cfgcleanup.c (flow_find_cross_jump): Do not crossjump across hot/cold regions. (try_crossjump_to_edge): Do not punt on partitioned functions. Index: cfgcleanup.c === --- cfgcleanup.c(revision 250378) +++ cfgcleanup.c(working copy) @@ -1435,6 +1435,13 @@ flow_find_cross_jump (basic_block bb1, b if (i1 == BB_HEAD (bb1) || i2 == BB_HEAD (bb2)) break; + /* Do not turn corssing edge to non-crossing or vice versa after "crossing" +reload. */ + if (BB_PARTITION (BLOCK_FOR_INSN (i1)) + != BB_PARTITION (BLOCK_FOR_INSN (i2)) + && reload_completed) + break; + dir = merge_dir (dir, old_insns_match_p (0, i1, i2)); if (dir == dir_none || (!dir_p && dir != dir_both)) break; @@ -1958,18 +1965,6 @@ try_crossjump_to_edge (int mode, edge e1 newpos1 = newpos2 = NULL; - /* If we have partitioned hot/cold basic blocks, it is a bad idea - to try this optimization. - - Basic block partitioning may result in some jumps that appear to - be optimizable (or blocks that appear to be mergeable), but which really - must be left untouched (they are required to make it safely across - partition boundaries). See the comments at the top of - bb-reorder.c:partition_hot_cold_basic_blocks for complete details. */ - - if (crtl->has_bb_partition && reload_completed) -return false; - /* Search backward through forwarder blocks. We don't need to worry about multiple entry or chained forwarders, as they will be optimized away. We do this to look past the unconditional jump following a @@ -2003,6 +1998,11 @@ try_crossjump_to_edge (int mode, edge e1 if (EDGE_COUNT (src1->preds) == 0 || EDGE_COUNT (src2->preds) == 0) return false; + /* Do not turn corssing edge to non-crossing or vice versa after reload. */ Likewise. + if (BB_PARTITION (src1) != BB_PARTITION (src2) + && reload_completed) +return false; + /* Look for the common insn sequence, part the first ... */ if (!outgoing_edges_match (mode, src1, src2)) return false; @@ -2024,12 +2024,10 @@ try_crossjump_to_edge (int mode, edge e1 if (dir == dir_backward) { -#define SWAP(T, X, Y) do { T tmp = (X); (X) = (Y); (Y) = tmp; } while (0) - SWAP (basic_block, osrc1, osrc2); - SWAP (basic_block, src1, src2); - SWAP (edge, e1, e2); - SWAP (rtx_insn *, newpos1, newpos2); -#undef SWAP + std::swap (osrc1, osrc2); + std::swap (src1, src2); + std::swap (e1, e2); + std::swap (newpos1, newpos2); } /* Don't proceed with the crossjump unless we found a sufficient number
Re: [PATCH][1/n] Fix PR81303
On Fri, Jul 21, 2017 at 8:12 AM, Richard Biener wrote: > > The following is sth I noticed when looking at a way to fix PR81303. > We happily compute a runtime cost model threshold that executes the > vectorized variant even though no vector iteration takes place due > to the number of prologue/epilogue iterations. The following fixes > that -- note that if we do not know the prologue/epilogue counts > statically they are estimated at vf/2 which means there's still the > chance the vector iteration won't execute. To fix that we'd have to > estimate those as vf-1 instead, sth we might consider doing anyway > given that we regularly completely peel the epilogues vf-1 times > in that case. Maybe as followup. Hi, Do we consider disabling epilogue peeling if # of iters is unknown at compilation time? When loop is versioned, could we share epilogue with versioned loop? Epilogue was shared before? Thanks, bin > > Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. > > Richard. > > 2016-07-21 Richard Biener > > PR tree-optimization/81303 > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take > into account prologue and epilogue iterations when raising > min_profitable_iters to sth at least covering one vector iteration. > > Index: gcc/tree-vect-loop.c > === > --- gcc/tree-vect-loop.c(revision 250384) > +++ gcc/tree-vect-loop.c(working copy) > @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop >" Calculated minimum iters for profitability: %d\n", >min_profitable_iters); > > - min_profitable_iters = > - min_profitable_iters < vf ? vf : min_profitable_iters; > + /* We want the vectorized loop to execute at least once. */ > + if (min_profitable_iters < (vf + peel_iters_prologue + > peel_iters_epilogue)) > +min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue; > >if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location,
[PATCH][1/n] Fix PR81303
The following is sth I noticed when looking at a way to fix PR81303. We happily compute a runtime cost model threshold that executes the vectorized variant even though no vector iteration takes place due to the number of prologue/epilogue iterations. The following fixes that -- note that if we do not know the prologue/epilogue counts statically they are estimated at vf/2 which means there's still the chance the vector iteration won't execute. To fix that we'd have to estimate those as vf-1 instead, sth we might consider doing anyway given that we regularly completely peel the epilogues vf-1 times in that case. Maybe as followup. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-07-21 Richard Biener PR tree-optimization/81303 * tree-vect-loop.c (vect_estimate_min_profitable_iters): Take into account prologue and epilogue iterations when raising min_profitable_iters to sth at least covering one vector iteration. Index: gcc/tree-vect-loop.c === --- gcc/tree-vect-loop.c(revision 250384) +++ gcc/tree-vect-loop.c(working copy) @@ -3702,8 +3702,9 @@ vect_estimate_min_profitable_iters (loop " Calculated minimum iters for profitability: %d\n", min_profitable_iters); - min_profitable_iters = - min_profitable_iters < vf ? vf : min_profitable_iters; + /* We want the vectorized loop to execute at least once. */ + if (min_profitable_iters < (vf + peel_iters_prologue + peel_iters_epilogue)) +min_profitable_iters = vf + peel_iters_prologue + peel_iters_epilogue; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location,