Re: [PATCH, PR 60419] Clear thunk flag of zombie nodes
On Thu, 20 Mar 2014, Martin Jambor wrote: Hi, in PR 60419 we end up with a call graph node for a thunk that has no callee because symtab_remove_unreachable_nodes has determined its body is not needed although its declaration is still reachable (more details in comment 11 in bugzilla) and removal of callees is a part of the zombification process that such nodes undergo. Later on, the last stage of inlining that runs after that connects the thunk to the call graph and we segfault because we expect thunks to have a callee. So we can either keep thunk targets alive or clear the thunk flag. Thunks and aliases are quite similar and symtab_remove_unreachable_nodes does clear the alias flag and the in border nodes are referred to but not output and thus just another symbol. Therefore I believe it is correct and much simpler to remove the thunk flag s well. Bootstrapped and tested on x86_64, I have also build Mozilla Firefox witht the patch (without LTO, partly on purpose, partly because again I'm having issues with LTO after updating FF). OK for trunk? Ok. Thanks, Richard. There is the same issue on the 4.8 branch, but the patch does not apply, I'm in the process of preparing it. Thanks, Martin 2014-03-20 Martin Jambor mjam...@suse.cz PR ipa/60419 * ipa.c (symtab_remove_unreachable_nodes): Clear thunk flag of nodes in the border. testsuite/ * g++.dg/ipa/pr60419.C: New test. diff --git a/gcc/ipa.c b/gcc/ipa.c index 572dba1..164de0d 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -488,6 +488,7 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file) node-definition = false; node-cpp_implicit_alias = false; node-alias = false; + node-thunk.thunk_p = false; node-weakref = false; if (!node-in_other_partition) node-local.local = false; diff --git a/gcc/testsuite/g++.dg/ipa/pr60419.C b/gcc/testsuite/g++.dg/ipa/pr60419.C new file mode 100644 index 000..84461f3 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr60419.C @@ -0,0 +1,80 @@ +// PR middle-end/60419 +// { dg-do compile } +// { dg-options -O2 } + +struct C +{ +}; + +struct I : C +{ + I (); +}; + +struct J +{ + void foo (); + J (); + virtual void foo (int , int); +}; + +template class +struct D +{ + virtual void foo (I ) const; + void bar () + { +I p; +foo (p); + } +}; + +struct K : J, public Dint +{ +}; + +struct F +{ + K *operator-(); +}; + +struct N : public K +{ + void foo (int , int); + I n; + void foo (I ) const {} +}; + +struct L : J +{ + F l; +}; + +struct M : F +{ + L *operator-(); +}; + +struct G +{ + G (); +}; + +M h; + +G::G () +try +{ + N f; + f.bar (); + throw; +} +catch (int) +{ +} + +void +baz () +{ + h-l-bar (); +} -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH, PR 59176] Mark zombie call graph nodes to remove verifier false positive
On Thu, 20 Mar 2014, Martin Jambor wrote: Hi, On Thu, Mar 20, 2014 at 07:40:56PM +0100, Jakub Jelinek wrote: On Thu, Mar 20, 2014 at 05:07:32PM +0100, Martin Jambor wrote: in the PR, verifier claims an edge is pointing to a wrong declaration even though it has successfully verified the edge multiple times before. The reason is that symtab_remove_unreachable_nodes decides to remove the body of a node and also clear any information that it is an alias of another in the process (more detailed analysis in comment #9 of the bug). In bugzilla Honza wrote that silencing the verifier is the way to go. Either we can dedicate a new flag in each cgraph_node or symtab_node just for the purpose of verification or do something more hackish like the patch below which re-uses the former_clone_of field for this purpose. Since clones are always private nodes, they should always either survive removal of unreachable nodes or be completely killed by it and should never enter the in_border zombie state. Therefore their former_clone_of must always be NULL. So I added a new special value, error_mark_node, to mark this zombie state and taught the verifier to be happy with such nodes. Bootstrapped and tested on x86_64-linux. What do you think? Don't we have like 22 spare bits in cgraph_node and 20 spare bits in symtab_node? I'd find it clearer if you just used a new flag to mark the zombie nodes. Though, I'll let Richard or Honza to decide, don't feel strongly about it. I guess you are right, here is the proper version which is currently undergoing bootstrap and testing. I agree with Jakub, the following variant is ok. Thanks, Richard. Thanks, Martin 2014-03-20 Martin Jambor mjam...@suse.cz PR ipa/59176 * cgraph.h (symtab_node): New flag body_removed. * ipa.c (symtab_remove_unreachable_nodes): Set body_removed flag when removing bodies. * symtab.c (dump_symtab_base): Dump body_removed flag. * cgraph.c (verify_edge_corresponds_to_fndecl): Skip nodes which had their bodies removed. testsuite/ * g++.dg/torture/pr59176.C: New test. diff --git a/gcc/cgraph.c b/gcc/cgraph.c index a15b6bc..fb6880c 100644 --- a/gcc/cgraph.c +++ b/gcc/cgraph.c @@ -2601,8 +2601,13 @@ verify_edge_corresponds_to_fndecl (struct cgraph_edge *e, tree decl) node = cgraph_get_node (decl); /* We do not know if a node from a different partition is an alias or what it - aliases and therefore cannot do the former_clone_of check reliably. */ - if (!node || node-in_other_partition || e-callee-in_other_partition) + aliases and therefore cannot do the former_clone_of check reliably. When + body_removed is set, we have lost all information about what was alias or + thunk of and also cannot proceed. */ + if (!node + || node-body_removed + || node-in_other_partition + || e-callee-in_other_partition) return false; node = cgraph_function_or_thunk_node (node, NULL); diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 32b1ee1..59d9ce6 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -91,7 +91,9 @@ public: unsigned forced_by_abi : 1; /* True when the name is known to be unique and thus it does not need mangling. */ unsigned unique_name : 1; - + /* True when body and other characteristics have been removed by + symtab_remove_unreachable_nodes. */ + unsigned body_removed : 1; /*** WHOPR Partitioning flags. These flags are used at ltrans stage when only part of the callgraph is diff --git a/gcc/ipa.c b/gcc/ipa.c index 572dba1..4a8c6b7 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -484,6 +484,7 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file) { if (file) fprintf (file, %s, node-name ()); + node-body_removed = true; node-analyzed = false; node-definition = false; node-cpp_implicit_alias = false; @@ -542,6 +543,7 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file) fprintf (file, %s, vnode-name ()); changed = true; } + vnode-body_removed = true; vnode-definition = false; vnode-analyzed = false; vnode-aux = NULL; diff --git a/gcc/symtab.c b/gcc/symtab.c index 5d69803..0ce8e8e 100644 --- a/gcc/symtab.c +++ b/gcc/symtab.c @@ -601,6 +601,8 @@ dump_symtab_base (FILE *f, symtab_node *node) ? IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node-alias_target)) : IDENTIFIER_POINTER (node-alias_target)); + if (node-body_removed) +fprintf (f, \n Body removed by symtab_remove_unreachable_nodes); fprintf (f, \n Visibility:); if (node-in_other_partition) fprintf (f, in_other_partition); diff --git
Re: [patch] Fix PR59295 -- remove useless warning
2014-03-21 1:33 GMT+01:00 Paul Pluzhnikov ppluzhni...@google.com: Greetings, Attached patch deletes code to warn about repeated friend declaration. Why not making this warning suppressable, instead of removing it ? Shouldn't it fall under -W(no)-redundant-decls ? -- Fabien
[wwwdocs] gcc-4.9/changes.html: Mention that LTO now generates slim objects
This patch mentions that -flto now generates slim objects. That's especially relevant for static libraries as one can there run into surprises, if one does not know about gcc-ar. OK - or do you have a better suggestion? Tobias --- changes.html8 Mar 2014 20:45:54 - 1.63 +++ changes.html21 Mar 2014 09:10:32 - @@ -65,6 +65,13 @@ liFunction bodies are now loaded on-demand and released early improving overall memory usage at link time./li liC++ hidden keyed methods can now be optimized out./li + liBy default, compiling with the code-flto/code option now generates + slim objects files (code.o/code) which only contain intermediate + language representation for LTO. Use code-ffat-lto-objects/code to + create files which contain additionally the object code. To generate + static libraries suitable for LTO processing, use codegcc-ar/code + and codegcc-ranlib/code (requires that codear/code and + coderanlib/code have been compiled with plugin support)./li /ul Memory usage building Firefox with debug enabled was reduced from 15GB to 3.5GB; link time from 1700 seconds to 350 seconds.
Re: [wwwdocs] gcc-4.9/changes.html: Mention that LTO now generates slim objects
On Fri, 21 Mar 2014, Tobias Burnus wrote: This patch mentions that -flto now generates slim objects. That's especially relevant for static libraries as one can there run into surprises, if one does not know about gcc-ar. OK - or do you have a better suggestion? Please adjust slightly to When using a linker-plugin, compiling with the code-flto/code option now generates slim object files by default ... as the change doesn't affect _all_ configurations but only those with HAVE_LTO_PLUGIN set (thus -fno-use-linker-plugin overrides the default as well). I'd also mention gcc-nm which needs to be used to inspect archives with thin LTO files. Richard. Tobias --- changes.html8 Mar 2014 20:45:54 - 1.63 +++ changes.html21 Mar 2014 09:10:32 - @@ -65,6 +65,13 @@ liFunction bodies are now loaded on-demand and released early improving overall memory usage at link time./li liC++ hidden keyed methods can now be optimized out./li + liBy default, compiling with the code-flto/code option now generates + slim objects files (code.o/code) which only contain intermediate + language representation for LTO. Use code-ffat-lto-objects/code to + create files which contain additionally the object code. To generate + static libraries suitable for LTO processing, use codegcc-ar/code + and codegcc-ranlib/code (requires that codear/code and + coderanlib/code have been compiled with plugin support)./li /ul Memory usage building Firefox with debug enabled was reduced from 15GB to 3.5GB; link time from 1700 seconds to 350 seconds.
[PATCH] Fix PR60577
This fixes a missed optimization regarding to coverage counters. In the past we failed to apply proper may-alias analysis and disregarded those counters as being possibly aliased by pointers by accident. The following patch adds a flag so we can mark VAR_DECLs as to be not considered as aliased by pointers so we can make that an informed decision in this case (even though strictly wrong - the address to those counters escapes into the gcov info struct we build and pass to __gcov_init). Profiledbootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? Do we eventually want sth like this for the branch(es)? Thanks, Richard. 2014-03-21 Richard Biener rguent...@suse.de PR tree-optimization/60577 * tree-core.h (struct tree_base): Document nothrow_flag use in VAR_DECL_NONALIASED. * tree.h (VAR_DECL_NONALIASED): New. (may_be_aliased): Adjust. * coverage.c (build_var): Set VAR_DECL_NONALIASED. * gcc.dg/tree-ssa/ssa-lim-11.c: New testcase. Index: gcc/tree-core.h === *** gcc/tree-core.h (revision 208693) --- gcc/tree-core.h (working copy) *** struct GTY(()) tree_base { *** 987,992 --- 987,995 SSA_NAME_IN_FREELIST in SSA_NAME +VAR_DECL_NONALIASED in + VAR_DECL + deprecated_flag: TREE_DEPRECATED in Index: gcc/tree.h === *** gcc/tree.h (revision 208693) --- gcc/tree.h (working copy) *** extern void decl_fini_priority_insert (t *** 2441,2446 --- 2441,2450 #define DECL_NONLOCAL_FRAME(NODE) \ (VAR_DECL_CHECK (NODE)-base.default_def_flag) + /* In a VAR_DECL, nonzero if this variable is not aliased by any pointer. */ + #define DECL_NONALIASED(NODE) \ + (VAR_DECL_CHECK (NODE)-base.nothrow_flag) + /* This field is used to reference anything in decl.result and is meant only for use by the garbage collector. */ #define DECL_RESULT_FLD(NODE) \ *** static inline bool *** 4462,4473 may_be_aliased (const_tree var) { return (TREE_CODE (var) != CONST_DECL - !((TREE_STATIC (var) || TREE_PUBLIC (var) || DECL_EXTERNAL (var)) - TREE_READONLY (var) - !TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (var))) (TREE_PUBLIC (var) || DECL_EXTERNAL (var) ! || TREE_ADDRESSABLE (var))); } /* Return pointer to optimization flags of FNDECL. */ --- 4466,4479 may_be_aliased (const_tree var) { return (TREE_CODE (var) != CONST_DECL (TREE_PUBLIC (var) || DECL_EXTERNAL (var) ! || TREE_ADDRESSABLE (var)) ! !((TREE_STATIC (var) || TREE_PUBLIC (var) || DECL_EXTERNAL (var)) ! ((TREE_READONLY (var) !!TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (var))) ! || (TREE_CODE (var) == VAR_DECL ! DECL_NONALIASED (var); } /* Return pointer to optimization flags of FNDECL. */ Index: gcc/coverage.c === *** gcc/coverage.c (revision 208693) --- gcc/coverage.c (working copy) *** build_var (tree fn_decl, tree type, int *** 721,726 --- 721,727 DECL_NAME (var) = get_identifier (buf); TREE_STATIC (var) = 1; TREE_ADDRESSABLE (var) = 1; + DECL_NONALIASED (var) = 1; DECL_ALIGN (var) = TYPE_ALIGN (type); return var; Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c === *** gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c (working copy) *** *** 0 --- 1,25 + /* { dg-do compile } */ + /* { dg-options -O -fprofile-arcs -fdump-tree-lim1-details } */ + + struct thread_param + { + long* buf; + long iterations; + long accesses; + } param; + + void access_buf(struct thread_param* p) + { + long i,j; + long iterations = p-iterations; + long accesses = p-accesses; + for (i=0; iiterations; i++) + { + long* pbuf = p-buf; + for (j=0; jaccesses; j++) + pbuf[j] += 1; + } + } + + /* { dg-final { scan-tree-dump-times Executing store motion of __gcov0.access_buf\\\[\[01\]\\\] from loop 1 2 lim1 } } */ + /* { dg-final { cleanup-tree-dump lim1 } } */
Re: [PATCH] Fix PR60577
On Fri, Mar 21, 2014 at 11:28:14AM +0100, Richard Biener wrote: This fixes a missed optimization regarding to coverage counters. In the past we failed to apply proper may-alias analysis and disregarded those counters as being possibly aliased by pointers by accident. The following patch adds a flag so we can mark VAR_DECLs as to be not considered as aliased by pointers so we can make that an informed decision in this case (even though strictly wrong - the address to those counters escapes into the gcov info struct we build and pass to __gcov_init). Profiledbootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk? Ok. Do we eventually want sth like this for the branch(es)? I'd say no, or at least not immediately. Jakub
Re: Fix PR59586
Hi Mircea, sorry for making you wait. -- Roman Gareev patch Description: Binary data ChangeLog_entry Description: Binary data
Re: [PATCH] dwarf2out: Represent bound_info with normal constant values if possible.
So the patch below makes it so that if HOST_WIDE_INT is wide enough then, depending on whether the range type is signed or not, add_AT_unsigned or add_AT_int is used. This is more efficient for small ranges. And makes it so that the value can be deduced from the DW_FORM by the consumer (which can assume again that DW_FORM_data[1248] are simply zero-extended and that negative constant values are represented by DW_FORM_sdata). FWIW this looks an improvement to me. I tested this on x86_64 with --enable-languages=c,ada,c++,fortran,java,objc without regressions. I also made sure that the example ada program range is recognized correctly by gdb with this patch. A couple of questions: - Are there more ada DWARF tests? Something like guality used for c/fortran? In the compiler proper no, but there is (of course) the GDB testsuite. - What values of HOST_BITS_PER_WIDE_INT are actually supported in GCC? The dwarf2out.c code tries to handle 8, 16, 32 and 64 bits for dw_val_class_const_double. 32 and 64 - Which setups use 32bit (or lower?) HOST_BITS_PER_WIDE_INT? i686 seems to require 64BIT HOST_WIDE_INTs too these days. Right, pure 32-bit hosted compilers are an endangered species and GNAT is probably not fully functional for these architectures. How did you run into the problem? Can't you conduct some minimal testing on 64-bit platforms by using 128-bit integers (not in Ada unfortunately)? -- Eric Botcazou
Re: [wwwdocs] gcc-4.9/changes.html: Mention that LTO now generates slim objects
On Fri, Mar 21, 2014 at 10:40:26AM +0100, Richard Biener wrote: On Fri, 21 Mar 2014, Tobias Burnus wrote: This patch mentions that -flto now generates slim objects. That's especially relevant for static libraries as one can there run into surprises, if one does not know about gcc-ar. Please adjust slightly to When using a linker-plugin, compiling with the code-flto/code option now generates slim object files by default ... as the change doesn't affect _all_ configurations but only those with HAVE_LTO_PLUGIN set (thus -fno-use-linker-plugin overrides the default as well). I'd also mention gcc-nm which needs to be used to inspect archives with thin LTO files. Thanks for the suggestions - updated patch below. Tobias --- changes.html8 Mar 2014 20:45:54 - 1.63 +++ changes.html21 Mar 2014 12:43:44 - @@ -65,6 +65,16 @@ liFunction bodies are now loaded on-demand and released early improving overall memory usage at link time./li liC++ hidden keyed methods can now be optimized out./li + liWhen using a linker plugin, compiling with the code-flto/code + option now generates slim objects files (code.o/code) which only + contain intermediate language representation for LTO. Use + code-ffat-lto-objects/code to create files which contain + additionally the object code. To generate static libraries suitable + for LTO processing, use codegcc-ar/code and + codegcc-ranlib/code; to list symbols from a slim object file use + codegcc-nm/code. (Requires that codear/code, + coderanlib/code and codenm/code have been compiled with + plugin support.)/li /ul Memory usage building Firefox with debug enabled was reduced from 15GB to 3.5GB; link time from 1700 seconds to 350 seconds.
Re: [wwwdocs] gcc-4.9/changes.html: Mention that LTO now generates slim objects
On Fri, 21 Mar 2014, Tobias Burnus wrote: On Fri, Mar 21, 2014 at 10:40:26AM +0100, Richard Biener wrote: On Fri, 21 Mar 2014, Tobias Burnus wrote: This patch mentions that -flto now generates slim objects. That's especially relevant for static libraries as one can there run into surprises, if one does not know about gcc-ar. Please adjust slightly to When using a linker-plugin, compiling with the code-flto/code option now generates slim object files by default ... as the change doesn't affect _all_ configurations but only those with HAVE_LTO_PLUGIN set (thus -fno-use-linker-plugin overrides the default as well). I'd also mention gcc-nm which needs to be used to inspect archives with thin LTO files. Thanks for the suggestions - updated patch below. Ok. Thanks, Richard. Tobias --- changes.html8 Mar 2014 20:45:54 - 1.63 +++ changes.html21 Mar 2014 12:43:44 - @@ -65,6 +65,16 @@ liFunction bodies are now loaded on-demand and released early improving overall memory usage at link time./li liC++ hidden keyed methods can now be optimized out./li + liWhen using a linker plugin, compiling with the code-flto/code + option now generates slim objects files (code.o/code) which only + contain intermediate language representation for LTO. Use + code-ffat-lto-objects/code to create files which contain + additionally the object code. To generate static libraries suitable + for LTO processing, use codegcc-ar/code and + codegcc-ranlib/code; to list symbols from a slim object file use + codegcc-nm/code. (Requires that codear/code, + coderanlib/code and codenm/code have been compiled with + plugin support.)/li /ul Memory usage building Firefox with debug enabled was reduced from 15GB to 3.5GB; link time from 1700 seconds to 350 seconds. -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
Re: [PATCH, PR 60419] Clear thunk flag of zombie nodes
Hi, On Fri, Mar 21, 2014 at 09:41:24AM +0100, Richard Biener wrote: On Thu, 20 Mar 2014, Martin Jambor wrote: Hi, in PR 60419 we end up with a call graph node for a thunk that has no callee because symtab_remove_unreachable_nodes has determined its body is not needed although its declaration is still reachable (more details in comment 11 in bugzilla) and removal of callees is a part of the zombification process that such nodes undergo. Later on, the last stage of inlining that runs after that connects the thunk to the call graph and we segfault because we expect thunks to have a callee. So we can either keep thunk targets alive or clear the thunk flag. Thunks and aliases are quite similar and symtab_remove_unreachable_nodes does clear the alias flag and the in border nodes are referred to but not output and thus just another symbol. Therefore I believe it is correct and much simpler to remove the thunk flag s well. Bootstrapped and tested on x86_64, I have also build Mozilla Firefox witht the patch (without LTO, partly on purpose, partly because again I'm having issues with LTO after updating FF). OK for trunk? Ok. Thanks, I have just committed the trunk patch. A proposed 4.8 variant is below, it does the same thing at the same spot, although there is a number of minor differences between the branches. One of them is that symtab_remove_unreachable_nodes does not clear any flags there in 4.8 and therefore I have added also clearing of the alias flag because (although I do not have a testcase) just like there should not be any thunks without callees, there also should not be any aliases without references, cgraph_function_node would choke on them too. Bootstrapped and tested on the 4.8 branch on x86_64-linux. OK for the branch? Thanks, Martin 2014-03-20 Martin Jambor mjam...@suse.cz PR ipa/60419 * ipa.c (symtab_remove_unreachable_nodes): Clear thunk and alias flags of nodes in the border. testsuite/ * g++.dg/ipa/pr60419.C: New test. diff --git a/gcc/ipa.c b/gcc/ipa.c index a9b8fb4..d73d105 100644 --- a/gcc/ipa.c +++ b/gcc/ipa.c @@ -359,6 +359,8 @@ symtab_remove_unreachable_nodes (bool before_inlining_p, FILE *file) { if (file) fprintf (file, %s, cgraph_node_name (node)); + node-alias = false; + node-thunk.thunk_p = false; cgraph_node_remove_callees (node); ipa_remove_all_references (node-symbol.ref_list); changed = true; diff --git a/gcc/testsuite/g++.dg/ipa/pr60419.C b/gcc/testsuite/g++.dg/ipa/pr60419.C new file mode 100644 index 000..84461f3 --- /dev/null +++ b/gcc/testsuite/g++.dg/ipa/pr60419.C @@ -0,0 +1,80 @@ +// PR middle-end/60419 +// { dg-do compile } +// { dg-options -O2 } + +struct C +{ +}; + +struct I : C +{ + I (); +}; + +struct J +{ + void foo (); + J (); + virtual void foo (int , int); +}; + +template class +struct D +{ + virtual void foo (I ) const; + void bar () + { +I p; +foo (p); + } +}; + +struct K : J, public Dint +{ +}; + +struct F +{ + K *operator-(); +}; + +struct N : public K +{ + void foo (int , int); + I n; + void foo (I ) const {} +}; + +struct L : J +{ + F l; +}; + +struct M : F +{ + L *operator-(); +}; + +struct G +{ + G (); +}; + +M h; + +G::G () +try +{ + N f; + f.bar (); + throw; +} +catch (int) +{ +} + +void +baz () +{ + h-l-bar (); +}
Re: [PATCH, rs6000] More efficient vector permute for little endian
On Thu, Mar 20, 2014 at 9:38 PM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: Hi, The original workaround for vector permute on a little endian platform includes subtracting each element of the permute control vector from 31. Because the upper 3 bits of each element are unimportant, this was implemented as subtracting the whole vector from a splat of -1. On reflection this can be done more efficiently with a vector nor operation. This patch makes that change. Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions. Is this ok for trunk? Thanks, Bill 2014-03-20 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_expand_vector_set): Generate a pattern for vector nor instead of subtract from splat(-1). (altivec_expand_vec_perm_const_le): Likewise. Okay. Thanks, David
Re: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
On 19/03/14 16:53, Ian Bolton wrote: This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. arm-and-ior-notdi-zeroextend-patch-v1.txt diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr type multiple)] ) +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r,r] The early clobber and register tying here is unnecessary. All of the input operands are consumed in the first instruction, so you can eliminate the ties and the restriction on the overlap. Something like (untested): +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r] Ok for stage-1 with that change (though I'd recommend a another test run to validate the above). R. + TARGET_32BIT + # + TARGET_32BIT reload_completed + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *anddi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr type multiple)] ) +(define_insn_and_split *iordi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) + (zero_extend:DI + (match_operand:SI 1 s_register_operand r,r] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *iordi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +anddi_di_notdi (s64int a, s64int b) +{ + return (a ~b); +} + +s64int +anddi_di_notzesidi (s64int a, u32int b) +{ + return (a ~(u64int) b); +} + +s64int +anddi_notdi_zesidi (s64int a, u32int b) +{ + return (~a (u64int) b); +} + +s64int +anddi_di_notsesidi (s64int a, s32int b) +{ + return (a ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x5f470112ll; + s64int c64 = 0xdeadbeef300fll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = anddi_di_notdi (c64, b64); + if (z != 0xdeadbeef2008ll) +abort (); + + z = anddi_di_notzesidi (a64, c32); + if (z !=
Re: [patch] Fix PR59295 -- remove useless warning
On Fri, Mar 21, 2014 at 1:58 AM, Fabien Chêne fabien.ch...@gmail.com wrote: Why not making this warning suppressable, instead of removing it ? Shouldn't it fall under -W(no)-redundant-decls ? Thanks. I'll revise the patch to do that. -- Paul Pluzhnikov
[PATCH AArch64] Fix aarch64_simd_valid_immediate for Bigendian
This patch fixes a bug whereby a vector like V8QImode {1,0,1,0,1,0,1,0} can result in an instruction like movi v1.4h, 0x1 whereas on bigendian this should be movi v1.4h, 0x1, lsl 8 Regression tested on aarch64_be-none-elf: no changes in libstdc++, newlib; no regressions in gcc or g++ and FAIL-PASS as listed below. Ok for trunk (stage 4) ? Cheers, Alan gcc/ChangeLog: 2014-03-21 Alan Lawrence alan.lawre...@arm.com * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): reverse order of elements for bigendian. = FAIL-PASS in gcc testsuite: c-c++-common/cilk-plus/PS/reduction-1.c -ftree-vectorize -fcilkplus -std=c99 execution test gcc.c-torture/execute/2112-1.c execution, -O0 gcc.c-torture/execute/900409-1.c execution, -O0 gcc.c-torture/execute/p18298.c execution, -O0 gcc.c-torture/execute/pr53645-2.c execution, -O1 gcc.c-torture/execute/pr53645-2.c execution, -O2 gcc.c-torture/execute/pr53645-2.c execution, -O2 -flto gcc.c-torture/execute/pr53645-2.c execution, -O2 -flto -flto-partition=none gcc.c-torture/execute/pr53645-2.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none gcc.c-torture/execute/pr53645-2.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects gcc.c-torture/execute/pr53645-2.c execution, -O3 -fomit-frame-pointer gcc.c-torture/execute/pr53645-2.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions gcc.c-torture/execute/pr53645-2.c execution, -O3 -fomit-frame-pointer -funroll-loops gcc.c-torture/execute/pr53645-2.c execution, -O3 -g gcc.c-torture/execute/pr53645-2.c execution, -Og -g gcc.c-torture/execute/pr53645-2.c execution, -Os gcc.c-torture/execute/pr53645.c execution, -O1 gcc.c-torture/execute/pr53645.c execution, -O2 gcc.c-torture/execute/pr53645.c execution, -O2 -flto gcc.c-torture/execute/pr53645.c execution, -O2 -flto -flto-partition=none gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fno-use-linker-plugin -flto-partition=none gcc.c-torture/execute/pr53645.c execution, -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions gcc.c-torture/execute/pr53645.c execution, -O3 -fomit-frame-pointer -funroll-loops gcc.c-torture/execute/pr53645.c execution, -O3 -g gcc.c-torture/execute/pr53645.c execution, -Og -g FAIL-PASS in g++ testsuite: g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer execution test g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer -funroll-loops execution test g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test g++.dg/torture/pr37922.C -O3 -g execution test g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer execution test g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer -funroll-loops execution test g++.dg/torture/pr37922.C -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions execution test g++.dg/torture/pr37922.C -O3 -g execution testdiff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f24b248..3166ebd 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6563,7 +6563,9 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, /* Splat vector constant out into a byte vector. */ for (i = 0; i n_elts; i++) { - rtx el = CONST_VECTOR_ELT (op, i); + /* The vector is provided in gcc endian-neutral fashion. For aarch64_be, + it must be laid out in the vector register in reverse order. */ + rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i); unsigned HOST_WIDE_INT elpart; unsigned int part, parts;
Re: [gomp4] Add tables generation
On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. Bernd Index: gomp-4_0-branch/gcc/lto-wrapper.c === --- gomp-4_0-branch.orig/gcc/lto-wrapper.c +++ gomp-4_0-branch/gcc/lto-wrapper.c @@ -67,6 +67,7 @@ static unsigned int nr; static char **input_names; static char **output_names; static char **offload_names; +static const char *ompbegin, *ompend; static char *makefile; const char tool_name[] = lto-wrapper; @@ -479,6 +480,61 @@ compile_images_for_openmp_targets (unsig free_array_of_ptrs ((void**) names, num_targets); } +/* Copy a file from SRC to DEST. */ +static void +copy_file (const char *dest, const char *src) +{ + FILE *d = fopen (dest, wb); + FILE *s = fopen (src, rb); + char buffer[512]; + while (!feof (s)) +{ + size_t len = fread (buffer, 1, 512, s); + if (ferror (s) != 0) + fatal (reading input file); + if (len 0) + { + fwrite (buffer, 1, len, d); + if (ferror (d) != 0) + fatal (writing output file); + } +} +} + +/* Find the omp_begin.o and omp_end.o files in LIBRARY_PATH, make copies + and store the names of the copies in ompbegin and ompend. */ + +static void +find_ompbeginend (void) +{ + char **paths; + const char *library_path = getenv (LIBRARY_PATH); + if (library_path == NULL) +return; + int n_paths = parse_env_var (library_path, paths, /ompbegin.o); + + for (int i = 0; i n_paths; i++) +if (access_check (paths[i], R_OK) == 0) + { + size_t len = strlen (paths[i]); + char *tmp = xstrdup (paths[i]); + strcpy (paths[i] + len - 7, end.o); + if (access_check (paths[i], R_OK) != 0) + fatal (installation error, can't find ompend.o); + /* The linker will delete the filenames we give it, so make + copies. */ + const char *omptmp1 = make_temp_file (.o); + const char *omptmp2 = make_temp_file (.o); + copy_file (omptmp1, tmp); + ompbegin = omptmp1; + copy_file (omptmp2, paths[i]); + ompend = oindmptmp2; + free (tmp); + break; + } + + free_array_of_ptrs ((void**) paths, n_paths); +} /* Execute gcc. ARGC is the number of arguments. ARGV contains the arguments. */ @@ -964,6 +1020,7 @@ cont: compile_images_for_openmp_targets (argc, argv); if (offload_names) { + find_ompbeginend (); for (i = 0; offload_names[i]; i++) { fputs (offload_names[i], stdout); @@ -972,12 +1029,23 @@ cont: free_array_of_ptrs ((void **)offload_names, i); } } + if (ompbegin) + { + fputs (ompbegin, stdout); + putc ('\n', stdout); + } + for (i = 0; i nr; ++i) { fputs (output_names[i], stdout); putc ('\n', stdout); free (input_names[i]); } + if (ompend) + { + fputs (ompend, stdout); + putc ('\n', stdout); + } nr = 0; free (output_names); free (input_names); Index: gomp-4_0-branch/libgcc/configure === --- gomp-4_0-branch.orig/libgcc/configure +++ gomp-4_0-branch/libgcc/configure @@ -566,6 +566,7 @@ sfp_machine_header set_use_emutls set_have_cc_tls vis_hide +enable_accelerator fixed_point enable_decimal_float decimal_float @@ -664,6 +665,8 @@ with_build_libsubdir enable_decimal_float with_system_libunwind enable_sjlj_exceptions +enable_accelerator +enable_offload_targets enable_tls ' ac_precious_vars='build_alias @@ -1301,6 +1304,9 @@ Optional Features: to use --enable-sjlj-exceptions force use of builtin_setjmp for exceptions + --enable-acceleratorbuild accelerator [ARG={no,device-triplet}] + --enable-offload-targets=LIST + enable offloading to devices from LIST --enable-tlsUse thread-local storage [default=yes] Optional Packages: @@ -4357,6 +4363,43 @@ esac # Collect host-machine-specific information. . ${srcdir}/config.host +offload_targets= +# Check whether --enable-accelerator was given. +if test ${enable_accelerator+set} = set; then : + enableval=$enable_accelerator; + case $enable_accelerator in + no) ;; + *) +offload_targets=$enable_accelerator +;; + esac + +fi + + + +# Check whether --enable-offload-targets was given. +if test ${enable_offload_targets+set} = set; then : + enableval=$enable_offload_targets; + if test x$enable_offload_targets = x; then +as_fn_error no offload targets specified $LINENO 5 + else +if test x$offload_targets = x; then + offload_targets=$enable_offload_targets +else +
Re: [C++ Patch] PR 60384
Let's assert errorcount|sorrycount before returning in the !identifier case. OK with that change. Jason
Re: [gomp4] Add tables generation
On Fri, Mar 21, 2014 at 04:13:45PM +0100, Bernd Schmidt wrote: On 03/20/2014 07:56 PM, Jakub Jelinek wrote: When we were discussing the design last year, my strong preference was that either this lives in some other crt object that mkoffload/linker plugin adds to link, or that it would be completely mkoffload synthetized. mkoffload is only concerned with generating target images. These fragments are for the host tables. How's this? It moves everything to ompbegin.o/ompend.o and only links in these files if we have produced at least one target offload image. I'd call the files crtompbegin.o/crtompend.o instead. And, what is the exact reason why you are using protected visibility rather than hidden? Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. As for the __OPENMP_TARGET__ header format, that can be certainly resolved later on. Jakub
Re: [gomp4] Add tables generation
On 03/21/2014 04:20 PM, Jakub Jelinek wrote: And, what is the exact reason why you are using protected visibility rather than hidden? Also, supposedly if you've used section names without . in them, the linker itself would provide the symbols automatically and you wouldn't actually need begin/end, but just one object that would reference the linker created symbols. Just use say __gnu_offload_whatever__ or similar section names. Hmm, okay. No real reason for any of these except things were set up like this in Michael Zolotukhin's original patch. I'll tweak it some more. Bernd
[patch] Fix PR59295 -- move redundant friend decl warning under -Wredundant-decls
Greetings, To fix PR59295, this patch moves (generally useless) warning about repeated / redundant friend declarations under -Wredundant-decls. Tested on Linux/x86_64 with no regressions. Ok for trunk once it opens in stage 1? Thanks, -- 2014-03-21 Paul Pluzhnikov ppluzhni...@google.com PR c++/59295 * gcc/cp/friend.c (add_friend, make_friend_class): Move repeated friend warning under Wredundant_decls. Index: gcc/cp/friend.c === --- gcc/cp/friend.c (revision 208748) +++ gcc/cp/friend.c (working copy) @@ -148,7 +148,8 @@ if (decl == TREE_VALUE (friends)) { if (complain) - warning (0, %qD is already a friend of class %qT, + warning (OPT_Wredundant_decls, +%qD is already a friend of class %qT, decl, type); return; } @@ -376,7 +377,8 @@ if (friend_type == probe) { if (complain) - warning (0, %qD is already a friend of %qT, probe, type); + warning (OPT_Wredundant_decls, +%qD is already a friend of %qT, probe, type); break; } } @@ -385,7 +387,8 @@ if (same_type_p (probe, friend_type)) { if (complain) - warning (0, %qT is already a friend of %qT, probe, type); + warning (OPT_Wredundant_decls, +%qT is already a friend of %qT, probe, type); break; } }
[C++ Patch, 4.8] Backport fix for c++/54537 to FSF 4.8
The following patch has lived on mainline for 6 months and has not generated any issues there. We've also been using it on our 4.8 based IBM branch with no problems either, so I'd like to ask for permission to backport this fix to the FSF 4.8 branch. This will bring GCC into compliance with CLANG and the XL C++ compilers with respect to this bug. I know the XL C++ compiler specifically will not accept the definition in the tr1/cmath header file, therefore it is not able to compile any program that uses that header. Since there are a few 4.8 based distro compilers coming, I'd like to fix this in the FSF branch so they'll all get the fix automatically. Ok for the FSF 4.8 branch once my bootstrap and regtesting are complete (using powerpc64-linux)? Peter libstdc++-v3/ Backport from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * include/tr1/cmath: Remove pow(double,double) overload, remove a duplicated comment about DR 550. Add a comment to explain the issue. * testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc: New. gcc/cp/ Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * cp-tree.h: Check OVL_USED with OVERLOAD_CHECK. * name-lookup.c (do_nonmember_using_decl): Make sure we have an OVERLOAD before calling OVL_USED. Call diagnose_name_conflict instead of issuing an error without mentioning the conflicting declaration. gcc/testsuite/ Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org Peter Bergner berg...@vnet.ibm.com PR c++/54537 * g++.dg/overload/using3.C: New. * g++.dg/overload/using2.C: Adjust. * g++.dg/lookup/using9.C: Likewise. Index: libstdc++-v3/include/tr1/cmath === --- libstdc++-v3/include/tr1/cmath (revision 208748) +++ libstdc++-v3/include/tr1/cmath (working copy) @@ -846,10 +846,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION nexttoward(_Tp __x, long double __y) { return __builtin_nexttoward(__x, __y); } - // DR 550. What should the return type of pow(float,int) be? - // NB: C++0x and TR1 != C++03. - // using std::pow; - inline float remainder(float __x, float __y) { return __builtin_remainderf(__x, __y); } @@ -985,9 +981,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // DR 550. What should the return type of pow(float,int) be? // NB: C++0x and TR1 != C++03. - inline double - pow(double __x, double __y) - { return std::pow(__x, __y); } + + // The std::tr1::pow(double, double) overload cannot be provided + // here, because it would clash with ::pow(double,double) declared + // in math.h, if tr1/math.h is included at the same time (raised + // by the fix of PR c++/54537). It is not possible either to use the + // using-declaration 'using ::pow;' here, because if the user code + // has a 'using std::pow;', it would bring the pow(*,int) averloads + // in the tr1 namespace, which is undesirable. Consequently, the + // solution is to forward std::tr1::pow(double,double) to + // std::pow(double,double) via the templatized version below. See + // the discussion about this issue here: + // http://gcc.gnu.org/ml/gcc-patches/2012-09/msg01278.html inline float pow(float __x, float __y) Index: libstdc++-v3/testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc === --- libstdc++-v3/testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc (revision 0) +++ libstdc++-v3/testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc (revision 0) @@ -0,0 +1,33 @@ +// { dg-do compile } + +// Copyright (C) 2013 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// http://www.gnu.org/licenses/. + +#include cmath +using std::pow; +#include tr1/cmath +#include testsuite_tr1.h + +void +test01() +{ + using namespace __gnu_test; + + float x = 2080703.375F; + check_ret_typefloat(std::pow(x, 2)); + check_ret_typedouble(std::tr1::pow(x, 2)); +} Index: gcc/testsuite/g++.dg/lookup/using9.C === --- gcc/testsuite/g++.dg/lookup/using9.C(revision 208748) +++
[C++ testcases, committed] Adjust two recent C++1y testcases to use 'target c++1y'
Hi, I'm committing this as obvious. Thanks, Paolo. 2014-03-21 Paolo Carlini paolo.carl...@oracle.com * g++.dg/cpp1y/pr60033.C: Use target c++1y. * g++.dg/cpp1y/pr60393.C: Likewise. Index: g++.dg/cpp1y/pr60033.C === --- g++.dg/cpp1y/pr60033.C (revision 208751) +++ g++.dg/cpp1y/pr60033.C (working copy) @@ -1,5 +1,5 @@ // PR c++/60033 -// { dg-options -std=c++1y } +// { dg-do compile { target c++1y } } template typename... T auto f(T... ts) Index: g++.dg/cpp1y/pr60393.C === --- g++.dg/cpp1y/pr60393.C (revision 208751) +++ g++.dg/cpp1y/pr60393.C (working copy) @@ -1,5 +1,5 @@ // PR c++/60393 -// { dg-options -std=c++1y } +// { dg-do compile { target c++1y } } void (*f)(auto) + 0; // { dg-error expected }
Re: [PATCH] Fix PR59543
This fixes PR59543 (confirmed by Jakub for the testcase at least) by not dropping debug stmts during WPA phase. LTO profiled-bootstrapped on x86_64-unknown-linux-gnu, applied. Honza - you can always come up with a better fix for 4.10. I guess this may work well. Other option (as mentioned in the PR) is to consistently turn statements IDS into pointers on function body read in that seems little but more robust to me. Thanks, Honza Richard. 2014-03-19 Richard Biener rguent...@suse.de PR lto/59543 * lto-streamer-in.c (input_function): In WPA stage do not drop debug stmts. Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 208642) +++ lto-streamer-in.c (working copy) @@ -988,7 +988,7 @@ input_function (tree fn_decl, struct dat We can't remove them earlier because this would cause uid mismatches in fixups, but we can do it at this point, as long as debug stmts don't require fixups. */ - if (!MAY_HAVE_DEBUG_STMTS is_gimple_debug (stmt)) + if (!MAY_HAVE_DEBUG_STMTS !flag_wpa is_gimple_debug (stmt)) { gimple_stmt_iterator gsi = bsi; gsi_next (bsi);
[build] Have s-macro_list depend on cc1
While looking at an unrelated issue, I noticed that the gcc/macro_list file is empty. In the build logs, I see echo | /var/gcc/regression/trunk/10-gcc/build/./gcc/xgcc -B/var/gcc/regression/trunk/10-gcc/build/./gcc/ -E -dM - | \ sed -n -e 's/^#define \([^_][a-zA-Z0-9_]*\).*/\1/p' \ -e 's/^#define \(_[^_A-Z][a-zA-Z0-9_]*\).*/\1/p' | \ sort -u tmp-macro_list xgcc: error trying to exec 'cc1': execvp: No such file or directory Unlike the other xgcc invocations, this one actually needs cc1 for gcc -E to work, but lacks the appropriate dependency. The following patch adds it and indeed macro_list now is non-empty, as expected. I'm just not sure if cc1 is the correct one in gcc/Makefile.in, or if it should rather be $(COMPILERS) instead. Anyway, with that patch a i386-pc-solaris2.10 bootstrap completed and the testsuite is now running. Ok for mainline? Rainer 2014-03-21 Rainer Orth r...@cebitec.uni-bielefeld.de * Makefile.in (s-macro_list): Depend on cc1. # HG changeset patch # Parent 89bcc5fc68b831f7502e5f546614f8e26010f233 Have s-macro_list depend on cc1 diff --git a/gcc/Makefile.in b/gcc/Makefile.in --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -2653,7 +2653,7 @@ install-gcc-tooldir: $(mkinstalldirs) $(DESTDIR)$(gcc_tooldir) macro_list: s-macro_list; @true -s-macro_list : $(GCC_PASSES) +s-macro_list : $(GCC_PASSES) cc1$(exeext) echo | $(GCC_FOR_TARGET) -E -dM - | \ sed -n -e 's/^#define \([^_][a-zA-Z0-9_]*\).*/\1/p' \ -e 's/^#define \(_[^_A-Z][a-zA-Z0-9_]*\).*/\1/p' | \ -- - Rainer Orth, Center for Biotechnology, Bielefeld University
[GOOGLE] guard recording of autofdo annotation info in a flag
This patch guards autofdo annotation coverage recording with a flag. Test on-going. OK for google-4_8 if test passes? Thanks, Dehao Index: gcc/auto-profile.c === --- gcc/auto-profile.c (revision 208753) +++ gcc/auto-profile.c (working copy) @@ -1634,7 +1634,8 @@ auto_profile (void) pop_cfun (); } - autofdo::afdo_source_profile-write_annotated_count (); + if (flag_auto_profile_record_coverage_in_elf) +autofdo::afdo_source_profile-write_annotated_count (); return 0; } Index: gcc/common.opt === --- gcc/common.opt (revision 208753) +++ gcc/common.opt (working copy) @@ -946,6 +946,10 @@ fauto-profile-accurate Common Report Var(flag_auto_profile_accurate) Optimization Whether to assume the sample profile is accurate. +fauto-profile-record-coverage-in-elf +Common Report Var(flag_auto_profile_record_coverage_in_elf) Optimization +Whether to record annotation coverage info in elf. + ; -fcheck-bounds causes gcc to generate array bounds checks. ; For C, C++ and ObjC: defaults off. ; For Java: defaults to on.
[patch] remove empty directory gcc/testsuite/g++.dg/cpp0x/regress
ok to remove the empty directory gcc/testsuite/g++.dg/cpp0x/regress on the trunk? Matthias
Re: [C++ Patch, 4.8] Backport fix for c++/54537 to FSF 4.8
On Fri, 2014-03-21 at 11:30 -0500, Peter Bergner wrote: The following patch has lived on mainline for 6 months and has not generated any issues there. We've also been using it on our 4.8 based IBM branch with no problems either, so I'd like to ask for permission to backport this fix to the FSF 4.8 branch. This will bring GCC into compliance with CLANG and the XL C++ compilers with respect to this bug. I know the XL C++ compiler specifically will not accept the definition in the tr1/cmath header file, therefore it is not able to compile any program that uses that header. Since there are a few 4.8 based distro compilers coming, I'd like to fix this in the FSF branch so they'll all get the fix automatically. Ok for the FSF 4.8 branch once my bootstrap and regtesting are complete (using powerpc64-linux)? FYI, this just completed bootstrap and regtesting with no regressions. Ok for the FSF 4.8 branch? Peter
Re: [4.8, PATCH 1/26 too big]
On Wed, Mar 19, 2014 at 02:39:22PM -0500, Bill Schmidt wrote: Hi, The main patch for this series was too large for the mailer to accept. Sorry about that. This piece is all powerpc-related and seems to have been delivered to David ok. If anyone else wants a copy of the patch, please contact me privately and I'll send it your way. One way to get around this is to compress the patch, but it generally better to try and split the patch into smaller pieces. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PR libstdc++/60587
On 20/03/14 20:29 +, Jonathan Wakely wrote: Everything passes for make check check-debug, so I plan to commit it tomorrow, unless anyone points out a problem. Committed to trunk. PR libstdc++/60587 * include/debug/functions.h (_Is_contiguous_sequence): Define. (__foreign_iterator): Accept additional iterator. Do not dispatch on iterator category. (__foreign_iterator_aux2): Likewise. Add overload for iterators from different types of debug container. Use _Is_contiguous_sequence instead of is_lvalue_reference. (__foreign_iterator_aux3): Accept additional iterator. Avoid dereferencing past-the-end iterator. (__foreign_iterator_aux4): Use const value_type* instead of potentially user-defined const_pointer type. * include/debug/macros.h (__glibcxx_check_insert_range): Fix comment and pass end iterator to __gnu_debug::__foreign_iterator. (__glibcxx_check_insert_range_after): Likewise. (__glibcxx_check_max_load_factor): Fix comment. * include/debug/vector (_Is_contiguous_sequence): Define partial specializations. * testsuite/23_containers/vector/debug/57779_neg.cc: Remove -std=gnu++11 option and unused header. * testsuite/23_containers/vector/debug/60587.cc: New. * testsuite/23_containers/vector/debug/60587_neg.cc: New.
Re: [PATCH, rs6000] More efficient vector permute for little endian
On 03/20/2014 06:38 PM, Bill Schmidt wrote: - rtx splat = gen_rtx_VEC_DUPLICATE (V16QImode, - gen_rtx_CONST_INT (QImode, -1)); + rtx notx = gen_rtx_NOT (V16QImode, force_reg (V16QImode, x)); + rtx andx = gen_rtx_AND (V16QImode, notx, notx); rtx tmp = gen_reg_rtx (V16QImode); - emit_move_insn (tmp, splat); - x = gen_rtx_MINUS (V16QImode, tmp, force_reg (V16QImode, x)); - emit_move_insn (tmp, x); + emit_move_insn (tmp, andx); Existing problem, and I know it's done all over that backend, but one shouldn't use emit_move_insn on expressions like that. Moves should be between RTX_OBJ things: registers, constants, and memory. Better to just do emit_insn (gen_rtx_SET (VOIDmode, tmp, andx));. r~
Re: [Testsuite, Patch] Fix testsuite/lib/gcc-dg.exp's scan-module-absence
On Mar 20, 2014, at 1:48 PM, Steve Kargl s...@troutmask.apl.washington.edu wrote: On Thu, Mar 20, 2014 at 08:24:49PM +0100, Tobias Burnus wrote: gfortran's modules are since GCC 4.9 zipped. There are two functions, which test for the existence and absent of strings in the .mod files. PR fortran/60599 * lib/gcc-dg.exp (scan-module): Uncompress .mod files for reading. Looks good to me. Not sure, I can give an OK as the file is outside of gfortran directories. My take (as test suite maintainer)... So the test suite has bits in it that domain experts know about and care about. It is always better if a domain expert reviews and approves that patch if they think the patch is in the right direction or nixes it if in the wrong direction. I’ll step forward and scream if you all run amok. If .mod files are fortran bits, then fortran people that know what a .mod is, would be the right people to review. I’m a catch all, if you all don’t do your job, then, you risk me approving it. :-) Short version, yes, you can.
Re: [patch] remove empty directory gcc/testsuite/g++.dg/cpp0x/regress
On Mar 21, 2014, at 10:46 AM, Matthias Klose d...@ubuntu.com wrote: ok to remove the empty directory gcc/testsuite/g++.dg/cpp0x/regress on the trunk? Ok.
[PATCH] Fix up lineno of builtin defines (PR debug/60603)
Hi! While the cpp_force_token_locations/cp_stop_forcing_token_locations pair forces BUILTINS_LOCATION upon tokens, the change introducing them removed cb_file_change/linemap_add, which is needed e.g. for proper line numbers of builtin defines in -g3 .debug_macro/.debug_macinfo. Fixed by reverting that part of the 2011-08-22 changes, while keeping the forcing of BUILTINS_LOCATION. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-21 Jakub Jelinek ja...@redhat.com PR debug/60603 c-family/ * c-opts.c (c_finish_options): Restore cb_file_change call to built-in. fortran/ * cpp.c (gfc_cpp_init): Restore cb_change_file call to built-in. testsuite/ * gcc.dg/debug/dwarf2/dwarf2-macro2.c: New test. --- gcc/c-family/c-opts.c.jj2014-03-11 12:14:00.0 +0100 +++ gcc/c-family/c-opts.c 2014-03-21 11:07:04.287946639 +0100 @@ -1274,17 +1274,18 @@ c_finish_options (void) { size_t i; - { - /* Make sure all of the builtins about to be declared have - BUILTINS_LOCATION has their source_location. */ - source_location builtins_loc = BUILTINS_LOCATION; - cpp_force_token_locations (parse_in, builtins_loc); + cb_file_change (parse_in, + linemap_add (line_table, LC_RENAME, 0, + _(built-in), 0)); + /* Make sure all of the builtins about to be declared have +BUILTINS_LOCATION has their source_location. */ + source_location builtins_loc = BUILTINS_LOCATION; + cpp_force_token_locations (parse_in, builtins_loc); - cpp_init_builtins (parse_in, flag_hosted); - c_cpp_builtins (parse_in); + cpp_init_builtins (parse_in, flag_hosted); + c_cpp_builtins (parse_in); - cpp_stop_forcing_token_locations (parse_in); - } + cpp_stop_forcing_token_locations (parse_in); /* We're about to send user input to cpplib, so make it warn for things that we previously (when we sent it internal definitions) --- gcc/fortran/cpp.c.jj2014-01-09 21:07:24.0 +0100 +++ gcc/fortran/cpp.c 2014-03-21 11:10:00.973020640 +0100 @@ -576,6 +576,7 @@ gfc_cpp_init (void) if (gfc_option.flag_preprocessed) return; + cpp_change_file (cpp_in, LC_RENAME, _(built-in)); if (!gfc_cpp_option.no_predefined) { /* Make sure all of the builtins about to be declared have --- gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2-macro2.c.jj2014-03-21 11:19:29.221017868 +0100 +++ gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2-macro2.c 2014-03-21 11:20:49.768580776 +0100 @@ -0,0 +1,7 @@ +/* Test to make sure the macro info includes the predefined macros with line number 0. */ +/* { dg-do compile } */ +/* { dg-options -g3 -gdwarf -dA -fverbose-asm } */ +/* { dg-final { scan-assembler At line number 0 } } */ + +#define FOO 1 +int i; Jakub
[PATCH] Fix ubsan expansion (PR sanitizer/60613)
Hi! As MINUS_EXPR is not commutative, we really can't swap op0 with op1 for testing whether subtraction overflowed, that is only possible for PLUS_EXPR, for MINUS_EXPR we really have to know if op1 is constant or negative or non-negative and have to compare result with op0 depending on that. Bootstrapped/regtested on x86_64-linux and i686-linux, i686-linux extra --with-build-config=bootstrap-ubsan bootstrap ongoing. Ok for trunk? 2014-03-21 Jakub Jelinek ja...@redhat.com PR sanitizer/60613 * interna-fn.c (ubsan_expand_si_overflow_addsub_check): For code == MINUS_EXPR, never swap op0 with op1. * c-c++-common/ubsan/pr60613-1.c: New test. * c-c++-common/ubsan/pr60613-2.c: New test. --- gcc/internal-fn.c.jj2014-03-18 12:27:10.0 +0100 +++ gcc/internal-fn.c 2014-03-21 15:41:39.116303973 +0100 @@ -221,14 +221,15 @@ ubsan_expand_si_overflow_addsub_check (t res = expand_binop (mode, code == PLUS_EXPR ? add_optab : sub_optab, op0, op1, NULL_RTX, false, OPTAB_LIB_WIDEN); - /* If we can prove one of the arguments is always non-negative -or always negative, we can do just one comparison and -conditional jump instead of 2 at runtime, 3 present in the + /* If we can prove one of the arguments (for MINUS_EXPR only +the second operand, as subtraction is not commutative) is always +non-negative or always negative, we can do just one comparison +and conditional jump instead of 2 at runtime, 3 present in the emitted code. If one of the arguments is CONST_INT, all we need is to make sure it is op1, then the first emit_cmp_and_jump_insns will be just folded. Otherwise try to use range info if available. */ - if (CONST_INT_P (op0)) + if (code == PLUS_EXPR CONST_INT_P (op0)) { rtx tem = op0; op0 = op1; @@ -236,7 +237,7 @@ ubsan_expand_si_overflow_addsub_check (t } else if (CONST_INT_P (op1)) ; - else if (TREE_CODE (arg0) == SSA_NAME) + else if (code == PLUS_EXPR TREE_CODE (arg0) == SSA_NAME) { double_int arg0_min, arg0_max; if (get_range_info (arg0, arg0_min, arg0_max) == VR_RANGE) --- gcc/testsuite/c-c++-common/ubsan/pr60613-1.c.jj 2014-03-21 16:00:47.930272534 +0100 +++ gcc/testsuite/c-c++-common/ubsan/pr60613-1.c2014-03-21 15:47:50.0 +0100 @@ -0,0 +1,33 @@ +/* PR sanitizer/60613 */ +/* { dg-do run } */ +/* { dg-options -fsanitize=undefined } */ + +long long y; + +__attribute__((noinline, noclone)) long long +foo (long long x) +{ + asm (); + if (x = 0 || x -2040) +return 23; + x += 2040; + return x - y; +} + +__attribute__((noinline, noclone)) long long +bar (long long x) +{ + asm (); + return 8LL - x; +} + +int +main () +{ + y = 1; + if (foo (8 - 2040) != 8 - 1) +__builtin_abort (); + if (bar (1) != 8 - 1) +__builtin_abort (); + return 0; +} --- gcc/testsuite/c-c++-common/ubsan/pr60613-2.c.jj 2014-03-21 16:00:50.795259403 +0100 +++ gcc/testsuite/c-c++-common/ubsan/pr60613-2.c2014-03-21 16:08:56.915733544 +0100 @@ -0,0 +1,36 @@ +/* PR sanitizer/60613 */ +/* { dg-do run } */ +/* { dg-options -fsanitize=undefined } */ + +long long y; + +__attribute__((noinline, noclone)) long long +foo (long long x) +{ + asm (); + if (x = 0 || x -2040) +return 23; + x += 2040; + return x - y; +} + +__attribute__((noinline, noclone)) long long +bar (long long x) +{ + asm (); + return 8LL - x; +} + +int +main () +{ + y = -__LONG_LONG_MAX__ + 6; + if (foo (8 - 2040) != -__LONG_LONG_MAX__) +__builtin_abort (); + if (bar (-__LONG_LONG_MAX__ + 5) != -__LONG_LONG_MAX__ + 1) +__builtin_abort (); + return 0; +} + +/* { dg-output signed integer overflow: 8 \\- -9223372036854775801 cannot be represented in type 'long long int'(\n|\r\n|\r) } */ +/* { dg-output \[^\n\r]*signed integer overflow: 8 \\- -9223372036854775802 cannot be represented in type 'long long int' } */ Jakub
Re: [wwwdocs] gcc-4.9/changes.html: Mention that LTO now generates slim objects
This patch mentions that -flto now generates slim objects. That's especially relevant for static libraries as one can there run into surprises, if one does not know about gcc-ar. OK - or do you have a better suggestion? Tobias --- changes.html8 Mar 2014 20:45:54 - 1.63 +++ changes.html21 Mar 2014 09:10:32 - @@ -65,6 +65,13 @@ liFunction bodies are now loaded on-demand and released early improving overall memory usage at link time./li liC++ hidden keyed methods can now be optimized out./li + liBy default, compiling with the code-flto/code option now generates + slim objects files (code.o/code) which only contain intermediate + language representation for LTO. Use code-ffat-lto-objects/code to + create files which contain additionally the object code. To generate + static libraries suitable for LTO processing, use codegcc-ar/code + and codegcc-ranlib/code (requires that codear/code and + coderanlib/code have been compiled with plugin support)./li Ah, seems I forgot some of wwwdocs updates, since I wrote one already. But yours is better. I would perhaps mention that with slim objects the LTO build times are often better (you can reffer i.e. to SPEC2k6 compilation time) and also mention gcc-nm. Honza /ul Memory usage building Firefox with debug enabled was reduced from 15GB to 3.5GB; link time from 1700 seconds to 350 seconds.
[PATCH] Fix non-biarch sorry diagnostics on unsupported -m64 or -m32 (PR target/60610)
Hi! Prior to r203634 we were comparing TARGET_64BIT with ix86_isa_flags OPTION_MASK_ISA_64BIT, which is the same thing for TARGET_BI_ARCH, otherwise the former is hardcoded constant. But with r203634, the condition was changed and is now always false and so e.g. for 32-bit non-multilib i?86 gcc we don't complain about lack of -m64 support anymore, instead just ICE later on. Fixed by making TARGET_64BIT_P that the new condition tests also constant for !TARGET_BI_ARCH. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-21 Jakub Jelinek ja...@redhat.com PR target/60610 * config/i386/i386.h (TARGET_64BIT_P): If not TARGET_BI_ARCH, redefine to 1 or 0. * config/i386/darwin.h (TARGET_64BIT_P): Redefine to TARGET_ISA_64BIT_P(x). --- gcc/config/i386/i386.h.jj 2014-03-18 10:04:14.0 +0100 +++ gcc/config/i386/i386.h 2014-03-21 17:50:22.465016379 +0100 @@ -284,10 +284,13 @@ extern const struct processor_costs ix86 #else #ifndef TARGET_BI_ARCH #undef TARGET_64BIT +#undef TARGET_64BIT_P #if TARGET_64BIT_DEFAULT #define TARGET_64BIT 1 +#define TARGET_64BIT_P(x) 1 #else #define TARGET_64BIT 0 +#define TARGET_64BIT_P(x) 0 #endif #endif #endif --- gcc/config/i386/darwin.h.jj 2014-01-03 11:41:06.0 +0100 +++ gcc/config/i386/darwin.h2014-03-21 17:51:56.492536202 +0100 @@ -26,7 +26,9 @@ along with GCC; see the file COPYING3. #define DARWIN_X86 1 #undef TARGET_64BIT +#undef TARGET_64BIT_P #define TARGET_64BIT TARGET_ISA_64BIT +#defineTARGET_64BIT_P(x) TARGET_ISA_64BIT_P(x) #ifdef IN_LIBGCC2 #undef TARGET_64BIT Jakub
Re: [PATCH, PR 59176] Mark zombie call graph nodes to remove verifier false positive
On Thu, 20 Mar 2014, Martin Jambor wrote: Hi, On Thu, Mar 20, 2014 at 07:40:56PM +0100, Jakub Jelinek wrote: On Thu, Mar 20, 2014 at 05:07:32PM +0100, Martin Jambor wrote: in the PR, verifier claims an edge is pointing to a wrong declaration even though it has successfully verified the edge multiple times before. The reason is that symtab_remove_unreachable_nodes decides to remove the body of a node and also clear any information that it is an alias of another in the process (more detailed analysis in comment #9 of the bug). In bugzilla Honza wrote that silencing the verifier is the way to go. Either we can dedicate a new flag in each cgraph_node or symtab_node just for the purpose of verification or do something more hackish like the patch below which re-uses the former_clone_of field for this purpose. Since clones are always private nodes, they should always either survive removal of unreachable nodes or be completely killed by it and should never enter the in_border zombie state. Therefore their former_clone_of must always be NULL. So I added a new special value, error_mark_node, to mark this zombie state and taught the verifier to be happy with such nodes. Bootstrapped and tested on x86_64-linux. What do you think? Don't we have like 22 spare bits in cgraph_node and 20 spare bits in symtab_node? I'd find it clearer if you just used a new flag to mark the zombie nodes. Though, I'll let Richard or Honza to decide, don't feel strongly about it. I guess you are right, here is the proper version which is currently undergoing bootstrap and testing. I agree with Jakub, the following variant is ok. With the extra bit, you probably will need to LTO pickle it, too. I would go with just clerning the thunk flag: this makes thunk to behave like external function that is safe to do. (I am back in civilization from Alaska camping, will catch up with email early next week) Honza
[PATCH] Fix two undefined behaviors in gcc sources
Hi! --with-build-config=bootstrap-ubsan on i686-linux showed most often these two issues. The first one is that 32-bit signed time_t is multiplied by 1000, which overflows the int type. The result is then cast to unsigned, so even on 64-bit hosts we don't care about the upper bits. The other change fixes a badly written portable rotate, which was shitfing up (correctly) by 0 for i == 0, but for all other i values was shifting up 64-bit value by 64. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-21 Jakub Jelinek ja...@redhat.com * toplev.c (init_local_tick): Avoid signed integer multiplication overflow. * genautomata.c (reserv_sets_hash_value): Fix rotate idiom, avoid shift by first operand's bitsize. --- gcc/toplev.c.jj 2014-01-08 10:23:22.0 +0100 +++ gcc/toplev.c2014-03-21 11:32:25.929893013 +0100 @@ -261,7 +261,7 @@ init_local_tick (void) struct timeval tv; gettimeofday (tv, NULL); - local_tick = tv.tv_sec * 1000 + tv.tv_usec / 1000; + local_tick = (unsigned) tv.tv_sec * 1000 + tv.tv_usec / 1000; } #else { --- gcc/genautomata.c.jj2014-02-12 08:33:34.0 +0100 +++ gcc/genautomata.c 2014-03-21 12:01:14.245727924 +0100 @@ -3494,7 +3494,7 @@ reserv_sets_hash_value (reserv_sets_t re { reservs_num--; hash_value += ((*reserv_ptr i) -| (*reserv_ptr ((sizeof (set_el_t) * CHAR_BIT) -i))); +| (*reserv_ptr (((sizeof (set_el_t) * CHAR_BIT) - 1) -i))); i++; if (i == sizeof (set_el_t) * CHAR_BIT) i = 0; Jakub
Re: [PATCH] Set correct probability for ORDER/UNORDER jumps
ping ^2... Dehao On Mon, Feb 10, 2014 at 8:35 AM, Dehao Chen de...@google.com wrote: ping... Dehao On Fri, Jan 24, 2014 at 1:54 PM, Dehao Chen de...@google.com wrote: Thanks, test updated: Index: gcc/testsuite/gcc.dg/predict-8.c === --- gcc/testsuite/gcc.dg/predict-8.c (revision 0) +++ gcc/testsuite/gcc.dg/predict-8.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-options -O2 -fdump-rtl-expand } */ + +int foo(float a, float b) { + if (a == b) +return 1; + else +return 2; +} + +/* { dg-final { scan-rtl-dump-times REG_BR_PROB 100 1 expand} } */ +/* { dg-final { cleanup-rtl-dump expand } } */ On Fri, Jan 24, 2014 at 11:38 AM, H.J. Lu hjl.to...@gmail.com wrote: On Fri, Jan 24, 2014 at 10:57 AM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Jan 24, 2014 at 10:20:53AM -0800, Dehao Chen wrote: --- gcc/testsuite/gcc.dg/predict-8.c (revision 0) +++ gcc/testsuite/gcc.dg/predict-8.c (revision 0) @@ -0,0 +1,12 @@ +/* { dg-do compile { target { x86_64-*-* } } } */ If you want it for x86_64 64-bit, then /* { dg-do compile { target { { i?86-*-* x86_64-*-* } lp64 } } } */ It should be ! { ia32 } instead of lp64 unless it doesn't work for x32. -- H.J.
[PATCH] Update the overall summary after edge_summary is updated
Hi, This patch updates node's inline summary after edge_summary is updated. Otherwise it could lead to incorrect inline summary. Bootstrapped and gcc regression test on-going. OK for trunk? Thanks, Dehao gcc/ChangeLog: 2014-03-21 Dehao Chen de...@google.com *ipa-inline.c (early_inliner): updates overall summary. Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c (revision 208755) +++ gcc/ipa-inline.c (working copy) @@ -2318,6 +2318,7 @@ early_inliner (void) edge-call_stmt, edge-callee-decl, false)) edge-call_stmt_cannot_inline_p = true; } + inline_update_overall_summary (node); timevar_pop (TV_INTEGRATION); iterations++; inlined = false;
Re: [PATCH] Fix non-biarch sorry diagnostics on unsupported -m64 or -m32 (PR target/60610)
On 03/21/2014 01:38 PM, Jakub Jelinek wrote: PR target/60610 * config/i386/i386.h (TARGET_64BIT_P): If not TARGET_BI_ARCH, redefine to 1 or 0. * config/i386/darwin.h (TARGET_64BIT_P): Redefine to TARGET_ISA_64BIT_P(x). Ok. r~
Fix _Hashtable extension
Hi Here is a patch to fix _Hashtable Standard extension type which is almost unusable at the moment if instantiated with anything else that the types used for the std unordered containers that is to say __detail::_Default_ranged_hash and __detail::_Mod_range_hashing. It is a really safe patch so I would propose it for current trunk but at the same time it only impacts a Standard extension and it hasn't been reported by anyone so just tell me when to apply it. 2014-03-21 François Dumont fdum...@gcc.gnu.org * include/bits/hashtable.h (_Hashtable(allocator_type)): Fix call to delegated constructor. (_Hashtable(size_type, _H1, key_equal, allocator_type)): Likewise. (_Hashtable_It(_It, _It, size_type, _H1, key_equal, allocator_type)): Likewise. (_Hashtable( initializer_list, size_type, _H1, key_equal, allocator_type)): Likewise. Tested under Linux x86_64. François Index: include/bits/hashtable.h === --- include/bits/hashtable.h (revision 207322) +++ include/bits/hashtable.h (working copy) @@ -372,9 +372,8 @@ // Use delegating constructors. explicit _Hashtable(const allocator_type __a) - : _Hashtable(10, _H1(), __detail::_Mod_range_hashing(), - __detail::_Default_ranged_hash(), key_equal(), - __key_extract(), __a) + : _Hashtable(10, _H1(), _H2(), _Hash(), key_equal(), + __key_extract(), __a) { } explicit @@ -382,8 +381,7 @@ const _H1 __hf = _H1(), const key_equal __eql = key_equal(), const allocator_type __a = allocator_type()) - : _Hashtable(__n, __hf, __detail::_Mod_range_hashing(), - __detail::_Default_ranged_hash(), __eql, + : _Hashtable(__n, __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { } @@ -393,8 +391,7 @@ const _H1 __hf = _H1(), const key_equal __eql = key_equal(), const allocator_type __a = allocator_type()) - : _Hashtable(__f, __l, __n, __hf, __detail::_Mod_range_hashing(), - __detail::_Default_ranged_hash(), __eql, + : _Hashtable(__f, __l, __n, __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { } @@ -403,9 +400,7 @@ const _H1 __hf = _H1(), const key_equal __eql = key_equal(), const allocator_type __a = allocator_type()) - : _Hashtable(__l.begin(), __l.end(), __n, __hf, - __detail::_Mod_range_hashing(), - __detail::_Default_ranged_hash(), __eql, + : _Hashtable(__l.begin(), __l.end(), __n, __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { }
[patch, libgfortran] Committed as obvious
Committed revision 208759. Index: io/transfer.c === --- io/transfer.c (revision 208755) +++ io/transfer.c (working copy) @@ -2674,7 +2674,8 @@ data_transfer_init (st_parameter_dt *dtp, int read if (dtp-u.p.current_unit-delim_status == DELIM_UNSPECIFIED) { if (ionml dtp-u.p.current_unit-flags.delim == DELIM_UNSPECIFIED) - dtp-u.p.current_unit-delim_status = DELIM_QUOTE; + dtp-u.p.current_unit-delim_status = + compile_options.allow_std GFC_STD_GNU ? DELIM_QUOTE : DELIM_NONE; else dtp-u.p.current_unit-delim_status = dtp-u.p.current_unit-flags.delim; } If std= was used at compile time for std=f95, f2003, f2008, set the default delimiter to 'NONE' if not otherwise specified. 2014-03-21 Jerry DeLisle jvdeli...@gcc.gnu PR libfortran/60148 * io/transfer.c (data_transfer_init): If std= was specified, set delim status to DELIM_NONE of no other was specified.
Re: Fix _Hashtable extension
On 21/03/14 22:39 +0100, François Dumont wrote: Hi Here is a patch to fix _Hashtable Standard extension type which is almost unusable at the moment if instantiated with anything else that the types used for the std unordered containers that is to say __detail::_Default_ranged_hash and __detail::_Mod_range_hashing. Good catch. Also, it seems that this specialization is missing the hasher typedef: /// Specialization: ranged hash function, no caching hash codes. H1 /// and H2 are provided but ignored. We define a dummy hash code type. templatetypename _Key, typename _Value, typename _ExtractKey, typename _H1, typename _H2, typename _Hash struct _Hash_code_base_Key, _Value, _ExtractKey, _H1, _H2, _Hash, false : private _Hashtable_ebo_helper0, _ExtractKey, private _Hashtable_ebo_helper1, _Hash { From the comments I think it is intentional, is that right? It is a really safe patch so I would propose it for current trunk but at the same time it only impacts a Standard extension and it hasn't been reported by anyone so just tell me when to apply it. As it doesn't fix a regression and apparently isn't affecting anyone I think it would be safer to add it to trunk after the 4.9 branch is created.
Re: [PATCH] Set correct probability for ORDER/UNORDER jumps
On Fri, Mar 21, 2014 at 10:13 PM, Dehao Chen de...@google.com wrote: ping ^2... Assuming this concerns http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01460.html and follow-ups. OK. Ciao! Steven
Re: Fix _Hashtable extension
On 21/03/14 22:59 +, Jonathan Wakely wrote: On 21/03/14 22:39 +0100, François Dumont wrote: It is a really safe patch so I would propose it for current trunk but at the same time it only impacts a Standard extension and it hasn't been reported by anyone so just tell me when to apply it. As it doesn't fix a regression and apparently isn't affecting anyone I think it would be safer to add it to trunk after the 4.9 branch is created. Actually, you're right, it's definitely safe and I'm being overly cautious :-) Please commit to the trunk, thanks!
Re: [RFA jit] initialize input_location
On Thu, 2014-03-20 at 08:53 -0600, Tom Tromey wrote: This patch initializes input_location at the same spot where the line table is initialized. Without this, it's possible to crash when emitting a diagnostic in a reinvocation of the compiler, because input_location refers to a location that is no longer valid. --- gcc/ChangeLog.jit | 4 gcc/toplev.c | 1 + 2 files changed, 5 insertions(+) diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index ee1df88..a9b0817 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-03-19 Tom Tromey tro...@redhat.com + * toplev.c (general_init): Initialize input_location. + +2014-03-19 Tom Tromey tro...@redhat.com + * timevar.h (auto_timevar): New class. 2014-03-19 Tom Tromey tro...@redhat.com diff --git a/gcc/toplev.c b/gcc/toplev.c index b257ab2..1febc2e 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1161,6 +1161,7 @@ general_init (const char *argv0) table. */ init_ggc (); init_stringpool (); + input_location = 0; line_table = ggc_alloc_line_maps (); linemap_init (line_table); line_table-reallocator = realloc_for_line_map; Given this declaration in input.c: location_t input_location; then assigning 0 is a faithful way of resetting it to its initial state. That said, 0 feels like a magic number. Would it better to assign UNKNOWN_LOCATION to it? which is 0, c.f. input.h: #define UNKNOWN_LOCATION ((source_location) 0) If so, perhaps the declaration in input.c should gain an initializer to the same value? (shouldn't affect the code, since it's 0 either way, but perhaps it's more readable?) Dave
Re: [PATCH] Handle more COMDAT profiling issues
On Fri, Feb 28, 2014 at 9:13 AM, Teresa Johnson tejohn...@google.com wrote: Here's the new patch. The only changes from the earlier patch are in handle_missing_profiles, where we now get the counts off of the entry and call stmt bbs, and in tree_profiling, where we call handle_missing_profiles earlier and I have removed the outlined cgraph rebuilding code since it doesn't need to be reinvoked. Honza, does this look ok for trunk when stage 1 reopens? David, I can send a similar patch for review to google-4_8 if it looks good. Thanks, Teresa ... Spec testing of my earlier patch hit an issue with the call to gimple_bb in this routine, since the caller was a thunk and therefore the edge did not have a call_stmt set. I've attached a slightly modified patch that guards the call by a check to cgraph_function_with_gimple_body_p. Regression and spec testing are clean. I made some more improvements to the patch based on more extensive testing. Since we now use the bb counts instead of the cgraph edge counts to determine call counts, it is important to do a topological walk over the cgraph when dropping profiles. This ensures that the counts we apply along a chain of calls whose profiles are being dropped are consistent. I also realized that we may in fact need to drop profiles on non-COMDATs - when a non-COMDAT was IPA inlined into a COMDAT during the profile-gen build, its profile counts would also be lost along with those of the COMDAT it was inlined into. So now the patch will drop profiles on non-COMDATs if they are low and the caller had its profile dropped. Here is a link to the original patch with motivation for the change: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00632.html New patch below. Bootstrapped and tested on x86_64-unknown-linux-gnu. Also profilebootstrapped and tested. Spec cpu2006 testing shows that we drop profiles on more routines in 3 benchmarks (447.dealII, 450.soplex, 483.xalancbmk). I am seeing around 1% speedup consistently on soplex when run on a Westmere. Ok for stage 1? Thanks, Teresa 2014-03-21 Teresa Johnson tejohn...@google.com * graphite.c (graphite_finalize): Pass new parameter. * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New. * predict.c (tree_estimate_probability): New parameter. (tree_estimate_probability_worker): Renamed from tree_estimate_probability_driver. (tree_reestimate_probability): New function. (tree_estimate_probability_driver): Invoke tree_estimate_probability_worker. (freqs_to_counts): Move here from tree-inline.c. (drop_profile): New parameter, re-estimate profiles when dropping counts. (handle_missing_profiles): Drop for some non-zero functions as well, get counts from bbs to support invocation before cgraph rebuild, and use a single topological walk over cgraph. (counts_to_freqs): Remove code obviated by reestimation. * predict.h (tree_estimate_probability): Update declaration. * tree-inline.c (freqs_to_counts): Move to predict.c. (copy_cfg_body): Remove code obviated by reestimation. * tree-profile.c (tree_profiling): Invoke handle_missing_profiles before cgraph rebuild. Index: graphite.c === --- graphite.c (revision 208492) +++ graphite.c (working copy) @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p) cleanup_tree_cfg (); profile_status_for_fn (cfun) = PROFILE_ABSENT; release_recorded_exits (); - tree_estimate_probability (); + tree_estimate_probability (false); } cloog_state_free (cloog_state); Index: params.def === --- params.def (revision 208492) +++ params.def (working copy) @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME, Maximal estimated outcome of branch considered predictable, 2, 0, 50) +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO, + min-caller-reestimate-ratio, + Minimum caller-to-callee node count ratio to force reestimated branch + probabilities in callee (where 0 means only when callee count is 0), + 10, 0, 0) + DEFPARAM (PARAM_INLINE_MIN_SPEEDUP, inline-min-speedup, The minimal estimated speedup allowing inliner to ignore inline-insns-single and inline-isnsns-auto, Index: predict.c === --- predict.c (revision 208492) +++ predict.c (working copy) @@ -68,6 +68,7 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include tree-scalar-evolution.h #include cfgloop.h +#include ipa-utils.h /* real constants: 0, 1, 1-1/REG_BR_PROB_BASE, REG_BR_PROB_BASE, 1/REG_BR_PROB_BASE, 0.5, BB_FREQ_MAX. */ @@ -2379,10 +2380,12 @@ tree_estimate_probability_bb (basic_block bb) /*
[PATCH] Minor ipa-utils dumping fix
Minor dumping fix. Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for stage 1? Thanks, Teresa 2014-03-21 Teresa Johnson tejohn...@google.com * ipa-utils.c (ipa_print_order): Use specified dump file. Index: ipa-utils.c === --- ipa-utils.c (revision 208492) +++ ipa-utils.c (working copy) @@ -55,7 +55,7 @@ ipa_print_order (FILE* out, fprintf (out, \n\n ordered call graph: %s\n, note); for (i = count - 1; i = 0; i--) -dump_cgraph_node (dump_file, order[i]); +dump_cgraph_node (out, order[i]); fprintf (out, \n); fflush (out); } -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
C++ PATCH for c++/60574 (ICE with 'virtual auto')
We were failing to give an error for B::foo not having a definition to deduce the return type from; an easy way to avoid the ICE is to promote the existing permerror about 'virtual auto' to a full error. Tested x86_64-pc-linux-gnu, applying to trunk. commit 62fa8dfdd93b964246d06f3d5b41a3c2659509f3 Author: Jason Merrill ja...@redhat.com Date: Thu Mar 20 17:14:19 2014 -0400 PR c++/60574 * decl.c (grokdeclarator): Change permerror about 'virtual auto' to error. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 4eb3e69..c912ffc 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -9553,8 +9553,8 @@ grokdeclarator (const cp_declarator *declarator, -std=gnu++1y); } else if (virtualp) - permerror (input_location, virtual function cannot - have deduced return type); + error (virtual function cannot + have deduced return type); } else if (!is_auto (type)) { diff --git a/gcc/testsuite/g++.dg/cpp1y/auto-fn25.C b/gcc/testsuite/g++.dg/cpp1y/auto-fn25.C new file mode 100644 index 000..628a685 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/auto-fn25.C @@ -0,0 +1,15 @@ +// PR c++/60574 +// { dg-options -flto } +// { dg-do compile { target c++1y } } + +struct A +{ + virtual auto foo() {} // { dg-error virtual.*deduced } +}; + +struct B : A +{ + auto foo(); +}; + +B b;