can_refer_decl_in_current_unit_p TLC
Hi, can_refer_decl_in_current_unit_p is a black magic that carefuly papers around a mine field created by combination of C++ ABI and ELF visibility attributes. As such it is not pretty and grown up into quite a maze of rules. Well, I think it is time to clean it up: for example it does twice very similar check - for statics and for comdats it checks that definition is still around, but the code used is very different (one written before symtab, other more modern). It is also wrong in some cases that apparently don't happen in pracitce; for example it check presence of symtab node for static symbol but it does not check that the definition is actually around to be output. This patch cleans it up and should make it more robust and also less conservative for WHOPR and COMDATs put into other partition. It also fixes tree-optimization/60899 (and updates devirt-11.C testcase that tests the wrong devirtualization happens), I will send release branch fix shortly. Bootstrapped/regtested x86_64-linux, will comitted. Honza PR tree-optimization/60899 * gimple-fold.c (can_refer_decl_in_current_unit_p): Cleanup; assume all static symbols will have definition wile parsing and check the do have definition later in compilation; check that variable referring symbol will be output before concluding that reference is safe; be conservative for referring local statics; be more precise about when comdat is output in other partition. g++.dg/ipa/devirt-11.C: Update template. Index: gimple-fold.c === --- gimple-fold.c (revision 210653) +++ gimple-fold.c (working copy) @@ -93,8 +93,12 @@ can_refer_decl_in_current_unit_p (tree d /* Static objects can be referred only if they was not optimized out yet. */ if (!TREE_PUBLIC (decl) && !DECL_EXTERNAL (decl)) { + /* Before we start optimizing unreachable code we can be sure all +static objects are defined. */ + if (cgraph_function_flags_ready) + return true; snode = symtab_get_node (decl); - if (!snode) + if (!snode || !snode->definition) return false; node = dyn_cast (snode); return !node || !node->global.inlined_to; @@ -102,10 +106,12 @@ can_refer_decl_in_current_unit_p (tree d /* We will later output the initializer, so we can refer to it. So we are concerned only when DECL comes from initializer of - external var. */ + external var or var that has been optimized out. */ if (!from_decl || TREE_CODE (from_decl) != VAR_DECL - || !DECL_EXTERNAL (from_decl) + || (!DECL_EXTERNAL (from_decl) + && (vnode = varpool_get_node (from_decl)) != NULL + && vnode->definition) || (flag_ltrans && symtab_get_node (from_decl)->in_other_partition)) return true; @@ -122,9 +128,9 @@ can_refer_decl_in_current_unit_p (tree d reference imply need to include function body in the curren tunit. */ if (TREE_PUBLIC (decl) && !DECL_COMDAT (decl)) return true; - /* We are not at ltrans stage; so don't worry about WHOPR. - Also when still gimplifying all referred comdat functions will be - produced. + /* We have COMDAT. We are going to check if we still have definition + or if the definition is going to be output in other partition. + Bypass this when gimplifying; all needed functions will be produced. As observed in PR20991 for already optimized out comdat virtual functions it may be tempting to not necessarily give up because the copy will be @@ -133,35 +139,17 @@ can_refer_decl_in_current_unit_p (tree d units where they are used and when the other unit was compiled with LTO it is possible that vtable was kept public while the function itself was privatized. */ - if (!flag_ltrans && (!DECL_COMDAT (decl) || !cgraph_function_flags_ready)) + if (!cgraph_function_flags_ready) return true; - /* OK we are seeing either COMDAT or static variable. In this case we must - check that the definition is still around so we can refer it. */ - if (TREE_CODE (decl) == FUNCTION_DECL) -{ - node = cgraph_get_node (decl); - /* Check that we still have function body and that we didn't took - the decision to eliminate offline copy of the function yet. - The second is important when devirtualization happens during final - compilation stage when making a new reference no longer makes callee - to be compiled. */ - if (!node || !node->definition || node->global.inlined_to) - { - gcc_checking_assert (!TREE_ASM_WRITTEN (decl)); - return false; - } -} - else if (TREE_CODE (decl) == VAR_DECL) -{ - vnode = varpool_get_node (decl); - if (!vnode || !vnode->definition) - { - gcc_checking_assert (!TREE_ASM_WRITTEN (decl)); - return false; - } -}
Silence devirt-11.C testcase on GCC 4.9
Hi, devirt-11.C is somewhat fragile testcase that originally was testing a specific series of events in inliner that are no longer happening anyway. It tests for a specific number of devirutalization that depends on several aspects. It seems more practical to relax this check, because it is kind of pointless. Regtested and commited to gcc-4.9 branch. Honza Index: g++.dg/ipa/devirt-11.C === --- g++.dg/ipa/devirt-11.C (revision 210672) +++ g++.dg/ipa/devirt-11.C (working copy) @@ -45,5 +45,5 @@ bar () /* While inlining function called once we should devirtualize a new call to fn2 and two to fn3. While doing so the new symbol for fn2 needs to be introduced. */ -/* { dg-final { scan-ipa-dump-times "Discovered a virtual call to a known target" 1 "inline" } } */ +/* { dg-final { scan-ipa-dump "Discovered a virtual call to a known target" "inline" } } */ /* { dg-final { cleanup-ipa-dump "inline" } } */ Index: ChangeLog === --- ChangeLog (revision 210672) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2014-05-18 Jan Hubicka + + PR middle-end/58094 + * g++.dg/ipa/devirt-11.C: Be lax about number of devirtualizations. + 2014-05-18 Eric Botcazou * gnat.dg/enum3.adb: New test.
Fix AIX bootstrap failure
Hi, on AIX GCC 4.9 as well as mainline dies on callgraph corruption problem. THis is caused by ipa-inline walking node that is removed by ipa-inline-transform (because it is a local alias that become unreachable). This patch solves it by adding a hook to inform user of inline_call when this happen. Bootstrapped ppc-aix, bootstrapped/regtested x86_64-linux, commited to mainline and branch. PR bootstrap/60984 * ipa-inline-transform.c (inline_call): Use add CALLEE_REMOVED parameter. * ipa-inline.c (inline_to_all_callers): If callee was removed; return. (ipa_inline): Loop inline_to_all_callers until no more aliases are removed. Index: ipa-inline-transform.c === --- ipa-inline-transform.c (revision 210624) +++ ipa-inline-transform.c (working copy) @@ -214,6 +214,7 @@ it is NULL. If UPDATE_OVERALL_SUMMARY is false, do not bother to recompute overall size of caller after inlining. Caller is required to eventually do it via inline_update_overall_summary. + If callee_removed is non-NULL, set it to true if we removed callee node. Return true iff any new callgraph edges were discovered as a result of inlining. */ @@ -221,7 +222,8 @@ bool inline_call (struct cgraph_edge *e, bool update_original, vec *new_edges, -int *overall_size, bool update_overall_summary) +int *overall_size, bool update_overall_summary, +bool *callee_removed) { int old_size = 0, new_size = 0; struct cgraph_node *to = NULL; @@ -260,6 +262,8 @@ { next_alias = cgraph_alias_target (alias); cgraph_remove_node (alias); + if (callee_removed) + *callee_removed = true; alias = next_alias; } else Index: ipa-inline.c === --- ipa-inline.c(revision 210624) +++ ipa-inline.c(working copy) @@ -1971,6 +1971,8 @@ inline_to_all_callers (struct cgraph_node *node, void *data) { int *num_calls = (int *)data; + bool callee_removed = false; + while (node->callers && !node->global.inlined_to) { struct cgraph_node *caller = node->callers->caller; @@ -1987,7 +1989,7 @@ inline_summary (node->callers->caller)->size); } - inline_call (node->callers, true, NULL, NULL, true); + inline_call (node->callers, true, NULL, NULL, true, &callee_removed); if (dump_file) fprintf (dump_file, " Inlined into %s which now has %i size\n", @@ -1997,8 +1999,10 @@ { if (dump_file) fprintf (dump_file, "New calls found; giving up.\n"); - return true; + return callee_removed; } + if (callee_removed) + return true; } return false; } @@ -2244,8 +2248,9 @@ int num_calls = 0; cgraph_for_node_and_aliases (node, sum_callers, &num_calls, true); - cgraph_for_node_and_aliases (node, inline_to_all_callers, - &num_calls, true); + while (cgraph_for_node_and_aliases (node, inline_to_all_callers, + &num_calls, true)) + ; remove_functions = true; } } Index: ipa-inline.h === --- ipa-inline.h(revision 210624) +++ ipa-inline.h(working copy) @@ -236,7 +236,8 @@ bool speculation_useful_p (struct cgraph_edge *e, bool anticipate_inlining); /* In ipa-inline-transform.c */ -bool inline_call (struct cgraph_edge *, bool, vec *, int *, bool); +bool inline_call (struct cgraph_edge *, bool, vec *, int *, bool, + bool *callee_removed = NULL); unsigned int inline_transform (struct cgraph_node *); void clone_inlined_nodes (struct cgraph_edge *e, bool, bool, int *, int freq_scale);
Re: [PATCH] Install sanitizer public headers (fix for PR sanitizer/61100)
> The new test fails on x86_64 Ubuntu 12.04 native build: Yes, this should be fixed once kcc's patches get into trunk (https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00756.html). Current asan_interface.h is indeed C++-only. -Y
Re: Add flag to optionally ignore ELF interposition
> On Tue, 20 May 2014, Jan Hubicka wrote: > > > Hi, > > as disucssed some time ago, our assumption that every symbol of shared > > library can > > be interposed at runtime is expensive and prevents a lot of useful > > optimizations, > > including inlining or IPA propagation. > > > > While this is useful feature, it is rather incommon to use it for bigger C++ > > projects, like firefox and at least clang seems to ignore the ELF > > interposition > > rules and always inline/propagate. This patch adds flag to control the > > behaviour. > > Symbols explicitly delcared WEAK are still considered as interposable. > > I could see a use for an option listing the symbols that can be > interposed. (For example, glibc supports interposition of a limited > number of malloc-related symbols, but if it were made to support building > with LTO in future that it would make sense to be able to optimize calls > to most other functions.) I tought glibc uses handcoded local aliases for all its exported symbols except for those where interposition is allowed. But yes, this mechanism does kind of similar thing - in addition to enabling optimizations ipa.c knows how to create local aliases for those symbols. Compiling: $ cat ~/t.c __attribute__ ((noinline)) t() { printf ("test\n"); } q() { t(); } $ ./xgcc -B ./ -O2 ~/t.c -fno-semantic-interposition -S -fPIC will get you: .sett.localalias.0,t .globl q .type q, @function q: xorl%eax, %eax jmp t.localalias.0 which is sort of what many libraries does by hand. I wonder if this can't be best modeled as yet another visibility setting. I.e. having visibility=default with interposition assumed and visibility=default-no-semantic-interposition with visibility not allowed. This can be both for -fvisibility flag and for attributes? Honza > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: Eliminate write-only variables
> > Unfortunately, this commit has caused the following ICE for me when > LTO building 471.omnetpp from SPEC 2006 or Firefox (but not libxul, > something that gets built earlier): > > lto1: internal compiler error: in gimple_get_virt_method_for_vtable, > at gimple-fold.c:3276 > 0x7437a3 gimple_get_virt_method_for_vtable(long, tree_node*, unsigned > long, bool*) > /home/mjambor/gcc/bisect/src/gcc/gimple-fold.c:3276 > 0x743993 gimple_get_virt_method_for_binfo(long, tree_node*, bool*) > /home/mjambor/gcc/bisect/src/gcc/gimple-fold.c:3377 > 0x7913f2 possible_polymorphic_call_targets(tree_node*, long, > ipa_polymorphic_call_context, bool*, void**, int*) > /home/mjambor/gcc/bisect/src/gcc/ipa-devirt.c:1697 > 0x7b73a9 possible_polymorphic_call_targets > /home/mjambor/gcc/bisect/src/gcc/ipa-utils.h:121 > 0x7b73a9 walk_polymorphic_call_targets > /home/mjambor/gcc/bisect/src/gcc/ipa.c:177 > 0x7b73a9 symtab_remove_unreachable_nodes(bool, _IO_FILE*) > /home/mjambor/gcc/bisect/src/gcc/ipa.c:407 > 0x86bf47 execute_todo > /home/mjambor/gcc/bisect/src/gcc/passes.c:1843 > Please submit a full bug report... > > I compile omnetpp with -Ofast -g -flto=8 -fwhole-program > -funroll-loops -fpeel-loops -march=native -mtune=native > > I'm afraid I won't be able to prepare a more reduced testcase very > soon. Actually I think where is a problem: when the variable is completely dead (that is no writes, reads and address taken), we still set the writeonly flag and clear DECL_INITIAL. This may happen for vtables still in use by the type based devirt machinery. The following patch should fix it. Bootstrapped/regtested and comitted to mainline. It also fixes the debug output issue you pointed out. Index: ipa.c === --- ipa.c (revision 210653) +++ ipa.c (working copy) @@ -730,7 +730,7 @@ ipa_discover_readonly_nonaddressable_var if (!address_taken) { if (TREE_ADDRESSABLE (vnode->decl) && dump_file) - fprintf (dump_file, " %s (addressable)", vnode->name ()); + fprintf (dump_file, " %s (non-addressable)", vnode->name ()); varpool_for_node_and_aliases (vnode, clear_addressable_bit, NULL, true); } if (!address_taken && !written @@ -743,7 +743,7 @@ ipa_discover_readonly_nonaddressable_var fprintf (dump_file, " %s (read-only)", vnode->name ()); varpool_for_node_and_aliases (vnode, set_readonly_bit, NULL, true); } - if (!vnode->writeonly && !read && !address_taken) + if (!vnode->writeonly && !read && !address_taken && written) { if (dump_file) fprintf (dump_file, " %s (write-only)", vnode->name ());
Re: [PATCH] Fix ARM NAN fraction bits
On Tue, 20 May 2014, Richard Earnshaw wrote: > >>> If it's not broken in 4.8.2 but broken on the branch head then it's OK > >>> for the branch. > >> > >> I thought I'd double-check with you that it is fine to push this change > >> to trunk first. OK to apply? > > > > Of course it should go to trunk (and 4.9 branch) first, but I'm not > > the one to approve that. > > > > I can... Based on Joseph's review, OK. I have now applied it to trunk and backported to 4.9 and 4.8. Thank you both for the review. Maciej
Re: [PATCH] Fix ICE in rtl-optimization/PR61220, PR61225
On 21 May 2014 00:54, Jeff Law wrote: > On 05/20/14 01:11, Zhenqiang Chen wrote: >> >> Hi, >> >> The patch fix ICE issue triggered by shrink-wrapping enhancement. >> >> Bootstrap and no make check regression on X86-64. >> >> OK for trunk? >> >> Thanks! >> -Zhenqiang >> >> >> 2014-05-20 Zhenqiang Chen >> >> PR rtl-optimization/61220 >> Part of PR rtl-optimization/61225 >> * shrink-wrap.c (move_insn_for_shrink_wrap): Skip SP and FP >> adjustment >> insn; skip split_edge for a block without only on successor. >> >> testsuite/ChangeLog: >> 2014-05-20 Zhenqiang Chen >> >> * gcc.dg/pr61220.c: New test. >> * gcc.dg/shrink-wrap-loop.c: Disable for x86_64 -m32 mode. >> > [ ... ] > > > > >> /* Make sure that the source register isn't defined later in BB. */ >> @@ -204,6 +209,10 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn, >> /* Create a new basic block on the edge. */ >> if (EDGE_COUNT (next_block->preds) == 2) >> { >> + /* split_edge for a block with only one successor is meaningless. >> */ >> + if (EDGE_COUNT (bb->succs) == 1) >> + return false; >> + >> next_block = split_edge (live_edge); > > So don't you just want to use: > > if (EDGE_CRITICAL_P (live_edge)) > > to replace the two explicit checks of EDGE_COUNT on the preds & succs? Thanks for the comment. In the code, there are 4 combinations of EDGE_COUNT: <1, 1>, <1, 2>, <2, 1> and <2, 2>. <2, 1> is "illegal". <2, 2> is legal, but need split_edge. <1, *> can bypass the second check. EDGE_CRITICAL_P can only distinguish <2, 2> from others. So I think two explicit checks is more efficient than EDGE_CRITICAL_P. Thanks! -Zhenqiang
Re: [PATCH] Install sanitizer public headers (fix for PR sanitizer/61100)
On May 14, 2014, at 4:13 AM, Yury Gribov wrote: > Hi, > > Asan and Tsan allow sanitized applications to tweak runtime behavior via API > defined in headers in libsanitizer/include/sanitizer. This patch adds > installation code for these headers and a small test. > > Bootstrapped and regtested on x64. > > -Y > The new test fails on x86_64 Ubuntu 12.04 native build: /home/maxim-kuvyrkov/build/gcc-native-e-m-s-f/gcc/xgcc -B/home/maxim-kuvyrkov/build/gcc-native-e-m-s-f/gcc/ /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/c-c++-common/asan/asan-interface-1.c -B/home/maxim-kuvyrkov/build/gcc-native-e-m-s-f/x86_64-unknown-linux-gnu/./libsanitizer/ -B/home/maxim-kuvyrkov/build/gcc-native-e-m-s-f/x86_64-unknown-linux-gnu/./libsanitizer/asan/ -L/home/maxim-kuvyrkov/build/gcc-native-e-m-s-f/x86_64-unknown-linux-gnu/./libsanitizer/asan/.libs -fsanitize=address -g -I/home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/../../libsanitizer/include -fno-diagnostics-show-caret -fdiagnostics-color=never -O0 -lm -o ./asan-interface-1.exe In file included from /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/c-c++-common/asan/asan-interface-1.c:5:0: /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/../../libsanitizer/include/sanitizer/asan_interface.h:53:3: error: unknown type name 'bool' /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/../../libsanitizer/include/sanitizer/asan_interface.h:66:40: error: unknown type name 'bool' /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/../../libsanitizer/include/sanitizer/asan_interface.h:88:3: error: unknown type name 'bool' /home/maxim-kuvyrkov/build/src/gcc-e-m-s-f/gcc/testsuite/../../libsanitizer/include/sanitizer/asan_interface.h:98:3: error: unknown type name 'bool' compiler exited with status 1 -- Maxim Kuvyrkov www.linaro.org
[PATCH] proposed fix for bug # 61144
Bug # 61144 is a regression in 4.9.0 that breaks building of musl libc due to aggressive and semantically-incorrect constant folding of weak aliases. The attached patch seems to fix the issue. A weak alias should never be a candidate for constant folding because it may always be replaced by a strong definition from another translation unit. For details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144 I do not have a copyright assignment on file but this patch should be sufficiently trivial not to require it. Rich diff --git a/gcc/varpool.c b/gcc/varpool.c index b426757..905047e 100644 --- a/gcc/varpool.c +++ b/gcc/varpool.c @@ -195,6 +195,8 @@ ctor_for_folding (tree decl) { gcc_assert (!DECL_INITIAL (decl) || DECL_INITIAL (decl) == error_mark_node); + if (DECL_WEAK (decl)) +return error_mark_node; if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl))) { node = varpool_alias_target (node);
[patch, lto] add testcase for PR60179
One of the consequences of the (now-fixed) bug in PR60179 is that Nios II code using target pragmas to specify custom instructions failed to generate those instructions with -flto. We came up with this test case for it, but there didn't seem to be an existing hook to scan LTO output. Is this OK, or is there a better way to do it? -Sandra 2014-05-20 Cesar Philippidis Sandra Loosemore gcc/testsuite/ * lib/scanasm.exp (scan-lto-assembler): New procedure. * gcc.target/nios2/custom-fp-lto.c: New test. Index: gcc/testsuite/lib/scanasm.exp === --- gcc/testsuite/lib/scanasm.exp (revision 210659) +++ gcc/testsuite/lib/scanasm.exp (working copy) @@ -500,3 +500,13 @@ proc dg-function-on-line { args } { append final-code "$cmd\n" } + +# Look for a pattern in the .exe.ltrans0.s file produced by the +# compiler. See dg-scan for details. + +proc scan-lto-assembler { args } { +set testcase [testname-for-summary] +set output_file "[file rootname [file tail $testcase]].exe.ltrans0.s" +verbose "output_file: $output_file" +dg-scan "scan-assembler" 1 $testcase $output_file $args +} Index: gcc/testsuite/gcc.target/nios2/custom-fp-lto.c === --- gcc/testsuite/gcc.target/nios2/custom-fp-lto.c (revision 0) +++ gcc/testsuite/gcc.target/nios2/custom-fp-lto.c (revision 0) @@ -0,0 +1,29 @@ +/* Test specification of custom instructions via pragma in the presence + of LTO. This test case formerly failed due to PR60179. */ + +/* { dg-do link } */ +/* { dg-require-effective-target lto } */ +/* { dg-options "-O1 -flto -save-temps" } */ + +/* -O1 in the options is significant. Without it FP operations may not be + optimized to custom instructions. */ + +#include +#include + +#pragma GCC target ("custom-fabss=224") + +float +custom_fp (float operand_a) +{ + return fabsf (operand_a); +} + +int +main (int argc, char *argv[]) +{ + return custom_fp ((float)argc) > 1.0; +} + +/* { dg-final { scan-lto-assembler "custom\\t224, " } } */ +/* { dg-final { cleanup-saved-temps } } */
RE: [PATCH] Fix PR54733 Optimize endian independent load/store
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > > I'll send the new patch as soon as all the tests are done. Here you are. I also simplified the tests a bit having the reference as a function parameter instead of a local one. Updated ChangeLogs: *** gcc/ChangeLog *** 2014-05-20 Thomas Preud'homme PR tree-optimization/54733 * tree-ssa-math-opts.c (nop_stats): New "bswap_stats" structure. (CMPNOP): Define. (find_bswap_or_nop_load): New. (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of bitwise operations equivalent to load in host endianness. (execute_optimize_bswap): Likewise. Also move its leading comment back in place and split statement transformation into ... (bswap_replace): This. *** gcc/testsuite/ChangeLog *** 2014-05-20 Thomas Preud'homme PR tree-optimization/54733 * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources and bitwise operations equivalent to load in host endianness. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Best regards, Thomas gcc32rm-84.6.0.diff Description: Binary data
Re: [PATCH, sched] Cleanup and improve multipass_dfa_lookahead_guard
On Tue, May 20, 2014 at 5:46 PM, Pat Haugen wrote: > On 05/19/2014 11:27 PM, Maxim Kuvyrkov wrote: >> >> Changes to ia64 and rs6000 are mostly mechanical to update hook return >> values, but port maintainers may wish to review those. > > I'm not a maintainer, but the rs6000 changes look good to me. The rs6000 changes are okay with me too. Thanks, David
Re: patch8.diff updated Was: Re: GCC's -fsplit-stack disturbing Mach's vm_allocate
Svante Signell, le Fri 16 May 2014 10:03:05 +0200, a écrit : > is used in gcc-4.9-4.9.0/src/libgo/go/net/fd_unix.go: > func dupCloseOnExec(fd int) (newfd int, err error) { > if atomic.LoadInt32(&tryDupCloexec) == 1 && syscall.F_DUPFD_CLOEXEC!=0 { > r0, _, e1 := syscall.Syscall(syscall.SYS_FCNTL, uintptr(fd), > syscall.F_DUPFD_CLOEXEC, 0) That code can not work as it is, fcntl is not a system call on GNU/Hurd. Why isn't gccgo just using the C fcntl function? That one will just work and be portable. > +# Special treatment of EWOULDBLOCK for GNU/Hurd > +# /usr/include/bits/errno.h: #define EWOULDBLOCK EAGAIN > +if egrep 'define EWOULDBLOCK EAGAIN' gen-sysinfo.go > /dev/null 2>&1; then > + egrep '^const EWOULDBLOCK = Errno(_EWOULDBLOCK)' ${OUT} | \ > +sed -i.bak -e 's/_EWOULDBLOCK/_EAGAIN/' ${OUT} I don't understand why you both pass the output of egrep to sed, and you give the -i option to sed. AIUI, the egrep '^const EWOULDBLOCK = Errno(_EWOULDBLOCK)' part is completely unused, so you can just drop it. > @@ -528,6 +540,12 @@ > +# Special treatment of st_dev for GNU/Hurd > +# /usr/include/i386-gnu/bits/stat.h: #define st_dev st_fsid > +if grep 'define st_dev st_fsid' gen-sysinfo.go > /dev/null 2>&1; then > + egrep '^type _stat ' gen-sysinfo.go > /dev/null 2>&1| \ > + sed -i.bak -e 's/; st_fsid/; st_dev/' gen-sysinfo.go > +fi The same remark about egrep | sed -i applies here. And anyway, why not simply using the very first patch you proposed, which was: @@ -536,6 +548,7 @@ fi | sed -e 's/type _stat64/type Stat_t/' \ -e 's/type _stat/type Stat_t/' \ -e 's/st_dev/Dev/' \ + -e 's/st_fsid/Dev/' \ -e 's/st_ino/Ino/g' \ -e 's/st_nlink/Nlink/' \ -e 's/st_mode/Mode/' \ which I said several times that it should be completely fine. Samuel
RE: [PATCH][MIPS] Implement O32 FPXX ABI (GCC)
Hi Richard, Apologies for appearing to be pushy on this. I appreciate all of this work is done in your spare time, I doubt I could manage that. I missed the important part which was to ask if you had thoughts on the user visible parts of the patch but you have been through the code now anyway... Richard Sandiford writes: > Matthew Fortune writes: > > *) Dwarf debug for 64-bit values in floating point values for FPXX can't > >be strictly correct for both 32-bit and 64-bit registers but opts to > >describe one 64-bit register as that is what the FPXX ABI is > emulating. > >I have not yet checked what exactly happens in GDB when confronted > with > >this and 32-bit registers. This also impacts frame information > described > >via mips_save_reg and mips_restore_reg. Advice on this would be > >appreciated. > > I'm not sure what's best either. Clearly it's something that needs > to be spelled out in the ABI, but I can imagine it would be dictated > by what consumers like the unwinder and gdb find easiest to handle. I'll be looking into this in detail this week. > > *) Because GCC can be built to have mfpxx or mfp64 as the default option > >the ASM_SPEC has to handle these specially such that they are not > >passed in conjunction with -msingle-float. Depending on how all this > >option handling pans out then this may also need to account for > >msoft-float as well. It is an error to have -msoft-float and -mfp64 in > >the assembler. > > The assembler and GCC shouldn't treat the options differently though. > Either it should be OK for both or neither. I wasn't sure if you were applying this rule to options set by --with- at configure time as well as user supplied options. If I move this logic to OPTION_DEFAULT_SPECS instead and not apply the --with-fp option if -msingle-float is given then that will fix the issue too. Is that OK? > > @@ -5141,7 +5141,7 @@ mips_get_arg_info (struct mips_arg_info *info, > const CUMULATIVE_ARGS *cum, > > || SCALAR_FLOAT_TYPE_P (type) > > || VECTOR_FLOAT_TYPE_P (type)) > > && (GET_MODE_CLASS (mode) == MODE_FLOAT > > -|| mode == V2SFmode) > > +|| (TARGET_PAIRED_SINGLE_FLOAT && mode == V2SFmode)) > > && GET_MODE_SIZE (mode) <= UNITS_PER_FPVALUE); > >break; > > This looks odd. We shouldn't have V2SF values if there's no ISA support > for them. True. This is a safety measure against future vector support. I/We wish to completely restrict this ABI extension to the paired single-float hardware feature. This detail is easy to forget, I would like to keep the check but you get final call of course. > > @@ -5636,7 +5636,7 @@ mips_return_fpr_pair (enum machine_mode mode, > > { > >int inc; > > > > - inc = (TARGET_NEWABI ? 2 : MAX_FPRS_PER_FMT); > > + inc = ((TARGET_NEWABI || mips_abi == ABI_32) ? 2 : MAX_FPRS_PER_FMT); > > Formatting nit: no extra brackets here. Is this a no brackets at all case, or brackets on the condition? There are both styles in GCC but it is difficult to tell which is most common/correct. > > @@ -6508,13 +6508,27 @@ mips_output_64bit_xfer (char direction, unsigned > int gpreg, unsigned int fpreg) > >if (TARGET_64BIT) > > fprintf (asm_out_file, "\tdm%cc1\t%s,%s\n", direction, > > reg_names[gpreg], reg_names[fpreg]); > > - else if (TARGET_FLOAT64) > > + else if (ISA_HAS_MXHC1) > > { > >fprintf (asm_out_file, "\tm%cc1\t%s,%s\n", direction, > >reg_names[gpreg + TARGET_BIG_ENDIAN], reg_names[fpreg]); > >fprintf (asm_out_file, "\tm%chc1\t%s,%s\n", direction, > >reg_names[gpreg + TARGET_LITTLE_ENDIAN], reg_names[fpreg]); > > } > > + else if (TARGET_FLOATXX && direction == 't') > > +{ > > + /* Use the argument save area to move via memory. */ > > + fprintf (asm_out_file, "\tsw\t%s,0($sp)\n", reg_names[gpreg]); > > + fprintf (asm_out_file, "\tsw\t%s,4($sp)\n", reg_names[gpreg + 1]); > > + fprintf (asm_out_file, "\tldc1\t%s,0($sp)\n", reg_names[fpreg]); > > +} > > + else if (TARGET_FLOATXX && direction == 'f') > > +{ > > + /* Use the argument save area to move via memory. */ > > + fprintf (asm_out_file, "\tsdc1\t%s,0($sp)\n", reg_names[fpreg]); > > + fprintf (asm_out_file, "\tlw\t%s,0($sp)\n", reg_names[gpreg]); > > + fprintf (asm_out_file, "\tlw\t%s,4($sp)\n", reg_names[gpreg + 1]); > > +} > > The argument save area might be in use. E.g. if an argument register > gets spilled, we'll generally try to spill it to the save area rather > than create a new stack slot for it. > > This case should always be handled via SECONDARY_MEMORY_NEEDED. It is handled via SECONDARY_MEMORY_NEEDED for normal code generation. This function needs a better name! It is solely used as part of generating a mips16 stub and as such the argument save area is fair game as far as I can see,
Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.
> Yes, but why is that better than providing them directly? The latter > would seem to work better with non-C-family languages like Ada. That's correct, enumeration types don't have base types in Ada, i.e. they are their own base types. But if the attributes cannot be expressed in DWARF, then synthesizing a base type on the fly would probably be OK. -- Eric Botcazou
Re: [PATCH] Implement -fsanitize=float-cast-overflow (take 2)
On Tue, 20 May 2014, Marek Polacek wrote: > * is missing tests for long doubles/-mlong-double-128, Also missing tests for float - as far as I can see, only double is tested. Ideally all of float, double, long double, __float128 (where supported), __float80 (where supported) would be tested (the functionality supported for __fp16 (ARM) is a bit more restricted) - hopefully using some shared macros to avoid too much duplication between tests. > * doesn't instrument _Decimal to integer conversions yet. So the code > + else if (REAL_MODE_FORMAT (mode)->b == 10) > +{ > + /* For _Decimal128 up to 34 decimal digits, - sign, > + dot, e, exponent. */ isn't actually being used yet? -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, sched] Cleanup and improve multipass_dfa_lookahead_guard
On 05/19/2014 11:27 PM, Maxim Kuvyrkov wrote: Changes to ia64 and rs6000 are mostly mechanical to update hook return values, but port maintainers may wish to review those. I'm not a maintainer, but the rs6000 changes look good to me. -Pat
Re: Add flag to optionally ignore ELF interposition
On Tue, 20 May 2014, Jan Hubicka wrote: > Hi, > as disucssed some time ago, our assumption that every symbol of shared > library can > be interposed at runtime is expensive and prevents a lot of useful > optimizations, > including inlining or IPA propagation. > > While this is useful feature, it is rather incommon to use it for bigger C++ > projects, like firefox and at least clang seems to ignore the ELF > interposition > rules and always inline/propagate. This patch adds flag to control the > behaviour. > Symbols explicitly delcared WEAK are still considered as interposable. I could see a use for an option listing the symbols that can be interposed. (For example, glibc supports interposition of a limited number of malloc-related symbols, but if it were made to support building with LTO in future that it would make sense to be able to optimize calls to most other functions.) -- Joseph S. Myers jos...@codesourcery.com
Re: RFA: cache enabled attribute by insn code
Jeff Law writes: > On 05/20/14 02:16, Richard Sandiford wrote: >> get_attr_enabled was showing up high in a -O0 compile of fold-const.ii. >> At the moment, extract_insn calls this function for every alternative >> on each extraction, which can be expensive for instructions like >> moves that have many alternatives. >> >> The attribute is only supposed to depend on the insn code and the >> current target. It isn't allowed to depend on the values of operands. >> LRA already makes use of this to cache the enabled attributes based on code. >> >> This patch makes the caching compiler-wide. It also converts the bool >> array into a plain int bitfield, since at the moment we don't allow more >> than 30 alternatives. (It's possible we might want to increase it >> beyond 32 at some point, but then we can change the type to uint64_t >> instead. Wanting to go beyond 64 would suggest deeper problems >> somewhere. :-)) >> >> The patch gives a consistent compile-time improvement of about ~3.5% >> on the -O0 fold-const.ii testcase. >> >> There were a few instances of the construct: >> >>cl = NO_REGS; >>for (ignore_p = false, curr_alt = 0; >> (c = *constraints); >> constraints += CONSTRAINT_LEN (c, constraints)) >> if (c == '#' || !recog_data.alternative_enabled_p[curr_alt]) >>ignore_p = true; >> else if (c == ',') >>{ >> curr_alt++; >> ignore_p = false; >>} >> else if (! ignore_p) >> >> This had the effect of ignoring all alternatives after the first >> attribute-disabled one, since once we found a disabled alternative we'd >> always enter the first arm of the "if" and never increment curr_alt. >> I don't think that was intentional. >> >> Tested on x86_64-linux-gnu. OK to install? >> >> I notice ira_setup_alts is using a HARD_REG_SET to store a mask >> of alternatives. If this patch is OK, I'll follow up with a patch >> to use alternative_mask there too. >> >> Thanks, >> Richard >> >> >> gcc/ >> * system.h (TEST_BIT): New macro. >> * recog.h (alternative_mask): New type. >> (ALL_ALTERNATIVES, ALTERNATIVE_BIT): New macros. >> (recog_data_d): Replace alternative_enabled_p array with >> enabled_alternatives. >> (target_recog): New structure. >> (default_target_recog, this_target_recog): Declare. >> (get_enabled_alternatives): Likewise. >> * recog.c (default_target_recog, this_target_recog): New variables. >> (get_enabled_alternatives): New function. >> (extract_insn): Use it. >> (preprocess_constraints, constrain_operands): Adjust for change to >> recog_data. >> * postreload.c (reload_cse_simplify_operands): Likewise. >> * reload.c (find_reloads): Likewise. >> * ira-costs.c (record_reg_classes): Likewise. >> * ira-lives.c (single_reg_class): Likewise. Fix bug in which >> all alternatives after a disabled one would be skipped. >> (ira_implicitly_set_insn_hard_regs): Likewise. >> * ira.c (commutative_constraint_p): Likewise. >> (ira_setup_alts): Adjust for change to recog_data. >> * lra-int.h (lra_insn_recog_data): Replace alternative_enabled_p >> with enabled_alternatives. >> * lra.c (free_insn_recog_data): Update accordingly. >> (lra_update_insn_recog_data): Likewise. >> (lra_set_insn_recog_data): Likewise. Use get_enabled_alternatives. >> * lra-constraints.c (process_alt_operands): Likewise. Handle >> only_alternative as part of the enabled mask. >> * target-globals.h (this_target_recog): Declare. >> (target_globals): Add a recog field. >> (restore_target_globals): Restore this_target_recog. >> * target-globals.c: Include recog.h. >> (default_target_globals): Initialize recog field. >> (save_target_globals): Likewise. > This is OK for the trunk (referring to the follow-up message which fixed > EWRONGPATCH. Sorry, while working on the follow-up LRA patch, I realised I hadn't accounted for target changes that happen directly via target_reinit (rather than SWITCHABLE_TARGETS) and cases where reinit_regs is called to change just the register information. Both could potentially affect the enabled attribute. This version adds a recog_init function that clears the data if necessary. There are no other changes from first time. Is this still OK? Thanks, Richard gcc/ * system.h (TEST_BIT): New macro. * recog.h (alternative_mask): New type. (ALL_ALTERNATIVES, ALTERNATIVE_BIT): New macros. (recog_data_d): Replace alternative_enabled_p array with enabled_alternatives. (target_recog): New structure. (default_target_recog, this_target_recog): Declare. (get_enabled_alternatives, recog_init): Likewise. * recog.c (default_target_recog, this_target_recog): New variables. (get_enabled_alternatives): New function. (extract_insn): Use it. (recog_init): New function. (preprocess_constrai
[committed] PR 61243: Missing copy of CROSSING_JUMP_P
My CROSSING_JUMP_P patch didn't handle insns that were copied by emit_copy_of_insn_after, which previously would copy REG_CROSSING_JUMP notes by virtue of them being != REG_LABEL_OPERAND. This patch adds the missing copy. Tested with profiledbootstrap on x86_64-linux-gnu. Committed as obvious. Thanks, Richard gcc/ PR rtl-optimization/61243 * emit-rtl.c (emit_copy_of_insn_after): Copy CROSSING_JUMP_P. Index: gcc/emit-rtl.c === --- gcc/emit-rtl.c 2014-05-20 20:20:24.525295969 +0100 +++ gcc/emit-rtl.c 2014-05-20 22:18:18.894580150 +0100 @@ -6027,6 +6027,7 @@ emit_copy_of_insn_after (rtx insn, rtx a case JUMP_INSN: new_rtx = emit_jump_insn_after (copy_insn (PATTERN (insn)), after); + CROSSING_JUMP_P (new_rtx) = CROSSING_JUMP_P (insn); break; case DEBUG_INSN:
Re: [PATCH] Fix PR middle-end/61141
On 05/19/14 15:20, John David Anglin wrote: The problem compiling c-common.c is very hard to debug. These routines are called intensively and the ICE occurs after several million calls. It takes a couple of hours of running under gdb to reach the failing call to reset_insn_used_flags just counting calls up to ICE. I'd suggest getting a suitable .i/.ii file and debugging it with a cross-debugger ;-) Then again, if you've got a breakpoint that is triggering a million times, it may still take hours to hit when debugging with a cross. The problem is more apparent with 64-bit compiler due to way PIC register is saved and restored. Yea, that makes sense. In c#2 you indicate we can get this stuff when an insn in a delay slot sequence is deleted. Where/when are those insns being deleted? Presumably the backends know enough to handle deleted insns in delay slot sequences?!? It's been a long time since I looked at that code in detail, so if you already know it's safe, just say "it's safe" and I'll believe you :-) Otherwise a bit of digging may be needed. I believe that the backend must handle the deleted insns in the delay slot as there are are no compilation errors or regressions. However, like you, I'm not 100% certain this done correctly. I'm pretty sure we're getting this wrong in the backend. In fact, the more I think about it, a NOTE_INSN_DELETED in a delay slot is just asking for all kinds of trouble. In general, if we have an insn with a filled delay slot, then we will emit the two insns independently of each other (most of the time, there are exceptions). A NOTE_INSN_DELETED in a delay slot still looks like a filled slot. So the target code isn't going to emit a NOP or anything like that. It's going to leave it up to the generic code to emit the insn with the delay slot and the delay slot insn itself (the NOTE_INSN_DELETED in this case). Of course we don't output anything for a NOTE_INSN_DELETED. So the net result is we, in essence, fill the delay slot with whatever random instruction happens to fall next in the insn chain. Amazingly, nothing seems to be failing, but I've seen far worse bugs go unnoticed for a long time. Sadly,I think we need to start digging deeper to find out what's deleting those insns and take corrective action. Perhaps writing a little routine to peek at all the filled delay slots and squawk if it finds a NOTE_INSN_DELETED. Then call it after each RTL pass that follows reorg in the pass manager. That'd at least narrow it down to a pass that's mucking things up. Something has changed. Either these insns aren't being removed from RTL chain as they were before or they are being deleted at a much later stage. The same code is present in 4.9 and the problem doesn't seem to occur there. Possibly, a regression search would pin point this. Probably just gone latent. Still, we shouldn't call reset_insn_used_flags or verify_insn_sharing with a note. Agreed, but at least in this instance, it's the symptom of a deeper problem I think. jeff
Re: Eliminate write-only variables
Hi, On Fri, May 16, 2014 at 07:25:59PM +0200, Jan Hubicka wrote: > Hi, > this patch adds code to remove write only static variables. While analyzing > effectivity of LTO on firefox, I noticed that surprisingly large part of > binary's data segment is occupied by these. Fixed thus. > (this is quite trivial transformation, I just never considered it important > enough to work on it). > > The patch goes by marking write only variables in ipa.c (at same time we > discover addressable flag) and also fixes handling of the flags for > aliases. References to variables are then removed by fixup_cfg. > As first cut, I only remove stores without side effects, so copies from > volatile variables are preserved. I also kill LHS of function calls. > I do not attempt to remove asm statements. This means that some references > may be left in the code and therefore the IPA code does not eliminate the > referneces after discovering write only variable and instead it relies > on dead variable elimination to do the job later. Consequently not all write > only variables are removed with WHOPR in the case the references ends up > in different partitions. Something I can address incrementally. > > Also I think dwarf2out should be updated to mark value of the write only > variables as optimized out. Jakub, can you help me with this? > (I do not think it is valid to output the optimized out value of constructor) > > Bootstrapped/regtested x86_64-linux, will commit it later today. > > Honza > > * varpool.c (dump_varpool_node): Dump write-only flag. > * lto-cgraph.c (lto_output_varpool_node, input_varpool_node): Stream > write-only flag. > * tree-cfg.c (execute_fixup_cfg): Remove statements setting write-only > variables. > > > * gcc.c-torture/execute/20101011-1.c: Update testcase. > * gcc.dg/ira-shrinkwrap-prep-1.c: Update testcase. > * gcc.dg/tree-ssa/writeonly.c: New testcase. > * gcc.dg/tree-ssa/ssa-dse-6.c: Update testcase. > * gcc.dg/tree-ssa/pr21559.c: Update testcase. > * gcc.dg/debug/pr35154.c: Update testcase. > * gcc.target/i386/vectorize1.c: Update testcase. > * ipa.c (process_references): New function. > (set_readonly_bit): New function. > (set_writeonly_bit): New function. > (clear_addressable_bit): New function. > (ipa_discover_readonly_nonaddressable_var): Mark write only variables; > fix > handling of aliases. > * cgraph.h (struct varpool_node): Add writeonly flag. > Unfortunately, this commit has caused the following ICE for me when LTO building 471.omnetpp from SPEC 2006 or Firefox (but not libxul, something that gets built earlier): lto1: internal compiler error: in gimple_get_virt_method_for_vtable, at gimple-fold.c:3276 0x7437a3 gimple_get_virt_method_for_vtable(long, tree_node*, unsigned long, bool*) /home/mjambor/gcc/bisect/src/gcc/gimple-fold.c:3276 0x743993 gimple_get_virt_method_for_binfo(long, tree_node*, bool*) /home/mjambor/gcc/bisect/src/gcc/gimple-fold.c:3377 0x7913f2 possible_polymorphic_call_targets(tree_node*, long, ipa_polymorphic_call_context, bool*, void**, int*) /home/mjambor/gcc/bisect/src/gcc/ipa-devirt.c:1697 0x7b73a9 possible_polymorphic_call_targets /home/mjambor/gcc/bisect/src/gcc/ipa-utils.h:121 0x7b73a9 walk_polymorphic_call_targets /home/mjambor/gcc/bisect/src/gcc/ipa.c:177 0x7b73a9 symtab_remove_unreachable_nodes(bool, _IO_FILE*) /home/mjambor/gcc/bisect/src/gcc/ipa.c:407 0x86bf47 execute_todo /home/mjambor/gcc/bisect/src/gcc/passes.c:1843 Please submit a full bug report... I compile omnetpp with -Ofast -g -flto=8 -fwhole-program -funroll-loops -fpeel-loops -march=native -mtune=native I'm afraid I won't be able to prepare a more reduced testcase very soon. Thanks, Martin
[msp430] fix addneghi and add split
Minor bugfixes. Committed to head and 4.9 branch. * config/msp430/msp430.md (split): Don't allow subregs when splitting SImode adds. (andneghi): Fix subtraction logic. * config/msp430/predicates.md (msp430_nonsubreg_or_imm_operand): New. Index: config/msp430/predicates.md === --- config/msp430/predicates.md (revision 210652) +++ config/msp430/predicates.md (working copy) @@ -70,11 +70,15 @@ || INTVAL (op) == ~8 || INTVAL (op) == ~(-1) " (define_predicate "msp430_nonsubreg_operand" (match_code "reg,mem")) +(define_predicate "msp430_nonsubreg_or_imm_operand" + (ior (match_operand 0 "msp430_nonsubreg_operand") + (match_operand 0 "immediate_operand"))) + ; TRUE for constants which are bit positions for zero_extract (define_predicate "msp430_bitpos" (and (match_code "const_int") (match_test (" INTVAL (op) >= 0 && INTVAL (op) <= 15 " Index: config/msp430/msp430.md === --- config/msp430/msp430.md (revision 210652) +++ config/msp430/msp430.md (working copy) @@ -359,14 +359,14 @@ ; Split an SImode add into two HImode adds, keeping track of the carry ; so that gcc knows when it can and can't optimize away the two ; halves. (define_split [(set (match_operand:SI 0 "msp430_nonsubreg_operand") - (plus:SI (match_operand:SI 1 "nonimmediate_operand") -(match_operand:SI 2 "general_operand"))) + (plus:SI (match_operand:SI 1 "msp430_nonsubreg_operand") +(match_operand:SI 2 "msp430_nonsubreg_or_imm_operand"))) ] "" [(parallel [(set (match_operand:HI 3 "nonimmediate_operand" "=&rm") (plus:HI (match_dup 4) (match_dup 5))) (set (reg:BI CARRY) @@ -1314,15 +1314,15 @@ [(set (match_operand:HI 0 "register_operand" "=r") (and:HI (neg:HI (match_operand:HI 1 "register_operand" "r")) (match_operand2 "immediate_operand" "n")))] "" "* if (REGNO (operands[0]) != REGNO (operands[1])) - return \"MOV.W\t%1, %0 { SUB.W\t#0, %0 { AND.W\t%2, %0\"; + return \"MOV.W\t%1, %0 { INV.W\t%0 { INC.W\t%0 { AND.W\t%2, %0\"; else - return \"SUB.W\t#0, %0 { AND.W\t%2, %0\"; + return \"INV.W\t%0 { INC.W\t%0 { AND.W\t%2, %0\"; " ) (define_insn "mulhisi3" [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (sign_extend:SI (match_operand:HI 1 "register_operand" "%0"))
Re: [PATCH, PR C++/61038] - g++ -E is unusable with UDL strings
On 05/13/2014 08:59 PM, Ed Smith-Rowland wrote: + escape_it = escape_it || cpp_userdef_string_p (token->type) + || cpp_userdef_char_p (token->type); Let's add the new cases to the previous statement instead of a new one. OK with that change. Jason
Re: [PATCH] dump_case_nodes: Treat unsigned as unsigned, don't ICE
On May 20, 2014, at 10:58 AM, Segher Boessenkool wrote: > The current code converts every tree to signed hwi; this ICEs with > values not representable as shwi, like 999ULL Looks nice to me… Looks like the type of change I would have done for wide-int, if I had tripped on this.
Re: [patch] libstdc++/61143 make unordered containers usable after move
On 20/05/2014 21:36, Jonathan Wakely wrote: OK. My sketch above avoided calling _M_moved_from() more than once per object, but the compiler should be able to optimise your version to avoid multiple calls anyway. Here is the new patch limited to what I really want to commit this time. Great. Please commit to trunk and the 4.9 branch - thanks! As I have integrated your remarks I won't have time to commit this evening. So I submit this update here, just in case you want to see it again and I will commit tomorrow. I finally use a simplified version of your sketch in the swap implementation. François Index: include/bits/hashtable.h === --- include/bits/hashtable.h (revision 210657) +++ include/bits/hashtable.h (working copy) @@ -316,14 +316,49 @@ size_type _M_element_count; _RehashPolicy _M_rehash_policy; + // A single bucket used when only need for 1 bucket. Especially + // interesting in move semantic to leave hashtable with only 1 buckets + // which is not allocated so that we can have those operations noexcept + // qualified. + // Note that we can't leave hashtable with 0 bucket without adding + // numerous checks in the code to avoid 0 modulus. + __bucket_type _M_single_bucket; + + bool + _M_uses_single_bucket(__bucket_type* __bkts) const + { return __builtin_expect(_M_buckets == &_M_single_bucket, false); } + + bool + _M_uses_single_bucket() const + { return _M_uses_single_bucket(_M_buckets); } + __hashtable_alloc& _M_base_alloc() { return *this; } - using __hashtable_alloc::_M_deallocate_buckets; + __bucket_type* + _M_allocate_buckets(size_type __n) + { + if (__builtin_expect(__n == 1, false)) + { + _M_single_bucket = nullptr; + return &_M_single_bucket; + } + return __hashtable_alloc::_M_allocate_buckets(__n); + } + void + _M_deallocate_buckets(__bucket_type* __bkts, size_type __n) + { + if (_M_uses_single_bucket(__bkts)) + return; + + __hashtable_alloc::_M_deallocate_buckets(__bkts, __n); + } + + void _M_deallocate_buckets() - { this->_M_deallocate_buckets(_M_buckets, _M_bucket_count); } + { _M_deallocate_buckets(_M_buckets, _M_bucket_count); } // Gets bucket begin, deals with the fact that non-empty buckets contain // their before begin node. @@ -703,11 +738,7 @@ size_type erase(const key_type& __k) - { - if (__builtin_expect(_M_bucket_count == 0, false)) - return 0; - return _M_erase(__unique_keys(), __k); - } + { return _M_erase(__unique_keys(), __k); } iterator erase(const_iterator, const_iterator); @@ -768,7 +799,7 @@ _M_rehash_policy() { _M_bucket_count = _M_rehash_policy._M_next_bkt(__bucket_hint); - _M_buckets = this->_M_allocate_buckets(_M_bucket_count); + _M_buckets = _M_allocate_buckets(_M_bucket_count); } template_M_allocate_buckets(_M_bucket_count); + _M_buckets = _M_allocate_buckets(_M_bucket_count); __try { for (; __f != __l; ++__f) @@ -833,9 +864,9 @@ { // Replacement allocator cannot free existing storage. this->_M_deallocate_nodes(_M_begin()); - if (__builtin_expect(_M_bucket_count != 0, true)) - _M_deallocate_buckets(); - _M_reset(); + _M_before_begin._M_nxt = nullptr; + _M_deallocate_buckets(); + _M_buckets = nullptr; std::__alloc_on_copy(__this_alloc, __that_alloc); __hashtable_base::operator=(__ht); _M_bucket_count = __ht._M_bucket_count; @@ -867,7 +898,7 @@ if (_M_bucket_count != __ht._M_bucket_count) { __former_buckets = _M_buckets; - _M_buckets = this->_M_allocate_buckets(__ht._M_bucket_count); + _M_buckets = _M_allocate_buckets(__ht._M_bucket_count); _M_bucket_count = __ht._M_bucket_count; } else @@ -885,8 +916,7 @@ [&__roan](const __node_type* __n) { return __roan(__n->_M_v()); }); if (__former_buckets) - this->_M_deallocate_buckets(__former_buckets, - __former_bucket_count); + _M_deallocate_buckets(__former_buckets, __former_bucket_count); } __catch(...) { @@ -917,7 +947,7 @@ { __bucket_type* __buckets = nullptr; if (!_M_buckets) - _M_buckets = __buckets = this->_M_allocate_buckets(_M_bucket_count); + _M_buckets = __buckets = _M_allocate_buckets(_M_bucket_count); __try { @@ -964,8 +994,9 @@ _M_reset() noexcept { _M_rehash_policy._M_reset(); - _M_bucket_count = 0; - _M_buckets = nullptr; + _M_bucket_count = 1; + _M_single_bucket = nullptr; + _M_buckets = &_M_single_bucket; _M_before_begin._M_nxt = nullptr; _M_element_count = 0; } @@ -980,12 +1011,16 @@ _M_move_assign(_Hashtable&& __ht, std::true_type) { this->_M_deallocate_nodes(_M_begin()); - if (__builtin_expect(_M_bucket_c
Re: [PATCH, rs6000] Fix HTM __builtin_ttest rtl expansion
On Tue, May 20, 2014 at 3:28 PM, Peter Bergner wrote: > The following patch fixes a semi-latent bug for the HTM pattern used with > the __builtin_ttest() builtin. This is supposed to expand to a tabortwci. > instruction which sets cr0 and then some code that copies the cr0 value > into a gpr and then shifts and masks it into the lowest 2 bits in the gpr. > > With newish -mcpu targets, we generate a "mfocrf rX,128" to copy the cr0 > value into bits 32-35 of the gpr. However, in some cases, we will instead > generate a "mfcr rX" instruction, which copies all 8 CR fields into the > gpr, with cr0 again being in bits 32-35. The difference is that the > mfocrf instruction seems to duplicate the 4-bit CR field in the gpr, > so bits 36-39 are a duplicate of bits 32-35. The error in the ttest > pattern is that we only shift the value down 24 bits. This "works" > when a mfocrf copied the cr0 into the rgeister due the duplicated CR > field, but is wrong when a mfcr instruction is used. > > The following passed bootstrap and regtesting on powerpc64-linux. > Ok for trunk? Is this also ok for the 4.9 and 4.8 branches once > my bootstraps and regtesting is done there? > > Peter > > > gcc/ > * config/rs6000/htm.md (ttest): Use correct shift value to get CR0. > > gcc/testsuite/ > * gcc.target/powerpc/htm-ttest.c: New test. Okay. Thanks, David
Re: [PATCH] Implement -fsanitize=float-cast-overflow (take 2)
Thanks for all your help. This is updated patch which: * uses mix/min handling that Jakub kindly wrote, * limited a test to ilp32 || lp64 + used sse2_runtime (but it's likely I screwed it up as usually), * adds a testcase for (unsigned) __int128, * adds testcases for C and C++ bit-fields, * makes -fsanitize=float-cast-overflow be not enabled by default, * is tested with -m32/-m64 on x86_64/ppc64, * is missing tests for long doubles/-mlong-double-128, * doesn't instrument _Decimal to integer conversions yet. Comments? 2014-05-20 Marek Polacek Jakub Jelinek * builtins.def: Change SANITIZE_FLOAT_DIVIDE to SANITIZE_NONDEFAULT. * gcc.c (sanitize_spec_function): Likewise. * convert.c (convert_to_integer): Include "ubsan.h". Add floating-point to integer instrumentation. * doc/invoke.texi: Document -fsanitize=float-cast-overflow. * flag-types.h (enum sanitize_code): Add SANITIZE_FLOAT_CAST and SANITIZE_NONDEFAULT. * opts.c (common_handle_option): Handle -fsanitize=float-cast-overflow. * sanitizer.def (BUILT_IN_UBSAN_HANDLE_FLOAT_CAST_OVERFLOW, BUILT_IN_UBSAN_HANDLE_FLOAT_CAST_OVERFLOW_ABORT): Add. * ubsan.c: Include "realmpfr.h" and "dfp.h". (get_ubsan_type_info_for_type): Handle REAL_TYPEs. (ubsan_instrument_float_cast): New function. * ubsan.h (ubsan_instrument_float_cast): Declare. * c-c++-common/ubsan/float-cast-overflow-1.c: New test. * c-c++-common/ubsan/float-cast-overflow-2.c: New test. * g++.dg/ubsan/float-cast-overflow-bf.C: New test. * gcc.dg/ubsan/float-cast-overflow-bf.c: New test. diff --git gcc/builtins.def gcc/builtins.def index d400ecb..cd823a3 100644 --- gcc/builtins.def +++ gcc/builtins.def @@ -176,7 +176,7 @@ along with GCC; see the file COPYING3. If not see DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,\ true, true, true, ATTRS, true, \ (flag_sanitize & (SANITIZE_ADDRESS | SANITIZE_THREAD \ - | SANITIZE_UNDEFINED | SANITIZE_FLOAT_DIVIDE))) + | SANITIZE_UNDEFINED | SANITIZE_NONDEFAULT))) #undef DEF_CILKPLUS_BUILTIN #define DEF_CILKPLUS_BUILTIN(ENUM, NAME, TYPE, ATTRS) \ diff --git gcc/convert.c gcc/convert.c index 91c1da2..b8f3671 100644 --- gcc/convert.c +++ gcc/convert.c @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see #include "diagnostic-core.h" #include "target.h" #include "langhooks.h" +#include "ubsan.h" /* Convert EXPR to some pointer or reference type TYPE. EXPR must be pointer, reference, integer, enumeral, or literal zero; @@ -394,6 +395,7 @@ convert_to_integer (tree type, tree expr) tree intype = TREE_TYPE (expr); unsigned int inprec = element_precision (intype); unsigned int outprec = element_precision (type); + location_t loc = EXPR_LOCATION (expr); /* An INTEGER_TYPE cannot be incomplete, but an ENUMERAL_TYPE can be. Consider `enum E = { a, b = (enum E) 3 };'. */ @@ -844,7 +846,17 @@ convert_to_integer (tree type, tree expr) return build1 (CONVERT_EXPR, type, expr); case REAL_TYPE: - return build1 (FIX_TRUNC_EXPR, type, expr); + if (flag_sanitize & SANITIZE_FLOAT_CAST) + { + expr = save_expr (expr); + tree check = ubsan_instrument_float_cast (loc, type, expr); + expr = build1 (FIX_TRUNC_EXPR, type, expr); + if (check == NULL) + return expr; + return fold_build2 (COMPOUND_EXPR, TREE_TYPE (expr), check, expr); + } + else + return build1 (FIX_TRUNC_EXPR, type, expr); case FIXED_POINT_TYPE: return build1 (FIXED_CONVERT_EXPR, type, expr); diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi index 5b1b0f1..659c265 100644 --- gcc/doc/invoke.texi +++ gcc/doc/invoke.texi @@ -5427,6 +5427,13 @@ Detect floating-point division by zero. Unlike other similar options, @option{-fsanitize=undefined}, since floating-point division by zero can be a legitimate way of obtaining infinities and NaNs. +@item -fsanitize=float-cast-overflow +@opindex fsanitize=float-cast-overflow + +This option enables floating-point type to integer conversion checking. +We check that the result of the conversion does not overflow. +This option does not work well with @code{FE_INVALID} exceptions enabled. + @item -fsanitize-recover @opindex fsanitize-recover By default @option{-fsanitize=undefined} sanitization (and its suboptions diff --git gcc/flag-types.h gcc/flag-types.h index caf4039..ed00046 100644 --- gcc/flag-types.h +++ gcc/flag-types.h @@ -229,9 +229,11 @@ enum sanitize_code { SANITIZE_BOOL = 1 << 10, SANITIZE_ENUM = 1 << 11, SANITIZE_FLOAT_DIVIDE = 1 << 12, + SANITIZE_FLOAT_CAST = 1 << 13, SANITIZE_UNDEFINED = SANITIZE_SHIFT | SANITIZE_DIVIDE | SANITIZE_UNREACHABLE | SANITIZE_VLA | SANITIZE_NULL | SANITIZE_RETURN -
Re: add dbgcnt and opt-info support for devirtualization
On Tue, May 20, 2014 at 4:32 AM, Richard Biener wrote: > On Mon, May 19, 2014 at 5:24 PM, Xinliang David Li wrote: >> Sorry about it. Here is the patch. There is one remaining case where >> cgraph_dump_file and dump_enable_p are checked separately -- >> cgraph_dump_file is set up differently from 'dump_file'. > > But there you check with an else if, so if you do -fdump-ipa-cgraph > then suddenly -fopt-info will stop reporting? At least in the cgraphunit.c > part of the patch. Right. Fixed. > > I'm ok with the rest of the patch. I checked in the patch with the addition fix. thanks, David > > Thanks, > Richard. > >> David >> >> >> >> On Mon, May 19, 2014 at 2:21 AM, Richard Biener >> wrote: >>> On Fri, May 16, 2014 at 11:19 PM, Xinliang David Li >>> wrote: Modified the patch according to yours and Richard's feedback. PTAL. >>> >>> ENOPATCH. >>> >>> Btw, I don't see any issue with leaking node order to opt-report. >>> >>> Richard. >>> thanks, David On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka wrote: >> Hi, debugging runtime bugs due to devirtualization can be hard for >> very large C++ programs with complicated class hierarchy. This patch >> adds the support to report this high level transformation via >> -fopt-info (not hidden inside dump file) and the ability the do binary >> search with cutoff. >> >> Ok for trunk after build and test? > > Seems resonable to me. >> >> thanks, >> >> David > >> Index: ChangeLog >> === >> --- ChangeLog (revision 210479) >> +++ ChangeLog (working copy) >> @@ -1,3 +1,18 @@ >> +2014-05-15 Xinliang David Li >> + >> + * cgraphunit.c (walk_polymorphic_call_targets): Add >> + dbgcnt and fopt-info support. >> + 2014-05-15 Xinliang David Li >> + >> + * cgraphunit.c (walk_polymorphic_call_targets): Add >> + dbgcnt and fopt-info support. >> + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto. >> + * ipa-devirt.c (ipa_devirt): Ditto. >> + * ipa.c (walk_polymorphic_call_targets): Ditto. >> + * gimple-fold.c (fold_gimple_assign): Ditto. >> + (gimple_fold_call): Ditto. >> + * dbgcnt.def: New counter. >> + >> 2014-05-15 Martin Jambor >> >> PR ipa/61085 >> Index: ipa-prop.c >> === >> --- ipa-prop.c(revision 210479) >> +++ ipa-prop.c(working copy) >> @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. >> #include "ipa-utils.h" >> #include "stringpool.h" >> #include "tree-ssanames.h" >> +#include "dbgcnt.h" >> >> /* Intermediate information about a parameter that is only useful >> during the >> run of ipa_analyze_node and is not kept afterwards. */ >> @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c >> fprintf (dump_file, "ipa-prop: Discovered direct call to >> non-function" >> " in %s/%i, making it unreachable.\n", >>ie->caller->name (), ie->caller->order); >> + else if (dump_enabled_p ()) >> + { >> + location_t loc = gimple_location (ie->call_stmt); >> + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, >> +"Discovered direct call to non-function in >> %s, " >> +"making it unreachable\n", ie->caller->name >> ()); > > Perhaps "turning it to __builtin_unreachable call" and similarly in the > other cases > we introduce __builtin_unreachable? I think that could be easier for user > to work > out. > > What king of problems in devirtualizatoin you are seeing? > > > Honza
Add flag to optionally ignore ELF interposition
Hi, as disucssed some time ago, our assumption that every symbol of shared library can be interposed at runtime is expensive and prevents a lot of useful optimizations, including inlining or IPA propagation. While this is useful feature, it is rather incommon to use it for bigger C++ projects, like firefox and at least clang seems to ignore the ELF interposition rules and always inline/propagate. This patch adds flag to control the behaviour. Symbols explicitly delcared WEAK are still considered as interposable. Bootstrapped/regtested x86_64-linux, will commit it tomorrow if there are no complains. (Feedback is welcome!) Honza * doc/invoke.texi (-fsemantic-interposition): New flag. * common.opt (fsemantic-interposition): Use it. * varasm.c (decl_replaceable_p): Use it. Index: doc/invoke.texi === --- doc/invoke.texi (revision 210653) +++ doc/invoke.texi (working copy) @@ -411,6 +411,7 @@ Objective-C and Objective-C++ Dialects}. -fschedule-insns -fschedule-insns2 -fsection-anchors @gol -fselective-scheduling -fselective-scheduling2 @gol -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol +-fsemantic-interposition @gol -fshrink-wrap -fsignaling-nans -fsingle-precision-constant @gol -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol @@ -7709,6 +7710,22 @@ This option has no effect unless one of When pipelining loops during selective scheduling, also pipeline outer loops. This option has no effect unless @option{-fsel-sched-pipelining} is turned on. +@item -fsemantic-interposition +@opindex fsemantic-interposition +Some object formats, like ELF, allow interposing of symbols by dynamic linker. +This means that for symbols exported from the DSO compiler can not perform +inter-procedural propagation, inlining and other optimizations in anticipation +that the function or variable in question may change. While this feature is +useful, for example, to rewrite memory allocation functions by a debugging +implementation, it is expensive in the terms of code quality. +With @option{-fno-semantic-inteposition} compiler assumest that if interposition +happens for functions the overwritting function will have +precisely same semantics (and side effects). Similarly if interposition happens +for variables, the constructor of the variable will be the same. The flag +has no effect for functions explicitly declared inline, where +interposition changing semantic is never allowed and for symbols explicitly +declared weak. + @item -fshrink-wrap @opindex fshrink-wrap Emit function prologues only before parts of the function that need it, Index: common.opt === --- common.opt (revision 210653) +++ common.opt (working copy) @@ -1854,6 +1854,10 @@ fsel-sched-reschedule-pipelined Common Report Var(flag_sel_sched_reschedule_pipelined) Init(0) Optimization Reschedule pipelined regions without pipelining +fsemantic-interposition +Common Report Var(flag_semantic_interposition) Init(1) +Allow interposing function (or variables) by ones with different semantics (or initializer) respectively by dynamic linker + ; sched_stalled_insns means that insns can be moved prematurely from the queue ; of stalled insns into the ready list. fsched-stalled-insns Index: varasm.c === --- varasm.c(revision 210654) +++ varasm.c(working copy) @@ -6870,6 +6870,9 @@ decl_replaceable_p (tree decl) gcc_assert (DECL_P (decl)); if (!TREE_PUBLIC (decl) || DECL_COMDAT (decl)) return false; + if (!flag_semantic_interposition + && !DECL_WEAK (decl)) +return false; return !decl_binds_to_current_def_p (decl); }
Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.
On 05/20/2014 01:51 PM, Mark Wielaard wrote: The DWARF part isn't what this patch is blocked on. That has already been discussed on the DWARF standard list, coordinated with the gdb hackers and approved some months ago. Fair enough. The part that hasn't been reviewed and approved yet is the frontend changes that implement the lang-hook. Rather than define the hook for C, let's have a default version that uses the type_for_size langhook; that should work better for Ada. Jason
Re: [patch ping] libstdc++ testsuite cxxflags
On 05/20/2014 02:11 AM, Jonathan Wakely wrote: > On 19/05/14 14:57 -0600, Sandra Loosemore wrote: >> On 05/17/2014 04:07 AM, Jonathan Wakely wrote: >>> On 17 May 2014 10:50, Jonathan Wakely wrote: On 17 May 2014 01:16, Sandra Loosemore wrote: > It appears that this patch from last fall never got reviewed. > > https://gcc.gnu.org/ml/gcc-patches/2013-10/msg02340.html > > Can someone take a look? I'll commit the patch on Cesar's behalf > if it's > approved. Libstdc++ patches need to go to the libstdc++ list, which this did, in a separate mail that broke the threading: https://gcc.gnu.org/ml/libstdc++/2013-10/msg00224.html Then archives's inability to thread betweem months broke it again: https://gcc.gnu.org/ml/libstdc++/2013-11/msg00113.html I approved it then withdrew that approval: https://gcc.gnu.org/ml/libstdc++/2013-11/msg00120.html then the patch got revised: https://gcc.gnu.org/ml/libstdc++/2013-11/msg00122.html I'll have to refresh my memory about it. >> >> Whoops, I totally missed that there was already a separate thread on >> the libstdc++ mailing list only. My bad. :-( >> >>> I think I'm happiest with the second version of the patch, in >>> https://gcc.gnu.org/ml/libstdc++/2013-11/msg00114.html >>> >>> It does mean a change that might affect people using CXXFLAGS when >>> running the tests, so we might want to update >>> https://gcc.gnu.org/onlinedocs/libstdc++/manual/test.html where it says >>> "Or, just run the testsuites with CXXFLAGS set to -D_GLIBCXX_DEBUG or >>> -D_GLIBCXX_PARALLEL." >> >> I came up with the attached patch for the wording change. I'm having >> trouble regenerating the HTML version of the manual, though; it looks >> like I have a different version of the DocBook stylesheets around that >> are introducing lots of extraneous changes, and I'm not sure what the >> "right" version is. :-S Any suggestions? > > You always get hundreds of changes, DocBook generates unique numeric > id attributes, which are different every run. Don't worr yabout the > docs, I can sort them out. If you and Cesar are happy with the patch > in https://gcc.gnu.org/ml/libstdc++/2013-11/msg00114.html then please > go ahead and commit that version, thanks. Looking back at my notes, this patch addresses the libstdc++ atomics test failures when using a custom site.exp. Without the -O2 flag, those tests would fail to link because of the dependency on libatomic. I'm happy with the second patch. Sandra please commit it. Thanks, Cesar
Re: [patch] libstdc++/61143 make unordered containers usable after move
On 19/05/14 22:27 +0200, François Dumont wrote: On 15/05/2014 22:52, Jonathan Wakely wrote: Does this get initialized in the constructors? Would it make sense to give it an initializer? __bucket_type_M_single_bucket = nullptr; This bucket is replacing those normally allocated and when they are allocated they are 0 initialised. So, you were right, there were one place where this initialization was missing which is fixed in this new patch. So I don't think this additional initialization is necessary. OK @@ -980,12 +999,16 @@ _M_move_assign(_Hashtable&& __ht, std::true_type) { this->_M_deallocate_nodes(_M_begin()); - if (__builtin_expect(_M_bucket_count != 0, true)) -_M_deallocate_buckets(); - + _M_deallocate_buckets(); __hashtable_base::operator=(std::move(__ht)); _M_rehash_policy = __ht._M_rehash_policy; - _M_buckets = __ht._M_buckets; + if (__builtin_expect(__ht._M_buckets != &__ht._M_single_bucket, true)) +_M_buckets = __ht._M_buckets; What is the value of this->_M_single_bucket now? Should it be set to nullptr, if only to help debugging? We are not resetting buckets to null when rehashing so unless I add more checks I won't be able to reset it each time. OK + if (__builtin_expect(__ht._M_buckets == &__ht._M_single_bucket, false)) This check appears in a few places, I wonder if it is worth creating a private member function to hide the details: bool _M_moved_from() const noexcept { return __builtin_expect(_M_buckets == &_M_single_bucket, false); } Then just test if (__ht._M_moved_from()) Usually I would think the __builtin_expect should not be inside the member function, so the caller decides what the expected result is, but I think in all cases the result is expected to be false. That matches how move semantics are designed: the object that gets moved from is expected to be going out of scope, and so will be reused in a minority of cases. @@ -1139,7 +1170,14 @@ { if (__ht._M_node_allocator() == this->_M_node_allocator()) { - _M_buckets = __ht._M_buckets; + if (__builtin_expect(__ht._M_buckets == &__ht._M_single_bucket, false)) This could be if (__ht._M_moved_from()) I hesitated in doing so and finally do so. I only prefer _M_use_single_bucket as we might not limit its usage to moved instances. Good point. I think it should be called _M_uses_single_bucket() or _M_using_single_bucket() though, otherwise it sounds more like it answers the question "should I use a single bucket?" rather than "am I using the single bucket?" How about removing the std::swap(_M_buckets, __x._M_buckets) above and doing (untested): if (this->_M_moved_from()) { if (__x._M_moved_from()) _M_buckets = &_M_single_bucket; else _M_buckets = __x._M_buckets; __x._M_buckets = &__x._M_single_bucket; } else if (__x._M_moved_from()) { __x._M_buckets = _M_buckets; _M_buckets = &_M_single_bucket; } else std::swap(_M_buckets, __x._M_buckets); Is that worth it? I'm not sure. Yes, with the newly introduced _M_use_single_bucket it worth it I think and is what I have done. OK. My sketch above avoided calling _M_moved_from() more than once per object, but the compiler should be able to optimise your version to avoid multiple calls anyway. Here is the new patch limited to what I really want to commit this time. Great. Please commit to trunk and the 4.9 branch - thanks!
[PATCH, rs6000] Fix HTM __builtin_ttest rtl expansion
The following patch fixes a semi-latent bug for the HTM pattern used with the __builtin_ttest() builtin. This is supposed to expand to a tabortwci. instruction which sets cr0 and then some code that copies the cr0 value into a gpr and then shifts and masks it into the lowest 2 bits in the gpr. With newish -mcpu targets, we generate a "mfocrf rX,128" to copy the cr0 value into bits 32-35 of the gpr. However, in some cases, we will instead generate a "mfcr rX" instruction, which copies all 8 CR fields into the gpr, with cr0 again being in bits 32-35. The difference is that the mfocrf instruction seems to duplicate the 4-bit CR field in the gpr, so bits 36-39 are a duplicate of bits 32-35. The error in the ttest pattern is that we only shift the value down 24 bits. This "works" when a mfocrf copied the cr0 into the rgeister due the duplicated CR field, but is wrong when a mfcr instruction is used. The following passed bootstrap and regtesting on powerpc64-linux. Ok for trunk? Is this also ok for the 4.9 and 4.8 branches once my bootstraps and regtesting is done there? Peter gcc/ * config/rs6000/htm.md (ttest): Use correct shift value to get CR0. gcc/testsuite/ * gcc.target/powerpc/htm-ttest.c: New test. Index: gcc/config/rs6000/htm.md === --- gcc/config/rs6000/htm.md(revision 210518) +++ gcc/config/rs6000/htm.md(working copy) @@ -179,7 +179,7 @@ (define_expand "ttest" (const_int 0)] UNSPECV_HTM_TABORTWCI)) (set (subreg:CC (match_dup 2) 0) (match_dup 1)) - (set (match_dup 3) (lshiftrt:SI (match_dup 2) (const_int 24))) + (set (match_dup 3) (lshiftrt:SI (match_dup 2) (const_int 28))) (parallel [(set (match_operand:SI 0 "int_reg_operand" "") (and:SI (match_dup 3) (const_int 15))) (clobber (scratch:CC))])] Index: gcc/testsuite/gcc.target/powerpc/htm-ttest.c === --- gcc/testsuite/gcc.target/powerpc/htm-ttest.c(revision 0) +++ gcc/testsuite/gcc.target/powerpc/htm-ttest.c(working copy) @@ -0,0 +1,14 @@ +/* { dg-do compile { target { powerpc*-*-* } } } */ +/* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */ +/* { dg-require-effective-target powerpc_htm_ok } */ +/* { dg-options "-O2 -mhtm" } */ + +/* { dg-final { scan-assembler "rlwinm r?\[0-9\]+,r?\[0-9\]+,3,30,31" { target { ilp32 } } } } */ +/* { dg-final { scan-assembler "rldicl r?\[0-9\]+,r?\[0-9\]+,35,62" { target { lp64 } } } } */ + +#include +long +ttest (void) +{ + return _HTM_STATE(__builtin_ttest()); +}
Re: [PATCH] Fix PR61221
On 05/20/14 12:13, Richard Biener wrote: I think it's still important for things like keeping SSA names densely allocated and for reuse on SSA rewrite of virtual operands for example. So I'm thinking of an explicit stop-/restart-reuse interface. Fair enough. WRT dense allocation, we could probably do a lot better if we allocated some N nodes at a time rather than one at a time on demand when the freelist is empty. When we allocate those N nodes, obviously N-1 go onto the freelist. It'd be interesting to see the cache effects of doing something like that ;-) jeff
Re: [PATCH] Fix PR61221
On May 20, 2014 6:47:44 PM CEST, Jeff Law wrote: >On 05/20/14 02:06, Richard Biener wrote: >> >> We're still going to have problems if SSA names are re-used >> (as even released SSA names have to remain valid lattice >> entries after this). But currently nothing will create new >> SSA names (fingers crossing) during eliminate. A followup >> patch of mine will though, thus I'll do that two-staged >> SSA name release/reuse list you thought would maybe fix this >> issue. >Sounds good (the two-staged release/reuse). > >The other thing I had pondered was two modes. One with immediate >re-use >and one with the two-staged release/reuse. > >But before doing anything my plan was to revisit the need for the >release/reuse code. That code was written when we had to go completely > >out of SSA, then back into SSA in the first incarnation of the SSA jump > >threading support. > >With the incremental updating of the SSA graph, we may find that we >really just don't need this code anymore. I wouldn't lose any sleep if > >after a bit of poking you came to that conclusion and just ripped out >the release/reuse code completely. I think it's still important for things like keeping SSA names densely allocated and for reuse on SSA rewrite of virtual operands for example. So I'm thinking of an explicit stop-/restart-reuse interface. Richard. >jeff
Re: RFA: cache recog_op_alt by insn code
On 05/20/14 02:19, Richard Sandiford wrote: Following on from (and depending on) the last patch, process_constraints also shows up high in the profile. This patch caches the recog_op_alt information by insn code too. It also shrinks the size of the structure from 1 pointer + 5 ints to 1 pointer + 2 ints: - no target should have more than 65536 register classes :-) That could probably be much lower if we needed more bits for other things. - "reject" is based on a cost of 600 for '!', so 16 bits should be plenty OK. Makes sense. - "matched" and "matches" are operand numbers and so fit in 8 bits OK. This could also be smaller, don't we have an upper limit of 32 (or 30?) on the number of operands appearing in an insn. It'd be a way to get a few more bits if we need them someday. Since this code is creating cached data and should only be run once per insn code or asm statement, I don't think the extra in-loop lra_static_insn_data assignments really justify having two copies of such a complicated function. I think it would be better for LRA to use the generic routine and then fill in the lra_static_insn_data fields on the result (basically just an OR of "is_address" and "early_clobber" for each alternative, plus setting up "commutative"). We could then avoid having two caches of the same data. I'll do that as a follow-up if it sounds OK. Seems reasonable. I suspect Vlad found the same code to be hot, hence the caching. Once he had a copy adding to it wasn't a big deal. But yes, I think that after the cache is globalized that having IRA walk over things another time shouldn't be a problem. On the subject of commutativity, we have: static bool commutative_constraint_p (const char *str) { int curr_alt, c; bool ignore_p; for (ignore_p = false, curr_alt = 0;;) { c = *str; if (c == '\0') break; str += CONSTRAINT_LEN (c, str); if (c == '#' || !recog_data.alternative_enabled_p[curr_alt]) ignore_p = true; else if (c == ',') { curr_alt++; ignore_p = false; } else if (! ignore_p) { /* Usually `%' is the first constraint character but the documentation does not require this. */ if (c == '%') return true; } } return false; } Any objections to requiring `%' to be the first constraint character? Seems wasteful to be searching the constraint string just to support an odd case. If we're going to change it, then clearly the docs need to change and ideally we'd statically check the port's constraints during the build process to ensure they meet the tighter definition. I'm sure David will be oh-so-happy to see state appearing. But that's something he'll have to deal with in the JIT side. The patch gives a further ~3.5% improvement in compile time for -O0 fold-const.ii, on top of the other patch. Tested on x86_64-linux-gnu. OK to install? Thanks, Richard gcc/ * recog.h (operand_alternative): Convert reg_class, reject, matched and matches into bitfields. (preprocess_constraints): Take the insn as parameter. (recog_op_alt): Change into an array of pointers. (target_recog): Add x_op_alt. * recog.c (asm_op_alt_1, asm_op_alt): New variables (recog_op_alt): Change into an array of pointers. (preprocess_constraints): Update accordingly. Take the insn as parameter. Use asm_op_alt_1 and asm_op_alt for asms. Cache other instructions in this_target_recog. * ira-lives.c (process_bb_node_lives): Pass the insn to process_constraints. * reg-stack.c (check_asm_stack_operands): Likewise. (subst_asm_stack_regs): Likewise. * regcprop.c (copyprop_hardreg_forward_1): Likewise. * regrename.c (build_def_use): Likewise. * sched-deps.c (sched_analyze_insn): Likewise. * sel-sched.c (get_reg_class, implicit_clobber_conflict_p): Likewise. * config/arm/arm.c (xscale_sched_adjust_cost): Likewise. (note_invalid_constants): Likewise. * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise. OK. Thanks! Jeff
[PATCH] dump_case_nodes: Treat unsigned as unsigned, don't ICE
The current code converts every tree to signed hwi; this ICEs with values not representable as shwi, like 999ULL in pr34154.c (and if it didn't ICE, it would print the wrong value). This fixes it. Bootstrapped and tested on powerpc64-linux. Okay to apply? Segher 2014-05-20 Segher Boessenkool gcc/ * stmt.c (dump_case_nodes): Don't convert values to HOST_WIDE_INT before printing. --- gcc/stmt.c | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/gcc/stmt.c b/gcc/stmt.c index 163d495..722d34f 100644 --- a/gcc/stmt.c +++ b/gcc/stmt.c @@ -774,24 +774,20 @@ static void dump_case_nodes (FILE *f, struct case_node *root, int indent_step, int indent_level) { - HOST_WIDE_INT low, high; - if (root == 0) return; indent_level++; dump_case_nodes (f, root->left, indent_step, indent_level); - low = tree_to_shwi (root->low); - high = tree_to_shwi (root->high); - fputs (";; ", f); - if (high == low) -fprintf (f, "%*s" HOST_WIDE_INT_PRINT_DEC, -indent_step * indent_level, "", low); - else -fprintf (f, "%*s" HOST_WIDE_INT_PRINT_DEC " ... " HOST_WIDE_INT_PRINT_DEC, -indent_step * indent_level, "", low, high); + fprintf (f, "%*s", indent_step * indent_level, ""); + print_dec (root->low, f, TYPE_SIGN (TREE_TYPE (root->low))); + if (!tree_int_cst_equal (root->low, root->high)) +{ + fprintf (f, " ... "); + print_dec (root->high, f, TYPE_SIGN (TREE_TYPE (root->high))); +} fputs ("\n", f); dump_case_nodes (f, root->right, indent_step, indent_level); -- 1.8.1.4
Re: [Google/4_8] LIPO COMDAT profile fixups
On Tue, May 20, 2014 at 8:39 AM, Xinliang David Li wrote: > On Tue, May 20, 2014 at 6:32 AM, Teresa Johnson wrote: >> On Mon, May 19, 2014 at 11:51 PM, Xinliang David Li >> wrote: >>> Why duplicating the merger functions in dyn-ipa.c? Should those in >>> libgcov-merge.c be reused? >> >> The merger functions in libgcov-merge.c use a macro to either read the >> counter to merge from a buffer in memory (when IN_GCOV_TOOL) or from >> the gcda file otherwise. In this case I need to merge from another >> array, and it needs to do so both IN_GCOV_TOOL and otherwise. So to >> reuse the merger functions I would have had to get the counter value >> from a new argument, guarded by a conditional. I didn't want to change >> the interface of those existing merge functions, or more importantly, >> make them less efficient with the added condition since they are >> invoked frequently during gcda file merging. >> > > ok -- something to think about in the future. > > For now, please add a comment in libgcov-utils.c before > ctr_merge_functions to point to the dup merge functions in dyn-ipa.c. Done. > > >>> >>> The refactoring of gcov_exit_write_gcda should probably be done in a >>> separate patch -- preferably submitted to trunk too. >> >> The refactoring is necessary for this patch since I need to invoke the >> outlined functionality in gcov_dump_module_info as well. >> >> I could submit that to trunk, but it has to go in to the google >> branches either with or preceeding this patch. >> > > ok for google branch after fixing the formatting issues Fixed the issues below and others uncovered by contrib/check_GNU_style.sh. Patch committed to google/4_8 as r210652. Will port to google/4_9. Teresa > > 1) calller --> caller > 2) many places where there is missing space: e.g, > gcov_fixup_counters_checksum( .. > > David > >> Thanks, >> Teresa >> >>> >>> David >>> >>> On Mon, May 19, 2014 at 10:08 PM, Teresa Johnson >>> wrote: Ping. Teresa On Wed, May 14, 2014 at 4:39 PM, Teresa Johnson wrote: > This patch applies profile fixups to COMDATs on the dyn ipa callgraph > at the end of LIPO module grouping (either in the profile gen run or > in gcov-tool). This is to address issues with missing profiles in the > out-of-line COMDAT copies not selected by the linker, and indirect > call profiles that target a module not included in the module group. > By default, both fixups are enabled, but can be controlled by a > profile-gen parameter, an environment variable, and a gcov-tool > option. > > The fixups assume that functions with the same lineno and cfg checksum > are copies of the same COMDAT. This is made more likely by ensuring > that in LIPO mode we include the full mangled name in the > lineno_checksum. > > For the counter fixup, we merge all non-zero profiles with matching > checksums and copy the merged profile into copies with all-zero > profiles. > > For the indirect counter fixup, if an indirect call profile target is > not in the module group, we look for a matching checksum copy in the > primary module and if exactly one is found we change the target to > that. > > If any fixups are applied, the gcda files are rewritten after module > grouping. > > This also required a couple of other changes to the optimizer. During > cgraph node resolution, we were arbitrarily selecting a copy that had > non-zero profiles. Now there are many more to choose from, so we will > prefer the primary module copy if it is non-zero. Also, during cloning > for inlining, we only want to update the profile on the callee node if > we are inlining into the resolved node caller node. We were already > doing this for AutoFDO, and need to do this here now that many node > copies have the same profile. > > Patch attached. Tested with regression tests and internal benchmarks. > Ok for google branches? > > Thanks, > Teresa > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.
On Tue, May 20, 2014 at 10:43:22AM -0400, Jason Merrill wrote: > On 05/20/2014 02:55 AM, Mark Wielaard wrote: > >On Mon, May 19, 2014 at 04:50:35PM -0400, Jason Merrill wrote: > >>On 05/13/2014 03:21 AM, Mark Wielaard wrote: > >>>So the debugger doesn't have to guess the properties of the enum's > >>>underlying base type, like size, encoding and signedness. > >> > >>Well, the enum already has DW_AT_byte_size. It seems to me that it should > >>also have DW_AT_encoding to provide the other two properties you mention. > > > >Right, that is the idea, since an enum doesn't provide those attributes, > >it should have a reference to the underlying base type that provides them. > > Yes, but why is that better than providing them directly? Because that would loose the information of which underlying type is used to implement the enum. And it is probably less efficient to replicate all the attributes for each enum. It is also simply how the DWARF standard specifies these properties. They are not attached to the DW_TAG_enumeration_type directly but specified through the underlying type. Which is what gdb expects now. The DWARF part isn't what this patch is blocked on. That has already been discussed on the DWARF standard list, coordinated with the gdb hackers and approved some months ago. The part that hasn't been reviewed and approved yet is the frontend changes that implement the lang-hook. Thanks, Mark
Re: RFA: cache enabled attribute by insn code
On 05/20/14 02:16, Richard Sandiford wrote: get_attr_enabled was showing up high in a -O0 compile of fold-const.ii. At the moment, extract_insn calls this function for every alternative on each extraction, which can be expensive for instructions like moves that have many alternatives. The attribute is only supposed to depend on the insn code and the current target. It isn't allowed to depend on the values of operands. LRA already makes use of this to cache the enabled attributes based on code. This patch makes the caching compiler-wide. It also converts the bool array into a plain int bitfield, since at the moment we don't allow more than 30 alternatives. (It's possible we might want to increase it beyond 32 at some point, but then we can change the type to uint64_t instead. Wanting to go beyond 64 would suggest deeper problems somewhere. :-)) The patch gives a consistent compile-time improvement of about ~3.5% on the -O0 fold-const.ii testcase. There were a few instances of the construct: cl = NO_REGS; for (ignore_p = false, curr_alt = 0; (c = *constraints); constraints += CONSTRAINT_LEN (c, constraints)) if (c == '#' || !recog_data.alternative_enabled_p[curr_alt]) ignore_p = true; else if (c == ',') { curr_alt++; ignore_p = false; } else if (! ignore_p) This had the effect of ignoring all alternatives after the first attribute-disabled one, since once we found a disabled alternative we'd always enter the first arm of the "if" and never increment curr_alt. I don't think that was intentional. Tested on x86_64-linux-gnu. OK to install? I notice ira_setup_alts is using a HARD_REG_SET to store a mask of alternatives. If this patch is OK, I'll follow up with a patch to use alternative_mask there too. Thanks, Richard gcc/ * system.h (TEST_BIT): New macro. * recog.h (alternative_mask): New type. (ALL_ALTERNATIVES, ALTERNATIVE_BIT): New macros. (recog_data_d): Replace alternative_enabled_p array with enabled_alternatives. (target_recog): New structure. (default_target_recog, this_target_recog): Declare. (get_enabled_alternatives): Likewise. * recog.c (default_target_recog, this_target_recog): New variables. (get_enabled_alternatives): New function. (extract_insn): Use it. (preprocess_constraints, constrain_operands): Adjust for change to recog_data. * postreload.c (reload_cse_simplify_operands): Likewise. * reload.c (find_reloads): Likewise. * ira-costs.c (record_reg_classes): Likewise. * ira-lives.c (single_reg_class): Likewise. Fix bug in which all alternatives after a disabled one would be skipped. (ira_implicitly_set_insn_hard_regs): Likewise. * ira.c (commutative_constraint_p): Likewise. (ira_setup_alts): Adjust for change to recog_data. * lra-int.h (lra_insn_recog_data): Replace alternative_enabled_p with enabled_alternatives. * lra.c (free_insn_recog_data): Update accordingly. (lra_update_insn_recog_data): Likewise. (lra_set_insn_recog_data): Likewise. Use get_enabled_alternatives. * lra-constraints.c (process_alt_operands): Likewise. Handle only_alternative as part of the enabled mask. * target-globals.h (this_target_recog): Declare. (target_globals): Add a recog field. (restore_target_globals): Restore this_target_recog. * target-globals.c: Include recog.h. (default_target_globals): Initialize recog field. (save_target_globals): Likewise. This is OK for the trunk (referring to the follow-up message which fixed EWRONGPATCH. jeff
Re: [C++ Patch/RFC] PR 58753 & 58930
OK. Jason
Re: [PATCH, AArch64] Fix for PR61202
Hi James Thank you for pointing this out. In the new patch I removed the modification of vqdmulh_n_s32 and vqdmulhq_n_s32. Passed dejagnu testing on aarch64 qemu again. OK for trunk, 4.9 and 4.8? 2014-05-20 Guozhi Wei * config/aarch64/arm_neon.h (vqdmulh_n_s16): Change the last operand's constraint. (vqdmulhq_n_s16): Likewise. On Mon, May 19, 2014 at 11:50 PM, James Greenhalgh wrote: > On Tue, May 20, 2014 at 07:18:40AM +0100, Carrot Wei wrote: >> Hi > > Hi, > >> The last operand of instruction sqdmulh can only be low fp registers, >> so we should use constraint "x". But the intrinsic functions use "w". >> This patch fixed the constrains in these intrinsics. > > This restriction is only on the _s16 variants of the intrinsics. From the > ARMv8 Architecture Reference Manual: > >Is the name of the second SIMD&FP source register, > [...] > Restricted to V0-V15 when element size is H. > > The patch is correct (though I can't approve it) for vqdmulh_n_s16 and > vqdmulhq_n_s16, but the two hunks for vqdmulh_n_s32 and vqdmulhq_n_s32 should > be dropped as they are too restrictive. > > Thanks, > James > >> 2014-05-19 Guozhi Wei >> >> * config/aarch64/arm_neon.h (vqdmulh_n_s16): Change >> the last operand's constraint. >> (vqdmulh_n_s32): Likewise. >> (vqdmulhq_n_s16): Likewise. >> (vqdmulhq_n_s32): Likewise. > patch Description: Binary data
Re: [PATCH v2] Replace C/C++ void_zero_node with a VOID_CST tree code
OK. I've committed my patch. Jason
Re: [PATCH, sched] Cleanup and improve multipass_dfa_lookahead_guard
On 05/19/14 22:27, Maxim Kuvyrkov wrote: Hi, This patch cleans up haifa-sched.c:choose_ready() function while allow more powerful customization by backends at the same time. The primary change is that targetm.sched.first_cycle_multipass_dfa_lookahead_guard hook is converted from returning a boolean OK/Nada value to returning an action to be applied to a given instruction in the ready list. If return value is N == 0, then insn is OK (previous corresponding return value was 1). If return value is N < 0, then insn should not be considered for scheduling for -N cycles (I'm going to use this for ARM scheduling in an upcoming patch). If return value is N > 0, then insn can still be issued on the current cycle, just not immediately now (previous corresponding return value was 0). Changes to ia64 and rs6000 are mostly mechanical to update hook return values, but port maintainers may wish to review those. Tested on x86_64-linux-gnu, arm-linux-gnueabihf, and by building ia64-linux-gnu cc1. Changes to PowerPC were seriously eyeballed. If maintainers so to wish, I will carry out any additional testing. This looks OK to me. I think it's safe to work from the assumption that the hook will never return INT_MIN :-) jeff
Re: [PATCH] Regression fix for PR target/61223
> For now, please revert your original patch Alex asked me to revert patch for him. Done in r210650. -Y
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On 05/20/14 11:14, Wei Mi wrote: On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: On 05/19/14 00:38, Bin.Cheng wrote: On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX to generate either the simple insn on the ARM or the PARALLEL on the thumb? Yes, I think it's more than upsizing the mode. There is another example from one of x86's candidate peephole patch at https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00467.html The patch wants to do below transformation, which I think is very target dependent. Presumably there's no way to go through an expander here? I don't know very much with respect to this case, maybe the patch author can help here. I just checked the expand result for my case. TER could make expand see the two intrinsics at the same time so in theory it is possible to make the merge happen in expand. I havn't looked into detail. I'm not referring to in the gimple->rtl expansion. I'm referring to using a define_expand to generate the load/store multiple instructions. Given define_expand in the backend, the target independent code can use HAVE_XXX and gen_XXX to test for the existence of the expander and to call the expander to generate insns. Jeff
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On Tue, May 20, 2014 at 12:13 AM, Bin.Cheng wrote: > On Tue, May 20, 2014 at 1:30 AM, Jeff Law wrote: >> On 05/19/14 00:38, Bin.Cheng wrote: >>> >>> On Sat, May 17, 2014 at 12:32 AM, Jeff Law wrote: On 05/16/14 04:07, Bin.Cheng wrote: But can't you go through movXX to generate either the simple insn on the ARM or the PARALLEL on the thumb? >>> Yes, I think it's more than upsizing the mode. There is another >>> example from one of x86's candidate peephole patch at >>> https://gcc.gnu.org/ml/gcc-patches/2014-04/msg00467.html >>> >>> The patch wants to do below transformation, which I think is very >>> target dependent. >> >> Presumably there's no way to go through an expander here? > I don't know very much with respect to this case, maybe the patch > author can help here. > I just checked the expand result for my case. TER could make expand see the two intrinsics at the same time so in theory it is possible to make the merge happen in expand. I havn't looked into detail. Thanks, Wei.
Re: [PATCH] Fix ICE in rtl-optimization/PR61220, PR61225
On 05/20/14 01:11, Zhenqiang Chen wrote: Hi, The patch fix ICE issue triggered by shrink-wrapping enhancement. Bootstrap and no make check regression on X86-64. OK for trunk? Thanks! -Zhenqiang 2014-05-20 Zhenqiang Chen PR rtl-optimization/61220 Part of PR rtl-optimization/61225 * shrink-wrap.c (move_insn_for_shrink_wrap): Skip SP and FP adjustment insn; skip split_edge for a block without only on successor. testsuite/ChangeLog: 2014-05-20 Zhenqiang Chen * gcc.dg/pr61220.c: New test. * gcc.dg/shrink-wrap-loop.c: Disable for x86_64 -m32 mode. [ ... ] /* Make sure that the source register isn't defined later in BB. */ @@ -204,6 +209,10 @@ move_insn_for_shrink_wrap (basic_block bb, rtx insn, /* Create a new basic block on the edge. */ if (EDGE_COUNT (next_block->preds) == 2) { + /* split_edge for a block with only one successor is meaningless. */ + if (EDGE_COUNT (bb->succs) == 1) + return false; + next_block = split_edge (live_edge); So don't you just want to use: if (EDGE_CRITICAL_P (live_edge)) to replace the two explicit checks of EDGE_COUNT on the preds & succs? jeff
Commit: MSP430: Fix some gcc testsuite failures
Hi Guys, I am checking in the patch below (to both the mainline and the 4.9 branch) in order to fix some unexpected failures in the gcc testsuite with the msp430-elf toolchain. With this patch in place the following tests now pass: gcc.c-torture/execute/builtins/memcpy-chk.c gcc.c-torture/execute/builtins/memmove.c gcc.c-torture/execute/builtins/mempcpy-chk.c gcc.c-torture/execute/builtins/mempcpy.c gcc.c-torture/execute/builtins/strcpy-chk.c gcc.c-torture/execute/builtins/strcpy.c gcc.c-torture/execute/20020412-1.c gcc.c-torture/execute/20050826-2.c gcc.c-torture/execute/20050826-2.c gcc.c-torture/execute/920908-1.c gcc.c-torture/execute/pr56205.c gcc.c-torture/execute/strct-varg-1.c gcc.c-torture/execute/va-arg-22.c No regressions are introduced by the patch. The patch fixes two bugs - the first was that the zero_extendpsisi2 pattern was confusing gcc about the number of hard registers needed to hold an SImode value. The second fixes a problem with expanding va_arg when pointers are not a multiple of the word size. Cheers Nick gcc/ChangeLog 2014-05-20 Nick Clifton * config/msp430/msp430.c (TARGET_GIMPLIFY_VA_ARG_EXPR): Define. (msp430_gimplify_va_arg_expr): New function. (msp430_print_operand): Handle (CONST (ZERO_EXTRACT)). * config/msp430/msp430.md (zero_extendpsisi2): Use + constraint on operand 0 in order to prevent confusion about the number of registers involved. msp430.patch.xz Description: application/xz
Re: [GCC RFC]A new and simple pass merging paired load store instructions
On 05/20/14 01:13, Bin.Cheng wrote: The idea being that common cases where a pair moves can be turned into a single wider move without having to write target code to make that happen much of the time. ie 2xQI->HI, 2xHI->SI, 2xSI->DI 2xSF->DF. For things outside those simple cases, fall back to a target hook or a target expander. This is practicable, but the question is why we would bother with general cases if the hook interface is needed anyway. Is it because target calls are generally more expensive? No, it's because most (all?) targets would benefit without having to write target dependent code. jeff
[PATCH v2] Replace C/C++ void_zero_node with a VOID_CST tree code
This is an updated version of the patch to replace void_zero_node with void_node. The main controversial point/hack in the original was that it handled void_node in the gimplifier as a zero constant. Jason sent me a patch that traps the dummy object in the front end, meaning that it no longer escapes to the gimplifier. I tested the combination of both patches on x86_64-linux-gnu. I think Richard has already approved the middle-end bits, but OK for C++? Jason, should I commit your patch or will you? Thanks, Richard gcc/ * tree.def (VOID_CST): New. * tree-core.h (TI_VOID): New. * tree.h (void_node): New. * tree.c (tree_node_structure_for_code, tree_code_size) (iterative_hash_expr): Handle VOID_CST. (build_common_tree_nodes): Initialize void_node. gcc/c-family/ * c-common.h (CTI_VOID_ZERO, void_zero_node): Delete. * c-common.c (c_common_nodes_and_builtins): Don't initialize void_zero_node. * c-pretty-print.c (pp_c_void_constant): New function. (c_pretty_printer::constant, c_pretty_printer::primary_expression) (c_pretty_printer::expression): Handle VOID_CST. * cilk.c (extract_free_variables): Likewise. * c-ubsan.c (ubsan_instrument_division, ubsan_instrument_shift) (ubsan_instrument_vla): Use void_node instead of void_zero_node. gcc/c/ * c-array-notation.c (expand_array_notations): Use void_node instead of void_zero_node. gcc/cp/ * cvt.c (convert_to_void): Use void_node instead of void_zero_node. * cp-array-notation.c (replace_invariant_exprs): Likewise. (expand_array_notation): Handle VOID_CST. * error.c (dump_expr): Likewise. * cxx-pretty-print.c (cxx_pretty_printer::primary_expression) (cxx_pretty_printer::expression): Likewise. (pp_cxx_new_expression): Use void_node instead of void_zero_node. * decl.c (register_dtor_fn): Likewise. * init.c (build_raw_new_expr, build_new_1, build_vec_init) (build_delete, push_base_cleanups): Likewise. * mangle.c (write_expression): Likewise. * semantics.c (finish_break_stmt, empty_expr_stmt_p): Likewise. * pt.c (tsubst_decl, tsubst_copy_and_build): Likewise. (tsubst, tsubst_copy, build_non_dependent_expr): Handle VOID_CST. * tree.c (cp_tree_equal): Likewise. (build_dummy_object, is_dummy_object, stabilize_expr): Use void_node instead of void_zero_node. * typeck.c (check_return_expr): Likewise. * typeck2.c (build_functional_cast): Likewise. Index: gcc/tree.def === --- gcc/tree.def2014-05-19 11:15:16.580535297 +0100 +++ gcc/tree.def2014-05-20 13:49:09.137734209 +0100 @@ -257,6 +257,8 @@ DEFTREECODE (LANG_TYPE, "lang_type", tcc /* First, the constants. */ +DEFTREECODE (VOID_CST, "void_cst", tcc_constant, 0) + /* Contents are in an array of HOST_WIDE_INTs. We often access these constants both in their native precision and Index: gcc/tree-core.h === --- gcc/tree-core.h 2014-05-20 13:48:56.299756356 +0100 +++ gcc/tree-core.h 2014-05-20 13:49:09.135734213 +0100 @@ -410,6 +410,8 @@ enum tree_index { TI_UINT32_TYPE, TI_UINT64_TYPE, + TI_VOID, + TI_INTEGER_ZERO, TI_INTEGER_ONE, TI_INTEGER_THREE, Index: gcc/tree.h === --- gcc/tree.h 2014-05-20 13:48:56.299756356 +0100 +++ gcc/tree.h 2014-05-20 13:49:09.138734208 +0100 @@ -3245,6 +3245,8 @@ #define uint16_type_node global_trees[T #define uint32_type_node global_trees[TI_UINT32_TYPE] #define uint64_type_node global_trees[TI_UINT64_TYPE] +#define void_node global_trees[TI_VOID] + #define integer_zero_node global_trees[TI_INTEGER_ZERO] #define integer_one_node global_trees[TI_INTEGER_ONE] #define integer_three_node global_trees[TI_INTEGER_THREE] Index: gcc/tree.c === --- gcc/tree.c 2014-05-20 13:48:56.299756356 +0100 +++ gcc/tree.c 2014-05-20 13:49:09.137734209 +0100 @@ -383,6 +383,7 @@ tree_node_structure_for_code (enum tree_ switch (code) { /* tcc_constant cases. */ +case VOID_CST: return TS_TYPED; case INTEGER_CST: return TS_INT_CST; case REAL_CST: return TS_REAL_CST; case FIXED_CST:return TS_FIXED_CST; @@ -652,6 +653,7 @@ tree_code_size (enum tree_code code) case tcc_constant: /* a constant */ switch (code) { + case VOID_CST: return sizeof (struct tree_typed); case INTEGER_CST: gcc_unreachable (); case REAL_CST: return sizeof (struct tree_real_cst); case FIXED_CST: retu
Re: [PATCH] Fix PR61221
On 05/20/14 02:06, Richard Biener wrote: We're still going to have problems if SSA names are re-used (as even released SSA names have to remain valid lattice entries after this). But currently nothing will create new SSA names (fingers crossing) during eliminate. A followup patch of mine will though, thus I'll do that two-staged SSA name release/reuse list you thought would maybe fix this issue. Sounds good (the two-staged release/reuse). The other thing I had pondered was two modes. One with immediate re-use and one with the two-staged release/reuse. But before doing anything my plan was to revisit the need for the release/reuse code. That code was written when we had to go completely out of SSA, then back into SSA in the first incarnation of the SSA jump threading support. With the incremental updating of the SSA graph, we may find that we really just don't need this code anymore. I wouldn't lose any sleep if after a bit of poking you came to that conclusion and just ripped out the release/reuse code completely. jeff
Re: [PATCH, PR60189, Cilk+] Fix for ICE with incorrect Cilk_sync usage
On 05/20/14 08:10, Zamyatin, Igor wrote: Please look then on the following patch. Regtested successfully on x86_64. Is it ok for trunk and 4.9? gcc/cp/ChangeLog: 2014-05-20 Igor Zamyatin PR c/60189 * parser.c (cp_parser_postfix_expression): Move handling of cilk_sync from here to... (cp_parser_statement): ...here. Make sure only semicolon can go after Cilk_sync. gcc/testsuite/ChangeLog: 2014-05-20 Igor Zamyatin PR c++/60189 * c-c++-common/cilk-plus/CK/invalid_sync.cс: New test. This is fine for both the trunk and the 4.9 branch. Thanks, Jeff
Re: -fuse-caller-save - Collect register usage information
> The test in get_call_reg_set_usage for flag_use_caller_save and the hook is > strictly speaking not necessary. But it's the interface function to retrieve > the collected register usage information, so it seems a good location to do > an early-out. I've left it in for now. But the test for targetm.call_fusage_contains_non_callee_clobbers seems to be useless in practice if -fuse-caller-save isn't the default, so I'd remove it. OK with that change and a head comment for get_call_cgraph_node, unless Jan wants to move get_call_fndecl and get_call_cgraph_node to cgraph.c. -- Eric Botcazou
Re: [PATCH, PR61191, Cilk+] Fix ICE on syntax error
On 05/20/14 09:27, H.J. Lu wrote: On Tue, May 20, 2014 at 7:31 AM, Zamyatin, Igor wrote: Hi all! The following patch fixes the ICE for the cilk code with syntax error. Regtested on x86_64. Ok for trunk and 4.9? Thanks, Igor gcc/c/ChangeLog: 2014-05-20 Igor Zamyatin * c-array-notation.c (fix_builtin_array_notation_fn): Check invalid function parameters. diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c index 0ac6ba8..127f9a1 100644 --- a/gcc/c/c-array-notation.c +++ b/gcc/c/c-array-notation.c @@ -229,6 +229,8 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree *new_var) /* Fully fold any EXCESSIVE_PRECISION EXPR that can occur in the function parameter. */ func_parm = c_fully_fold (func_parm, false, NULL); + if (func_parm == error_mark_node) +return error_mark_node; location = EXPR_LOCATION (an_builtin_fn); You should include a testcase. Agreed. This definitely should include a testcase. jeff
Fix libgo build
Hi, libgo currently does not build because of duplicated symbols. This is because of confusion in the output machinery, where comdat local is output as a global symbol. This is because of confusing use of DECL_ONE_ONLY. This bug seems to be there since introduction of comdat locals in gcc 4.9, but it was not harmful, because all symbols C++ produce have unique name. Instead of one-liner to special case comdat locals in elfos.h, it seems better to make DECL_ONE_ONLY more meaningful and make it return true only on real comdats, not on comdat locals. This patch does that and revisits its use in the back-end. Basically all tests that care about the fact if function should go into special section needs to be DECL_COMDAT_GROUP (decl), because we care if function is in comdat group. All tests that care about special semantic of comdats needs to be DECL_ONE_ONLY. For those who are interested, the oneliner is: Index: config/elfos.h === --- config/elfos.h (revision 210623) +++ config/elfos.h (working copy) @@ -303,6 +303,7 @@ RTLD_LOCAL. Don't use gnu_unique_object for typeinfo, \ vtables and other read-only artificial decls. */ \ if (USE_GNU_UNIQUE_OBJECT && DECL_ONE_ONLY (DECL) \ + && TREE_PUBLIC (DECL) \ && (!DECL_ARTIFICIAL (DECL) || !TREE_READONLY (DECL)))\ ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "gnu_unique_object");\ else \ Bootstrapped/regtested x86_64-linux, I will give it some further testing and commit at afternoon if there are no complains. Honza PR go/61232 * tree.h (DECL_ONE_ONLY): Return true only for externally visible symbols. * except.c (switch_to_exception_section, resolve_unique_section, get_named_text_section, default_function_rodata_section, align_variable, get_block_for_decl, default_section_type_flags): Use DECL_COMDAT_GROUP instead of DECL_ONE_ONLY. * symtab.c (symtab_add_to_same_comdat_group, symtab_make_decl_local, fixup_same_cpp_alias_visibility, symtab_nonoverwritable_alias, symtab_get_symbol_partitioning_class): Likewise. * cgraphclones.c (cgraph_create_virtual_clone): Likewise. * bb-reorder.c (pass_partition_blocks::gate): Likewise. * config/c6x/c6x.c (c6x_elf_unique_section): Likewise. (c6x_function_in_section_p): Likewise. * config/darwin.c (machopic_select_section): Likewise. * config/arm/arm.c (arm_function_in_section_p): Likewise. * config/mips/mips.c (mips_function_rodata_section): Likewise. * config/mep/mep.c (mep_select_section): LIkewise. * config/i386/i386.c (x86_64_elf_unique_section): Likewise. Index: tree.h === --- tree.h (revision 210521) +++ tree.h (working copy) @@ -2327,7 +2327,8 @@ extern void decl_value_expr_insert (tree /* Used in TREE_PUBLIC decls to indicate that copies of this DECL in multiple translation units should be merged. */ -#define DECL_ONE_ONLY(NODE) (DECL_COMDAT_GROUP (NODE) != NULL_TREE) +#define DECL_ONE_ONLY(NODE) (DECL_COMDAT_GROUP (NODE) != NULL_TREE \ +&& (TREE_PUBLIC (NODE) || DECL_EXTERNAL (NODE))) /* The name of the object as the assembler will see it (but before any translations made by ASM_OUTPUT_LABELREF). Often this is the same Index: except.c === --- except.c(revision 210521) +++ except.c(working copy) @@ -2854,12 +2854,12 @@ switch_to_exception_section (const char #ifdef HAVE_LD_EH_GC_SECTIONS if (flag_function_sections - || (DECL_ONE_ONLY (current_function_decl) && HAVE_COMDAT_GROUP)) + || (DECL_COMDAT_GROUP (current_function_decl) && HAVE_COMDAT_GROUP)) { char *section_name = XNEWVEC (char, strlen (fnname) + 32); /* The EH table must match the code section, so only mark it linkonce if we have COMDAT groups to tie them together. */ - if (DECL_ONE_ONLY (current_function_decl) && HAVE_COMDAT_GROUP) + if (DECL_COMDAT_GROUP (current_function_decl) && HAVE_COMDAT_GROUP) flags |= SECTION_LINKONCE; sprintf (section_name, ".gcc_except_table.%s", fnname); s = get_section (section_name, flags, current_function_decl); Index: varasm.c === --- varasm.c(revision 210521) +++ varasm.c(working copy) @@ -428,7 +428,7 @@ resolve_unique_section (tree decl, int r if (DECL_SECTION_NAME (decl) == NULL_TREE && targetm_common.have_named_sections && (flag_function_or_data
Re: [PATCH] Regression fix for PR target/61223
On 05/20/14 17:05, Alexey Merzlyakov wrote: Hi all, This is a fix for thumb1 build fail on trunk appeared since rev.210515(pr60758). On thumb1 targets the LR can not be used as argument of POP instruction. To keep a support of __cxa_end_cleanup backtracing on thumb1 we can make additional register operations before push/pop. But I guess, this will have a negative effect on __cxa_end_cleanup performance. So, I propose to preserve __cxa_end_cleanup backtracing on thumb2 architectures and revert it on thumb1. Regtest was finished with no regressions on C/C++ arm-linux-gnueabi(sf). For now, please revert your original patch and one of Richard or me will try to look at it in the morning. The thumb1 case is slightly more complicated than this. I don't like this approach as you are introducing the problem again in ARM state and leaving the problem as is for Thumb1. regards Ramana Best regards, Merzlyakov Alexey
Re: [C++ Patch/RFC] PR 58753 & 58930
Hi, On 05/20/2014 05:08 PM, Jason Merrill wrote: On 05/20/2014 10:50 AM, Paolo Carlini wrote: + if (TREE_TYPE (init) != type) +{ + if (BRACE_ENCLOSED_INITIALIZER_P (init) + && CP_AGGREGATE_TYPE_P (type)) +init = reshape_init (type, init, tf_warning_or_error); + init = digest_init (type, init, tf_warning_or_error); Instead of this let's factor the relevant code out of cp_parser_late_parse_one_default_arg and use it here as well. Indeed. Thus I'm finishing testing the below. Thanks, Paolo. /// /cp 2014-05-20 Paolo Carlini PR c++/58753 PR c++/58930 PR c++/58704 * typeck2.c (digest_nsdmi_init): New. * parser.c (cp_parser_late_parse_one_default_arg): Use it. * init.c (get_nsdmi): Likewise. * cp-tree.h (digest_nsdmi_init): Declare. /testsuite 2014-05-20 Paolo Carlini PR c++/58753 PR c++/58930 PR c++/58704 * g++.dg/cpp0x/nsdmi-template11.C: New. * g++.dg/cpp0x/nsdmi-template12.C: Likewise. * g++.dg/cpp0x/nsdmi-template13.C: Likewise. Index: cp/cp-tree.h === --- cp/cp-tree.h(revision 210645) +++ cp/cp-tree.h(working copy) @@ -6172,6 +6172,7 @@ extern tree store_init_value (tree, tree, vec +struct BaseHandler +{ + SampleModule module_ { 0 }; +}; + +BaseHandler a; Index: testsuite/g++.dg/cpp0x/nsdmi-template12.C === --- testsuite/g++.dg/cpp0x/nsdmi-template12.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template12.C (working copy) @@ -0,0 +1,17 @@ +// PR c++/58753 +// { dg-do compile { target c++11 } } + +#include + +template +struct X {X(std::initializer_list) {}}; + +template +class T { + X x{1}; +}; + +int main() +{ + T t; +} Index: testsuite/g++.dg/cpp0x/nsdmi-template13.C === --- testsuite/g++.dg/cpp0x/nsdmi-template13.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template13.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58704 +// { dg-do compile { target c++11 } } + +struct A {}; + +template struct B +{ + A a[1] = { }; +}; + +B b;
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 20.05.2014 17:16, Richard Earnshaw wrote: On 20/05/14 14:12, Ramana Radhakrishnan wrote: The following PR is opened for this problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61223 Thumb1 failure was also detected and reported in pr60758. I've proposed a thumb1 bugfix there. Regtest for the fix currently is in progress. Patches must be proposed on gcc-patches and / or in this case Never OR. Patches must always go to gcc-patches. They may *also* go to other lists, if appropriate. R. libstdc++-patches mailing list. Attaching patches to bugzilla does not ensure review. Thanks, Ramana Best regards, Merzlyakov Alexey Thanks, created a thread with proposed fix: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01640.html Best regards, Merzlyakov Alexey
[PATCH] Regression fix for PR target/61223
Hi all, This is a fix for thumb1 build fail on trunk appeared since rev.210515(pr60758). On thumb1 targets the LR can not be used as argument of POP instruction. To keep a support of __cxa_end_cleanup backtracing on thumb1 we can make additional register operations before push/pop. But I guess, this will have a negative effect on __cxa_end_cleanup performance. So, I propose to preserve __cxa_end_cleanup backtracing on thumb2 architectures and revert it on thumb1. Regtest was finished with no regressions on C/C++ arm-linux-gnueabi(sf). Best regards, Merzlyakov Alexey 2014-05-20 Alexey Merzlyakov PR target/61223 * libsupc++/eh_arm.cc (__cxa_end_cleanup): Revert backtracing support on thumb1. diff --git a/libstdc++-v3/libsupc++/eh_arm.cc b/libstdc++-v3/libsupc++/eh_arm.cc index 6a45af5..8ad327d 100644 --- a/libstdc++-v3/libsupc++/eh_arm.cc +++ b/libstdc++-v3/libsupc++/eh_arm.cc @@ -200,7 +200,8 @@ asm (".global __cxa_end_cleanup\n" #else // Assembly wrapper to call __gnu_end_cleanup without clobbering r1-r3. // Also push lr to preserve stack alignment and to allow backtracing. -#ifdef __thumb__ +// On thumb1 architectures push r4 because pop lr is not available. +#ifdef __thumb2__ asm (" .pushsection .text.__cxa_end_cleanup\n" " .global __cxa_end_cleanup\n" " .type __cxa_end_cleanup, \"function\"\n" @@ -214,6 +215,17 @@ asm (" .pushsection .text.__cxa_end_cleanup\n" " bl\t_Unwind_Resume @ Never returns\n" " .fnend\n" " .popsection\n"); +#elif defined(__thumb__) +asm (" .pushsection .text.__cxa_end_cleanup\n" +" .global __cxa_end_cleanup\n" +" .type __cxa_end_cleanup, \"function\"\n" +" .thumb_func\n" +"__cxa_end_cleanup:\n" +" push\t{r1, r2, r3, r4}\n" +" bl\t__gnu_end_cleanup\n" +" pop\t{r1, r2, r3, r4}\n" +" bl\t_Unwind_Resume @ Never returns\n" +" .popsection\n"); #else asm (" .pushsection .text.__cxa_end_cleanup\n" " .global __cxa_end_cleanup\n"
Re: [Google/4_8] LIPO COMDAT profile fixups
On Tue, May 20, 2014 at 6:32 AM, Teresa Johnson wrote: > On Mon, May 19, 2014 at 11:51 PM, Xinliang David Li > wrote: >> Why duplicating the merger functions in dyn-ipa.c? Should those in >> libgcov-merge.c be reused? > > The merger functions in libgcov-merge.c use a macro to either read the > counter to merge from a buffer in memory (when IN_GCOV_TOOL) or from > the gcda file otherwise. In this case I need to merge from another > array, and it needs to do so both IN_GCOV_TOOL and otherwise. So to > reuse the merger functions I would have had to get the counter value > from a new argument, guarded by a conditional. I didn't want to change > the interface of those existing merge functions, or more importantly, > make them less efficient with the added condition since they are > invoked frequently during gcda file merging. > ok -- something to think about in the future. For now, please add a comment in libgcov-utils.c before ctr_merge_functions to point to the dup merge functions in dyn-ipa.c. >> >> The refactoring of gcov_exit_write_gcda should probably be done in a >> separate patch -- preferably submitted to trunk too. > > The refactoring is necessary for this patch since I need to invoke the > outlined functionality in gcov_dump_module_info as well. > > I could submit that to trunk, but it has to go in to the google > branches either with or preceeding this patch. > ok for google branch after fixing the formatting issues 1) calller --> caller 2) many places where there is missing space: e.g, gcov_fixup_counters_checksum( .. David > Thanks, > Teresa > >> >> David >> >> On Mon, May 19, 2014 at 10:08 PM, Teresa Johnson >> wrote: >>> Ping. >>> Teresa >>> >>> On Wed, May 14, 2014 at 4:39 PM, Teresa Johnson >>> wrote: This patch applies profile fixups to COMDATs on the dyn ipa callgraph at the end of LIPO module grouping (either in the profile gen run or in gcov-tool). This is to address issues with missing profiles in the out-of-line COMDAT copies not selected by the linker, and indirect call profiles that target a module not included in the module group. By default, both fixups are enabled, but can be controlled by a profile-gen parameter, an environment variable, and a gcov-tool option. The fixups assume that functions with the same lineno and cfg checksum are copies of the same COMDAT. This is made more likely by ensuring that in LIPO mode we include the full mangled name in the lineno_checksum. For the counter fixup, we merge all non-zero profiles with matching checksums and copy the merged profile into copies with all-zero profiles. For the indirect counter fixup, if an indirect call profile target is not in the module group, we look for a matching checksum copy in the primary module and if exactly one is found we change the target to that. If any fixups are applied, the gcda files are rewritten after module grouping. This also required a couple of other changes to the optimizer. During cgraph node resolution, we were arbitrarily selecting a copy that had non-zero profiles. Now there are many more to choose from, so we will prefer the primary module copy if it is non-zero. Also, during cloning for inlining, we only want to update the profile on the callee node if we are inlining into the resolved node caller node. We were already doing this for AutoFDO, and need to do this here now that many node copies have the same profile. Patch attached. Tested with regression tests and internal benchmarks. Ok for google branches? Thanks, Teresa -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 > > > > -- > Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Patch, ARM] Fix pr 61208
PR 61208 is a wrong code bug in Thumb2 where we can generate out of range branches due to incorrect instruction size calculations. It's mostly gone latent on 4.9 and trunk (though could still happen at -O0 where splitting is done during final instruction generation). Complicating things slightly is the fact that prior to 4.9 we had a single alternative that was emitted statically; we now use an insn_and_split (hence the reduced impact on 4.9). That means different patches are needed: one for 4.9 and trunk; another for 4.8 and earlier. I've tried to minimize the 4.8 version as much as possible to reduce the risk. It seems simplest to just provide a new alternative for use with Thumb2 that has the correct length. Consequently there are two versions of this patch attached. One for 4.7 and 4.8, the other for 4.9 and trunk. PR target/61208 * arm.md (arm_cmpdi_unsigned): Fix length calculation for Thumb2. Testing of the backport version on 4.8 is OK, the trunk version is ongoing, though I've smoke tested it. Richi, is the backport version OK to go into 4.8? R. Index: gcc/config/arm/arm.md === --- gcc/config/arm/arm.md (revision 210617) +++ gcc/config/arm/arm.md (working copy) @@ -8371,8 +8371,8 @@ (define_insn_and_split "*arm_cmpdi_insn" (define_insn_and_split "*arm_cmpdi_unsigned" [(set (reg:CC_CZ CC_REGNUM) -(compare:CC_CZ (match_operand:DI 0 "s_register_operand" "l,r,r") - (match_operand:DI 1 "arm_di_operand" "Py,r,rDi")))] +(compare:CC_CZ (match_operand:DI 0 "s_register_operand" "l,r,r,r") + (match_operand:DI 1 "arm_di_operand" "Py,r,Di,rDi")))] "TARGET_32BIT" "#" ; "cmp\\t%R0, %R1\;it eq\;cmpeq\\t%Q0, %Q1" @@ -8392,9 +8392,9 @@ (define_insn_and_split "*arm_cmpdi_unsig operands[1] = gen_lowpart (SImode, operands[1]); } [(set_attr "conds" "set") - (set_attr "enabled_for_depr_it" "yes,yes,no") - (set_attr "arch" "t2,t2,*") - (set_attr "length" "6,6,8") + (set_attr "enabled_for_depr_it" "yes,yes,no,*") + (set_attr "arch" "t2,t2,t2,a") + (set_attr "length" "6,6,10,8") (set_attr "type" "multiple")] ) --- gcc/config/arm/arm.md (revision 210631) +++ gcc/config/arm/arm.md (local) @@ -7630,12 +7630,13 @@ (define_insn "*arm_cmpdi_insn" (define_insn "*arm_cmpdi_unsigned" [(set (reg:CC_CZ CC_REGNUM) - (compare:CC_CZ (match_operand:DI 0 "s_register_operand" "r") - (match_operand:DI 1 "arm_di_operand" "rDi")))] + (compare:CC_CZ (match_operand:DI 0 "s_register_operand" "r,r") + (match_operand:DI 1 "arm_di_operand" "rDi,rDi")))] "TARGET_32BIT" "cmp\\t%R0, %R1\;it eq\;cmpeq\\t%Q0, %Q1" [(set_attr "conds" "set") - (set_attr "length" "8")] + (set_attr "arch" "a,t2") + (set_attr "length" "8,10")] ) (define_insn "*arm_cmpdi_zero"
Re: [PATCH, PR61191, Cilk+] Fix ICE on syntax error
On Tue, May 20, 2014 at 7:31 AM, Zamyatin, Igor wrote: > Hi all! > > The following patch fixes the ICE for the cilk code with syntax error. > > Regtested on x86_64. > Ok for trunk and 4.9? > > Thanks, > Igor > > > gcc/c/ChangeLog: > > 2014-05-20 Igor Zamyatin > > * c-array-notation.c (fix_builtin_array_notation_fn): Check invalid > function parameters. > > > diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c > index 0ac6ba8..127f9a1 100644 > --- a/gcc/c/c-array-notation.c > +++ b/gcc/c/c-array-notation.c > @@ -229,6 +229,8 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree > *new_var) >/* Fully fold any EXCESSIVE_PRECISION EXPR that can occur in the function > parameter. */ >func_parm = c_fully_fold (func_parm, false, NULL); > + if (func_parm == error_mark_node) > +return error_mark_node; > >location = EXPR_LOCATION (an_builtin_fn); > You should include a testcase. -- H.J.
Re: [build, doc, testsuite] Centralise clearing hardware capabilities with Sun ld
On May 20, 2014, at 6:03 AM, Rainer Orth wrote: > The following patch implements what I've outlined there > Ok for mainline and 4.9 branch if those pass? For the test suite bits, Ok.
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
On Tue, May 20, 2014 at 5:00 AM, Kirill Yukhin wrote: > Hello, > On 19 May 09:58, H.J. Lu wrote: >> On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote: >> > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote: >> > >> Uros, >> I am looking into libreoffice size and the data alignment seems to make >> huge >> difference. Data section has grown from 5.8MB to 6.3MB in between GCC >> 4.8 and 4.9, >> while clang produces 5.2MB. >> >> The two patches I posted to not align vtables and RTTI reduces it to >> 5.7MB, but >> But perhaps we want to revisit the alignment rules. The optimization >> manuals >> usually care only about performance critical loops. Perhaps we can >> make the >> rules to align only bigger datastructures, or so at least for -O2. >> >>> >> >>> Based on the above quote, "Misaligned data access can incur >> >>> significant performance penalties." and the fact that this particular >> >>> alignment rule has some compatibility issues with previous versions of >> >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as >> >>> is. However, if the access is from the cold section, we can perhaps >> >>> avoid extra alignment, while avoiding those compatibility issues. >> >>> >> >> >> >> It is excessive to align >> >> >> >> struct foo >> >> { >> >> int x1; >> >> int x2; >> >> char x3; >> >> int x4; >> >> int x5; >> >> char x6; >> >> int x7; >> >> int x8; >> >> }; >> >> >> >> to 32 bytes and align >> >> >> >> struct foo >> >> { >> >> int x1; >> >> int x2; >> >> char x3; >> >> int x4; >> >> int x5; >> >> char x6; >> >> int x7[9]; >> >> int x8; >> >> }; >> >> >> >> to 64 bytes. What performance gain does it provide? >> > >> > Avoids "significant performance penalties," perhaps? >> > >> >> Kirill, do we have performance data for excessive alignment >> vs ABI alignment? > Nope, we have no actual data showing performance impact on such changes, > sorry. > > We may try such a change on HSW machine (on Spec 2006), will it be useful? > > --- a/gcc/config/i386/i386.c > +++ b/gcc/config/i386/i386.c > @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt) > used to assume. */ > >int max_align_compat > -= optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); > += optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT); > >/* A data structure, equal or greater than the size of a cache line > (64 bytes in the Pentium 4 and other recent Intel processors, including > > ABI alignment should be sufficient for correctness. Bigger alignments are supposed to give better performance. Can you try this patch on HSW and SLM to see if it has any impact on performance? -- H.J. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c0a46ed..4879110 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -26568,39 +26568,6 @@ ix86_constant_alignment (tree exp, int align) int ix86_data_alignment (tree type, int align, bool opt) { - /* GCC 4.8 and earlier used to incorrectly assume this alignment even - for symbols from other compilation units or symbols that don't need - to bind locally. In order to preserve some ABI compatibility with - those compilers, ensure we don't decrease alignment from what we - used to assume. */ - - int max_align_compat -= optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); - - /* A data structure, equal or greater than the size of a cache line - (64 bytes in the Pentium 4 and other recent Intel processors, including - processors based on Intel Core microarchitecture) should be aligned - so that its base address is a multiple of a cache line size. */ - - int max_align -= MIN ((unsigned) ix86_tune_cost->prefetch_block * 8, MAX_OFILE_ALIGNMENT); - - if (max_align < BITS_PER_WORD) -max_align = BITS_PER_WORD; - - if (opt - && AGGREGATE_TYPE_P (type) - && TYPE_SIZE (type) - && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST) -{ - if (wi::geu_p (TYPE_SIZE (type), max_align_compat) - && align < max_align_compat) - align = max_align_compat; - if (wi::geu_p (TYPE_SIZE (type), max_align) - && align < max_align) - align = max_align; -} - /* x86-64 ABI requires arrays greater than 16 bytes to be aligned to 16byte boundary. */ if (TARGET_64BIT) @@ -26616,6 +26583,9 @@ ix86_data_alignment (tree type, int align, bool opt) if (!opt) return align; + if (align < BITS_PER_WORD) +align = BITS_PER_WORD; + if (TREE_CODE (type) == ARRAY_TYPE) { if (TYPE_MODE (TREE_TYPE (type)) == DFmode && align < 64)
Re: [RFC] HOST_WIDE_INT transition steps
On Tue, 20 May 2014, Eric Botcazou wrote: > > Make the code base easier to understand for newcomers. It's also a > > documentation improvement (you see what a HOST_WIDE_INT really is), > > alongside with [u]int64_t being less to type ... > > I personally find the abstraction and the separation with the other, more > mundane types useful, but I guess that it's a matter of habit. Whatever the abstraction is, HOST_WIDE_INT is a pretty poor name for it. I think it's something like "integer type that is big enough to store a target address / offset in bytes, or a target word", together with a sense of "integer type big enough for certain constants and such that twice it is big enough for other constants" (the latter being what wide-int should enable eliminating eventually). But I suspect HOST_WIDE_INT is used for lots of other things as well - and for things for which 64 bits may not be enough (bit offsets, for example), and for that matter int is used for things for which HOST_WIDE_INT would be more appropriate (the length of a STRING_CST, for example). So while there may be a meaningful abstraction somewhere in there, I don't think the particular places that happen to use HOST_WIDE_INT are a very good guide to where such an abstraction should be used - and don't think moving to int64_t / uint64_t would actually lose information of use for cleaning things up in future. (HOST_WIDEST_INT should of course become intmax_t. It should also not be used in any way that can affect code generation.) -- Joseph S. Myers jos...@codesourcery.com
Re: [C++ Patch/RFC] PR 58753 & 58930
On 05/20/2014 10:50 AM, Paolo Carlini wrote: + if (TREE_TYPE (init) != type) + { + if (BRACE_ENCLOSED_INITIALIZER_P (init) + && CP_AGGREGATE_TYPE_P (type)) + init = reshape_init (type, init, tf_warning_or_error); + init = digest_init (type, init, tf_warning_or_error); Instead of this let's factor the relevant code out of cp_parser_late_parse_one_default_arg and use it here as well. That is, int flags = LOOKUP_IMPLICIT; if (DIRECT_LIST_INIT_P (parsed_arg)) flags = LOOKUP_NORMAL; parsed_arg = digest_init_flags (TREE_TYPE (decl), parsed_arg, flags); if (TREE_CODE (parsed_arg) == TARGET_EXPR) /* This represents the whole initialization. */ TARGET_EXPR_DIRECT_INIT_P (parsed_arg) = true; Jason
Re: [patch] libstdc++/61011 fix --disable-libstdcxx in top-level configure
On 20/05/14 15:53 +0100, Jonathan Wakely wrote: The first part of this patch fixes the PR, we should be adding target-libstdc++-v3 to noconfigdirs, not libstdc++-v3. The second part disables the target libs that can't be bootstrapped without libstdc++. libcilkrts and libsanitizer use lots of std:: stuff. libitm only uses std::atomic, but in any case fails to build when --disable-libstdcxx is used, with this error: xg++: error: unrecognized command line option ‘-funconfigured-libstdc++-v3’ Bootstrapped with and without --disable-libstdcxx and tested on x86_64-linux. PR libstdc++/61011 * configure.ac (--disable-libstdcxx): Set noconfigdirs correctly. Disable libcilkrts, libitm, libsanitizer when not building libstdc++. * configure: Regenerate. OK for trunk? With the patch this time ... commit c384fef85bca43046b65eb02267169a5613af2b8 Author: Jonathan Wakely Date: Tue May 20 12:13:43 2014 +0100 PR libstdc++/61011 * configure.ac (--disable-libstdcxx): Set noconfigdirs correctly. Disable libcilkrts, libitm, libsanitizer when not building libstdc++. * configure: Regenerate. diff --git a/configure.ac b/configure.ac index 07c3a66..548525b 100644 --- a/configure.ac +++ b/configure.ac @@ -435,7 +435,7 @@ AS_HELP_STRING([--disable-libstdcxx], ENABLE_LIBSTDCXX=$enableval, ENABLE_LIBSTDCXX=default) [if test "${ENABLE_LIBSTDCXX}" = "no" ; then - noconfigdirs="$noconfigdirs libstdc++-v3" + noconfigdirs="$noconfigdirs target-libstdc++-v3" fi] # Save it here so that, even in case of --enable-libgcj, if the Java @@ -2057,9 +2057,17 @@ case ,${enable_languages},:${enable_objc_gc} in ;; esac -# Disable libitm, libsanitizer, libvtv if we're not building C++ +# Disable libcilkrts, libitm, libsanitizer, libvtv if we're not building C++ case ,${enable_languages}, in - *,c++,*) ;; + *,c++,*) +# Disable libcilkrts, libitm, libsanitizer if we're not building libstdc++ +case "${noconfigdirs}" in + *target-libstdc++-v3*) +noconfigdirs="$noconfigdirs target-libcilkrts target-libitm target-libsanitizer" +;; + *) ;; +esac +;; *) noconfigdirs="$noconfigdirs target-libcilkrts target-libitm target-libsanitizer target-libvtv" ;;
Re: [patch] libstdc++/61011 fix --disable-libstdcxx in top-level configure
On Tue, May 20, 2014 at 03:53:14PM +0100, Jonathan Wakely wrote: > The first part of this patch fixes the PR, we should be adding > target-libstdc++-v3 to noconfigdirs, not libstdc++-v3. > > The second part disables the target libs that can't be bootstrapped > without libstdc++. > > libcilkrts and libsanitizer use lots of std:: stuff. > libitm only uses std::atomic, but in any case fails to build when > --disable-libstdcxx is used, with this error: > > xg++: error: unrecognized command line option ‘-funconfigured-libstdc++-v3’ > > Bootstrapped with and without --disable-libstdcxx and tested on > x86_64-linux. > > PR libstdc++/61011 > * configure.ac (--disable-libstdcxx): Set noconfigdirs correctly. > Disable libcilkrts, libitm, libsanitizer when not building libstdc++. > * configure: Regenerate. > > OK for trunk? ENOPATCH. Jakub
[patch] libstdc++/61011 fix --disable-libstdcxx in top-level configure
The first part of this patch fixes the PR, we should be adding target-libstdc++-v3 to noconfigdirs, not libstdc++-v3. The second part disables the target libs that can't be bootstrapped without libstdc++. libcilkrts and libsanitizer use lots of std:: stuff. libitm only uses std::atomic, but in any case fails to build when --disable-libstdcxx is used, with this error: xg++: error: unrecognized command line option ‘-funconfigured-libstdc++-v3’ Bootstrapped with and without --disable-libstdcxx and tested on x86_64-linux. PR libstdc++/61011 * configure.ac (--disable-libstdcxx): Set noconfigdirs correctly. Disable libcilkrts, libitm, libsanitizer when not building libstdc++. * configure: Regenerate. OK for trunk?
Re: [C++ Patch/RFC] PR 58753 & 58930
Hi, On 05/19/2014 08:28 PM, Jason Merrill wrote: How about doing digest_init in get_nsdmi, so that the conversion is also exposed to walk_field_subobs? Thus, good news: something as simple as the below passes testing, works for the 2 bugs and for c++/58704 too. Thus, what else? Personally, I'm not sure about a few things: whether reshape_init is needed (I tend to think so, because later in perform_member_init certainly we use it); whether digest_init (vs digest_init_flags, cmp the parser) is fine. Thanks again! Paolo. // Index: cp/init.c === --- cp/init.c (revision 210641) +++ cp/init.c (working copy) @@ -529,17 +529,28 @@ tree get_nsdmi (tree member, bool in_ctor) { tree init; + tree type = TREE_TYPE (member); tree save_ccp = current_class_ptr; tree save_ccr = current_class_ref; if (!in_ctor) inject_this_parameter (DECL_CONTEXT (member), TYPE_UNQUALIFIED); if (DECL_LANG_SPECIFIC (member) && DECL_TEMPLATE_INFO (member)) -/* Do deferred instantiation of the NSDMI. */ -init = (tsubst_copy_and_build - (DECL_INITIAL (DECL_TI_TEMPLATE (member)), -DECL_TI_ARGS (member), -tf_warning_or_error, member, /*function_p=*/false, -/*integral_constant_expression_p=*/false)); +{ + /* Do deferred instantiation of the NSDMI. */ + init = (tsubst_copy_and_build + (DECL_INITIAL (DECL_TI_TEMPLATE (member)), + DECL_TI_ARGS (member), + tf_warning_or_error, member, /*function_p=*/false, + /*integral_constant_expression_p=*/false)); + + if (TREE_TYPE (init) != type) + { + if (BRACE_ENCLOSED_INITIALIZER_P (init) + && CP_AGGREGATE_TYPE_P (type)) + init = reshape_init (type, init, tf_warning_or_error); + init = digest_init (type, init, tf_warning_or_error); + } +} else { init = DECL_INITIAL (member); Index: testsuite/g++.dg/cpp0x/nsdmi-template11.C === --- testsuite/g++.dg/cpp0x/nsdmi-template11.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template11.C (working copy) @@ -0,0 +1,15 @@ +// PR c++/58930 +// { dg-do compile { target c++11 } } + +struct SampleModule +{ + explicit SampleModule (int); +}; + +template < typename > +struct BaseHandler +{ + SampleModule module_ { 0 }; +}; + +BaseHandler a; Index: testsuite/g++.dg/cpp0x/nsdmi-template12.C === --- testsuite/g++.dg/cpp0x/nsdmi-template12.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template12.C (working copy) @@ -0,0 +1,17 @@ +// PR c++/58753 +// { dg-do compile { target c++11 } } + +#include + +template +struct X {X(std::initializer_list) {}}; + +template +class T { + X x{1}; +}; + +int main() +{ + T t; +} Index: testsuite/g++.dg/cpp0x/nsdmi-template13.C === --- testsuite/g++.dg/cpp0x/nsdmi-template13.C (revision 0) +++ testsuite/g++.dg/cpp0x/nsdmi-template13.C (working copy) @@ -0,0 +1,11 @@ +// PR c++/58704 +// { dg-do compile { target c++11 } } + +struct A {}; + +template struct B +{ + A a[1] = { }; +}; + +B b;
Re: Ping2: [PATCH] PR debug/16063. Add DW_AT_type to DW_TAG_enumeration.
On 05/20/2014 02:55 AM, Mark Wielaard wrote: On Mon, May 19, 2014 at 04:50:35PM -0400, Jason Merrill wrote: On 05/13/2014 03:21 AM, Mark Wielaard wrote: So the debugger doesn't have to guess the properties of the enum's underlying base type, like size, encoding and signedness. Well, the enum already has DW_AT_byte_size. It seems to me that it should also have DW_AT_encoding to provide the other two properties you mention. Right, that is the idea, since an enum doesn't provide those attributes, it should have a reference to the underlying base type that provides them. Yes, but why is that better than providing them directly? The latter would seem to work better with non-C-family languages like Ada. Jason
Re: RFA: cache enabled attribute by insn code
On May 20, 2014, at 1:17 AM, Richard Sandiford wrote: >> The patch gives a consistent compile-time improvement of about ~3.5% >> on the -O0 fold-const.ii test case. 3.5 alone is really nice and 3.5 on top of 3.5 is amazing.
Re: patch to fix PR60969
On 05/19/2014 05:37 PM, James Greenhalgh wrote: > On Fri, May 16, 2014 at 06:49:45PM +0100, Vladimir Makarov wrote: >>The following patch fixes >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60969 >> >> The patch was bootstrapped and tested on x86/x86-64. >> >> Committed as rev. 210519 to gcc 4.9 branch and as rev. 210520 to trunk. >> >> 2014-05-16 Vladimir Makarov >> >> PR rtl-optimization/60969 >> * ira-costs.c (record_reg_classes): Allow only memory for pseudo. >> Calculate costs for this case. >> >> 2014-05-16 Vladimir Makarov >> >> PR rtl-optimization/60969 >> * g++.dg/pr60969.C: New. > This seems to have cause gcc.target/aarch64/vect-abs-compile.c to begin > failing on aarch64-none-elf: > > FAIL: gcc.target/aarch64/table-intrinsics.c (internal compiler error) > FAIL: gcc.target/aarch64/table-intrinsics.c (test for excess errors) > Excess errors: > /work/gcc-clean/src/gcc/gcc/testsuite/gcc.target/aarch64/table-intrinsics.c:172:1: > internal compiler error: Max. number of generated reload insns per insn is > achieved (90) > 0x8923cd lra_constraints(bool) > /work/gcc-clean/src/gcc/gcc/lra-constraints.c:4140 > 0x882f62 lra(_IO_FILE*) > /work/gcc-clean/src/gcc/gcc/lra.c:2353 > 0x8453f6 do_reload > /work/gcc-clean/src/gcc/gcc/ira.c:5457 > 0x8453f6 execute > /work/gcc-clean/src/gcc/gcc/ira.c:5618 > > Sorry, I have no aarch64 machine. Could you sent me the pre-processed file of the test.
[PATCH, PR61191, Cilk+] Fix ICE on syntax error
Hi all! The following patch fixes the ICE for the cilk code with syntax error. Regtested on x86_64. Ok for trunk and 4.9? Thanks, Igor gcc/c/ChangeLog: 2014-05-20 Igor Zamyatin * c-array-notation.c (fix_builtin_array_notation_fn): Check invalid function parameters. diff --git a/gcc/c/c-array-notation.c b/gcc/c/c-array-notation.c index 0ac6ba8..127f9a1 100644 --- a/gcc/c/c-array-notation.c +++ b/gcc/c/c-array-notation.c @@ -229,6 +229,8 @@ fix_builtin_array_notation_fn (tree an_builtin_fn, tree *new_var) /* Fully fold any EXCESSIVE_PRECISION EXPR that can occur in the function parameter. */ func_parm = c_fully_fold (func_parm, false, NULL); + if (func_parm == error_mark_node) + return error_mark_node; location = EXPR_LOCATION (an_builtin_fn);
[PATCH, ira] Fix for PR61241
Hi,all There are some small problems that block the ira to make the best choice, as PR61241 shows. This patch fix PR61241. I have done "make bootstrap" and "make check" on x86, nothing changed after the patch. Is the patch OK for trunk? Thanks Ma Jiang 2014-05-20 Ma Jiang PR rtl-opti/61221 * ira-lives.c (process_bb_node_lives): Clear unnecessary hard register conflicts. irabug.patch Description: Binary data
RE: [PATCH, PR60189, Cilk+] Fix for ICE with incorrect Cilk_sync usage
Please look then on the following patch. Regtested successfully on x86_64. Is it ok for trunk and 4.9? gcc/cp/ChangeLog: 2014-05-20 Igor Zamyatin PR c/60189 * parser.c (cp_parser_postfix_expression): Move handling of cilk_sync from here to... (cp_parser_statement): ...here. Make sure only semicolon can go after Cilk_sync. gcc/testsuite/ChangeLog: 2014-05-20 Igor Zamyatin PR c++/60189 * c-c++-common/cilk-plus/CK/invalid_sync.cс: New test. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 0c9e113..814f323 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -5845,20 +5845,6 @@ cp_parser_postfix_expression (cp_parser *parser, bool address_p, bool cast_p, } break; } - -case RID_CILK_SYNC: - if (flag_cilkplus) - { - tree sync_expr = build_cilk_sync (); - SET_EXPR_LOCATION (sync_expr, -cp_lexer_peek_token (parser->lexer)->location); - finish_expr_stmt (sync_expr); - } - else - error_at (token->location, "-fcilkplus must be enabled to use" - " %<_Cilk_sync%>"); - cp_lexer_consume_token (parser->lexer); - break; case RID_BUILTIN_SHUFFLE: { @@ -9414,6 +9400,24 @@ cp_parser_statement (cp_parser* parser, tree in_statement_expr, statement = cp_parser_jump_statement (parser); break; + case RID_CILK_SYNC: + cp_lexer_consume_token (parser->lexer); + if (flag_cilkplus) + { + tree sync_expr = build_cilk_sync (); + SET_EXPR_LOCATION (sync_expr, +token->location); + statement = finish_expr_stmt (sync_expr); + } + else + { + error_at (token->location, "-fcilkplus must be enabled to use" + " %<_Cilk_sync%>"); + statement = error_mark_node; + } + cp_parser_require (parser, CPP_SEMICOLON, RT_SEMICOLON); + break; + /* Objective-C++ exception-handling constructs. */ case RID_AT_TRY: case RID_AT_CATCH: diff --git a/gcc/testsuite/c-c++-common/cilk-plus/CK/invalid_sync.cc b/gcc/testsuite/c-c++-common/cilk-plus/CK/invalid_sync.cc new file mode 100644 index 000..cf1caf1 --- /dev/null +++ b/gcc/testsuite/c-c++-common/cilk-plus/CK/invalid_sync.cc @@ -0,0 +1,9 @@ +/* PR c/60189 */ +/* { dg-do compile } */ +/* { dg-options "-fcilkplus" } */ + +int main (void) +{ +_Cilk_sync return; /* { dg-error " expected ';' before 'return'" } */ +return 0; +} Thanks, Igor > -Original Message- > From: Zamyatin, Igor > Sent: Tuesday, May 13, 2014 12:28 AM > To: 'Jason Merrill'; 'Jakub Jelinek' > Cc: 'GCC Patches (gcc-patches@gcc.gnu.org)'; Iyer, Balaji V > Subject: RE: [PATCH, PR60189, Cilk+] Fix for ICE with incorrect Cilk_sync > usage > > Ping. Should I prepare the patch? > > Thanks, > Igor > > > > -Original Message > > > From: Jason Merrill [mailto:ja...@redhat.com] > > > Sent: Monday, April 14, 2014 9:49 PM > > > To: Zamyatin, Igor; Jakub Jelinek > > > Cc: GCC Patches (gcc-patches@gcc.gnu.org); Iyer, Balaji V > > > Subject: Re: [PATCH, PR60189, Cilk+] Fix for ICE with incorrect > > > Cilk_sync usage > > > > > > Oh, I see where the problem is coming from. Cilk_sync is a > > > statement, but it's being parsed as an expression. Let's move it to > > cp_parser_statement. > > > > Something like this (better to put new code in separate routine?)? > > > > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index bb59e3b..3105d6c > > 100644 > > --- a/gcc/cp/parser.c > > +++ b/gcc/cp/parser.c > > @@ -5835,20 +5835,6 @@ cp_parser_postfix_expression (cp_parser > > *parser, bool address_p, bool cast_p, > > } > > break; > >} > > - > > -case RID_CILK_SYNC: > > - if (flag_cilkplus) > > - { > > - tree sync_expr = build_cilk_sync (); > > - SET_EXPR_LOCATION (sync_expr, > > -cp_lexer_peek_token (parser->lexer)->location); > > - finish_expr_stmt (sync_expr); > > - } > > - else > > - error_at (token->location, "-fcilkplus must be enabled to use" > > - " %<_Cilk_sync%>"); > > - cp_lexer_consume_token (parser->lexer); > > - break; > > > > case RID_BUILTIN_SHUFFLE: > >{ > > @@ -9404,6 +9390,24 @@ cp_parser_statement (cp_parser* parser, tree > > in_statement_expr, > > statement = cp_parser_jump_statement (parser); > > break; > > > > + case RID_CILK_SYNC: > > + cp_lexer_consume_token (parser->lexer); > > + if (flag_cilkplus) > > + { > > + tree sync_expr = build_cilk_sync (); > > + SET_EXPR_LOCATION (sync_expr, > > +token->location); > > + statement = finish_expr_stmt (sync_expr); > > + } > > + else > > + { > > + error_at (
Re: [C++ Patch] PR 60373
OK. Jason
Re: PR 61210: recursive template substitution depending on system compiler
OK. Jason
[PATCH] Tidy SCCVN dump
This makes the dumps easier to follow. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2014-05-20 Richard Biener * tree-ssa-sccvn.c (process_scc): Dump SCC here, when iterating, (extract_and_process_scc_for_name): not here. (cond_dom_walker::before_dom_children): Only process stmts that end the BB in interesting ways. (run_scc_vn): Mark param uses as visited. Index: gcc/tree-ssa-sccvn.c === --- gcc/tree-ssa-sccvn.c(revision 210635) +++ gcc/tree-ssa-sccvn.c(working copy) @@ -3833,6 +3833,9 @@ process_scc (vec scc) } } + if (dump_file && (dump_flags & TDF_DETAILS)) +print_scc (dump_file, scc); + /* Iterate over the SCC with the optimistic table until it stops changing. */ current_info = optimistic_info; @@ -3858,6 +3861,8 @@ process_scc (vec scc) changed |= visit_use (var); } + if (dump_file && (dump_flags & TDF_DETAILS)) +fprintf (dump_file, "Processing SCC needed %d iterations\n", iterations); statistics_histogram_event (cfun, "SCC iterations", iterations); /* Finally, copy the contents of the no longer used optimistic @@ -3909,9 +3914,6 @@ extract_and_process_scc_for_name (tree n if (scc.length () > 1) sort_scc (scc); - if (dump_file && (dump_flags & TDF_DETAILS)) -print_scc (dump_file, scc); - process_scc (scc); return true; @@ -4221,6 +4223,19 @@ cond_dom_walker::before_dom_children (ba if (!stmt) return; + enum gimple_code code = gimple_code (stmt); + if (code != GIMPLE_COND + && code != GIMPLE_SWITCH + && code != GIMPLE_GOTO) +return; + + if (dump_file && (dump_flags & TDF_DETAILS)) +{ + fprintf (dump_file, "Value-numbering operands of stmt ending BB %d: ", + bb->index); + print_gimple_stmt (dump_file, stmt, 0, 0); +} + /* Value-number the last stmts SSA uses. */ ssa_op_iter i; tree op; @@ -4236,7 +4251,7 @@ cond_dom_walker::before_dom_children (ba if value-numbering can prove they are not reachable. Handling computed gotos is also possible. */ tree val; - switch (gimple_code (stmt)) + switch (code) { case GIMPLE_COND: { @@ -4259,8 +4274,7 @@ cond_dom_walker::before_dom_children (ba val = gimple_goto_dest (stmt); break; default: - val = NULL_TREE; - break; + gcc_unreachable (); } if (!val) return; @@ -4300,7 +4314,10 @@ run_scc_vn (vn_lookup_kind default_vn_wa { tree def = ssa_default_def (cfun, param); if (def) - VN_INFO (def)->valnum = def; + { + VN_INFO (def)->visited = true; + VN_INFO (def)->valnum = def; + } } /* Mark all edges as possibly executable. */
Re: [Google/4_8] LIPO COMDAT profile fixups
On Mon, May 19, 2014 at 11:51 PM, Xinliang David Li wrote: > Why duplicating the merger functions in dyn-ipa.c? Should those in > libgcov-merge.c be reused? The merger functions in libgcov-merge.c use a macro to either read the counter to merge from a buffer in memory (when IN_GCOV_TOOL) or from the gcda file otherwise. In this case I need to merge from another array, and it needs to do so both IN_GCOV_TOOL and otherwise. So to reuse the merger functions I would have had to get the counter value from a new argument, guarded by a conditional. I didn't want to change the interface of those existing merge functions, or more importantly, make them less efficient with the added condition since they are invoked frequently during gcda file merging. > > The refactoring of gcov_exit_write_gcda should probably be done in a > separate patch -- preferably submitted to trunk too. The refactoring is necessary for this patch since I need to invoke the outlined functionality in gcov_dump_module_info as well. I could submit that to trunk, but it has to go in to the google branches either with or preceeding this patch. Thanks, Teresa > > David > > On Mon, May 19, 2014 at 10:08 PM, Teresa Johnson wrote: >> Ping. >> Teresa >> >> On Wed, May 14, 2014 at 4:39 PM, Teresa Johnson wrote: >>> This patch applies profile fixups to COMDATs on the dyn ipa callgraph >>> at the end of LIPO module grouping (either in the profile gen run or >>> in gcov-tool). This is to address issues with missing profiles in the >>> out-of-line COMDAT copies not selected by the linker, and indirect >>> call profiles that target a module not included in the module group. >>> By default, both fixups are enabled, but can be controlled by a >>> profile-gen parameter, an environment variable, and a gcov-tool >>> option. >>> >>> The fixups assume that functions with the same lineno and cfg checksum >>> are copies of the same COMDAT. This is made more likely by ensuring >>> that in LIPO mode we include the full mangled name in the >>> lineno_checksum. >>> >>> For the counter fixup, we merge all non-zero profiles with matching >>> checksums and copy the merged profile into copies with all-zero >>> profiles. >>> >>> For the indirect counter fixup, if an indirect call profile target is >>> not in the module group, we look for a matching checksum copy in the >>> primary module and if exactly one is found we change the target to >>> that. >>> >>> If any fixups are applied, the gcda files are rewritten after module >>> grouping. >>> >>> This also required a couple of other changes to the optimizer. During >>> cgraph node resolution, we were arbitrarily selecting a copy that had >>> non-zero profiles. Now there are many more to choose from, so we will >>> prefer the primary module copy if it is non-zero. Also, during cloning >>> for inlining, we only want to update the profile on the callee node if >>> we are inlining into the resolved node caller node. We were already >>> doing this for AutoFDO, and need to do this here now that many node >>> copies have the same profile. >>> >>> Patch attached. Tested with regression tests and internal benchmarks. >>> Ok for google branches? >>> >>> Thanks, >>> Teresa >>> >>> >>> >>> -- >>> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 >> >> >> >> -- >> Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413 -- Teresa Johnson | Software Engineer | tejohn...@google.com | 408-460-2413
[Ping] [C++ Patch] PR 60373
Hi, On 05/13/2014 02:19 AM, Paolo Carlini wrote: Hi, in this issue Marc noticed that the warning for ignored attribute visibility is truncated when the previous declaration belongs to a system header. Luckily the problem can be neatly fixed because it's just a consequence of not using warning_at + inform, as we should anyway. The other hunk in the patch tweaks the only other remaining warning_at pair I found in the file, should be a rather benign one, but better adjusting it too. Tested x86_64-linux. Ping rather early because this one should be rather straightforward... Thanks! Paolo.
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 20/05/14 14:12, Ramana Radhakrishnan wrote: > >> >> The following PR is opened for this problem: >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61223 >> >> Thumb1 failure was also detected and reported in pr60758. >> I've proposed a thumb1 bugfix there. Regtest for the fix currently is in >> progress. > > Patches must be proposed on gcc-patches and / or in this case Never OR. Patches must always go to gcc-patches. They may *also* go to other lists, if appropriate. R. > libstdc++-patches mailing list. Attaching patches to bugzilla does not > ensure review. > > Thanks, > Ramana > >> >> Best regards, >> Merzlyakov Alexey >> >
PR 61210: recursive template substitution depending on system compiler
PR 61210 is about a case where a templated wi:: function had "x | y" and where both "x" and "y" needed temporary variables. The recursive instantation of x and y were done as two arguments to the same function call, which meant that the order of instantiation depended on the host compiler. The uids of the temporary variables could therefore be in a different (but still valid) order than when compiled with the stage1 compiler, which caused code differences later when the uids were used as tie-breakers to get a stable sort. I've gone through pt.c a couple of times trying to find all cases of this, but I might have missed some. Tested on x86_64-linux-gnu. Also tested by using clang as the stage1 compiler in an i486-pc-linux-gnu bootstrap. OK to install? Thanks, Richard gcc/cp/ PR bootstrap/61210 * pt.c (tsubst_copy, tsubst_omp_for_iterator, tsubst_expr) (tsubst_copy_and_build): Perform recursive substitutions in a deterministic order. Index: gcc/cp/pt.c === --- gcc/cp/pt.c 2014-05-19 18:27:03.510024737 +0100 +++ gcc/cp/pt.c 2014-05-19 22:17:17.854250621 +0100 @@ -12731,9 +12731,11 @@ tsubst_copy (tree t, tree args, tsubst_f case IMPLICIT_CONV_EXPR: case CONVERT_EXPR: case NOP_EXPR: - return build1 - (code, tsubst (TREE_TYPE (t), args, complain, in_decl), -tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl)); + { + tree type = tsubst (TREE_TYPE (t), args, complain, in_decl); + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + return build1 (code, type, op0); + } case SIZEOF_EXPR: if (PACK_EXPANSION_P (TREE_OPERAND (t, 0))) @@ -12801,9 +12803,11 @@ tsubst_copy (tree t, tree args, tsubst_f case REALPART_EXPR: case IMAGPART_EXPR: case PAREN_EXPR: - return build1 - (code, tsubst (TREE_TYPE (t), args, complain, in_decl), -tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl)); + { + tree type = tsubst (TREE_TYPE (t), args, complain, in_decl); + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + return build1 (code, type, op0); + } case COMPONENT_REF: { @@ -12877,24 +12881,26 @@ tsubst_copy (tree t, tree args, tsubst_f case PREINCREMENT_EXPR: case POSTDECREMENT_EXPR: case POSTINCREMENT_EXPR: - return build_nt - (code, tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl), -tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl)); + { + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + tree op1 = tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl); + return build_nt (code, op0, op1); + } case SCOPE_REF: - return build_qualified_name (/*type=*/NULL_TREE, - tsubst_copy (TREE_OPERAND (t, 0), - args, complain, in_decl), - tsubst_copy (TREE_OPERAND (t, 1), - args, complain, in_decl), - QUALIFIED_NAME_IS_TEMPLATE (t)); + { + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + tree op1 = tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl); + return build_qualified_name (/*type=*/NULL_TREE, op0, op1, +QUALIFIED_NAME_IS_TEMPLATE (t)); + } case ARRAY_REF: - return build_nt - (ARRAY_REF, -tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl), -tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl), -NULL_TREE, NULL_TREE); + { + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + tree op1 = tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl); + return build_nt (ARRAY_REF, op0, op1, NULL_TREE, NULL_TREE); + } case CALL_EXPR: { @@ -12912,29 +12918,29 @@ tsubst_copy (tree t, tree args, tsubst_f case PSEUDO_DTOR_EXPR: case VEC_PERM_EXPR: { - r = build_nt - (code, tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl), - tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl), - tsubst_copy (TREE_OPERAND (t, 2), args, complain, in_decl)); + tree op0 = tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl); + tree op1 = tsubst_copy (TREE_OPERAND (t, 1), args, complain, in_decl); + tree op2 = tsubst_copy (TREE_OPERAND (t, 2), args, complain, in_decl); + r = build_nt (code, op0, op1, op2); TREE_NO_WARNING (r) = TREE_NO_WARNING (t); return r; } case NEW_EXPR: { - r = build_nt - (code, tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl), -tsubst_copy (TREE_OPERAND (t, 1), args,
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
The following PR is opened for this problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61223 Thumb1 failure was also detected and reported in pr60758. I've proposed a thumb1 bugfix there. Regtest for the fix currently is in progress. Patches must be proposed on gcc-patches and / or in this case libstdc++-patches mailing list. Attaching patches to bugzilla does not ensure review. Thanks, Ramana Best regards, Merzlyakov Alexey
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 20.05.2014 16:25, Richard Earnshaw wrote: On 16/05/14 14:56, Alexey Merzlyakov wrote: On 07.05.2014 13:28, Ramana Radhakrishnan wrote: On 05/07/14 09:19, Yury Gribov wrote: Original Message Subject: [PING] [PATCH] Fix for PR libstdc++/60758 Date: Thu, 17 Apr 2014 17:48:12 +0400 From: Alexey Merzlyakov To: Ramana Radhakrishnan CC: gcc-patches@gcc.gnu.org , Viacheslav Garbuzov , Yury Gribov Hi, This fixes infinite backtrace in __cxa_end_cleanup(). Regtest was finished with no regressions on arm-linux-gnueabi(sf). The patch posted at: http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html This is OK to apply if no regressions. Thanks, Ramana Thanks in advance. Best regards, Merzlyakov Alexey I have re-tested it again on arm-linux-gnueabi(sf) - no regressions. The change was committed to mainline as revision 210515. Best regards, Merzlyakov Alexey Unfortunately, this breaks thumb1. Pop instructions there can only contain low registers and the PC. R. The following PR is opened for this problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61223 Thumb1 failure was also detected and reported in pr60758. I've proposed a thumb1 bugfix there. Regtest for the fix currently is in progress. Best regards, Merzlyakov Alexey
Re: [PATCH][ARM] Adjust arith_shiftsi for ARMv8-style
On 15/05/14 09:47, Kyrill Tkachov wrote: > Hi all, > > Shifted arithmetic operations can never be encoded in 16-bits in and > therefore > can not appear in Thumb2 IT blocks under ARMv8-A rules (and the -mrestrict-it > rules). This patch adjusts the relevant pattern for that purpose. > > Tested and bootstrapped on arm-none-linux-gnueabihf and made sure no > performance > regressions on a number of benchmarks. > > This is a bug (not wrong-code though) in -mrestrict-it that affects 4.9 as > well > as trunk, so is it ok to backport it there? > > > Thanks, > Kyrill > > 2014-05-15 Kyrylo Tkachov > > * config/arm/arm.md (arith_shiftsi): Do not predicate for > arm_restrict_it. > OK both. R.
[PATCH][match-and-simplify] Provide parsing error locations
This adds $subject. Committed to branch. Richard. 2014-05-20 Richard Biener * genmatch.c (error_cb, fatal_at): New functions. (expect, parse_expr, parse_op, parse_match_and_simplify, main): Provide error locations. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 210638) +++ gcc/genmatch.c (working copy) @@ -546,6 +546,30 @@ write_gimple (FILE *f, vec& /* libccp helpers. */ +static struct line_maps *line_table; + +static bool +error_cb (cpp_reader *, int, int, source_location location, + unsigned int, const char *msg, va_list *ap) +{ + const line_map *map; + linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, &map); + expanded_location loc = linemap_expand_location (line_table, map, location); + fprintf (stderr, "%s:%d:%d error: ", loc.file, loc.line, loc.column); + vfprintf (stderr, msg, *ap); + fprintf (stderr, "\n"); + exit (1); +} + +static void +fatal_at (const cpp_token *tk, const char *msg, ...) +{ + va_list ap; + va_start (ap, msg); + error_cb (NULL, CPP_DL_FATAL, 0, tk->src_loc, 0, msg, &ap); + va_end (ap); +} + /* Read the next non-whitespace token from R. */ @@ -585,8 +609,8 @@ expect (cpp_reader *r, enum cpp_ttype tk { const cpp_token *token = next (r); if (token->type != tk) -fatal ("error: expected %s, got %s", - cpp_type2name (tk, 0), cpp_type2name (token->type, 0)); +fatal_at (token, "expected %s, got %s", + cpp_type2name (tk, 0), cpp_type2name (token->type, 0)); return token; } @@ -660,7 +684,7 @@ parse_expr (cpp_reader *r) op = parse_capture (r, e); else if (token->type == CPP_COLON && !(token->flags & PREV_WHITE)) -fatal ("not implemented: predicates on expressions"); +fatal_at (token, "not implemented: predicates on expressions"); else op = e; do @@ -672,8 +696,8 @@ parse_expr (cpp_reader *r) { operator_id *opr = static_cast (e->operation->op); if (e->ops.length () != opr->get_required_nargs ()) - fatal ("got %d operands instead of the required %d", - e->ops.length (), opr->get_required_nargs ()); + fatal_at (token, "got %d operands instead of the required %d", + e->ops.length (), opr->get_required_nargs ()); } return op; } @@ -789,10 +813,10 @@ parse_op (cpp_reader *r) } else if (token->type != CPP_COLON && token->type != CPP_ATSIGN) - fatal ("expected expression or predicate"); + fatal_at (token, "expected expression or predicate"); /* optionally followed by a capture and a predicate. */ if (token->type == CPP_COLON) - fatal ("not implemented: predicate on leaf operand"); + fatal_at (token, "not implemented: predicate on leaf operand"); if (token->type == CPP_ATSIGN) op = parse_capture (r, op); } @@ -818,7 +842,10 @@ parse_match_and_simplify (cpp_reader *r) asprintf (&tem, "anon_%d", ++cnt); id = tem; } + const cpp_token *loc = peek (r); struct operand *match = parse_op (r); + if (match->type != operand::OP_EXPR) +fatal_at (loc, "expected uncaptured expression"); token = peek (r); /* Conditional if () */ struct operand *ifexpr = NULL; @@ -826,11 +853,9 @@ parse_match_and_simplify (cpp_reader *r) { const char *tem = get_ident (r); if (strcmp (tem, "if") != 0) - fatal ("expected 'if' or expression"); + fatal_at (token, "expected 'if' or expression"); ifexpr = parse_c_expr (r, CPP_OPEN_PAREN); } - if (match->type != operand::OP_EXPR) -fatal ("expected uncaptured expression"); return new simplify (id, match, ifexpr, parse_op (r)); } @@ -846,7 +871,6 @@ main(int argc, char **argv) { cpp_reader *r; const cpp_token *token; - struct line_maps *line_table; progname = "genmatch"; @@ -859,6 +883,8 @@ main(int argc, char **argv) line_table->round_alloc_size = round_alloc_size; r = cpp_create_reader (CLK_GNUC99, NULL, line_table); + cpp_callbacks *cb = cpp_get_callbacks (r); + cb->error = error_cb; if (!cpp_read_main_file (r, argv[1])) return 1; @@ -895,7 +921,7 @@ main(int argc, char **argv) if (strcmp (id, "match_and_simplify") == 0) simplifiers.safe_push (parse_match_and_simplify (r)); else - fatal ("expected 'match_and_simplify'"); + fatal_at (token, "expected 'match_and_simplify'"); eat_token (r, CPP_CLOSE_PAREN); }
[build, doc, testsuite] Centralise clearing hardware capabilities with Sun ld
Rainer Orth writes: > Prompted by the recent failures of c-c++-common/gomp/pr60823-2.c on > Solaris/x86 with Sun as > > http://gcc.gnu.org/ml/gcc-patches/2014-05/msg00943.html > > I've reworked the clearing of hardware capabilities via linker maps for > Sun ld, which is currently replicated in several places all over the > testsuite. I've chosen to use TEST_ALWAYS_FLAGS or ALWAYS_CXXFLAGS to > pass on the necessary linker flags. It turned out that more is needed to deal with the new gcc.dg/gomp and g++.dg/gomp failures: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01185.html The following patch implements what I've outlined there: it introduces a Solaris-specific new -mclear-hwcap option, checking which version of the mapfile syntax works, if any. This way, all the duplication in the testsuite and libitm can go. Do deal with the OpenMP declare simd failures, this option/mapfile is automatically activated with -fopenmp*. Initial testing has concluded successfully, also on an Solaris 10/x86 system where ld only understands the old mapfile syntax. Mainline bootstraps on i386-pc-solaris2.11 (as/ld, gas/ld, gas/gld), i386-pc-solaris2.10 (as/ld, gas/ld), and sparc-sun-solaris2.11 (as/ld) are still running. x86_64-unknown-linux-gnu didn't finish due to unrelated libgo breakage. Also, i386-pc-solaris2.9 bootstrap running on the 4.9 branch which is equally affected by the testsuite failures. Ok for mainline and 4.9 branch if those pass? Thanks. Rainer 2014-05-13 Rainer Orth gcc: * configure.ac ($gcc_cv_ld_clearcap): New test. * configure: Regenerate. * config.in: Regenerate. * config/sol2.opt (mclear-hwcap): New option. * config/sol2.h (LINK_CLEARCAP_SPEC): Define. * config/sol2-clearcap.map: Moved here from testsuite/gcc.target/i386/clearcap.map. * config/sol2-clearcapv2.map: Move here from gcc.target/i386/clearcapv2.map. * config/t-sol2 (install): Depend on install-clearcap-map. (install-clearcap-map): New target. * doc/invoke.texi (Option Summary, Solaris 2 Options): Document -mclear-hwcap. gcc/testsuite: * lib/clearcap.exp: New file. * gcc.dg/vect/vect.exp: Load clearcap.exp. Remove clearcap_ldflags handling. Call clearcap-init, clearcap-finish. * gcc.target/i386/i386.exp: Likewise. * gcc.target/i386/clearcap.map: Move to ../config/sol2-clearcap.map. * gcc.target/i386/clearcapv2.map: Move to ../config/sol2-clearcapv2.map. * gcc.target/x86_64/abi/avx/abi-avx.exp: Likewise. * gcc.target/x86_64/abi/avx512f/abi-avx512f.exp: Likewise. libitm: * acinclude.m4 (LIBITM_CHECK_LINKER_HWCAP): Check for -mclear-hwcap instead. * configure: Regenerate. * clearcap.map: Remove. # HG changeset patch # Parent 8bc292d22fc4feb31bf4225a4f6d3ee7f39c9a68 Centralise clearing hardware capabilities with Sun ld diff --git a/gcc/testsuite/gcc.target/i386/clearcap.map b/gcc/config/sol2-clearcap.map rename from gcc/testsuite/gcc.target/i386/clearcap.map rename to gcc/config/sol2-clearcap.map --- a/gcc/testsuite/gcc.target/i386/clearcap.map +++ b/gcc/config/sol2-clearcap.map @@ -1,3 +1,2 @@ -# clear all hardware capabilities emitted by Sun as: the tests here -# guard against execution at runtime +# Clear all hardware capabilities emitted by Sun as. hwcap_1 = V0x0 OVERRIDE; diff --git a/gcc/testsuite/gcc.target/i386/clearcapv2.map b/gcc/config/sol2-clearcapv2.map rename from gcc/testsuite/gcc.target/i386/clearcapv2.map rename to gcc/config/sol2-clearcapv2.map --- a/gcc/testsuite/gcc.target/i386/clearcapv2.map +++ b/gcc/config/sol2-clearcapv2.map @@ -1,6 +1,6 @@ -# clear all hardware capabilities emitted by Sun as: the tests here -# guard against execution at runtime -# uses mapfile v2 syntax which is the only way to clear AT_SUN_CAP_HW2 flags +# Clear all hardware capabilities emitted by Sun as. +# +# Uses mapfile v2 syntax which is the only way to clear AT_SUN_CAP_HW2 flags. $mapfile_version 2 CAPABILITY { HW = ; diff --git a/gcc/config/sol2.h b/gcc/config/sol2.h --- a/gcc/config/sol2.h +++ b/gcc/config/sol2.h @@ -268,12 +268,21 @@ along with GCC; see the file COPYING3. #define LINK_LIBGCC_MAPFILE_SPEC "" #endif +/* Clear hardware capabilities, either explicitly or with OpenMP: + #pragma openmp declare simd creates clones for SSE2, AVX, and AVX2. */ +#ifdef HAVE_LD_CLEARCAP +#define LINK_CLEARCAP_SPEC " %{mclear-hwcap|fopenmp*:-M %sclearcap.map}" +#else +#define LINK_CLEARCAP_SPEC "" +#endif + #undef LINK_SPEC #define LINK_SPEC \ "%{h*} %{v:-V} \ %{!shared:%{!static:%{rdynamic: " RDYNAMIC_SPEC "}}} \ %{static:-dn -Bstatic} \ - %{shared:-G -dy %{!mimpure-text:-z text}} " LINK_LIBGCC_MAPFILE_SPEC " \ + %{shared:-G -dy %{!mimpure-text:-z text}} " \ + LINK_LIBGCC_MAPFILE_SPEC LINK_CLEARCAP_SPEC " \ %{symbolic:-Bsymbolic -G -dy
Re: [C++ Patch] PR 58664
OK. Jason
Re: [PATCH] Fix ARM NAN fraction bits
On 20/05/14 08:57, Richard Biener wrote: > On Tue, 20 May 2014, Maciej W. Rozycki wrote: > >> Ian, >> >> On Sat, 17 May 2014, Richard Biener wrote: >> >>> On May 17, 2014 12:22:23 AM CEST, "Maciej W. Rozycki" >>> wrote: On Fri, 16 May 2014, Joseph S. Myers wrote: >> 2014-05-16 Maciej W. Rozycki >> >> PR libgcc/60166 >> * sysdeps/arm/soft-fp/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D) >> (_FP_NANSIGN_Q): Set the quiet bit. > > OK for glibc. Joseph, thanks for your review, this is now in. Richard, you wrote yesterday that pushing changes to 4.8 would require explicit approval from release managers, however it is not clear to me who they are for that branch. This fix corrects a regression introduced after 4.8.2. Can you approve it? If not, then who can? >>> >>> If it's not broken in 4.8.2 but broken on the branch head then it's OK >>> for the branch. >> >> I thought I'd double-check with you that it is fine to push this change >> to trunk first. OK to apply? > > Of course it should go to trunk (and 4.9 branch) first, but I'm not > the one to approve that. > I can... Based on Joseph's review, OK. R. > Richard. > >> 2014-05-20 Maciej W. Rozycki >> >> PR libgcc/60166 >> libgcc/ >> * config/arm/sfp-machine.h (_FP_NANFRAC_S, _FP_NANFRAC_D) >> (_FP_NANSIGN_Q): Set the quiet bit. >> >> Maciej >> >> gcc-soft-fp-arm-nanfrac.diff >> Index: gcc-fsf-trunk-quilt/libgcc/config/arm/sfp-machine.h >> === >> --- gcc-fsf-trunk-quilt.orig/libgcc/config/arm/sfp-machine.h 2014-05-16 >> 15:59:06.0 +0100 >> +++ gcc-fsf-trunk-quilt/libgcc/config/arm/sfp-machine.h 2014-05-20 >> 01:23:36.618434199 +0100 >> @@ -21,10 +21,10 @@ typedef int __gcc_CMPtype __attribute__ >> >> /* According to RTABI, QNAN is only with the most significant bit of the >> significand set, and all other significand bits zero. */ >> -#define _FP_NANFRAC_H 0 >> -#define _FP_NANFRAC_S 0 >> -#define _FP_NANFRAC_D 0, 0 >> -#define _FP_NANFRAC_Q 0, 0, 0, 0 >> +#define _FP_NANFRAC_H _FP_QNANBIT_H >> +#define _FP_NANFRAC_S _FP_QNANBIT_S >> +#define _FP_NANFRAC_D _FP_QNANBIT_D, 0 >> +#define _FP_NANFRAC_Q _FP_QNANBIT_Q, 0, 0, 0 >> #define _FP_NANSIGN_H 0 >> #define _FP_NANSIGN_S 0 >> #define _FP_NANSIGN_D 0 >> >> >
Re: [PATCH] [PING^2] Fix for PR libstdc++/60758
On 16/05/14 14:56, Alexey Merzlyakov wrote: > On 07.05.2014 13:28, Ramana Radhakrishnan wrote: >> On 05/07/14 09:19, Yury Gribov wrote: >>> Original Message >>> Subject: [PING] [PATCH] Fix for PR libstdc++/60758 >>> Date: Thu, 17 Apr 2014 17:48:12 +0400 >>> From: Alexey Merzlyakov >>> To: Ramana Radhakrishnan >>> CC: gcc-patches@gcc.gnu.org , Viacheslav >>> Garbuzov , Yury Gribov >>> >>> Hi, >>> >>> This fixes infinite backtrace in __cxa_end_cleanup(). >>> Regtest was finished with no regressions on arm-linux-gnueabi(sf). >>> >>> The patch posted at: >>> http://gcc.gnu.org/ml/gcc-patches/2014-04/msg00496.html >> >> This is OK to apply if no regressions. >> >> Thanks, >> Ramana >> >>> >>> Thanks in advance. >>> >>> Best regards, >>> Merzlyakov Alexey >>> >>> >> > I have re-tested it again on arm-linux-gnueabi(sf) - no regressions. > The change was committed to mainline as revision 210515. > > Best regards, > Merzlyakov Alexey > > Unfortunately, this breaks thumb1. Pop instructions there can only contain low registers and the PC. R.
[PATCH][match-and-simplify] Reject outermost captures
We can't really code-generate those (and thus such patterns are simply skipped during code-gen ...). Reject them early. Committed to the branch. Richard. 2014-05-20 Richard Biener * genmatch.c (parse_match_and_simplify): Reject outermost expressions that are captured or not expressions. * match.pd: Rewrite negate contracting patterns to avoid capturing the outermost expression. Index: gcc/match.pd === --- gcc/match.pd(revision 210637) +++ gcc/match.pd(working copy) @@ -26,22 +26,18 @@ along with GCC; see the file COPYING3. /* ??? Have match_and_simplify groups guarded with common predicates on the outermost type? */ -/* Contract negates. - ??? !TYPE_SATURATING condition missing. */ +/* Contract negates. */ (match_and_simplify - (PLUS_EXPR @0 (NEGATE_EXPR @1)) - /* Ugh, free form mixed in ... ;) Want (if ...) instead? */ - if (!TYPE_SATURATING (TREE_TYPE (@0))) - (MINUS_EXPR @0 @1)) + (plus @0 (negate @1)) + if (!TYPE_SATURATING (type)) + (minus @0 @1)) (match_and_simplify - /* ??? Allow (MINUS:!TYPE_SATURATING @0 (NEGATE @1)), thus - type predicates on operands? Thus also :INTEGRAL_TYPE_P - or @0:INTEGRAL_TYPE_P? */ - (MINUS @0 (NEGATE @1)) - (PLUS @0 @1)) + (minus @0 (negate @1)) + if (!TYPE_SATURATING (type)) + (plus @0 @1)) (match_and_simplify - (plus@2 (negate @0) @1) - if (!TYPE_SATURATING (TREE_TYPE (@2))) + (plus (negate @0) @1) + if (!TYPE_SATURATING (type)) (minus @1 @0)) /* Change to even more free-form like Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 210637) +++ gcc/genmatch.c (working copy) @@ -829,6 +829,8 @@ parse_match_and_simplify (cpp_reader *r) fatal ("expected 'if' or expression"); ifexpr = parse_c_expr (r, CPP_OPEN_PAREN); } + if (match->type != operand::OP_EXPR) +fatal ("expected uncaptured expression"); return new simplify (id, match, ifexpr, parse_op (r)); }
Re: [PATCH i386 5/8] [AVX-512] Extend vectorizer hooks.
Hello, On 19 May 09:58, H.J. Lu wrote: > On Mon, May 19, 2014 at 9:45 AM, Uros Bizjak wrote: > > On Mon, May 19, 2014 at 6:42 PM, H.J. Lu wrote: > > > Uros, > I am looking into libreoffice size and the data alignment seems to make > huge > difference. Data section has grown from 5.8MB to 6.3MB in between GCC > 4.8 and 4.9, > while clang produces 5.2MB. > > The two patches I posted to not align vtables and RTTI reduces it to > 5.7MB, but > But perhaps we want to revisit the alignment rules. The optimization > manuals > usually care only about performance critical loops. Perhaps we can make > the > rules to align only bigger datastructures, or so at least for -O2. > >>> > >>> Based on the above quote, "Misaligned data access can incur > >>> significant performance penalties." and the fact that this particular > >>> alignment rule has some compatibility issues with previous versions of > >>> gcc (these were later fixed by Jakub), I'd rather leave this rule as > >>> is. However, if the access is from the cold section, we can perhaps > >>> avoid extra alignment, while avoiding those compatibility issues. > >>> > >> > >> It is excessive to align > >> > >> struct foo > >> { > >> int x1; > >> int x2; > >> char x3; > >> int x4; > >> int x5; > >> char x6; > >> int x7; > >> int x8; > >> }; > >> > >> to 32 bytes and align > >> > >> struct foo > >> { > >> int x1; > >> int x2; > >> char x3; > >> int x4; > >> int x5; > >> char x6; > >> int x7[9]; > >> int x8; > >> }; > >> > >> to 64 bytes. What performance gain does it provide? > > > > Avoids "significant performance penalties," perhaps? > > > > Kirill, do we have performance data for excessive alignment > vs ABI alignment? Nope, we have no actual data showing performance impact on such changes, sorry. We may try such a change on HSW machine (on Spec 2006), will it be useful? --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -26576,7 +26576,7 @@ ix86_data_alignment (tree type, int align, bool opt) used to assume. */ int max_align_compat -= optimize_size ? BITS_PER_WORD : MIN (256, MAX_OFILE_ALIGNMENT); += optimize_size ? BITS_PER_WORD : MIN (128, MAX_OFILE_ALIGNMENT); /* A data structure, equal or greater than the size of a cache line (64 bytes in the Pentium 4 and other recent Intel processors, including > -- > H.J. -- Thanks, K
Re: add dbgcnt and opt-info support for devirtualization
On Mon, May 19, 2014 at 5:24 PM, Xinliang David Li wrote: > Sorry about it. Here is the patch. There is one remaining case where > cgraph_dump_file and dump_enable_p are checked separately -- > cgraph_dump_file is set up differently from 'dump_file'. But there you check with an else if, so if you do -fdump-ipa-cgraph then suddenly -fopt-info will stop reporting? At least in the cgraphunit.c part of the patch. I'm ok with the rest of the patch. Thanks, Richard. > David > > > > On Mon, May 19, 2014 at 2:21 AM, Richard Biener > wrote: >> On Fri, May 16, 2014 at 11:19 PM, Xinliang David Li >> wrote: >>> Modified the patch according to yours and Richard's feedback. PTAL. >> >> ENOPATCH. >> >> Btw, I don't see any issue with leaking node order to opt-report. >> >> Richard. >> >>> thanks, >>> >>> David >>> >>> On Fri, May 16, 2014 at 9:03 AM, Jan Hubicka wrote: > Hi, debugging runtime bugs due to devirtualization can be hard for > very large C++ programs with complicated class hierarchy. This patch > adds the support to report this high level transformation via > -fopt-info (not hidden inside dump file) and the ability the do binary > search with cutoff. > > Ok for trunk after build and test? Seems resonable to me. > > thanks, > > David > Index: ChangeLog > === > --- ChangeLog (revision 210479) > +++ ChangeLog (working copy) > @@ -1,3 +1,18 @@ > +2014-05-15 Xinliang David Li > + > + * cgraphunit.c (walk_polymorphic_call_targets): Add > + dbgcnt and fopt-info support. > + 2014-05-15 Xinliang David Li > + > + * cgraphunit.c (walk_polymorphic_call_targets): Add > + dbgcnt and fopt-info support. > + * ipa-prop.c (ipa_make_edge_direct_to_target): Ditto. > + * ipa-devirt.c (ipa_devirt): Ditto. > + * ipa.c (walk_polymorphic_call_targets): Ditto. > + * gimple-fold.c (fold_gimple_assign): Ditto. > + (gimple_fold_call): Ditto. > + * dbgcnt.def: New counter. > + > 2014-05-15 Martin Jambor > > PR ipa/61085 > Index: ipa-prop.c > === > --- ipa-prop.c(revision 210479) > +++ ipa-prop.c(working copy) > @@ -59,6 +59,7 @@ along with GCC; see the file COPYING3. > #include "ipa-utils.h" > #include "stringpool.h" > #include "tree-ssanames.h" > +#include "dbgcnt.h" > > /* Intermediate information about a parameter that is only useful during > the > run of ipa_analyze_node and is not kept afterwards. */ > @@ -2494,6 +2495,13 @@ ipa_make_edge_direct_to_target (struct c > fprintf (dump_file, "ipa-prop: Discovered direct call to > non-function" > " in %s/%i, making it unreachable.\n", >ie->caller->name (), ie->caller->order); > + else if (dump_enabled_p ()) > + { > + location_t loc = gimple_location (ie->call_stmt); > + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, > +"Discovered direct call to non-function in > %s, " > +"making it unreachable\n", ie->caller->name > ()); Perhaps "turning it to __builtin_unreachable call" and similarly in the other cases we introduce __builtin_unreachable? I think that could be easier for user to work out. What king of problems in devirtualizatoin you are seeing? Honza
Re: [RFC] HOST_WIDE_INT transition steps
On Tue, 20 May 2014, Eric Botcazou wrote: > > Same as for going C++. > > Not to the same extent, this will be worse because done en masse throughout > the code instead of gradually. Like the gimple -> gimple * change pending or the various gimple -> gswitch,glabel,etc. stuff? It's on a similar scale at lest. > > Make the code base easier to understand for newcomers. It's also a > > documentation improvement (you see what a HOST_WIDE_INT really is), > > alongside with [u]int64_t being less to type ... > > I personally find the abstraction and the separation with the other, more > mundane types useful, but I guess that it's a matter of habit. Yeah. I'm happy of converting only "obvious" cases (or converting them separately at least). Of course there may be not that many decoupled pieces ... Richard.
Re: [RFC] HOST_WIDE_INT transition steps
> Same as for going C++. Not to the same extent, this will be worse because done en masse throughout the code instead of gradually. > Make the code base easier to understand for newcomers. It's also a > documentation improvement (you see what a HOST_WIDE_INT really is), > alongside with [u]int64_t being less to type ... I personally find the abstraction and the separation with the other, more mundane types useful, but I guess that it's a matter of habit. -- Eric Botcazou
Re: [RFC] HOST_WIDE_INT transition steps
On Tue, 20 May 2014, Eric Botcazou wrote: > > The following is my current idea on progressing on the HOST_WIDE_INT > > removal > > > > 1) https://gcc.gnu.org/ml/gcc-patches/2014-05/msg00381.html (ping) > > > > 2) make sure [u]int64_t is available and use that to define HOST_WIDE_INT > > > > 3) s/HOST_WIDE_INT/int64_t/ (same for unsigned HOST_WIDE_INT) > > Does 3) really buy us something? That would make backports painful I think. Same as for going C++. Make the code base easier to understand for newcomers. It's also a documentation improvement (you see what a HOST_WIDE_INT really is), alongside with [u]int64_t being less to type ... Btw, all of the current pending rewrite patches will make backports painful. Oh well. Richard.