Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
On Thu, Oct 20, 2011 at 6:39 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Updated patch is attached. Test fails wihout and passing with the fix. ChangeLog entry: 2011-10-20 Kirill Yukhin kirill.yuk...@intel.com PR target/50766 * config/i386/i386.md (bmi_bextr_mode): Update register/ memory operand order. (bmi2_bzhi_mode3): Ditto. (bmi2_pdep_mode3): Ditto. (bmi2_pext_mode3): Ditto. testsuite/ChangeLog entry: 2011-10-20 Kirill Yukhin kirill.yuk...@intel.com PR target/50766 * gcc.target/i386/pr50766.c: New test. Could you please have a look? OK. Thanks, Uros.
[Patch, gcc, testsuite] Adjust optimization levels for some cases.
Hello, These four cases check the amount of the desired instructions. At O2 level, some factors like loop unroll will increase the amount of them. This patch is proposing to adjust the optimization level to O1 (the minimal requirement) to avoid such impact. In this way, the cases are more robust. Regression test is performed on arm-none-eabi target. No regression found. Is it OK to trunk? BR, Terry 2011-10-20 Terry Guo terry@arm.com * gcc.target/arm/wmul-1.c: Adjust optimization levels. * gcc.target/arm/wmul-2.c: Ditto. * gcc.target/arm/wmul-3.c: Ditto. * gcc.target/arm/wmul-4.c: Ditto. diff --git a/gcc/testsuite/gcc.target/arm/wmul-1.c b/gcc/testsuite/gcc.target/arm/wmul-1.c index 426c939..d50 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-1.c +++ b/gcc/testsuite/gcc.target/arm/wmul-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options -O2 } */ +/* { dg-options -O1 -fexpensive-optimizations } */ int mac(const short *a, const short *b, int sqr, int *sum) { diff --git a/gcc/testsuite/gcc.target/arm/wmul-2.c b/gcc/testsuite/gcc.target/arm/wmul-2.c index 898b5f0..2ea55f9 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-2.c +++ b/gcc/testsuite/gcc.target/arm/wmul-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options -O2 } */ +/* { dg-options -O1 -fexpensive-optimizations } */ void vec_mpy(int y[], const short x[], short scaler) { diff --git a/gcc/testsuite/gcc.target/arm/wmul-3.c b/gcc/testsuite/gcc.target/arm/wmul-3.c index 83f73fb..144b553 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-3.c +++ b/gcc/testsuite/gcc.target/arm/wmul-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options -O2 } */ +/* { dg-options -O1 -fexpensive-optimizations } */ int mac(const short *a, const short *b, int sqr, int *sum) { diff --git a/gcc/testsuite/gcc.target/arm/wmul-4.c b/gcc/testsuite/gcc.target/arm/wmul-4.c index a297bda..68f9866 100644 --- a/gcc/testsuite/gcc.target/arm/wmul-4.c +++ b/gcc/testsuite/gcc.target/arm/wmul-4.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_dsp } */ -/* { dg-options -O2 } */ +/* { dg-options -O1 -fexpensive-optimizations } */ int mac(const int *a, const int *b, long long sqr, long long *sum) {
RE: PING: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change
Ping http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01100.html * config/arm/arm.c (arm_option_override): Enable use of iWMMXt with VFP. Disable use of iwMMXt and Neon. (arm_expand_binop_builtin): Accept VOIDmode op. * config/arm/arm.md (*arm_movdi): Remove check for TARGET_IWMMXT. (*arm_movsi_insn): Likewise. (iwmmxt.md): Include earlier.
RE: PING: [PATCH, ARM, iWMMXt][3/5]: built in define and expand
Ping http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01103.html * config/arm/arm.c (enum arm_builtins): Revise built-in fcode. (builtin_description bdesc_2arg): Revise built in declaration. (builtin_description bdesc_1arg): Likewise. (arm_init_iwmmxt_builtins): Revise built in initialization. (arm_expand_builtin): Revise built in expansion.
RE: PING: [PATCH, ARM, iWMMXt][4/5]: WMMX machine description
Ping http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00279.html * config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function. (arm_output_iwmmxt_tinsr): Likewise. * config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): Declare. (arm_output_iwmmxt_tinsr): Likewise. * config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant. (iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete. (iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi): Likewise (*iwmmxt_clrv8qi, *iwmmxt_clrv4hi, *iwmmxt_clrv2si): Likewise. (tbcstv8qi, tbcstv4hi, tbsctv2si): New pattern. (iwmmxt_clrv8qi, iwmmxt_clrv4hi, iwmmxt_clrv2si): Likewise. (*andmode3_iwmmxt, *iormode3_iwmmxt, *xormode3_iwmmxt): Likewise. (rorimode3, ashrimode3_iwmmxt, lshrimode3_iwmmxt): Likewise. (ashlimode3_iwmmxt, iwmmxt_waligni, iwmmxt_walignr): Likewise. (iwmmxt_walignr0, iwmmxt_walignr1): Likewise. (iwmmxt_walignr2, iwmmxt_walignr3): Likewise. (iwmmxt_setwcgr0, iwmmxt_setwcgr1): Likewise. (iwmmxt_setwcgr2, iwmmxt_setwcgr3): Likewise. (iwmmxt_getwcgr0, iwmmxt_getwcgr1): Likewise. (iwmmxt_getwcgr2, iwmmxt_getwcgr3): Likewise. (All instruction patterns): Add wtype attribute. (*iwmmxt_arm_movdi, *iwmmxt_movsi_insn): iWMMXt coexist with vfp. (iwmmxt_uavgrndv8qi3, iwmmxt_uavgrndv4hi3): Revise the pattern. (iwmmxt_uavgv8qi3, iwmmxt_uavgv4hi3): Likewise. (iwmmxt_tinsrb, iwmmxt_tinsrh, iwmmxt_tinsrw):Likewise. (eqv8qi3, eqv4hi3, eqv2si3, gtuv8qi3): Likewise. (gtuv4hi3, gtuv2si3, gtv8qi3, gtv4hi3, gtv2si3): Likewise. (iwmmxt_wunpckihh, iwmmxt_wunpckihw, iwmmxt_wunpckilh): Likewise. (iwmmxt_wunpckilw, iwmmxt_wunpckehub, iwmmxt_wunpckehuh): Likewise. (iwmmxt_wunpckehuw, iwmmxt_wunpckehsb, iwmmxt_wunpckehsh): Likewise. (iwmmxt_wunpckehsw, iwmmxt_wunpckelub, iwmmxt_wunpckeluh): Likewise. (iwmmxt_wunpckeluw, iwmmxt_wunpckelsb, iwmmxt_wunpckelsh): Likewise. (iwmmxt_wunpckelsw, iwmmxt_wmadds, iwmmxt_wmaddu): Likewise. (iwmmxt_wsadb, iwmmxt_wsadh, iwmmxt_wsadbz, iwmmxt_wsadhz): Likewise. (iwmmxt2.md): Include. * config/arm/iwmmxt2.md: New file. * config/arm/iterators.md (VMMX2): New mode_iterator. * config/arm/arm.md (wtype): New attribute. (UNSPEC_WMADDS, UNSPEC_WMADDU): Delete. (UNSPEC_WALIGNI): New unspec. * config/arm/t-arm (MD_INCLUDES): Add iwmmxt2.md.
RE: PING: [PATCH, ARM, iWMMXt][5/5]: pipeline description
Ping http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01106.html * config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md. * config/arm/marvell-f-iwmmxt.md: New file. * config/arm/arm.md (marvell-f-iwmmxt.md): Include.
[PATCH] Loop IM cost TLC
We've got new tree codes, the following makes Loop IM cost consider those expensive that make sense. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-10-19 Richard Guenther rguent...@suse.de * tree-ssa-loop-im.c (stmt_cost): Add WIDEN_*, FMA_EXPR and rotates to the set of expensive operations. Index: gcc/tree-ssa-loop-im.c === *** gcc/tree-ssa-loop-im.c (revision 180191) --- gcc/tree-ssa-loop-im.c (working copy) *** stmt_cost (gimple stmt) *** 549,554 --- 549,559 switch (gimple_assign_rhs_code (stmt)) { case MULT_EXPR: + case WIDEN_MULT_EXPR: + case WIDEN_MULT_PLUS_EXPR: + case WIDEN_MULT_MINUS_EXPR: + case DOT_PROD_EXPR: + case FMA_EXPR: case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: *** stmt_cost (gimple stmt) *** 565,570 --- 570,578 case LSHIFT_EXPR: case RSHIFT_EXPR: + case WIDEN_LSHIFT_EXPR: + case LROTATE_EXPR: + case RROTATE_EXPR: cost += 20; break;
[PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Hello, The below fixes an embarrassing thinko of mine that breaks bootstrap on SPU and PPC targets (at very least). I am surprised it doesn't break more code. :-( I have lightly tested it on SPU in a cross compiled environment (so I couldn't bootstrap it there) and I have bootstrapped it on x86_64-unknown-linux-gnu. One person confirmed in the audit trail of the PR that it fixes the issue for him on PPC, so I am proposing the patch even if I don't know if it bootstraps on SPU or PPC in general. OK for trunk? From: Dodji Seketeli do...@redhat.com Date: Thu, 20 Oct 2011 09:43:49 +0200 Subject: [PATCH] Fix thinko in _cpp_remaining_tokens_num_in_context libcpp/ * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of number of tokens in direct tokens contexts. --- libcpp/ChangeLog |6 ++ libcpp/lex.c |3 +-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/libcpp/lex.c b/libcpp/lex.c index cd6ae9f..cf8ef7d 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -1710,8 +1710,7 @@ _cpp_remaining_tokens_num_in_context (cpp_reader *pfile) { cpp_context *context = pfile-context; if (context-tokens_kind == TOKENS_KIND_DIRECT) -return ((LAST (context).token - FIRST (context).token) - / sizeof (cpp_token)); +return (LAST (context).token - FIRST (context).token); else if (context-tokens_kind == TOKENS_KIND_INDIRECT || context-tokens_kind == TOKENS_KIND_EXTENDED) return ((LAST (context).ptoken - FIRST (context).ptoken) -- 1.7.6.4 -- Dodji
Re: [PATCH PR50572] Tune loop alignment for Atom
Please provide a patch which can be applied. Cut/paste doesn't create a working patch. Please attach it. -- H.J. Will that works? Sergos. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6c73404..e21cf86 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-20 Sergey Ostanevich sergos@gmail.com + + * config/i386/i386.c (processor_target_table): Change Atom + align_loops_max_skip to 15. + 2011-10-17 Michael Spertus mike_sper...@symantec.com * gcc/c-family/c-common.c (c_common_reswords): Add __bases, diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} }; static const char *const cpu_names[TARGET_CPU_DEFAULT_max] =
Re: [PATCH] Account for devirtualization opportunities in inliner
On Wed, Oct 19, 2011 at 11:59 PM, Maxim Kuvyrkov ma...@codesourcery.com wrote: On 28/09/2011, at 4:56 PM, Maxim Kuvyrkov wrote: Jan, The following patch starts a series of patches which improve devirtualization optimizations in GCC. This patch builds on ipa-cp.c and ipa-prop.c infrastructure for analyzing parameters and jump functions and adds basic estimation of devirtualization benefit from inlining an edge. E.g., if inlining A across edge E into B will allow some of the indirect edges of A to be resolved, then inlining cost of edge E is reduced. The patch was bootstrapped and regtested on x86_64-linux-gnu on both -m32 and -m64 multilibs. OK to commit? Ping. The primary change of this patch is to make evaluate_conditions_for_edge to output KNOWN_VALS and KNOWN_BINFOS arrays in addition to conditions for a callsite. KNOWN_VALS and KNOWN_BINFOS are then passed on to a subroutine of estimate_calls_size_and_time, which uses ipa-prop.c infrastructure to check if it will be possible to devirtualize any of the indirect edged within callee. If possible, then *size and *time returned by estimate_calls_size_and_time are reduced to account for the devirtualization benefits. OK for trunk? I miss testcase(s). Any assesment on how this improves devirtualization in practice (for example for Mozilla)? Thanks, Richard. -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: Avoid gcc.dg/tree-prof/val-prof-7.c dependence on strings.h
On Thu, Oct 20, 2011 at 1:25 AM, Joseph S. Myers jos...@codesourcery.com wrote: The testcase gcc.dg/tree-prof/val-prof-7.c includes strings.h to get a declaration of bzero. This causes it to fail on targets where bzero (a legacy function removed in the latest version of POSIX) is not declared in that header; declaring it explicitly in the testcase is more reliable. This patch changes the include to an explicit declaration. Tested with cross to i686-mingw32 (where the header just includes string.h and does not provide a declaration of bzero). OK to commit? Ok. Thanks, Richard. 2011-10-19 Joseph Myers jos...@codesourcery.com * gcc.dg/tree-prof/val-prof-7.c: Declare bzero instead of including strings.h. Index: gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c === --- gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (revision 180200) +++ gcc/testsuite/gcc.dg/tree-prof/val-prof-7.c (working copy) @@ -1,7 +1,7 @@ /* { dg-options -O2 -fdump-ipa-profile -mtune=core2 } */ /* { dg-skip-if { ! { i?86-*-* x86_64-*-* } } { * } { } } */ -#include strings.h +extern void bzero (void *, __SIZE_TYPE__); int foo(int len) { -- Joseph S. Myers jos...@codesourcery.com
[PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math
Hello! This patch builds on recent patch by Michael (that implemented fine-grained control on -mrecip option) and with -ffast-math emits reciprocal sequences with additional NR step for vectorized SFmode division and vectorized sqrtf(x). 2011-10-20 Uros Bizjak ubiz...@gmail.com * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph to check if I didn't mess something with options handling. The effect of the patch is 7% faster gas_dyn from polyhedron testsuite on corei7-avx. Uros. Index: config/i386/i386.h === --- config/i386/i386.h (revision 180176) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt === --- config/i386/i386.opt(revision 180176) +++ config/i386/i386.opt(working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi === --- doc/invoke.texi (revision 180176) +++ doc/invoke.texi (working copy) @@ -12927,6 +12927,11 @@ already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}. +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized sqrtf(x) already with +@option{-ffast-math} (or the above option combination), and doesn't need +@option{-mrecip}. + @item -mrecip=@var{opt} @opindex mrecip=opt This option allows to control which reciprocal estimate instructions
Re: new patches using -fopt-info (issue5294043)
On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote: x...@google.com (Rong Xu) writes: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. This doesn't look scalable if you consider that each pass would print as much of a mess like -fvectorizer-verbose=5. I think =2 and =3 should be omitted - we do have dump-files for a reason. Also the coverage/profile cases you changed do not at all match ... with large performance impact. In fact the impact is completely unknown (as it would be the case usually). I'd rather have a way to make dump-files more structured (so, following some standard reporting scheme) than introducing yet another way of output. [after making dump-files more consistent it will be easy to revisit patches like this, there would be a natural general central way to implement it] So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Dodji Seketeli wrote: I have lightly tested it on SPU in a cross compiled environment (so I couldn't bootstrap it there) and I have bootstrapped it on x86_64-unknown-linux-gnu. One person confirmed in the audit trail of the PR that it fixes the issue for him on PPC, so I am proposing the patch even if I don't know if it bootstraps on SPU or PPC in general. Well, SPU doesn't bootstrap as such (it's a target-only architecture), but I can confirm that the patch does fix the newlib build failure I was seeing on SPU. Thanks for the quick fix! Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
OK. Thanks, Uros. Great, could anybody please commit that? K
Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Dodji Seketeli wrote: cpp_context *context = pfile-context; if (context-tokens_kind == TOKENS_KIND_DIRECT) -return ((LAST (context).token - FIRST (context).token) - / sizeof (cpp_token)); +return (LAST (context).token - FIRST (context).token); else if (context-tokens_kind == TOKENS_KIND_INDIRECT || context-tokens_kind == TOKENS_KIND_EXTENDED) return ((LAST (context).ptoken - FIRST (context).ptoken) B.t.w. isn't the same thinko also present in the else if path: else if (context-tokens_kind == TOKENS_KIND_INDIRECT || context-tokens_kind == TOKENS_KIND_EXTENDED) return ((LAST (context).ptoken - FIRST (context).ptoken) / sizeof (cpp_token *)); ptoken seems to be of type const cpp_token **, so the pointer subtraction already divides by sizeof (cpp_token *). Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: Avoid -mno-accumulate-outgoing-args in tests on Windows target
On Thu, Oct 20, 2011 at 2:03 AM, Joseph S. Myers jos...@codesourcery.com wrote: The -mno-accumulate-outgoing-args option does not work with the stack probing used on Windows targets, giving a warning and so causing tests using that option to fail. This patch makes three tests not use that option on affected targets, like sse-10.c (see http://gcc.gnu.org/ml/gcc-patches/2008-03/msg00180.html for the introduction of the warning and the sse-10.c change). Tested with cross to i686-mingw32. OK to commit? Ok. Thanks, Richard. 2011-10-19 Joseph Myers jos...@codesourcery.com * gcc.target/i386/pr40906-1.c, gcc.target/i386/pr40906-2.c, gcc.target/i386/pr46226.c: Do not use -mno-accumulate-outgoing-args. Index: gcc/testsuite/gcc.target/i386/pr40906-2.c === --- gcc/testsuite/gcc.target/i386/pr40906-2.c (revision 180200) +++ gcc/testsuite/gcc.target/i386/pr40906-2.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target ia32 } */ /* { dg-options -O2 -Wno-psabi -fomit-frame-pointer -fno-asynchronous-unwind-tables -mpush-args -mno-accumulate-outgoing-args -m128bit-long-double } */ +/* { dg-options -O2 -Wno-psabi -fomit-frame-pointer -fno-asynchronous-unwind-tables -mpush-args -m128bit-long-double { target *-*-mingw* *-*-cygwin* } } */ void abort (void); Index: gcc/testsuite/gcc.target/i386/pr46226.c === --- gcc/testsuite/gcc.target/i386/pr46226.c (revision 180200) +++ gcc/testsuite/gcc.target/i386/pr46226.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -Os -fomit-frame-pointer -mno-accumulate-outgoing-args -fno-asynchronous-unwind-tables } */ +/* { dg-options -Os -fomit-frame-pointer -fno-asynchronous-unwind-tables { target *-*-mingw* *-*-cygwin* } } */ extern void abort(void); Index: gcc/testsuite/gcc.target/i386/pr40906-1.c === --- gcc/testsuite/gcc.target/i386/pr40906-1.c (revision 180200) +++ gcc/testsuite/gcc.target/i386/pr40906-1.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-require-effective-target ia32 } */ /* { dg-options -O2 -fomit-frame-pointer -fno-asynchronous-unwind-tables -mpush-args -mno-accumulate-outgoing-args } */ +/* { dg-options -O2 -fomit-frame-pointer -fno-asynchronous-unwind-tables -mpush-args { target *-*-mingw* *-*-cygwin* } } */ void abort (void); -- Joseph S. Myers jos...@codesourcery.com
Re: Use of vector instructions in memmov/memset expanding
Middle-end part of the patch is attached. On 20 October 2011 12:34, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I fixed the tests as well as updated my branch and fixed introduced during this process bugs. Here is fixed complete patch (other parts will be sent in consequent letters). The changes passed bootstrap and make check. On 29 September 2011 15:21, Jakub Jelinek ja...@redhat.com wrote: Hi! On Thu, Sep 29, 2011 at 03:14:40PM +0400, Michael Zolotukhin wrote: +/* { dg-options -O2 -march=atom -mtune=atom -m64 -dp } */ The testcases are wrong, -m64 or -m32 should never appear in dg-options, instead if the testcase is specific to -m64, it should be guarded with /* { dg-do compile { target lp64 } } */ resp. ia32 (or ilp32, depending on what exactly should be done for -mx32), if you have the same testcase for -m32 and -m64, but just want different scan-assembler for the two cases, then just guard the scan-assembler with lp64 resp. ia32/ilp32 target and add second one for the other target. Jakub -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation. memfunc-mid-3.patch Description: Binary data
Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Ulrich Weigand uweig...@de.ibm.com writes: I can confirm that the patch does fix the newlib build failure I was seeing on SPU. Pheew, thank you. Below is a better patch that I am bootstrapping at the moment. From: Dodji Seketeli do...@redhat.com Date: Thu, 20 Oct 2011 09:43:49 +0200 Subject: [PATCH] Fix thinko in _cpp_remaining_tokens_num_in_context libcpp/ * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of number of tokens. --- libcpp/ChangeLog |6 ++ libcpp/lex.c |6 ++ 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog index bbb4085..128d3e1 100644 --- a/libcpp/ChangeLog +++ b/libcpp/ChangeLog @@ -1,3 +1,9 @@ +2011-10-20 Dodji Seketeli do...@redhat.com + + PR bootstrap/50801 + * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of + number of tokens. + 2011-10-18 Dodji Seketeli do...@redhat.com PR bootstrap/50760 diff --git a/libcpp/lex.c b/libcpp/lex.c index cd6ae9f..527368b 100644 --- a/libcpp/lex.c +++ b/libcpp/lex.c @@ -1710,12 +1710,10 @@ _cpp_remaining_tokens_num_in_context (cpp_reader *pfile) { cpp_context *context = pfile-context; if (context-tokens_kind == TOKENS_KIND_DIRECT) -return ((LAST (context).token - FIRST (context).token) - / sizeof (cpp_token)); +return (LAST (context).token - FIRST (context).token); else if (context-tokens_kind == TOKENS_KIND_INDIRECT || context-tokens_kind == TOKENS_KIND_EXTENDED) -return ((LAST (context).ptoken - FIRST (context).ptoken) - / sizeof (cpp_token *)); +return (LAST (context).ptoken - FIRST (context).ptoken); else abort (); } -- 1.7.6.4 -- Dodji
Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO
On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most convenient way to get this into existing Makefiles is using small wrappers that pass the plugin. This matches how other compilers (LLVM, icc) do this too. My previous attempt at using shell scripts for this http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html was not approved. Here's another attempt using wrappers written in C. It's only a single wrapper which just adds a --plugin argument before calling the respective binutils utilities. Thanks for doing this. How do they end up being used? I suppose Makefiles will need to call gcc-ar then instead of ar? In which case I wonder if ... The logic gcc.c uses to find the files is very complicated. I didn't try to replicate it 100% and left out some magic. I would be interested if this simple method works for everyone or if more code needs to be added. This only needs to support LTO supporting hosts of course. ;) ... using something like gcc --ar would be more convenient (as you can then trivially share the find-the-files logic)? Did you consider factoring out the find-the-file logic to a shared file that you can re-use? Thanks, Richard. I didn't add any documentation because the syntax is exactly the same as the native ar/ranlib/nm. Passed bootstrap and test suite on x86_64-linux. gcc/: 2011-10-19 Andi Kleen a...@linux.intel.com * Makefile.in (MOSTLYCLEANFILES): Add gcc-ar/nm/ranlib. (native): Add gcc-ar. (AR_OBJS, AR_LIBS, gcc-ar, gcc-ar.o): Add. (install): Depend on install-gcc-ar. (install-gcc-ar): Add. (uninstall): Uninstall gcc-ar/nm/ranlib. * gcc-ar.c: Add new file. --- gcc/Makefile.in | 28 +-- gcc/gcc-ar.c | 109 +++ 2 files changed, 134 insertions(+), 3 deletions(-) create mode 100644 gcc/gcc-ar.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6b28ef5..7816243 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \ genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \ xgcc$(exeext) cpp$(exeext) cc1$(exeext) $(EXTRA_PASSES) \ $(EXTRA_PARTS) $(EXTRA_PROGRAMS) gcc-cross$(exeext) \ - $(SPECS) collect2$(exeext) lto-wrapper$(exeext) \ + $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \ + gcc-ranlib$(exeext) \ gcov-iov$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \ gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \ libcommon-target.a libcommon.a libgcc.mk @@ -1791,7 +1792,8 @@ rest.encap: lang.rest.encap # This is what is made with the host's compiler # whether making a cross compiler or not. native: config.status auto-host.h build-@POSUB@ $(LANGUAGES) \ - $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) + $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) \ + gcc-ar$(exeext) ifeq ($(enable_plugin),yes) native: gengtype$(exeext) @@ -2049,6 +2051,17 @@ sbitmap.o: sbitmap.c sbitmap.h $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(EBITMAP_H) sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h $(CONFIG_H) +AR_OBJS = gcc-ar.o +AR_LIBS = @COLLECT2_LIBS@ +gcc-ar$(exeext): $(AR_OBJS) $(LIBDEPS) + +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \ + $(AR_OBJS) $(LIBS) $(AR_LIBS) + +gcc-ar.o: gcc-ar.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H) + $(COMPILER) $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(DRIVER_DEFINES) \ + -DTARGET_MACHINE=\$(target_noncanonical)\ \ + -c $(srcdir)/gcc-ar.c $(OUTPUT_OPTION) @TARGET_SYSTEM_ROOT_DEFINE@ + COLLECT2_OBJS = collect2.o collect2-aix.o tlink.o COLLECT2_LIBS = @COLLECT2_LIBS@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS) @@ -4576,7 +4589,7 @@ maintainer-clean: # broken is small. install: install-common $(INSTALL_HEADERS) \ install-cpp install-man install-info install-@POSUB@ \ - install-driver install-lto-wrapper + install-driver install-lto-wrapper install-gcc-ar ifeq ($(enable_plugin),yes) install: install-plugin @@ -4901,6 +4914,12 @@ install-collect2: collect2 installdirs install-lto-wrapper: lto-wrapper$(exeext) $(INSTALL_PROGRAM) lto-wrapper$(exeext) $(DESTDIR)$(libexecsubdir)/lto-wrapper$(exeext) +# XXX hardlink if system supports it +install-gcc-ar: + $(INSTALL_PROGRAM) gcc-ar$(exeext) $(DESTDIR)$(bindir)/gcc-ar$(exeext) + $(INSTALL_PROGRAM) gcc-ar$(exeext) $(DESTDIR)$(bindir)/gcc-nm$(exeext) + $(INSTALL_PROGRAM) gcc-ar$(exeext) $(DESTDIR)$(bindir)/gcc-ranlib$(exeext) + # Cancel installation by deleting the installed files. uninstall: lang.uninstall -rm -rf $(DESTDIR)$(libsubdir) @@
Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Ulrich Weigand uweig...@de.ibm.com writes: B.t.w. isn't the same thinko also present in the else if path: Right. Jakub spotted it as well. Hence the followup patch in the other subthread. Thanks for watching. -- Dodji
Re: [PATCH, libcpp] Fix thinko in _cpp_remaining_tokens_num_in_context (PR bootstrap/50801)
Dodji Seketeli do...@redhat.com writes: libcpp/ * lex.c (_cpp_remaining_tokens_num_in_context): Fix computation of number of tokens. Jakub OKed the patch on IRC, so I went ahead and committed to trunk Thanks. -- Dodji
Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO
On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most convenient way to get this into existing Makefiles is using small wrappers that pass the plugin. This matches how other compilers (LLVM, icc) do this too. My previous attempt at using shell scripts for this http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html was not approved. Here's another attempt using wrappers written in C. It's only a single wrapper which just adds a --plugin argument before calling the respective binutils utilities. Thanks for doing this. How do they end up being used? I suppose Makefiles will need to call gcc-ar then instead of ar? In which case Yes, it is what other compilers provide at the moment, too. In longer run, I would like to see binutils plugin machinery to be able to resolve this by itself for all installed compilers in the system. This is bit tricky: 1) binutils already has default plugin search path. We need to arrange our plugin to install there 2) it is not realistic to expect exactly one linker plugin on the system. LLVM/Open64/ICC eventually will want to provide their own plugins on that search path 3) Either we will need to install plugin for every GCC release installed or we will need to make our plugin resonably backward compatible. This is probably not that big deal since the symbol table is rather simple part of LTO machinery. We broke compatibility in between 4.5/4.6 and 4.7, but we probably could get more serious here. I wonder if ... The logic gcc.c uses to find the files is very complicated. I didn't try to replicate it 100% and left out some magic. I would be interested if this simple method works for everyone or if more code needs to be added. This only needs to support LTO supporting hosts of course. ;) ... using something like gcc --ar would be more convenient (as you can then trivially share the find-the-files logic)? Did you consider factoring out the find-the-file logic to a shared file that you can re-use? Hmm, these alternatives would work with me. Bit ugly feature about gcc --ar is the fact that all options after --ar are passed to real ar and must be in the ar's syntax. That one is different from ours (and different from nm or ranlib's), so the formal description of how command line options works would get bit tricky. Honza
Re: [Patch ARM] Fix PR target/50106
On 19 October 2011 20:38, Nathan Froyd nfr...@mozilla.com wrote: On 10/19/2011 3:27 PM, Ramana Radhakrishnan wrote: Index: gcc/config/arm/arm.c - live_regs_mask |= extra_mask (size / UNITS_PER_WORD); + live_regs_mask |= extra_mask ((size + 3) / UNITS_PER_WORD); IIUC, wouldn't ((size + UNITS_PER_WORD - 1) / UNITS_PER_WORD) be clearer? -Nathan Doh ! Yes , this is what I committed. Ramana 2011-10-20 Ramana Radhakrishnan ramana.radhakrish...@linaro.org PR target/50106 * config/arm/arm.c (thumb_unexpanded_epilogue): Handle return reg size from 1-3. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c(revision 180239) +++ gcc/config/arm/arm.c(working copy) @@ -21652,7 +21652,8 @@ if (extra_pop 0) { unsigned long extra_mask = (1 extra_pop) - 1; - live_regs_mask |= extra_mask (size / UNITS_PER_WORD); + live_regs_mask |= extra_mask ((size + UNITS_PER_WORD - 1) + / UNITS_PER_WORD); } /* The prolog may have pushed some high registers to use as
Re: [PATCH] Account for devirtualization opportunities in inliner
Hi, sorry for delayed review. I am still trying to get ipa-inline-analysis to behave well on real codebases and make my mind around how to get more advanced hints, like this one, into it. diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index bd4d2ea..5e88c2d 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -711,14 +711,23 @@ evaluate_conditions_for_known_args (struct cgraph_node *node, /* Work out what conditions might be true at invocation of E. */ static clause_t -evaluate_conditions_for_edge (struct cgraph_edge *e, bool inline_p) +evaluate_conditions_vals_binfos_for_edge (struct cgraph_edge *e, + bool inline_p, + VEC (tree, heap) **known_vals_ptr, + VEC (tree, heap) **known_binfos_ptr) Hmm, I would make clause also returned by reference to be sonsistent and perhaps call it something like edge_properties since it is not really only about evaulating the clause anymore. -/* Increase SIZE and TIME for size and time needed to handle all calls in NODE. */ +/* Estimate benefit devirtualizing indirect edge IE, provided KNOWN_VALS and + KNOWN_BINFOS. */ + +static void +estimate_edge_devirt_benefit (struct cgraph_edge *ie, + int *size, int *time, int prob, + VEC (tree, heap) *known_vals, + VEC (tree, heap) *known_binfos) I think this whole logic should go into estimate_edge_time_and_size. This way we will save all the duplication of scaling logic Just add the known_vals/binfos arguments. I am not quite sure how to estimate the actual benefits. estimate_num_insns doesn't really make a difference in between direct and indirect calls. I see it is good idea to inline more then the destination is known inlinable. This is an example when we have additional knowledge that we want to mix into badness metric that does not directly translate to time/size. There are multiple cases like this. I was thinking of adding kind of bonus metric for this purpose, but I would suggest doing this incrementally. What about 1) extending estimate_num_insns wieghts to account direct calls differently from indirect calls (i.e. adding indirect_call cost value into eni wights) I would set it 2 for size metrics and 15 for time metrics for start 2) make estimate_edge_time_and_size to subtract difference of those two metrics from edge costs when destination is direct. Incrementally we can think of how to extra prioritize direct calls with inlinable targets. +/* Increase SIZE and TIME for size and time needed to handle all calls in NODE. + POSSIBLE_TRUTHS, KNOWN_VALS and KNOWN_BINFOS describe context of the call + site. */ static void estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time, - clause_t possible_truths) + clause_t possible_truths, + VEC (tree, heap) *known_vals, + VEC (tree, heap) *known_binfos) { struct cgraph_edge *e; for (e = node-callees; e; e = e-next_callee) @@ -2125,25 +2207,35 @@ estimate_calls_size_and_time (struct cgraph_node *node, int *size, int *time, } else estimate_calls_size_and_time (e-callee, size, time, - possible_truths); + possible_truths, + /* TODO: remap KNOWN_VALS and +KNOWN_BINFOS to E-CALLEE +parameters, and use them. */ + NULL, NULL); Remapping should not be needed here - the jump functions are merged after marking edge inline, so jump functions in inlined functions actually reffer to the parameters of the function they are inlined to. Honza
Re: [Patch ARM] Fix PR target/50106
On Wed, Oct 19, 2011 at 08:27:26PM +0100, Ramana Radhakrishnan wrote: Ok to backport to 4.6 branch given it is branch freeze time ? I'll be Yeah (with the changes Nathan suggested). 2011-10-19 Ramana Radhakrishnan ramana.radhakrish...@linaro.org PR target/50106 * config/arm/arm.c (thumb_unexpanded_epilogue): Handle return reg size from 1-3. Jakub
Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO
On Thu, Oct 20, 2011 at 10:56 AM, Jan Hubicka hubi...@ucw.cz wrote: On Thu, Oct 20, 2011 at 9:24 AM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most convenient way to get this into existing Makefiles is using small wrappers that pass the plugin. This matches how other compilers (LLVM, icc) do this too. My previous attempt at using shell scripts for this http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html was not approved. Here's another attempt using wrappers written in C. It's only a single wrapper which just adds a --plugin argument before calling the respective binutils utilities. Thanks for doing this. How do they end up being used? I suppose Makefiles will need to call gcc-ar then instead of ar? In which case Yes, it is what other compilers provide at the moment, too. In longer run, I would like to see binutils plugin machinery to be able to resolve this by itself for all installed compilers in the system. This is bit tricky: 1) binutils already has default plugin search path. We need to arrange our plugin to install there 2) it is not realistic to expect exactly one linker plugin on the system. LLVM/Open64/ICC eventually will want to provide their own plugins on that search path 3) Either we will need to install plugin for every GCC release installed or we will need to make our plugin resonably backward compatible. This is probably not that big deal since the symbol table is rather simple part of LTO machinery. We broke compatibility in between 4.5/4.6 and 4.7, but we probably could get more serious here. I wonder if ... The logic gcc.c uses to find the files is very complicated. I didn't try to replicate it 100% and left out some magic. I would be interested if this simple method works for everyone or if more code needs to be added. This only needs to support LTO supporting hosts of course. ;) ... using something like gcc --ar would be more convenient (as you can then trivially share the find-the-files logic)? Did you consider factoring out the find-the-file logic to a shared file that you can re-use? Hmm, these alternatives would work with me. Bit ugly feature about gcc --ar is the fact that all options after --ar are passed to real ar and must be in the ar's syntax. That one is different from ours (and different from nm or ranlib's), so the formal description of how command line options works would get bit tricky. Yeah, maybe use it as `gcc --ar`, thus make it print the found ar plus the plugin argument ... At least somehow sharing the file finding code would be nice, but I don't want to block the patch in its current form if sharing it does complicate things more than it simplifies them by not duplicating code. Richard. Honza
Honnor -fno-topleverl-reorder with whopr for vars and functions
Hi, this patch makes -fno-toplevel-reorder to work better with WHOPR. The functions and variables comes out in proper order that is needed for Linux kernel to currently boot with LTO because linker order is important there for kernel's initialization code. I also used this code when comparing various code layout algorithms - the default layout is not as bad as one might think in most cases. The implementation is generally simple - lto_balanced_map already works on fixed order of functions. It however grabs variables to first partition that reffers to them and if none is found, they are all homed in the last partition. This needs to be changed and variables needs to be inserted in order when corresponding function is inserted, this is reason for lto_balanced_map changes. Also we sort partitions by size in lto_wpa_write_files to make parallel make finish faster. This would mix the linker order and needs to be disbaled. We could of course output separate linker and makefile order, but I did't bother to do so. Also the patch won't output toplevel asm statements correctly - these are still homed in first partition. I can look into this incrementally. However to make this useful, we probably ought to prevent lto_balanced_map to break up partitions in the middle of asm file. This is not needed for kernel, so I deffer it for later time. Unfortunately the patch doesn't make kernel to build since we hit quite involved bug in partitioning and variable promotion. I am working on fix but it will take me bit time. Well, extra stress on bugs in partitioning is another reason for this patch to be interesting. Bootstrapped/regtested x86_64-linux, OK? Honza * lto/lto.c (node_cmp, varpool_node_cmp): New functions. (lto_balanced_map): Honnor -fno-toplevel-reorder of varsfunctions. (cmp_partitions): Rename to ... (cmp_partitions_size): ... this one. (cmp_partitions_order): New function. (lto_wpa_write_files): Sort partitions by order when -fno-toplevel-reorder is used. Index: lto/lto.c === --- lto/lto.c (revision 180181) +++ lto/lto.c (working copy) @@ -1665,6 +1673,23 @@ lto_1_to_1_map (void) ltrans_partitions); } +/* Helper function for qsort; sort nodes by order. */ +static int +node_cmp (const void *pa, const void *pb) +{ + const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; + const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; + return b-order - a-order; +} + +/* Helper function for qsort; sort nodes by order. */ +static int +varpool_node_cmp (const void *pa, const void *pb) +{ + const struct varpool_node *a = *(const struct varpool_node * const *) pa; + const struct varpool_node *b = *(const struct varpool_node * const *) pb; + return b-order - a-order; +} /* Group cgraph nodes into equally-sized partitions. @@ -1708,9 +1733,11 @@ static void lto_balanced_map (void) { int n_nodes = 0; + int n_varpool_nodes = 0, varpool_pos = 0; struct cgraph_node **postorder = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes); struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid); + struct varpool_node **varpool_order = NULL; int i, postorder_len; struct cgraph_node *node; int total_size = 0, best_total_size = 0; @@ -1722,6 +1749,7 @@ lto_balanced_map (void) int best_n_nodes = 0, best_n_varpool_nodes = 0, best_i = 0, best_cost = INT_MAX, best_internal = 0; int npartitions; + int current_order = -1; for (vnode = varpool_nodes; vnode; vnode = vnode-next) gcc_assert (!vnode-aux); @@ -1731,6 +1759,7 @@ lto_balanced_map (void) multiple partitions, this is just an estimate of real size. This is why we keep partition_size updated after every partition is finalized. */ postorder_len = ipa_reverse_postorder (postorder); + for (i = 0; i postorder_len; i++) { node = postorder[i]; @@ -1742,6 +1771,23 @@ lto_balanced_map (void) } free (postorder); + if (!flag_toplevel_reorder) +{ + qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp); + + for (vnode = varpool_nodes; vnode; vnode = vnode-next) + if (partition_varpool_node_p (vnode)) + n_varpool_nodes++; + varpool_order = XNEWVEC (struct varpool_node *, n_varpool_nodes); + + n_varpool_nodes = 0; + for (vnode = varpool_nodes; vnode; vnode = vnode-next) + if (partition_varpool_node_p (vnode)) + varpool_order[n_varpool_nodes++] = vnode; + qsort (varpool_order, n_varpool_nodes, sizeof (struct varpool_node *), +varpool_node_cmp); +} + /* Compute partition size and create the first partition. */ partition_size = total_size / PARAM_VALUE (PARAM_LTO_PARTITIONS); if (partition_size PARAM_VALUE (MIN_PARTITION_SIZE)) @@ -1756,8 +1802,20 @@ lto_balanced_map
Re: Honnor -fno-topleverl-reorder with whopr for vars and functions
On Thu, 20 Oct 2011, Jan Hubicka wrote: Hi, this patch makes -fno-toplevel-reorder to work better with WHOPR. The functions and variables comes out in proper order that is needed for Linux kernel to currently boot with LTO because linker order is important there for kernel's initialization code. I also used this code when comparing various code layout algorithms - the default layout is not as bad as one might think in most cases. The implementation is generally simple - lto_balanced_map already works on fixed order of functions. It however grabs variables to first partition that reffers to them and if none is found, they are all homed in the last partition. This needs to be changed and variables needs to be inserted in order when corresponding function is inserted, this is reason for lto_balanced_map changes. Also we sort partitions by size in lto_wpa_write_files to make parallel make finish faster. This would mix the linker order and needs to be disbaled. We could of course output separate linker and makefile order, but I did't bother to do so. Also the patch won't output toplevel asm statements correctly - these are still homed in first partition. I can look into this incrementally. However to make this useful, we probably ought to prevent lto_balanced_map to break up partitions in the middle of asm file. This is not needed for kernel, so I deffer it for later time. Unfortunately the patch doesn't make kernel to build since we hit quite involved bug in partitioning and variable promotion. I am working on fix but it will take me bit time. Well, extra stress on bugs in partitioning is another reason for this patch to be interesting. Bootstrapped/regtested x86_64-linux, OK? Ok. Thanks, Richard. Honza * lto/lto.c (node_cmp, varpool_node_cmp): New functions. (lto_balanced_map): Honnor -fno-toplevel-reorder of varsfunctions. (cmp_partitions): Rename to ... (cmp_partitions_size): ... this one. (cmp_partitions_order): New function. (lto_wpa_write_files): Sort partitions by order when -fno-toplevel-reorder is used. Index: lto/lto.c === --- lto/lto.c (revision 180181) +++ lto/lto.c (working copy) @@ -1665,6 +1673,23 @@ lto_1_to_1_map (void) ltrans_partitions); } +/* Helper function for qsort; sort nodes by order. */ +static int +node_cmp (const void *pa, const void *pb) +{ + const struct cgraph_node *a = *(const struct cgraph_node * const *) pa; + const struct cgraph_node *b = *(const struct cgraph_node * const *) pb; + return b-order - a-order; +} + +/* Helper function for qsort; sort nodes by order. */ +static int +varpool_node_cmp (const void *pa, const void *pb) +{ + const struct varpool_node *a = *(const struct varpool_node * const *) pa; + const struct varpool_node *b = *(const struct varpool_node * const *) pb; + return b-order - a-order; +} /* Group cgraph nodes into equally-sized partitions. @@ -1708,9 +1733,11 @@ static void lto_balanced_map (void) { int n_nodes = 0; + int n_varpool_nodes = 0, varpool_pos = 0; struct cgraph_node **postorder = XCNEWVEC (struct cgraph_node *, cgraph_n_nodes); struct cgraph_node **order = XNEWVEC (struct cgraph_node *, cgraph_max_uid); + struct varpool_node **varpool_order = NULL; int i, postorder_len; struct cgraph_node *node; int total_size = 0, best_total_size = 0; @@ -1722,6 +1749,7 @@ lto_balanced_map (void) int best_n_nodes = 0, best_n_varpool_nodes = 0, best_i = 0, best_cost = INT_MAX, best_internal = 0; int npartitions; + int current_order = -1; for (vnode = varpool_nodes; vnode; vnode = vnode-next) gcc_assert (!vnode-aux); @@ -1731,6 +1759,7 @@ lto_balanced_map (void) multiple partitions, this is just an estimate of real size. This is why we keep partition_size updated after every partition is finalized. */ postorder_len = ipa_reverse_postorder (postorder); + for (i = 0; i postorder_len; i++) { node = postorder[i]; @@ -1742,6 +1771,23 @@ lto_balanced_map (void) } free (postorder); + if (!flag_toplevel_reorder) +{ + qsort (order, n_nodes, sizeof (struct cgraph_node *), node_cmp); + + for (vnode = varpool_nodes; vnode; vnode = vnode-next) + if (partition_varpool_node_p (vnode)) + n_varpool_nodes++; + varpool_order = XNEWVEC (struct varpool_node *, n_varpool_nodes); + + n_varpool_nodes = 0; + for (vnode = varpool_nodes; vnode; vnode = vnode-next) + if (partition_varpool_node_p (vnode)) + varpool_order[n_varpool_nodes++] = vnode; + qsort (varpool_order, n_varpool_nodes, sizeof (struct varpool_node *), + varpool_node_cmp); +} + /* Compute partition size and create the first partition. */
Re: [PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math
Hi, On Thu, 20 Oct 2011, Uros Bizjak wrote: This patch builds on recent patch by Michael (that implemented fine-grained control on -mrecip option) and with -ffast-math emits reciprocal sequences with additional NR step for vectorized SFmode division and vectorized sqrtf(x). FWIW, I didn't yet come to do the same for cpu2006, but here are the two results of polyhedron (sandybridge, with baseflags -Ofast -funroll-loops -fpeel-loops -march=corei7-avx -mveclibabi=svml -flto -fwhole-program, i.e. without increasing the inline limits, and linking against libimf and libsvml). With the above flags: Benchmark Compile Executable Ave Run Number Estim Name(secs) (bytes)(secs) Repeats Err % - --- -- --- --- -- ac 4.68 4086864 6.16 2 0.0211 aermod 68.22 5603956 13.40 5 0.1864 air 10.46 4961134 3.78 5 0.2888 capacita 3.74 4213850 19.24 3 0.0998 channel 1.44 4808524 1.22 5 0.2898 doduc 12.64 4288238 19.91 5 0.1128 fatigue 4.47 4217301 3.71 5 0.0989 gas_dyn 6.92 4211997 3.43 5 2.8640 induct 7.44 4385543 10.33 5 0.2719 linpk 1.28 4053798 5.88 2 0.0647 mdbx 3.97 4114107 7.63 5 0.1365 nf 4.89 4147809 7.90 2 0.0380 protein 15.07 5049415 20.70 5 0.7615 rnflow 11.89 4260434 16.05 5 0.1359 test_fpu 8.11 4207868 3.69 5 0.6687 tfft 0.99 4110713 0.84 5 0.3024 Geometric Mean Execution Time = 6.35 seconds With the above flags plus -mrecip=vec-sqrt,vec-div: Benchmark Compile Executable Ave Run Number Estim Name(secs) (bytes)(secs) Repeats Err % - --- -- --- --- -- ac 3.85 4086864 6.17 2 0.0227 aermod 68.31 5603956 13.38 2 0.0019 air 10.92 4961134 3.77 5 0.1367 capacita 3.71 4213850 18.68 2 0.0391 channel 1.41 4808524 1.22 5 0.3327 doduc 12.66 4288238 19.93 5 0.2391 fatigue 4.36 4217301 3.70 2 0.0567 gas_dyn 6.91 4211997 2.31 2 0.0867 induct 7.46 4385543 10.31 5 0.1201 linpk 1.70 4053798 5.88 2 0.0383 mdbx 3.98 4114107 7.68 5 0.4000 nf 4.89 4147809 7.89 2 0.0348 protein 14.00 5049415 20.51 2 0.0478 rnflow 11.89 4260434 16.05 4 0.0837 test_fpu 8.09 4207868 3.71 5 0.7097 tfft 1.13 4110713 0.83 5 0.2290 Geometric Mean Execution Time = 6.18 seconds I.e. gas_dyn improves quite a bit (as expected), and the rest still works. I know that cpu2006 also works, but as said have no recent measurements for that, which I'm going to take now. Ciao, Michael.
Re: [PATCH] Distribute inliner's size_time data across entries with similar predicates
Hi, Jan, The following patch started as a one-liner for ipa-inline-analysis.c: account_size_time() to merge predicates when we are adding data to entry[0] (i.e., when space for 32 size_time entries is exhausted): @@ -537,6 +592,9 @@ account_size_time (struct inline_summary } else { + e-predicate = or_predicates (summary-conds, e-predicate, pred); e-size += size; e-time += time; if (e-time MAX_TIME * INLINE_TIME_SCALE) As we discussed, this is not needed in current form because we arrange first predicate to be always true and thus we could always place there all the costs that did not fit elwhere. The patch has a problem with fact that the predicates must be always conservative i.e. when they are proved to be false the code must be unreachable after inlining. We could either go with your patch with the distance fuction modified to accept only predicates such that the new predicate is implied by them. If you are willing to play with this, I have no problem with going for this. The accounting is run just at most N statements of time, so the overall time should not be too bad. We could also stay with current logic until we hit real world testcases that demonstrate need for something like this and drop comment in the code above explaning why or is not needed to avoid confussion. Honza
Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)
On Wed, 19 Oct 2011, Jakub Jelinek wrote: Hi! Similarly to casts of bool to integer, even stores into bool arrays can be handled similarly. Just we need to ensure tree-vect-data-refs.c doesn't reject vectorization before tree-vect-patterns.c has a chance to optimize it. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok with ... 2011-10-19 Jakub Jelinek ja...@redhat.com PR tree-optimization/50596 * tree-vect-stmts.c (vect_mark_relevant): Only use FOR_EACH_IMM_USE_FAST if lhs is SSA_NAME. (vectorizable_store): If is_pattern_stmt_p look through VIEW_CONVERT_EXPR on lhs. * tree-vect-patterns.c (vect_recog_bool_pattern): Optimize also stores into bool memory in addition to casts from bool to integral types. (vect_mark_pattern_stmts): If pattern_stmt already has vinfo created, don't create it again. * tree-vect-data-refs.c (vect_analyze_data_refs): For stores into bool memory use vectype for integral type corresponding to bool's mode. * tree-vect-loop.c (vect_determine_vectorization_factor): Give up if a store into bool memory hasn't been replaced by the pattern recognizer. * gcc.dg/vect/vect-cond-10.c: New test. --- gcc/tree-vect-stmts.c.jj 2011-10-18 23:52:07.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-19 14:19:00.0 +0200 @@ -159,19 +159,20 @@ vect_mark_relevant (VEC(gimple,heap) **w /* This use is out of pattern use, if LHS has other uses that are pattern uses, we should mark the stmt itself, and not the pattern stmt. */ - FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) -{ - if (is_gimple_debug (USE_STMT (use_p))) -continue; - use_stmt = USE_STMT (use_p); + if (TREE_CODE (lhs) == SSA_NAME) + FOR_EACH_IMM_USE_FAST (use_p, imm_iter, lhs) + { + if (is_gimple_debug (USE_STMT (use_p))) + continue; + use_stmt = USE_STMT (use_p); - if (vinfo_for_stmt (use_stmt) - STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt))) -{ - found = true; - break; -} -} + if (vinfo_for_stmt (use_stmt) + STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (use_stmt))) + { + found = true; + break; + } + } } if (!found) @@ -3656,6 +3657,9 @@ vectorizable_store (gimple stmt, gimple_ return false; scalar_dest = gimple_assign_lhs (stmt); + if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR + is_pattern_stmt_p (stmt_info)) +scalar_dest = TREE_OPERAND (scalar_dest, 0); if (TREE_CODE (scalar_dest) != ARRAY_REF TREE_CODE (scalar_dest) != INDIRECT_REF TREE_CODE (scalar_dest) != COMPONENT_REF Just change the if () stmt to if (!handled_component_p (scalar_dest) TREE_CODE (scalar_dest) != MEM_REF) return false; --- gcc/tree-vect-patterns.c.jj 2011-10-18 23:52:05.0 +0200 +++ gcc/tree-vect-patterns.c 2011-10-19 13:55:27.0 +0200 @@ -1933,6 +1933,50 @@ vect_recog_bool_pattern (VEC (gimple, he VEC_safe_push (gimple, heap, *stmts, last_stmt); return pattern_stmt; } + else if (rhs_code == SSA_NAME + STMT_VINFO_DATA_REF (stmt_vinfo)) +{ + stmt_vec_info pattern_stmt_info; + vectype = STMT_VINFO_VECTYPE (stmt_vinfo); + gcc_assert (vectype != NULL_TREE); + if (!check_bool_pattern (var, loop_vinfo)) + return NULL; + + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts); + if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF) + { + lhs = copy_node (lhs); We don't handle TARGET_MEM_REF in vectorizable_store, so no need to do it here. In fact, just unconditionally do ... + TREE_TYPE (lhs) = TREE_TYPE (vectype); + } + else + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); ... this (wrap it in a V_C_E). No need to special-case any MEM_REFs. + if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) This should never be false, so you can as well unconditionally build the conversion stmt. + { + tree rhs2 = vect_recog_temp_ssa_var (TREE_TYPE (lhs), NULL); + gimple cast_stmt + = gimple_build_assign_with_ops (NOP_EXPR, rhs2, rhs, NULL_TREE); + STMT_VINFO_PATTERN_DEF_STMT (stmt_vinfo) = cast_stmt; + rhs = rhs2; + } + pattern_stmt + = gimple_build_assign_with_ops (SSA_NAME, lhs, rhs, NULL_TREE); + pattern_stmt_info = new_stmt_vec_info (pattern_stmt, loop_vinfo, NULL); + set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info); + STMT_VINFO_DATA_REF
[patch] dwarf2out crash: missing GTY? (PR 50806)
Hi, with custom patched dwarf2out.c I got a crash on memory mangled by the garbage collector. With patched GTY there the crash no longer happened - but I do not have a reproducer anymore, sorry if it is a bogus patch. The memory corrupted later was initially allocated and stored into mem_loc_result-dw_loc_oprnd1.v.val_loc. I do not think there is any other reference to it than that field with no GTY. GIT 33e7b55c2549d655d88ec64c06c51912d0d07527 gcc (GCC) 4.7.0 20111002 (experimental) 11900 mem_loc_result-dw_loc_oprnd1.v.val_loc = op0; (gdb) bt #0 mem_loc_descriptor (rtl=, mode=SImode, mem_mode=VOIDmode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:11900 #1 in loc_descriptor (rtl=, mode=SImode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12790 #2 in loc_descriptor (rtl=, mode=SImode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12614 #3 in dw_loc_list_1 (loc=, varloc=, want_address=2, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12889 #4 in dw_loc_list (loc_list=, decl=, want_address=2) at gcc/dwarf2out.c:13145 #5 in loc_list_from_tree (loc=, want_address=2) at gcc/dwarf2out.c:13538 #6 in add_location_or_const_value_attribute (die=, decl=, cache_p=0 '\000', attr=DW_AT_location) at gcc/dwarf2out.c:15048 #7 in gen_formal_parameter_die (node=, origin=0x0, emit_name_p=1 '\001', context_die=) at gcc/dwarf2out.c:16804 #8 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19632 #9 in gen_subprogram_die (decl=, context_die=) at gcc/dwarf2out.c:17560 #10 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19545 #11 in dwarf2out_decl (decl=) at gcc/dwarf2out.c:19919 #12 in dwarf2out_function_decl (decl=) at gcc/dwarf2out.c:19927 #13 in rest_of_handle_final () at gcc/final.c:4252 #14 in execute_one_pass (pass=0x4dbe120) at gcc/passes.c:2064 #15 in execute_pass_list (pass=0x4dbe120) at gcc/passes.c:2119 #16 in execute_pass_list (pass=0x4dbef00) at gcc/passes.c:2120 #17 in execute_pass_list (pass=0x4dbeea0) at gcc/passes.c:2120 #18 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420 #19 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803 #20 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862 #21 in cgraph_optimize () at gcc/cgraphunit.c:2133 #22 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310 #23 in c_write_global_declarations () at gcc/c-decl.c:9936 #24 in compile_file () at gcc/toplev.c:581 #25 in do_compile () at gcc/toplev.c:1925 #26 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001 #27 in main (argc=101, argv=) at gcc/main.c:36 It was later freed (watchpoint hit) by: (gdb) bt #0 __memset_sse2 () at ../sysdeps/x86_64/memset.S:333 #1 in poison_pages () at gcc/ggc-page.c:1845 #2 in ggc_collect () at gcc/ggc-page.c:1938 #3 in execute_todo (flags=2) at gcc/passes.c:1763 #4 in execute_one_pass (pass=0x4dbce80) at gcc/passes.c:2087 #5 in execute_pass_list (pass=0x4dbce80) at gcc/passes.c:2119 #6 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420 #7 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803 #8 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862 #9 in cgraph_optimize () at gcc/cgraphunit.c:2133 #10 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310 #11 in c_write_global_declarations () at gcc/c-decl.c:9936 #12 in compile_file () at gcc/toplev.c:581 #13 in do_compile () at gcc/toplev.c:1925 #14 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001 #15 in main (argc=101, argv=) at gcc/main.c:36 And later it crashed on the mangled memory. OK to check it in? No regression testing done. Thanks, Jan gcc/ 2011-10-20 Jan Kratochvil jan.kratoch...@redhat.com * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr; --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct { char *ll_symbol; /* Label for beginning of location list. Only on head of list */ const char *section; /* Section this loclist is relative to */ - dw_loc_descr_ref expr; + dw_loc_descr_ref GTY(()) expr; hashval_t hash; /* True if all addresses in this and subsequent lists are known to be resolved. */
Re: [patch] dwarf2out crash: missing GTY? (PR 50806)
On Thu, Oct 20, 2011 at 12:14 PM, Jan Kratochvil jan.kratoch...@redhat.com wrote: Hi, with custom patched dwarf2out.c I got a crash on memory mangled by the garbage collector. With patched GTY there the crash no longer happened - but I do not have a reproducer anymore, sorry if it is a bogus patch. The memory corrupted later was initially allocated and stored into mem_loc_result-dw_loc_oprnd1.v.val_loc. I do not think there is any other reference to it than that field with no GTY. GIT 33e7b55c2549d655d88ec64c06c51912d0d07527 gcc (GCC) 4.7.0 20111002 (experimental) 11900 mem_loc_result-dw_loc_oprnd1.v.val_loc = op0; (gdb) bt #0 mem_loc_descriptor (rtl=, mode=SImode, mem_mode=VOIDmode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:11900 #1 in loc_descriptor (rtl=, mode=SImode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12790 #2 in loc_descriptor (rtl=, mode=SImode, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12614 #3 in dw_loc_list_1 (loc=, varloc=, want_address=2, initialized=VAR_INIT_STATUS_INITIALIZED) at gcc/dwarf2out.c:12889 #4 in dw_loc_list (loc_list=, decl=, want_address=2) at gcc/dwarf2out.c:13145 #5 in loc_list_from_tree (loc=, want_address=2) at gcc/dwarf2out.c:13538 #6 in add_location_or_const_value_attribute (die=, decl=, cache_p=0 '\000', attr=DW_AT_location) at gcc/dwarf2out.c:15048 #7 in gen_formal_parameter_die (node=, origin=0x0, emit_name_p=1 '\001', context_die=) at gcc/dwarf2out.c:16804 #8 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19632 #9 in gen_subprogram_die (decl=, context_die=) at gcc/dwarf2out.c:17560 #10 in gen_decl_die (decl=, origin=0x0, context_die=) at gcc/dwarf2out.c:19545 #11 in dwarf2out_decl (decl=) at gcc/dwarf2out.c:19919 #12 in dwarf2out_function_decl (decl=) at gcc/dwarf2out.c:19927 #13 in rest_of_handle_final () at gcc/final.c:4252 #14 in execute_one_pass (pass=0x4dbe120) at gcc/passes.c:2064 #15 in execute_pass_list (pass=0x4dbe120) at gcc/passes.c:2119 #16 in execute_pass_list (pass=0x4dbef00) at gcc/passes.c:2120 #17 in execute_pass_list (pass=0x4dbeea0) at gcc/passes.c:2120 #18 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420 #19 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803 #20 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862 #21 in cgraph_optimize () at gcc/cgraphunit.c:2133 #22 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310 #23 in c_write_global_declarations () at gcc/c-decl.c:9936 #24 in compile_file () at gcc/toplev.c:581 #25 in do_compile () at gcc/toplev.c:1925 #26 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001 #27 in main (argc=101, argv=) at gcc/main.c:36 It was later freed (watchpoint hit) by: (gdb) bt #0 __memset_sse2 () at ../sysdeps/x86_64/memset.S:333 #1 in poison_pages () at gcc/ggc-page.c:1845 #2 in ggc_collect () at gcc/ggc-page.c:1938 #3 in execute_todo (flags=2) at gcc/passes.c:1763 #4 in execute_one_pass (pass=0x4dbce80) at gcc/passes.c:2087 #5 in execute_pass_list (pass=0x4dbce80) at gcc/passes.c:2119 #6 in tree_rest_of_compilation (fndecl=) at gcc/tree-optimize.c:420 #7 in cgraph_expand_function (node=) at gcc/cgraphunit.c:1803 #8 in cgraph_expand_all_functions () at gcc/cgraphunit.c:1862 #9 in cgraph_optimize () at gcc/cgraphunit.c:2133 #10 in cgraph_finalize_compilation_unit () at gcc/cgraphunit.c:1310 #11 in c_write_global_declarations () at gcc/c-decl.c:9936 #12 in compile_file () at gcc/toplev.c:581 #13 in do_compile () at gcc/toplev.c:1925 #14 in toplev_main (argc=101, argv=) at gcc/toplev.c:2001 #15 in main (argc=101, argv=) at gcc/main.c:36 And later it crashed on the mangled memory. OK to check it in? No regression testing done. I don't see how it can make any difference. Richard. Thanks, Jan gcc/ 2011-10-20 Jan Kratochvil jan.kratoch...@redhat.com * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr; --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct { char *ll_symbol; /* Label for beginning of location list. Only on head of list */ const char *section; /* Section this loclist is relative to */ - dw_loc_descr_ref expr; + dw_loc_descr_ref GTY(()) expr; hashval_t hash; /* True if all addresses in this and subsequent lists are known to be resolved. */
Re: [PATCH] Extend vect_recog_bool_pattern also to stores into bool memory (PR tree-optimization/50596)
On Thu, Oct 20, 2011 at 11:42:01AM +0200, Richard Guenther wrote: + if (TREE_CODE (scalar_dest) == VIEW_CONVERT_EXPR + is_pattern_stmt_p (stmt_info)) +scalar_dest = TREE_OPERAND (scalar_dest, 0); if (TREE_CODE (scalar_dest) != ARRAY_REF TREE_CODE (scalar_dest) != INDIRECT_REF TREE_CODE (scalar_dest) != COMPONENT_REF Just change the if () stmt to if (!handled_component_p (scalar_dest) TREE_CODE (scalar_dest) != MEM_REF) return false; That will accept BIT_FIELD_REF and ARRAY_RANGE_REF (as well as VCE outside of pattern stmts). The VCEs I hope don't appear, but the first two might, and I'm not sure we are prepared to handle them. Certainly not BIT_FIELD_REFs. + rhs = adjust_bool_pattern (var, TREE_TYPE (vectype), NULL_TREE, stmts); + if (TREE_CODE (lhs) == MEM_REF || TREE_CODE (lhs) == TARGET_MEM_REF) + { + lhs = copy_node (lhs); We don't handle TARGET_MEM_REF in vectorizable_store, so no need to do it here. In fact, just unconditionally do ... + TREE_TYPE (lhs) = TREE_TYPE (vectype); + } + else + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (vectype), lhs); ... this (wrap it in a V_C_E). No need to special-case any MEM_REFs. Ok. After all it seems vectorizable_store pretty much ignores it (except for the scalar_dest check above). For aliasing it uses the type from DR_REF and otherwise it uses the vectorized type. + if (!useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (rhs))) This should never be false, so you can as well unconditionally build the conversion stmt. You mean because currently adjust_bool_pattern will prefer signed types over unsigned while here lhs will be unsigned? I guess I should change it to use signed type for the memory store too to avoid the extra cast instead. Both types can be certainly the same precision, e.g. for: unsigned char a[N], b[N]; unsigned int d[N], e[N]; bool c[N]; ... for (i = 0; i N; ++i) c[i] = a[i] b[i]; or different precision, e.g. for: for (i = 0; i N; ++i) c[i] = d[i] e[i]; @@ -347,6 +347,28 @@ vect_determine_vectorization_factor (loo gcc_assert (STMT_VINFO_DATA_REF (stmt_info) || is_pattern_stmt_p (stmt_info)); vectype = STMT_VINFO_VECTYPE (stmt_info); + if (STMT_VINFO_DATA_REF (stmt_info)) + { + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + tree scalar_type = TREE_TYPE (DR_REF (dr)); + /* vect_analyze_data_refs will allow bool writes through, +in order to allow vect_recog_bool_pattern to transform +those. If they couldn't be transformed, give up now. */ + if (((TYPE_PRECISION (scalar_type) == 1 +TYPE_UNSIGNED (scalar_type)) + || TREE_CODE (scalar_type) == BOOLEAN_TYPE) Shouldn't it be always possible to vectorize those? For loads we can assume the memory contains only 1 or 0 (we assume that for scalar loads), for stores we can mask out all other bits explicitly if you add support for truncating conversions to non-mode precision (in fact, we could support non-mode precision vectorization that way, if not support bitfield loads or extending conversions). Not without the pattern recognizer transforming it into something. That is something we've discussed on IRC before I started working on the first vect_recog_bool_pattern patch, we'd need to special case bool and one-bit precision types in way too many places all around the vectorizer. Another reason for that was that what vect_recog_bool_pattern does currently is certainly way faster than what would we end up with if we just handled bool as unsigned (or signed?) char with masking on casts and stores - the ability to use any integer type for the bools rather than char as appropriate means we can avoid many VEC_PACK_TRUNK_EXPRs and corresponding VEC_UNPACK_{LO,HI}_EXPRs. So the chosen solution was attempt to transform some of bool patterns into something the vectorizer can handle easily. And that can be extended over time what it handles. The above just reflects it, probably just me trying to be too cautious, the vectorization would likely fail on the stmt feeding the store, because get_vectype_for_scalar_type would fail on it. If we wanted to support general TYPE_PRECISION != GET_MODE_BITSIZE (TYPE_MODE) vectorization (hopefully with still preserving the pattern bool recognizer for the above stated reasons), we'd start with changing get_vectype_for_scalar_type to handle those types (then the tree-vect-data-refs.c and tree-vect-loop.c changes from this patch would be unnecessary), but then we'd need to handle it in other places too (I guess loads would be fine (unless BIT_FIELD_REF loads), but then casts and stores need extra code). Jakub
Re: [patch] dwarf2out crash: missing GTY? (PR 50806)
On Thu, Oct 20, 2011 at 12:21:58PM +0200, Richard Guenther wrote: I don't see how it can make any difference. Indeed, I see no changes in gt-dwarf2out.h with the patch. So it doesn't do anything. 2011-10-20 Jan Kratochvil jan.kratoch...@redhat.com * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr; --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -1211,7 +1210,7 @@ typedef struct GTY(()) dw_loc_list_struct { char *ll_symbol; /* Label for beginning of location list. Only on head of list */ const char *section; /* Section this loclist is relative to */ - dw_loc_descr_ref expr; + dw_loc_descr_ref GTY(()) expr; hashval_t hash; /* True if all addresses in this and subsequent lists are known to be resolved. */ Jakub
Plug some bogus used uninitialized warnings
Hi, last time I tried profiledbootstrap with LTO I had to plug the following warnings. Will commit it as obvoius later today. Honza * pt.c (unify_pack_expansion): Iniitalize bad_old_arg and bad_new_arg. * parser.c (cp_parser_ctor_initializer_opt_and_function_body): Initialize list. * sched-deps.c (sched_get_condition_with_rev_uncached): Iniitalize tmp. Index: cp/pt.c === *** cp/pt.c (revision 180241) --- cp/pt.c (working copy) *** unify_pack_expansion (tree tparms, tree *** 15714,15720 } else { ! tree bad_old_arg, bad_new_arg; tree old_args = ARGUMENT_PACK_ARGS (old_pack); if (!comp_template_args_with_info (old_args, new_args, --- 15714,15720 } else { ! tree bad_old_arg = NULL, bad_new_arg = NULL; tree old_args = ARGUMENT_PACK_ARGS (old_pack); if (!comp_template_args_with_info (old_args, new_args, Index: cp/parser.c === *** cp/parser.c (revision 180241) --- cp/parser.c (working copy) *** cp_parser_function_body (cp_parser *pars *** 16887,16893 static bool cp_parser_ctor_initializer_opt_and_function_body (cp_parser *parser) { ! tree body, list; bool ctor_initializer_p; const bool check_body_p = DECL_CONSTRUCTOR_P (current_function_decl) --- 16887,16893 static bool cp_parser_ctor_initializer_opt_and_function_body (cp_parser *parser) { ! tree body, list = NULL; bool ctor_initializer_p; const bool check_body_p = DECL_CONSTRUCTOR_P (current_function_decl) Index: sched-deps.c === *** sched-deps.c(revision 180241) --- sched-deps.c(working copy) *** sched_get_condition_with_rev_uncached (c *** 544,550 static rtx sched_get_condition_with_rev (const_rtx insn, bool *rev) { ! bool tmp; if (INSN_LUID (insn) == 0) return sched_get_condition_with_rev_uncached (insn, rev); --- 544,550 static rtx sched_get_condition_with_rev (const_rtx insn, bool *rev) { ! bool tmp = false; if (INSN_LUID (insn) == 0) return sched_get_condition_with_rev_uncached (insn, rev);
Re: Plug some bogus used uninitialized warnings
On Thu, Oct 20, 2011 at 12:35:39PM +0200, Jan Hubicka wrote: Hi, last time I tried profiledbootstrap with LTO I had to plug the following warnings. Will commit it as obvoius later today. Please use NULL_TREE instead of NULL for tree initializers. Jakub
[Ada] Fix couple of issues with pragma Source_Reference
The GNAT specific pragma Source_Reference can alter the source line mapping, leading to (logical) line numbers lower than 1 or greater than the maximum number of lines in the file. Dodji's patch shows that we weren't taking it into account in gigi. Tested on i586-suse-linux, applied on the mainline. 2011-10-20 Eric Botcazou ebotca...@adacore.com * back_end.adb (Call_Back_End): Pass the maximum logical line number instead of the maximum physical line number to gigi. * gcc-interface/trans.c (Sloc_to_locus): Cope with line zero. 2011-10-20 Eric Botcazou ebotca...@adacore.com * gnat.dg/source_ref1.adb: New test. * gnat.dg/source_ref2.adb: Likewise. -- Eric Botcazou Index: back_end.adb === --- back_end.adb (revision 180235) +++ back_end.adb (working copy) @@ -114,9 +114,13 @@ package body Back_End is return; end if; + -- The back end needs to know the maximum line number that can appear + -- in a Sloc, in other words the maximum logical line number. + for J in 1 .. Last_Source_File loop File_Info_Array (J).File_Name:= Full_Debug_Name (J); - File_Info_Array (J).Num_Source_Lines := Num_Source_Lines (J); + File_Info_Array (J).Num_Source_Lines := + Nat (Physical_To_Logical (Last_Source_Line (J), J)); end loop; if Generate_SCIL then Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 180235) +++ gcc-interface/trans.c (working copy) @@ -8393,6 +8393,10 @@ Sloc_to_locus (Source_Ptr Sloc, location Column_Number column = Get_Column_Number (Sloc); struct line_map *map = LINEMAPS_ORDINARY_MAP_AT (line_table, file - 1); + /* We can have zero if pragma Source_Reference is in effect. */ + if (line 1) + line = 1; + /* Translate the location. */ *locus = linemap_position_for_line_and_column (map, line, column); } pragma Source_Reference (1, p2.adb); procedure Source_Ref2 is pragma Source_Reference (2, p2.adb); begin null; end; pragma Source_Reference (3, p1.adb); procedure Source_Ref1 is begin null; end;
[Ada] Housekeeping work in gigi (40/n)
Tested on i586-suse-linux, applied on the mainline. 2011-10-20 Eric Botcazou ebotca...@adacore.com * gcc-interface/trans.c (lhs_or_actual_p): New predicate. (unchecked_conversion_nop): Use it. (gnat_to_gnu): Likewise. -- Eric Botcazou Index: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 180242) +++ gcc-interface/trans.c (working copy) @@ -4472,6 +4472,28 @@ Compilation_Unit_to_gnu (Node_Id gnat_no invalidate_global_renaming_pointers (); } +/* Return true if GNAT_NODE is on the LHS of an assignment or an actual + parameter of a call. */ + +static bool +lhs_or_actual_p (Node_Id gnat_node) +{ + Node_Id gnat_parent = Parent (gnat_node); + Node_Kind kind = Nkind (gnat_parent); + + if (kind == N_Assignment_Statement Name (gnat_parent) == gnat_node) +return true; + + if ((kind == N_Procedure_Call_Statement || kind == N_Function_Call) + Name (gnat_parent) != gnat_node) +return true; + + if (kind == N_Parameter_Association) +return true; + + return false; +} + /* Return true if GNAT_NODE, an unchecked type conversion, is a no-op as far as gigi is concerned. This is used to avoid conversions on the LHS. */ @@ -4483,11 +4505,7 @@ unchecked_conversion_nop (Node_Id gnat_n /* The conversion must be on the LHS of an assignment or an actual parameter of a call. Otherwise, even if the conversion was essentially a no-op, it could de facto ensure type consistency and this should be preserved. */ - if (!(Nkind (Parent (gnat_node)) == N_Assignment_Statement - Name (Parent (gnat_node)) == gnat_node) - !((Nkind (Parent (gnat_node)) == N_Procedure_Call_Statement - || Nkind (Parent (gnat_node)) == N_Function_Call) - Name (Parent (gnat_node)) != gnat_node)) + if (!lhs_or_actual_p (gnat_node)) return false; from_type = Etype (Expression (gnat_node)); @@ -6528,13 +6546,13 @@ gnat_to_gnu (Node_Id gnat_node) /* Now convert the result to the result type, unless we are in one of the following cases: - 1. If this is the Name of an assignment statement or a parameter of - a procedure call, return the result almost unmodified since the - RHS will have to be converted to our type in that case, unless - the result type has a simpler size. Likewise if there is just - a no-op unchecked conversion in-between. Similarly, don't convert - integral types that are the operands of an unchecked conversion - since we need to ignore those conversions (for 'Valid). + 1. If this is the LHS of an assignment or an actual parameter of a + call, return the result almost unmodified since the RHS will have + to be converted to our type in that case, unless the result type + has a simpler size. Likewise if there is just a no-op unchecked + conversion in-between. Similarly, don't convert integral types + that are the operands of an unchecked conversion since we need + to ignore those conversions (for 'Valid). 2. If we have a label (which doesn't have any well-defined type), a field or an error, return the result almost unmodified. Similarly, @@ -6549,13 +6567,9 @@ gnat_to_gnu (Node_Id gnat_node) 4. Finally, if the type of the result is already correct. */ if (Present (Parent (gnat_node)) - ((Nkind (Parent (gnat_node)) == N_Assignment_Statement - Name (Parent (gnat_node)) == gnat_node) + (lhs_or_actual_p (gnat_node) || (Nkind (Parent (gnat_node)) == N_Unchecked_Type_Conversion unchecked_conversion_nop (Parent (gnat_node))) - || (Nkind (Parent (gnat_node)) == N_Procedure_Call_Statement - Name (Parent (gnat_node)) != gnat_node) - || Nkind (Parent (gnat_node)) == N_Parameter_Association || (Nkind (Parent (gnat_node)) == N_Unchecked_Type_Conversion !AGGREGATE_TYPE_P (gnu_result_type) !AGGREGATE_TYPE_P (TREE_TYPE (gnu_result
Re: [PATCH, RFA] Pass address space to REGNO_MODE_CODE_OK_FOR_BASE_P
Ulrich Weigand schrieb: Hello, Georg-Johann Lay has proposed a patch to add named address space support to the AVR target here: http://gcc.gnu.org/ml/gcc-patches/2011-10/msg00471.html Since the target needs to make register allocation decisions for address base registers depending on the target address space, a prerequiste for this is a patch of mine that I posted a while ago to add the address space to the MODE_CODE_BASE_REG_CLASS and REGNO_MODE_CODE_OK_FOR_BASE_P target macros. I've updated the patch for current mainline and re-tested on SPU with no regressions. Meanwhile, there was some code clean-up to avr backend. Would you add this? Johann * config/avr/avr.h (MODE_CODE_BASE_REG_CLASS): Add address space argument. (REGNO_MODE_CODE_OK_FOR_BASE_P): Ditto. * config/avr/avr-protos.h (avr_mode_code_base_reg_class): Ditto. (avr_regno_mode_code_ok_for_base_p): Ditto. * config/avr/avr.c (avr_mode_code_base_reg_class): Ditto. (avr_regno_mode_code_ok_for_base_p): Ditto. (avr_reg_ok_for_addr_p): Pass AS down to avr_regno_mode_code_ok_for_base_p. ChangeLog: * doc/tm.texi.in (MODE_CODE_BASE_REG_CLASS): Add address space argument. (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise. * doc/tm.texi: Regenerate. * config/cris/cris.h (MODE_CODE_BASE_REG_CLASS): Add address space argument. (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise. * config/bfin/bfin.h (MODE_CODE_BASE_REG_CLASS): Likewise. (REGNO_MODE_CODE_OK_FOR_BASE_P): Likewise. * addresses.h (base_reg_class): Add address space argument. Pass to MODE_CODE_BASE_REG_CLASS. (ok_for_base_p_1): Add address space argument. Pass to REGNO_MODE_CODE_OK_FOR_BASE_P. (regno_ok_for_base_p): Add address space argument. Pass to ok_for_base_p_1. * regrename.c (scan_rtx_address): Add address space argument. Pass address space to regno_ok_for_base_p and base_reg_class. Update recursive calls. (scan_rtx): Pass address space to scan_rtx_address. (build_def_use): Likewise. * regcprop.c (replace_oldest_value_addr): Add address space argument. Pass to regno_ok_for_base_p and base_reg_class. Update recursive calls. (replace_oldest_value_mem): Pass address space to replace_oldest_value_addr. (copyprop_hardreg_forward_1): Likewise. * reload.c (find_reloads_address_1): Add address space argument. Pass address space to base_reg_class and regno_ok_for_base_p. Update recursive calls. (find_reloads_address): Pass address space to base_reg_class, regno_ok_for_base_p, and find_reloads_address_1. (find_reloads): Pass address space to base_reg_class. (find_reloads_subreg_address): Likewise. * ira-costs.c (record_reg_classes): Update calls to base_reg_class. (ok_for_base_p_nonstrict): Add address space argument. Pass to ok_for_base_p_1. (record_address_regs): Add address space argument. Pass to base_reg_class and ok_for_base_p_nonstrict. Update recursive calls. (record_operand_costs): Pass address space to record_address_regs. (scan_one_insn): Likewise. * caller-save.c (init_caller_save): Update call to base_reg_class. * ira-conflicts.c (ira_build_conflicts): Likewise. * reload1.c (maybe_fix_stack_asms): Likewise. Index: config/avr/avr-protos.h === --- config/avr/avr-protos.h (revision 180193) +++ config/avr/avr-protos.h (working copy) @@ -107,8 +107,8 @@ extern int avr_simplify_comparison_p (en extern RTX_CODE avr_normalize_condition (RTX_CODE condition); extern void out_shift_with_cnt (const char *templ, rtx insn, rtx operands[], int *len, int t_len); -extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE); -extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE); +extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, addr_space_t, RTX_CODE, RTX_CODE); +extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, addr_space_t, RTX_CODE, RTX_CODE); extern rtx avr_incoming_return_addr_rtx (void); extern rtx avr_legitimize_reload_address (rtx, enum machine_mode, int, int, int, int, rtx (*)(rtx,int)); #endif /* RTX_CODE */ Index: config/avr/avr.c === --- config/avr/avr.c (revision 180193) +++ config/avr/avr.c (working copy) @@ -1213,12 +1213,12 @@ avr_cannot_modify_jumps_p (void) /* Helper function for `avr_legitimate_address_p'. */ static inline bool -avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED, +avr_reg_ok_for_addr_p (rtx reg, addr_space_t as, RTX_CODE outer_code, bool strict) { return (REG_P (reg) -
Re: PR bootstrap/50709 (bootstrap miscompare)
@@ -1392,16 +1393,20 @@ inline_small_functions (void) if (!edge-inline_failed) continue; - /* Be sure that caches are maintained consistent. */ #ifdef ENABLE_CHECKING + /* Be sure that caches are maintained conservatively consistent. + This means that cached badness is allways smaller or equal + to the real badness. */ + cached_badness = edge_badness (edge, false); +#endif reset_edge_growth_cache (edge); reset_node_growth_cache (edge-callee); -#endif /* When updating the edge costs, we only decrease badness in the keys. Increases of badness are handled lazilly; when we see key with out of date value on it, we re-insert it now. */ current_badness = edge_badness (edge, false); + gcc_assert (cached_badness == -1 || cached_badness = current_badness); This new check actually cathes a bug that is in tree since introduction of new ipa-inline-analysis code. The inliner assume that when it produce a new inline copy, the overall growth estimates for all callees can only degrade. This is not quite true: when a new knowledge is propagated, the callees might actually become cheaper and reduce the growth. This patch takes the easy but expensive way to plug the problem by forcing updating of all keys in the queue. It increases LTO compile time of Mozilla to 10 minutes, so I will need to develop better sollution. (the trick saving recomputation was originally introduced to reduce copmile time particularly on this testcase) Just I should not keep tree ICEing on many C++ sources until I am done. Bootstrapped/regtested x86_64-linux, comitted. Honza Index: ChangeLog === --- ChangeLog (revision 180247) +++ ChangeLog (working copy) @@ -1,5 +1,10 @@ 2011-10-19 Jan Hubicka j...@suse.cz + * ipa-inline.c (inline_small_functions): Always update all calles after + inlining. + +2011-10-19 Jan Hubicka j...@suse.cz + PR bootstrap/50709 * ipa-inline.c (inline_small_functions): Fix checking code to not make effect on fibheap stability. Index: ipa-inline.c === --- ipa-inline.c(revision 180247) +++ ipa-inline.c(working copy) @@ -1515,8 +1515,13 @@ inline_small_functions (void) /* We inlined last offline copy to the body. This might lead to callees of function having fewer call sites and thus they -may need updating. */ - if (callee-global.inlined_to) +may need updating. + +FIXME: the callee size could also shrink because more information +is propagated from caller. We don't track when this happen and +thus we need to recompute everything all the time. Once this is +solved, || 1 should go away. */ + if (callee-global.inlined_to || 1) update_all_callee_keys (heap, callee, updated_nodes); else update_callee_keys (heap, edge-callee, updated_nodes);
Re: [patch] C6X unwinding/exception handling
On 10/17/11 16:10, Nicola Pero wrote: I checked the attached patch, test results at http://gcc.gnu.org/ml/gcc-testresults/2011-10/msg01377.html which are the same as with my suggested patch. Ok for the trunk? I probably don't have authority to approve this, but looks OK to me. The libobjc bits are Ok for trunk. This is just making sure libjava/libobjc match libsupc++, correct? OK if Andrew doesn't object in the next day or so. Bernd
Re: PING: [PATCH, ARM, iWMMXt][5/5]: pipeline description
On 20 October 2011 08:42, Xinyu Qi x...@marvell.com wrote: Ping http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01106.html Index: gcc/config/arm/marvell-f-iwmmxt.md === --- gcc/config/arm/marvell-f-iwmmxt.md(revision 0) +++ gcc/config/arm/marvell-f-iwmmxt.md(revision 0) @@ -0,0 +1,179 @@ + +;; instructions classes s/instructions/Instruction. Otherwise OK. Ramana
Re: PING: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change
On 20 October 2011 08:35, Xinyu Qi x...@marvell.com wrote: Ping http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01100.html * config/arm/arm.c (arm_option_override): Enable use of iWMMXt with VFP. Disable use of iwMMXt and Neon. (arm_expand_binop_builtin): Accept VOIDmode op. * config/arm/arm.md (*arm_movdi): Remove check for TARGET_IWMMXT. (*arm_movsi_insn): Likewise. (iwmmxt.md): Include earlier. OK. cheers Ramana
[patch tree-optimization]: allow branch-cost optimization for truth-and/or on mode-expanded simple boolean-operands
Hello, this patch re-enables the branch-cost optimization on simple boolean-typed operands, which are casted to a wider integral type. This happens due casts from boolean-types are preserved, but FE might expands simple-expression to wider mode. I added two tests for already working branch-cost optimization for IA-architecture and two for explicit checking for boolean-type. ChangeLog 2011-10-20 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): Handle integral casts from boolean-operands. 2011-10-20 Kai Tietz kti...@redhat.com * gcc.target/i386/branch-cost1.c: New test. * gcc.target/i386/branch-cost2.c: New test. * gcc.target/i386/branch-cost3.c: New test. * gcc.target/i386/branch-cost4.c: New test. Bootstrapped and regression tested on x86_64-unknown-linux-gnu for all languages including Ada and Obj-C++. Ok for apply? Regards, Kai Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost2.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=2 } */ + +extern int doo (void); + +int +foo (int a, int b) +{ + if (a b) + return doo (); + return 0; +} + +/* { dg-final { scan-tree-dump-times if 1 gimple } } */ +/* { dg-final { scan-tree-dump-times1 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=2 } */ + +extern int doo (void); + +int +foo (_Bool a, _Bool b) +{ + if (a b) + return doo (); + return 0; +} + +/* { dg-final { scan-tree-dump-times if 1 gimple } } */ +/* { dg-final { scan-tree-dump-times1 gimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost4.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=0 } */ + +extern int doo (void); + +int +foo (_Bool a, _Bool b) +{ + if (a b) + return doo (); + return 0; +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { scan-tree-dump-notgimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */ Index: gcc-head/gcc/fold-const.c === --- gcc-head.orig/gcc/fold-const.c +++ gcc-head/gcc/fold-const.c @@ -3706,6 +3706,19 @@ simple_operand_p_2 (tree exp) /* Strip any conversions that don't change the machine mode. */ STRIP_NOPS (exp); + /* Handle integral widening casts from boolean-typed + expressions as simple. This happens due casts from + boolean-types are preserved, but FE might expands + simple-expression to wider mode. */ + if (INTEGRAL_TYPE_P (TREE_TYPE (exp)) + CONVERT_EXPR_P (exp) + TREE_CODE (TREE_TYPE (TREE_OPERAND (exp, 0))) +== BOOLEAN_TYPE) +{ + exp = TREE_OPERAND (exp, 0); + STRIP_NOPS (exp); +} + code = TREE_CODE (exp); if (TREE_SIDE_EFFECTS (exp) Index: gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c === --- /dev/null +++ gcc-head/gcc/testsuite/gcc.target/i386/branch-cost1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-gimple -mbranch-cost=0 } */ + +extern int doo (void); + +int +foo (int a, int b) +{ + if (a b) + return doo (); + return 0; +} + +/* { dg-final { scan-tree-dump-times if 2 gimple } } */ +/* { dg-final { scan-tree-dump-notgimple } } */ +/* { dg-final { cleanup-tree-dump gimple } } */
Re: regcprop.c bug fix
On 10/19/11 23:24, Mike Stump wrote: So while tracking down a hairy address reload for an output reload bug, copyprop_hardreg_forward_1 was faulting because it was trying to extract move patterns that didn't work out, and when it came back to the code, it then tries to access recog_data, but the problem is, the exploration of other instructions to see if they match, overwrites that data, and there is nothing that restores the data to a point in which the code below this point expects. It uses recog_data.operand[i], where i is limited by n_ops, but that value corresponded to the old data in recog_data. The recog and extract_insn in insn_invalid_p called from verify_changes called from apply_change_group called from validate_change wipes the `old' recog_data with new data. This data, for example, might only have 2 operands, with an invalid value for the third operand. The old n_ops, might well be 3 from the original data. Accessing that data can cause a crash. I found that maximally confusing, so let me try to rephrase it to see if I understood you. The two calls to validate_change clobber the recog_data even if they fail. In case they failed, we want to continue looking at data from the original insn, so we must recompute it. If that's what you were trying to say, it looks like the right diagnosis. Better to move the recomputation into the if statement that contains the validate_change calls and possibly add a comment about the effect of that function; otherwise OK. Bernd
Re: Plug some bogus used uninitialized warnings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/20/11 04:35, Jan Hubicka wrote: Hi, last time I tried profiledbootstrap with LTO I had to plug the following warnings. Will commit it as obvoius later today. Honza * pt.c (unify_pack_expansion): Iniitalize bad_old_arg and bad_new_arg. * parser.c (cp_parser_ctor_initializer_opt_and_function_body): Initialize list. * sched-deps.c (sched_get_condition_with_rev_uncached): Iniitalize tmp. Could we somehow mark cases where we create an initialization to avoid a bogus warning. Just some kind of comment marker would be fine. Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJOoCReAAoJEBRtltQi2kC7x/oIAI/FrW/S9MyQjkRP5Kv4oQWM qDJPAiTufSyHqaYgjRFpblihsimMKEvuJYnxy0KxJXsPhy8HeO1OStnuNhTMLKLY NAAkjLkq0VjfaEslukLM/OvQWmzJBwlt6nWle9K11KStrlpn1VTSFZWbZeDf5ELR J4wvj4m57nHtUzy4nL2Iv4fQ2MwGZcdvjBOYQ7txb6szWcA0FY/M+y5gTLJ3vVIP PRQmk7+nmTO/KJhgMGuWMo/kxvBOnWUl7knEySioHSwGZPxtfvsisO1h0AmfAN4E yrdm+FAPqhH3MOtRgBryYyiqG/BNiBD1Ia+qGFrljHZarI1WMcJxfs50K9KU6jU= =ITJz -END PGP SIGNATURE-
[PATCH, PR50763] Fix for ICE in verify_gimple
Richard, I have a fix for PR50763. The second example from the PR looks like this: ... int bar (int i); void foo (int c, int d) { if (bar (c)) bar (c); d = 33; while (c == d); } ... When compiled with -O2 -fno-dominator-opt, the gimple representation before ftree-tail-merge looks like this: ... foo (intD.6 cD.1606, intD.6 dD.1607) { intD.6 D.2730; # BLOCK 2 freq:900 # PRED: ENTRY [100.0%] (fallthru,exec) # .MEMD.2733_6 = VDEF .MEMD.2733_5(D) # USE = nonlocal # CLB = nonlocal D.2730_2 = barD.1605 (cD.1606_1(D)); if (D.2730_2 != 0) goto bb 3; else goto bb 7; # SUCC: 3 [29.0%] (true,exec) 7 [71.0%] (false,exec) # BLOCK 7 freq:639 # PRED: 2 [71.0%] (false,exec) goto bb 4; # SUCC: 4 [100.0%] (fallthru) # BLOCK 3 freq:261 # PRED: 2 [29.0%] (true,exec) # .MEMD.2733_7 = VDEF .MEMD.2733_6 # USE = nonlocal # CLB = nonlocal barD.1605 (cD.1606_1(D)); # SUCC: 4 [100.0%] (fallthru,exec) # BLOCK 4 freq:900 # PRED: 7 [100.0%] (fallthru) 3 [100.0%] (fallthru,exec) # .MEMD.2733_4 = PHI .MEMD.2733_6(7), .MEMD.2733_7(3) if (cD.1606_1(D) == 33) goto bb 8; else goto bb 9; # SUCC: 8 [91.0%] (true,exec) 9 [9.0%] (false,exec) # BLOCK 9 freq:81 # PRED: 4 [9.0%] (false,exec) goto bb 6; # SUCC: 6 [100.0%] (fallthru) # BLOCK 8 freq:819 # PRED: 4 [91.0%] (true,exec) # SUCC: 5 [100.0%] (fallthru) # BLOCK 5 freq:9100 # PRED: 8 [100.0%] (fallthru) 10 [100.0%] (fallthru) if (cD.1606_1(D) == 33) goto bb 10; else goto bb 11; # SUCC: 10 [91.0%] (true,exec) 11 [9.0%] (false,exec) # BLOCK 10 freq:8281 # PRED: 5 [91.0%] (true,exec) goto bb 5; # SUCC: 5 [100.0%] (fallthru) # BLOCK 11 freq:819 # PRED: 5 [9.0%] (false,exec) # SUCC: 6 [100.0%] (fallthru) # BLOCK 6 freq:900 # PRED: 11 [100.0%] (fallthru) 9 [100.0%] (fallthru) # VUSE .MEMD.2733_4 return; # SUCC: EXIT [100.0%] } ... During the first iteration, tail_merge_optimize finds that block 9 and 11, and block 8 and 10 are equal, and removes block 11 and 10. During the second iteration it finds that block 4 and block 5 are equal, and it removes block 5. Since pre had no effect, the responsibility for updating the vops lies with tail_merge_optimize. Block 4 starts with a virtual PHI which needs updating, but replace_block_by decides that an update is not necessary, because vop_at_entry returns NULL_TREE for block 5 (the vop_at_entry for block 4 is .MEMD.2733_4). What is different from normal is that block 4 dominates block 5. The patch makes sure that the vops are also updated if vop_at_entry is defined for only one of bb1 and bb2. This also forced me to rewrite the code that updates the uses, which uses dominator info now. This forced me to keep the dominator info up-to-date. Which forced me to move the actual deletion of the basic block and some additional bookkeeping related to that from purge_bbs to replace_block_by. Additionally, I fixed the case that update_vuses leaves virtual phis with only one argument (see unlink_virtual_phi). bootstrapped and reg-tested on x86_64. The tested patch had one addition to the attached patch: calling verify_dominators at the end of replace_block_by. OK for trunk? Thanks, - Tom 2011-10-20 Tom de Vries t...@codesourcery.com PR tree-optimization/50763 * tree-ssa-tail-merge.c (same_succ_flush_bb): New function, factored out of ... (same_succ_flush_bbs): Use same_succ_flush_bb. (purge_bbs): Remove argument. Remove calls to same_succ_flush_bbs, release_last_vdef and delete_basic_block. (unlink_virtual_phi): New function. (update_vuses): Add and use vuse1_phi_args argument. Set var to SSA_NAME_VAR of vuse1 or vuse2, and use var. Handle case that def_stmt2 is NULL. Use phi result as phi arg in case vuse1 or vuse2 is NULL_TREE. Replace uses of vuse1 if vuse2 is NULL_TREE. Fix code to limit replacement of uses. Propagate phi argument for phis with a single argument. (replace_block_by): Update vops if phi_vuse1 or phi_vuse2 is NULL_TREE. Set vuse1_phi_args if vuse1 is a phi defined in bb1. Add vuse1_phi_args as argument to call to update_vuses. Call release_last_vdef, same_succ_flush_bb, delete_basic_block. Update CDI_DOMINATORS info. (tail_merge_optimize): Remove argument in call to purge_bbs. Remove call to free_dominance_info. Only call calculate_dominance_info once. * gcc.dg/pr50763.c: New test. Index: gcc/tree-ssa-tail-merge.c === --- gcc/tree-ssa-tail-merge.c (revision 180237) +++ gcc/tree-ssa-tail-merge.c (working copy) @@ -753,6 +753,19 @@ delete_basic_block_same_succ (basic_bloc bitmap_set_bit (deleted_bb_preds, e-src-index); } +/* Removes BB from its corresponding same_succ. */ + +static void +same_succ_flush_bb (basic_block bb) +{ +
Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO
On Thu, Oct 20, 2011 at 10:45:31AM +0200, Richard Guenther wrote: My previous attempt at using shell scripts for this http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html was not approved. Here's another attempt using wrappers written in C. It's only a single wrapper which just adds a --plugin argument before calling the respective binutils utilities. Thanks for doing this. How do they end up being used? I suppose Makefiles will need to call gcc-ar then instead of ar? In which case I wonder if ... Basically you use make AR=gcc-ar RANLIB=gcc-ranlib NM=gcc-nm For most makefiles just specifying ar is enough. The logic gcc.c uses to find the files is very complicated. I didn't try to replicate it 100% and left out some magic. I would be interested if this simple method works for everyone or if more code needs to be added. This only needs to support LTO supporting hosts of course. ;) ... using something like gcc --ar would be more convenient (as you That's essentially what the old proposal did (gcc -print-plugin-name) plus a wrapper. You can see the old discussion here http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html can then trivially share the find-the-files logic)? Did you consider factoring out the find-the-file logic to a shared file that you can re-use? I did this first (with collect2), but it was quite messy. Still have it as a branch. Then I settled on this simpler method which works for me at least. collect2 does not fully match what gcc.c does I think, so there's already some divergence. -Andi
Re: [patch#2] dwarf2out: Drop the size + performance overhead of DW_AT_sibling
On Tue, 18 Oct 2011 10:38:23 +0200, Jakub Jelinek wrote: On Tue, Oct 18, 2011 at 10:28:09AM +0200, Jan Kratochvil wrote: 2011-10-12 Jan Kratochvil jan.kratoch...@redhat.com Stop producing DW_AT_sibling without -gstrict-dwarf. * dwarf2out.c (dwarf2out_finish): Remove calls of add_sibling_attributes if !DWARF_STRICT. Extend the comment with reason. This is ok for trunk. FYI this patch has not yet been checked in, it has negative performance effect on the systemtap DWARF consumer. http://sourceware.org/ml/archer/2011-q4/msg4.html I will post a patch removing only very short DW_AT_sibling skips later. Thanks, Jan
Re: [v3] tr2: bool_set, dynamic_bitset, ratio
On Wed, 19 Oct 2011, Ed Smith-Rowland wrote: I don't know if there is a paper yet. I also did rational using the gmp library. I'm wondering if rational should be a template class that could take Having things in libstdc++ etc. using GMP runs into the same issues as libquadmath of not wanting to link in non-libc LGPL code unless required - in particular for -static-libstdc++, where you should be able to distribute a binary built with -static-libstdc++ without either it having a dependency on GMP (unless the relevant features are used) or including GMP code with the consequent complications to distribution requirements. (Quite apart from GMP changing its SONAME - not under our control, unlike libquadmath.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO
On Thu, 20 Oct 2011, Andi Kleen wrote: collect2 does not fully match what gcc.c does I think, so there's already some divergence. collect2 is always called from within the gcc driver, so it can rely on environment variables set by the driver. As I understand it, these wrappers are not called from within the driver - they are called in the same environment as the driver itself is called in. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH, i386]: Use reciprocal sequences for vectorized SFmode division and sqrtf(x) for -ffast-math
On Thu, Oct 20, 2011 at 4:45 PM, Joseph S. Myers jos...@codesourcery.com wrote: The patch was tested on x86_64-pc-linux-gnu, but I would like Joseph to check if I didn't mess something with options handling. I have no comments on the option handling in this patch. +for vectorized single float division and vectorized sqrtf(x) already with @code{sqrtf (@var{x})} Thanks - fixed, with a similar fix in the previous paragraph. I also found a PR that deals with vectorized reciprocal, so I referred to the PR in the ChangeLog entry: 2011-10-20 Uros Bizjak ubiz...@gmail.com PR target/47989 * config/i386/i386.h (RECIP_MASK_DEFAULT): New define. * config/i386/i386.op (recip_mask): Initialize with RECIP_MASK_DEFAULT. * doc/invoke.texi (ix86 Options, -mrecip): Document that GCC implements vectorized single float division and vectorized sqrtf(x) with reciprocal sequence with additional Newton-Raphson step with -ffast-math. Attached is the patch that was committed to mainline SVN. Encouraged by Michael's results, let's see what automated benchmark testers will show. Uros. Index: config/i386/i386.h === --- config/i386/i386.h (revision 180255) +++ config/i386/i386.h (working copy) @@ -2322,6 +2322,7 @@ #define RECIP_MASK_VEC_SQRT0x08 #define RECIP_MASK_ALL (RECIP_MASK_DIV | RECIP_MASK_SQRT \ | RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) +#define RECIP_MASK_DEFAULT (RECIP_MASK_VEC_DIV | RECIP_MASK_VEC_SQRT) #define TARGET_RECIP_DIV ((recip_mask RECIP_MASK_DIV) != 0) #define TARGET_RECIP_SQRT ((recip_mask RECIP_MASK_SQRT) != 0) Index: config/i386/i386.opt === --- config/i386/i386.opt(revision 180255) +++ config/i386/i386.opt(working copy) @@ -32,7 +32,7 @@ HOST_WIDE_INT ix86_isa_flags_explicit TargetVariable -int recip_mask +int recip_mask = RECIP_MASK_DEFAULT Variable int recip_mask_explicit Index: doc/invoke.texi === --- doc/invoke.texi (revision 180255) +++ doc/invoke.texi (working copy) @@ -12922,7 +12922,12 @@ of the non-reciprocal instruction, the precision of the sequence can be decreased by up to 2 ulp (i.e. the inverse of 1.0 equals 0.9994). -Note that GCC implements 1.0f/sqrtf(x) in terms of RSQRTSS (or RSQRTPS) +Note that GCC implements @code{1.0f/sqrtf(@var{x})} in terms of RSQRTSS +(or RSQRTPS) already with @option{-ffast-math} (or the above option +combination), and doesn't need @option{-mrecip}. + +Also note that GCC emits the above sequence with additional Newton-Raphson step +for vectorized single float division and vectorized @code{sqrtf(@var{x})} already with @option{-ffast-math} (or the above option combination), and doesn't need @option{-mrecip}.
Re: [cxx-mem-model] compare_exchange implementation II
On 10/19/2011 05:43 PM, Andrew MacLeod wrote: * optabs.h (direct_optab_index): Replace DOI_atomic_compare_exchange with DOI_atomic_compare_and_swap. (direct_op): Add DOI_atomic_compare_and_swap. * genopinit.c: Set atomic_compare_and_swap_optab. * expr.h (expand_atomic_compare_exchange): Add parameter. * builtins.c (builtin_atomic_compare_exchange): Add weak parameter and verify it is a compile time constant. * optabs.c (expand_atomic_compare_exchange): Use atomic_compare_and_swap if present, otherwise use __sync_val_compare_and_swap. * builtin-types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT): Add the bool parameter. * sync-builtins.def (BUILT_IN_ATOMIC_COMPARE_EXCHANGE_*): Use new prototype. * c-family/c-common.c (resolve_overloaded_builtin): Don't try to process a return value with an error mark. * libstdc++-v3/include/bits/atomic_2.h: Use __atomic_compare_exchange. * fortran/types.def (BT_FN_BOOL_VPTR_PTR_I{1,2,4,8,16}_BOOL_INT_INT): Add the bool parameter. * testsuite/gcc.dg/atomic-invalid.c: Add compare_exchange failures. * testsuite/gcc.dg/atomic-compare-exchange-{1-5}.c: New tests. Ok. r~
Re: [PATCH, testsuite]: Require non_strict_align effective target for gcc.dg/ipa/ipa-sra-[26].c
On Wed, Oct 19, 2011 at 9:50 PM, Uros Bizjak ubiz...@gmail.com wrote: These two tests require non_strict_aligned effective target, since IPA fails in tree_non_mode_aligned_mem_p () for cow and calf candidates for STRICT_ALIGNMENT targets. Mode alignment requires 32 bytes, while data is aligned to 8 bytes. 2011-10-19 Uros Bizjak ubiz...@gmail.com * gcc.dg/ipa/ipa-sra-2.c: Add dg-require-effective-target non_strict_align. * gcc.dg/ipa/ipa-sra-6.c: Ditto. Tested on x86_64-pc-linux-gnu and alphaev68-pc-linux-gnu, where the patch fixes: FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace expr cow_.*D.-red with \\*ISRA FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace expr cow_.*D.-green with ISRA FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace expr calf_.*D.-red with \\*ISRA FAIL: gcc.dg/ipa/ipa-sra-2.c scan-tree-dump eipa_sra About to replace expr calf_.*D.-green with ISRA FAIL: gcc.dg/ipa/ipa-sra-6.c scan-tree-dump-times eipa_sra foo 1 So, comitted to SVN mainline and 4.6 branch under obvious rule. Uros.
Re: trunk (rev 180248) not buildable --with-gc=zone: undefined ggc_alloced_size_for_request
Basile Starynkevitch bas...@starynkevitch.net a écrit: libbackend.a(ggc-zone.o): In function `ggc_internal_alloc_zone_stat': /usr/src/Lang/gcc-trunk-bstarynk/gcc/ggc-zone.c:1105: undefined reference to `ggc_alloced_size_for_request' This is my fault. I have tested and committed the below as per the obvious rule. Sorry for the inconvenience. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index eeed56d..83da507 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -10,6 +10,10 @@ 2011-10-20 Dodji Seketeli do...@redhat.com + * ggc-zone.c (ggc_internal_alloc_zone_stat): Rename + ggc_alloced_size_order_for_request into ggc_round_alloc_size like + it was done in ggc-page.c. + PR other/50659 * doc/cppopts.texi: Use @smallexample/@end smallexample in documentation for -fdebug-cpp instead of @quotation/@end quotation diff --git a/gcc/ggc-zone.c b/gcc/ggc-zone.c index 79c8c03..5257ada 100644 --- a/gcc/ggc-zone.c +++ b/gcc/ggc-zone.c @@ -1102,7 +1102,7 @@ ggc_internal_alloc_zone_stat (size_t orig_size, struct alloc_zone *zone struct small_page_entry *entry; struct alloc_chunk *chunk, **pp; void *result; - size_t size = ggc_alloced_size_for_request (orig_size); + size_t size = ggc_round_alloc_size (orig_size); /* Try to allocate the object from several different sources. Each of these cases is responsible for setting RESULT and SIZE to -- Dodji
[patch, testsuite] Fix vect-120.c failure on IA64
I am going to check this change in as obvious later today, the test includes a conversion from float to int in the loop and if that isn't supported by a target, then the loop is not vectorized. This test has been failing on IA64 and perhaps on ARM too, there was a reference to it in PR 50150. I didn't test the change on ARM but it fixes the failure on IA64. Steve Ellcey s...@cup.hp.com 2011-10-20 Steve Ellcey s...@cup.hp.com * gcc.dg/vect/vect-120.c: Add vect_floatint_cvt requirement. Index: gcc.dg/vect/vect-120.c === --- gcc.dg/vect/vect-120.c (revision 180233) +++ gcc.dg/vect/vect-120.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do compile } */ /* { dg-require-effective-target vect_float } */ /* { dg-require-effective-target vect_shift } */ +/* { dg-require-effective-target vect_floatint_cvt } */ static inline float i2f(int x)
Re: [PATCH] New port for TILEPro and TILE-Gx 2/7: changes in contrib
Here is a resubmission of the contrib patch, adding the entries to gcc_update to handle the multiply tables. * config-list.mk (tilegx-linux-gnu): Add. (tilepro-linux-gnu): Add. * gcc_update (gcc/config/tilegx/mul-tables.c): New dependencies. (gcc/config/tilepro/mul-tables.c): New dependencies. diff -r -u -p -N /home/packages/gcc-4.7.0-180241/contrib/config-list.mk ./contrib/config-list.mk --- /home/packages/gcc-4.7.0-180241/contrib/config-list.mk 2011-10-14 01:08:51.0 -0400 +++ ./contrib/config-list.mk2011-10-20 10:23:51.331484000 -0400 @@ -59,7 +59,8 @@ LIST = alpha-linux-gnu alpha-freebsd6 al sparc-leon3-linux-gnuOPT-enable-target=all sparc-netbsdelf \ sparc64-sun-solaris2.10OPT-with-gnu-ldOPT-with-gnu-asOPT-enable-threads=posix \ sparc-wrs-vxworks sparc64-elf sparc64-rtems sparc64-linux sparc64-freebsd6 \ - sparc64-netbsd sparc64-openbsd spu-elf v850e-elf v850-elf vax-linux-gnu \ + sparc64-netbsd sparc64-openbsd spu-elf tilegx-linux-gnu tilepro-linux-gnu \ + v850e-elf v850-elf vax-linux-gnu \ vax-netbsdelf vax-openbsd x86_64-apple-darwin \ x86_64-pc-linux-gnuOPT-with-fpmath=avx \ x86_64-elfOPT-with-fpmath=sse x86_64-freebsd6 x86_64-netbsd \ diff -r -u -p -N /home/packages/gcc-4.7.0-180241/contrib/gcc_update ./contrib/gcc_update --- /home/packages/gcc-4.7.0-180241/contrib/gcc_update 2011-10-14 01:08:51.0 -0400 +++ ./contrib/gcc_update2011-10-20 10:23:51.337478000 -0400 @@ -88,6 +88,8 @@ gcc/config/c6x/c6x-mult.md: gcc/config/c gcc/config/m68k/m68k-tables.opt: gcc/config/m68k/m68k-devices.def gcc/config/m68k/m68k-isas.def gcc/config/m68k/m68k-microarchs.def gcc/config/m68k/genopt.sh gcc/config/mips/mips-tables.opt: gcc/config/mips/mips-cpus.def gcc/config/mips/genopt.sh gcc/config/rs6000/rs6000-tables.opt: gcc/config/rs6000/rs6000-cpus.def gcc/config/rs6000/genopt.sh +gcc/config/tilegx/mul-tables.c: gcc/config/tilepro/gen-mul-tables.cc +gcc/config/tilepro/mul-tables.c: gcc/config/tilepro/gen-mul-tables.cc # And then, language-specific files gcc/cp/cfns.h: gcc/cp/cfns.gperf gcc/java/keyword.h: gcc/java/keyword.gperf
Re: [PATCH] New port for TILEPro and TILE-Gx: 5/7 libgcc port
Here is a resubmission of the libgcc patch, using soft-fp as the floating point library. I plan to do the benchmarking between the implementations as suggested, but I'd like to decouple that from the initial submission. * config.host: Handle tilegx and tilepro. * config/tilegx/sfp-machine.h: New file. * config/tilegx/sfp-machine32.h: New file. * config/tilegx/sfp-machine64.h: New file. * config/tilegx/t-softfp: New file. * config/tilegx/t-tilegx: New file. * config/tilepro/atomic.c: New file. * config/tilepro/sfp-machine.h: New file. * config/tilepro/softdivide.c: New file. * config/tilepro/softmpy.S: New file. * config/tilepro/t-tilepro: New file. libgcc.diff.gz Description: GNU Zip compressed data
[patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX
I am going to check this change in as obvious later today if there are no objections, the test gives warnings on HP-UX because it calls __builtin_return_address with arguments of 0 through 5 but the value 0 is the only valid argument to __builtin_return_address on HP-UX. Tested on IA64 and PA HP-UX. Steve Ellcey s...@cup.hp.com 2011-10-20 Steve Ellcey s...@cup.hp.com PR testsuite/50722 * gcc.dg/pr49994-3.c: Skip on HP-UX. Index: gcc.dg/pr49994-3.c === --- gcc.dg/pr49994-3.c (revision 180233) +++ gcc.dg/pr49994-3.c (working copy) @@ -2,6 +2,7 @@ /* { dg-options -O2 -fsched2-use-superblocks -g } */ /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target s390*-*-* } } */ /* { dg-require-effective-target scheduling } */ +/* { dg-skip-if { *-*-hpux* } { * } { } } */ void * foo (int offset)
Re: [patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX
Steve Ellcey s...@cup.hp.com writes: Index: gcc.dg/pr49994-3.c === --- gcc.dg/pr49994-3.c(revision 180233) +++ gcc.dg/pr49994-3.c(working copy) @@ -2,6 +2,7 @@ /* { dg-options -O2 -fsched2-use-superblocks -g } */ /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target s390*-*-* } } */ /* { dg-require-effective-target scheduling } */ +/* { dg-skip-if { *-*-hpux* } { * } { } } */ Would you please include either an explanation or a PR reference in the dg-skip-if? Having to search the archives for an explanation is tedious. Btw., you should be able to omit both the * and . Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH][PING] Vectorize conversions directly
On 10/20/2011 09:24 AM, Dmitry Plotnikov wrote: gcc/ * tree-cfg.c (verify_gimple_assign_unary): Allow vector conversions. * tree-vect-stmts.c (supportable_convert_operation): New function. (vectorizable_conversion): Call it. Change condition and behavior for NONE modifier case. * tree-vectorizer.h (supportable_convert_operation): New prototype. * tree.h (VECTOR_INTEGER_TYPE_P): New macro. gcc/config/arm/ * neon.md (floatv2siv2sf2): New. (floatunsv2siv2sf2): New. (fix_truncv2sfv2si2): New. (fix_truncunsv2sfv2si2): New. (floatv4siv4sf2): New. (floatunsv4siv4sf2): New. (fix_truncv4sfv4si2): New. (fix_truncunsv4sfv4si2): New. gcc/testsuite/ * gcc.target/arm/vect-vcvt.c: New test. * gcc.target/arm/vect-vcvtq.c: New test. gcc/testsuite/lib/ * target-supports.exp (check_effective_target_vect_intfloat_cvt): True for ARM NEON. (check_effective_target_vect_uintfloat_cvt): Likewise. (check_effective_target_vect_intfloat_cvt): Likewise. (check_effective_target_vect_floatuint_cvt): Likewise. (check_effective_target_vect_floatint_cvt): Likewise. (check_effective_target_vect_extract_even_odd): Likewise. Please move supportable_convert_operation to optabs.c; eventually we ought to use can_fix_p/can_float_p. + if (code == FIX_TRUNC_EXPR) +optab1 = (TYPE_UNSIGNED (vectype_out)) ? ufixtrunc_optab : sfixtrunc_optab; + else if (code == FLOAT_EXPR) +optab1 = (TYPE_UNSIGNED (vectype_in)) ? ufloat_optab : sfloat_optab; + + m1 = TYPE_MODE (vectype_in); Looks like a missing else gcc_unreachable() there, since there's no check for optab1 != NULL later. Otherwise the generic parts of the patch look good. Please get separate approval for the arm portions of the patch. After the generic parts of the patch goes in I will endevour to adjust the i386 and rs6000 backends to similarly populate the optabs, so that we can remove the builtin path here. r~
Re: new patches using -fopt-info (issue5294043)
On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote: x...@google.com (Rong Xu) writes: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. This doesn't look scalable if you consider that each pass would print as much of a mess like -fvectorizer-verbose=5. What is not scalable? For level 1 dump, only the summary of vectorization will be printed just like other loop transformations. I think =2 and =3 should be omitted - we do have dump-files for a reason. Dump files are not easy to use -- it is big, and slow especially for people with large distributed build systems. Having both level 2 and 3 is debatable, but it will be useful to have a least one level above level 1. Dump files are mainly for compiler developers, while -fopt-info are for compiler developers *and* power users who know performance tuning. Also the coverage/profile cases you changed do not at all match ... with large performance impact. In fact the impact is completely unknown (as it would be the case usually). Impact of any transformations is just 'potential', coverage problems are no different from that. I'd rather have a way to make dump-files more structured (so, following some standard reporting scheme) than introducing yet another way of output. [after making dump-files more consistent it will be easy to revisit patches like this, there would be a natural general central way to implement it] Yes, I remember we have discussed about this before -- currently dump files are a big mess -- debug tracing, IR are all mixed up, but as I said above, this is a different matter -- it is for compiler developers. For more structured optimization report, we should use option -fopt-report which dump optimization information based on category -- the info data base can also be shared across modules: Example: [Loop Interchange] File a, line x, yyy File b, line xx, yyy File c, line z, It is beneficial to interchange the loop, but not done because of possible carried dependency (caused by false aliasing ...) [Loop Vectorization] [Loop Unroll] ... [SRA] [Alias summary] [Global Vars] a: addr exposed b: add not exposed .. [Global Pointers] .. ... Thanks, David So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: new patches using -fopt-info (issue5294043)
Richard, Thanks for the comments. Let me give some background of the patch: The initial intention of the patch is to suppress the verbose warnings and notes emitted in profile-use compilation. This warnings/notes are caused by inconsistent profile due to data race (which is currently common in multi-thread programs), and some stale profiles (after adding a new functions). While valid, they can easily pollute the build log. In addition, some of these warnings cannot be disabled by any options except -Wno-error. We have many FDO users and these verbose messages one of the most complained issues. The first patch was adding another option to control the fdo related messages. Later we thought it's better to do this in more general way. And here comes this patch. I believe fopt-info is very useful for tracking regressions for certain important optimization (inline, loop opt etc). IR dump is just too big and add too much overhead to compilation (look how many files it creates). -Rong On Thu, Oct 20, 2011 at 1:21 AM, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:33 AM, Andi Kleen a...@firstfloor.org wrote: x...@google.com (Rong Xu) writes: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. This doesn't look scalable if you consider that each pass would print as much of a mess like -fvectorizer-verbose=5. I think =2 and =3 should be omitted - we do have dump-files for a reason. Also the coverage/profile cases you changed do not at all match ... with large performance impact. In fact the impact is completely unknown (as it would be the case usually). I'd rather have a way to make dump-files more structured (so, following some standard reporting scheme) than introducing yet another way of output. [after making dump-files more consistent it will be easy to revisit patches like this, there would be a natural general central way to implement it] So, please fix dump-files instead. And for coverage/profiling, fill in stuff in a dump-file! Richard. It would be interested to have some warnings about missing SRA opportunities in =1 or =2. I found that sometimes fixing those can give a large speedup. Right now a common case that prevents SRA on structure field is simply a memset or memcpy. -Andi -- a...@linux.intel.com -- Speaking for myself only
Breakage with Update testsuite to run with slim LTO
Date: Tue, 27 Sep 2011 19:23:22 +0200 From: Jan Hubicka hubi...@ucw.cz this patch updates testsuite to cover both fat and slim LTO when linker plugin is used and also both linker plugin and collect2 paths. I didn't wanted to slow down testing too much so I just distributes the flags across existing runs with aim to maximize the coverage of testing matrix that is bit large now. I believe it is sufficient and testsuite now runs a bit faster than previously since slim LTO saves some effort. sync and pr34850 tests doesn't pass with slim LTO. The reason is that they excpects diagnostics that is output too late in compilation (usually at expansion time). These should be probably fixed as QOI issue but they are not real bug - the diagnostics will be output at linktime. I will open PR tracking this. We probably should output pretty much everything till end of early opts except for stuff that really looks for optimization results. Especially now when we handle always inline in early inlining. Honza * lib/lto.exp: When linker plugin is available test both plugin/non-plugin LTO paths as well as fat and slim LTO. lib/c-torture.exp: Likewise. lib/gcc-dg.exp: Likweise Looks like this patch broke, for cris-elf with TOT binutils: Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ... FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects don't produce any code. Is that expected? If so, and if the required update is as for the test-cases you updated, to add: + /* { dg-options -ffat-lto-objects } */ then IIUC you need to patch *all* torture tests that use scan-assembler and scan-assembler-not. Alternatively, patch somewhere else, like not passing it if certain directives are used, like scan-assembler{,-not}. And either way, is it safe to add that option always, not just when also passing -flto or something? brgds, H-P
Re: Breakage with Update testsuite to run with slim LTO
Looks like this patch broke, for cris-elf with TOT binutils: Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ... FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects don't produce any code. Is that expected? Yes. -fno-fat-lto-objects does only produce code at the final link. -Andi
Re: new patches using -fopt-info (issue5294043)
While discussion for trunk version is still going, it is ok for google branches. thanks, David On Wed, Oct 19, 2011 at 4:28 PM, Rong Xu x...@google.com wrote: After some off-line discussion, we decided to use a more general approach to control the printing of optimization messages/warnings. We will introduce a new option -fopt-info: * fopt-info=0 or fno-opt-info: no message will be emitted. * fopt-info or fopt-info=1: emit important warnings and optimization messages with large performance impact. * fopt-info=2: warnings and optimization messages targeting power users. * fopt-info=3: informational messages for compiler developers. 2011-10-19 Rong Xu x...@google.com * gcc/common.opt (fopt-info): New flag. (fopt-info=) Ditto. * gcc/opts.c (common_handle_option): Handle OPT_fopt_info_. * gcc/flag-types.h (opt_info_verbosity_levels): New enum. * gcc/value-prof.c (check_ic_counter): guard warnings/notes by flag_opt_info. (find_func_by_funcdef_no): Ditto. (check_ic_target): Ditto. (check_counter): Ditto. (check_ic_counter): Ditto. * gcc/mcf.c (find_minimum_cost_flow): Ditto. * gcc/profile.c (read_profile_edge_counts): Ditto. (compute_branch_probabilities): Ditto. * gcc/coverage.c (read_counts_file): Ditto. (get_coverage_counts): Ditto. * gcc/tree-profile.c: (gimple_gen_reusedist): Ditto. (maybe_issue_profile_use_note): Ditto. (optimize_reusedist): Ditto. * gcc/testsuite/gcc.dg/pr32773.c: add -fopt-info. * gcc/testsuite/gcc.dg/pr40209.c: Ditto. * gcc/testsuite/gcc.dg/pr26570.c: Ditto. * gcc/testsuite/g++.dg/tree-ssa/dom-invalid.C: Ditto. Index: gcc/value-prof.c === --- gcc/value-prof.c (revision 180106) +++ gcc/value-prof.c (working copy) @@ -472,9 +472,10 @@ : DECL_SOURCE_LOCATION (current_function_decl); if (flag_profile_correction) { - inform (locus, correcting inconsistent value profile: - %s profiler overall count (%d) does not match BB count - (%d), name, (int)*all, (int)bb_count); + if (flag_opt_info = OPT_INFO_MAX) + inform (locus, correcting inconsistent value profile: %s + profiler overall count (%d) does not match BB count + (%d), name, (int)*all, (int)bb_count); *all = bb_count; if (*count *all) *count = *all; @@ -510,33 +511,42 @@ location_t locus; if (*count1 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler top target count (%ld) exceeds - BB count (%ld), (long)*count1, (long)all); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value profile: + ic (topn) profiler top target count (%ld) exceeds + BB count (%ld), (long)*count1, (long)all); + } *count1 = all; } if (*count2 all flag_profile_correction) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Correcting inconsistent value profile: - ic (topn) profiler second target count (%ld) exceeds - BB count (%ld), (long)*count2, (long)all); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Correcting inconsistent value profile: + ic (topn) profiler second target count (%ld) exceeds + BB count (%ld), (long)*count2, (long)all); + } *count2 = all; } if (*count2 *count1) { - locus = (stmt != NULL) - ? gimple_location (stmt) - : DECL_SOURCE_LOCATION (current_function_decl); - inform (locus, Corrupted topn ic value profile: - first target count (%ld) is less than the second - target count (%ld), (long)*count1, (long)*count2); + if (flag_opt_info = OPT_INFO_MAX) + { + locus = (stmt != NULL) + ? gimple_location (stmt) + : DECL_SOURCE_LOCATION (current_function_decl); + inform (locus, Corrupted topn ic value profile: + first target count (%ld) is less than the second +
Remove target.vectorize.builtin_vec_perm
Since the vectorizer has been changed to emit VEC_PERM_EXPR, I've now removed the hook and the implementations of that hook. For the x86 target I also removed the builtins themselves. For the rs6000 and spu targets, I've left that detail to the port maintainers; I don't know what interfaces are actually public. Tested on x86_64-linux. Committed. r~ gcc/ + * target.def (builtin_vec_perm): Remove. + * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove. + + * config/i386/i386.c (ix86_expand_vec_perm_builtin): Remove. + (IX86_BUILTIN_VEC_PERM_*): Remove. + (bdesc_args): Remove vec_perm builtins + (ix86_expand_builtin): Likewise. + (ix86_expand_vec_perm_const_1): Rename from + ix86_expand_vec_perm_builtin_1. + (extract_vec_perm_cst): Merge into... + (ix86_vectorize_vec_perm_const_ok): ... here. Rename from + ix86_vectorize_builtin_vec_perm_ok. + (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove. + + * config/rs6000/rs6000.c (rs6000_builtin_vec_perm): Remove. + (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove. + + * config/spu/spu.c (spu_builtin_vec_perm): Remove. + (TARGET_VECTORIZE_BUILTIN_VEC_PERM): Remove. gcc/testsuite/ + * gcc.target/i386/vperm-v2df.c, gcc.target/i386/vperm-v2di.c, + gcc.target/i386/vperm-v4sf-1.c, gcc.target/i386/vperm-v4sf-2.c, + gcc.target/i386/vperm-v4si-1.c, gcc.target/i386/vperm-v4si-2.c: + Use __builtin_shuffle. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 4af4e59..7750356 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2509,7 +2509,6 @@ static void ix86_compute_frame_layout (struct ix86_frame *); static bool ix86_expand_vector_init_one_nonzero (bool, enum machine_mode, rtx, rtx, int); static void ix86_add_new_builtins (HOST_WIDE_INT); -static rtx ix86_expand_vec_perm_builtin (tree); static tree ix86_canonical_va_list_type (tree); static void predict_jump (int); static unsigned int split_stack_prologue_scratch_regno (void); @@ -25058,19 +25057,6 @@ enum ix86_builtins IX86_BUILTIN_CVTUDQ2PS, - IX86_BUILTIN_VEC_PERM_V2DF, - IX86_BUILTIN_VEC_PERM_V4SF, - IX86_BUILTIN_VEC_PERM_V2DI, - IX86_BUILTIN_VEC_PERM_V4SI, - IX86_BUILTIN_VEC_PERM_V8HI, - IX86_BUILTIN_VEC_PERM_V16QI, - IX86_BUILTIN_VEC_PERM_V2DI_U, - IX86_BUILTIN_VEC_PERM_V4SI_U, - IX86_BUILTIN_VEC_PERM_V8HI_U, - IX86_BUILTIN_VEC_PERM_V16QI_U, - IX86_BUILTIN_VEC_PERM_V4DF, - IX86_BUILTIN_VEC_PERM_V8SF, - /* FMA4 instructions. */ IX86_BUILTIN_VFMADDSS, IX86_BUILTIN_VFMADDSD, @@ -25779,19 +25765,6 @@ static const struct builtin_description bdesc_args[] = /* SSE2 */ { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_shufpd, __builtin_ia32_shufpd, IX86_BUILTIN_SHUFPD, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_INT }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2df, IX86_BUILTIN_VEC_PERM_V2DF, UNKNOWN, (int) V2DF_FTYPE_V2DF_V2DF_V2DI }, - { OPTION_MASK_ISA_SSE, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4sf, IX86_BUILTIN_VEC_PERM_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V4SF_V4SI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2di, IX86_BUILTIN_VEC_PERM_V2DI, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4si, IX86_BUILTIN_VEC_PERM_V4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8hi, IX86_BUILTIN_VEC_PERM_V8HI, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI_V8HI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v16qi, IX86_BUILTIN_VEC_PERM_V16QI, UNKNOWN, (int) V16QI_FTYPE_V16QI_V16QI_V16QI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v2di_u, IX86_BUILTIN_VEC_PERM_V2DI_U, UNKNOWN, (int) V2UDI_FTYPE_V2UDI_V2UDI_V2UDI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4si_u, IX86_BUILTIN_VEC_PERM_V4SI_U, UNKNOWN, (int) V4USI_FTYPE_V4USI_V4USI_V4USI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8hi_u, IX86_BUILTIN_VEC_PERM_V8HI_U, UNKNOWN, (int) V8UHI_FTYPE_V8UHI_V8UHI_V8UHI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_nothing, __builtin_ia32_vec_perm_v16qi_u, IX86_BUILTIN_VEC_PERM_V16QI_U, UNKNOWN, (int) V16UQI_FTYPE_V16UQI_V16UQI_V16UQI }, - { OPTION_MASK_ISA_AVX, CODE_FOR_nothing, __builtin_ia32_vec_perm_v4df, IX86_BUILTIN_VEC_PERM_V4DF, UNKNOWN, (int) V4DF_FTYPE_V4DF_V4DF_V4DI }, - { OPTION_MASK_ISA_AVX, CODE_FOR_nothing, __builtin_ia32_vec_perm_v8sf, IX86_BUILTIN_VEC_PERM_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V8SF_V8SI }, - { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_movmskpd, __builtin_ia32_movmskpd, IX86_BUILTIN_MOVMSKPD, UNKNOWN, (int) INT_FTYPE_V2DF }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmovmskb, __builtin_ia32_pmovmskb128, IX86_BUILTIN_PMOVMSKB128, UNKNOWN, (int) INT_FTYPE_V16QI }, {
C++ PATCH for c++/41449 (EH cleanup of partially-aggregate-initialized objects)
The C++ standard says that if an exception is thrown during initialization of a class, any fully-constructed subobjects are destroyed. We already handled that properly for objects initialized via constructor, but we weren't handling it properly for aggregate initialization. This patch adds the necessary EH cleanups for during initialization; conveniently, just using push_eh_cleanup works here because we were already doing push/pop_stmt_list around the initialization as a whole. Tested x86_64-pc-linux-gnu, applying to trunk. commit 7d32708956095f3ddb6698fee9fa092f649d72d4 Author: Jason Merrill ja...@redhat.com Date: Thu Oct 20 15:00:49 2011 -0400 PR c++/41449 * typeck2.c (split_nonconstant_init_1): Handle EH cleanup of initialized subobjects. diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c index 3accab6..580f669 100644 --- a/gcc/cp/typeck2.c +++ b/gcc/cp/typeck2.c @@ -567,6 +567,13 @@ split_nonconstant_init_1 (tree dest, tree init) code = build2 (INIT_EXPR, inner_type, sub, value); code = build_stmt (input_location, EXPR_STMT, code); add_stmt (code); + if (!TYPE_HAS_TRIVIAL_DESTRUCTOR (inner_type)) + { + code = (build_special_member_call + (sub, complete_dtor_identifier, NULL, inner_type, + LOOKUP_NORMAL, tf_warning_or_error)); + finish_eh_cleanup (code); + } num_split_elts++; } diff --git a/gcc/testsuite/g++.dg/eh/partial1.C b/gcc/testsuite/g++.dg/eh/partial1.C new file mode 100644 index 000..db73177 --- /dev/null +++ b/gcc/testsuite/g++.dg/eh/partial1.C @@ -0,0 +1,37 @@ +// PR c++/41449 +// { dg-do run } + +struct A +{ + A() {} + A(const A) { throw 1; } +}; + +int bs; +struct B +{ + B() { ++bs; } + B(const B) { ++bs; } + ~B() { --bs; } +}; + +struct C +{ + B b1; + A a; + B b2; +}; + +int main() +{ + { +B b1, b2; +A a; + +try { + C c = { b1, a, b2 }; +} catch (...) {} + } + if (bs != 0) +__builtin_abort (); +}
Re: [PATCH PR50572] Tune loop alignment for Atom
On Thu, Oct 20, 2011 at 1:05 AM, Sergey Ostanevich sergos@gmail.com wrote: Please provide a patch which can be applied. Cut/paste doesn't create a working patch. Please attach it. -- H.J. Will that works? Sergos. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 6c73404..e21cf86 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2011-10-20 Sergey Ostanevich sergos@gmail.com + + * config/i386/i386.c (processor_target_table): Change Atom + align_loops_max_skip to 15. + 2011-10-17 Michael Spertus mike_sper...@symantec.com * gcc/c-family/c-common.c (c_common_reswords): Add __bases, diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2c53423..8c60086 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2596,7 +2596,7 @@ static const struct ptt processor_target_table[PROCESSOR_max] = {bdver1_cost, 32, 24, 32, 7, 32}, {bdver2_cost, 32, 24, 32, 7, 32}, {btver1_cost, 32, 24, 32, 7, 32}, - {atom_cost, 16, 7, 16, 7, 16} + {atom_cost, 16, 15, 16, 7, 16} }; static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = No, it doesn't work. I had to apply it by hand for you. 1. You have to use attachment for patches to be applied by other people when you are sending them in gmail. 2. Please don't use diff on ChangeLog unless you are the only person who changes it. Such patch rarely applies. 3. You should add PR target/50572 in ChangeLog entry. -- H.J.
Re: new patches using -fopt-info (issue5294043)
This warnings/notes are caused by inconsistent profile due to data race (which is currently common in multi-thread programs), I never quite understood why the gcov counters are not simply marked __thread. This would make the profiled programs faster too because they wouldn't bounce cache lines that much. Especially on larger systems (2S) frequent cache line bouncing can lead to extreme slow downs, and even on smaller systems it's very expensive. This would also eliminate data races, except for signals and somesuch. -Andi
Re: [PATCH] New port for TILEPro and TILE-Gx 3/7: gcc port
On Thu, 20 Oct 2011, Walter Lee wrote: +#undef MCOUNT_NAME +#define MCOUNT_NAME mcount For a new target it seems much better to define your ABI to use a name in the reserved namespace for this - that is, starting with two underscores. I've changed it to use _mcount with one underscore. That seems to be what glibc support by default, and it's consistent with x86, and we'd prefer to be consistent with x86 whenever possible. x86 also has a newer version __fentry__ with -mfentry. ARM has mcount and __gnu_mcount_nc. I don't think consistency with the old x86 _mcount is particularly desirable. +/* For __clear_cache in libgcc2.c. */ +#ifdef IN_LIBGCC2 + +#include arch/icache.h Where does this header come from? Linux kernel, glibc, somewhere else? In general you want to condition header includes on inhibit_libc to facilitate bootstrapping (including building a partial static libgcc) before the libc headers are installed, since configuring glibc to install its headers requires a working compiler to run configure tests. We plan to include this as part of the Linux kernel, as the kernel itself depends on it. So make headers_install for your architectures will install this header under than name? -- Joseph S. Myers jos...@codesourcery.com
Re: new patches using -fopt-info (issue5294043)
On Thu, Oct 20, 2011 at 12:53 PM, Andi Kleen a...@firstfloor.org wrote: This warnings/notes are caused by inconsistent profile due to data race (which is currently common in multi-thread programs), I never quite understood why the gcov counters are not simply marked __thread. This would make the profiled programs faster too because they wouldn't bounce cache lines that much. Especially on larger systems (2S) frequent cache line bouncing can lead to extreme slow downs, and even on smaller systems it's very expensive. This would also eliminate data races, except for signals and somesuch. It uses stack space and for programs with hundreds and thousands of threads, it can be a big problem. David -Andi
Use .exe suffix on LTO test executables
As I noted in http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00905.html, test executables with no .exe or .something suffix are problematic for testing on Windows targets. This patch fixes the LTO tests to use such suffixes, like other tests. Tested with cross to i686-mingw32. OK to commit? 2011-10-20 Joseph Myers jos...@codesourcery.com * lib/lto.exp (lto-execute): Use .exe suffix for test executable names. Index: gcc/testsuite/lib/lto.exp === --- gcc/testsuite/lib/lto.exp (revision 180200) +++ gcc/testsuite/lib/lto.exp (working copy) @@ -500,7 +500,7 @@ verbose Testing $testcase, $option # There's a unique name for each executable we generate. - set execname ${execbase}-${count}1 + set execname ${execbase}-${count}1.exe incr count file_on_host delete $execname -- Joseph S. Myers jos...@codesourcery.com
Re: Use .exe suffix on LTO test executables
On Thu, Oct 20, 2011 at 16:23, Joseph S. Myers jos...@codesourcery.com wrote: 2011-10-20 Joseph Myers jos...@codesourcery.com * lib/lto.exp (lto-execute): Use .exe suffix for test executable names. OK. Diego.
Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
On Thu, Oct 20, 2011 at 1:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: OK. Thanks, Uros. Great, could anybody please commit that? I checked it in for you. -- H.J.
[cxx-mem-model] expand_atomic_load: Handle an empty target
Found this while testing the branch on ia64. The call to expand_val_compare_and_swap() above returns NULL_RTX when it can't find a suitable instruction. OK for branch? * optabs.c (expand_atomic_load): Handle a NULL target. Index: optabs.c === --- optabs.c(revision 180273) +++ optabs.c(working copy) @@ -7140,7 +7140,7 @@ expand_atomic_load (rtx target, rtx mem, return target; } - if (target == const0_rtx) + if (!target || target == const0_rtx) target = gen_reg_rtx (mode); /* Emit the appropriate barrier before the load. */
Re: [cxx-mem-model] expand_atomic_load: Handle an empty target
On 10/20/2011 04:54 PM, Aldy Hernandez wrote: Found this while testing the branch on ia64. The call to expand_val_compare_and_swap() above returns NULL_RTX when it can't find a suitable instruction. OK for branch? yes. btw, did you audit the other new expand routines to see if they handled a NULL return target as well? Andrew
Fix gcc.dg/lto/pr46940_0.c for assembler name prefixes
gcc.dg/lto/pr46940_0.c needs fixing for targets using a prefix on assembler names, similar to the fixes recently made by Joern to some other testcases. This patch fixes it in the same way as Joern fixed gcc.dg/lto/20081222_1.c. Testes with cross to i686-mingw32. OK to commit? 2011-10-20 Joseph Myers jos...@codesourcery.com * gcc.dg/lto/pr46940_0.c (ASMNAME, ASMNAME2, STRING): Define. (_moz_foo, EXT__foo): Use ASMNAME. Index: gcc/testsuite/gcc.dg/lto/pr46940_0.c === --- gcc/testsuite/gcc.dg/lto/pr46940_0.c(revision 180200) +++ gcc/testsuite/gcc.dg/lto/pr46940_0.c(working copy) @@ -2,10 +2,14 @@ /* { dg-extra-ld-options -fuse-linker-plugin } */ #include stdio.h +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) STRING (prefix) cname +#define STRING(x)#x + extern __attribute__((visibility(hidden))) void _moz_foo (void); -extern __typeof (_moz_foo) _moz_foo __asm__ ( INT__foo) __attribute__((__visibility__(hidden))) ; +extern __typeof (_moz_foo) _moz_foo __asm__ (ASMNAME (INT__foo)) __attribute__((__visibility__(hidden))) ; void _moz_foo(void) { printf (blah\n); } -extern __typeof (_moz_foo) EXT__foo __asm__( _moz_foo) __attribute__((__alias__( INT__foo))); +extern __typeof (_moz_foo) EXT__foo __asm__(ASMNAME (_moz_foo)) __attribute__((__alias__( INT__foo))); -- Joseph S. Myers jos...@codesourcery.com
[cxx-mem-model] Handle x86-64 with -m32
These operations don't exist on x86-32 bits, and when running multilibed tests, the target is still x86_64-unknown-linux-gnu but the target is 32-bits when using -m32. The following change checks that we are actually running in 64-bits before assuming sync_int_128 or sync_long_long exist on the target. OK for branch? * lib/target-supports.exp (check_effective_target_sync_int_128): Only set when running in 64-bit mode. (check_effective_target_sync_long_long): Same. Index: lib/target-supports.exp === --- lib/target-supports.exp (revision 180156) +++ lib/target-supports.exp (working copy) @@ -3456,7 +3456,7 @@ proc check_effective_target_sync_int_128 verbose check_effective_target_sync_int_128: using cached result 2 } else { set et_sync_int_128_saved 0 -if { [istarget x86_64-*-*] } { +if { [istarget x86_64-*-*] [is-effective-target lp64] } { set et_sync_int_128_saved 1 } } @@ -3474,7 +3474,7 @@ proc check_effective_target_sync_long_lo verbose check_effective_target_sync_long_long: using cached result 2 } else { set et_sync_long_long_saved 0 -if { [istarget x86_64-*-*] } { +if { [istarget x86_64-*-*] [is-effective-target lp64] } { set et_sync_long_long_saved 1 } }
Rename builtin_vec_perm_ok to vec_perm_const_ok
... since it no longer applies to a builtin. Tested on x86_64-linux. r~ * target.def (vec_perm_const_ok): Rename from builtin_vec_perm_ok. * optabs.c (can_vec_perm_expr_p): Update to match. (expand_vec_perm_expr): Likewise. * config/i386/i386.c (TARGET_VECTORIZE_VEC_PERM_CONST_OK): Rename from TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK. * doc/tm.texi.in: Likewise. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7750356..b7718e9 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -36446,7 +36446,7 @@ ix86_expand_vec_perm_const (rtx operands[4]) return false; } -/* Implement targetm.vectorize.builtin_vec_perm_ok. */ +/* Implement targetm.vectorize.vec_perm_const_ok. */ static bool ix86_vectorize_vec_perm_const_ok (tree vec_type, tree mask) @@ -37879,8 +37879,8 @@ ix86_autovectorize_vector_sizes (void) #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \ ix86_builtin_vectorization_cost -#undef TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK -#define TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK \ +#undef TARGET_VECTORIZE_VEC_PERM_CONST_OK +#define TARGET_VECTORIZE_VEC_PERM_CONST_OK \ ix86_vectorize_vec_perm_const_ok #undef TARGET_VECTORIZE_PREFERRED_SIMD_MODE #define TARGET_VECTORIZE_PREFERRED_SIMD_MODE \ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index c52753a..a43ce3d 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5711,7 +5711,7 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn -@deftypefn {Target Hook} bool TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK (tree @var{vec_type}, tree @var{mask}) +@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST_OK (tree @var{vec_type}, tree @var{mask}) Return true if a vector created for @code{vec_perm_const} is valid. @end deftypefn diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 22e82ee..cede91e 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -5649,7 +5649,7 @@ misalignment value (@var{misalign}). Return true if vector alignment is reachable (by peeling N iterations) for the given type. @end deftypefn -@hook TARGET_VECTORIZE_BUILTIN_VEC_PERM_OK +@hook TARGET_VECTORIZE_VEC_PERM_CONST_OK Return true if a vector created for @code{vec_perm_const} is valid. @end deftypefn diff --git a/gcc/optabs.c b/gcc/optabs.c index 558c0fa..5036856 100644 --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -6714,7 +6714,7 @@ can_vec_perm_expr_p (tree type, tree sel) if (sel == NULL || TREE_CODE (sel) == VECTOR_CST) { if (direct_optab_handler (vec_perm_const_optab, mode) != CODE_FOR_nothing - (sel == NULL || targetm.vectorize.builtin_vec_perm_ok (type, sel))) + (sel == NULL || targetm.vectorize.vec_perm_const_ok (type, sel))) return true; } @@ -6808,7 +6808,7 @@ expand_vec_perm_expr (tree type, tree v0, tree v1, tree sel, rtx target) { icode = direct_optab_handler (vec_perm_const_optab, mode); if (icode != CODE_FOR_nothing - targetm.vectorize.builtin_vec_perm_ok (TREE_TYPE (v0), sel) + targetm.vectorize.vec_perm_const_ok (TREE_TYPE (v0), sel) (tmp = expand_vec_perm_expr_1 (icode, target, v0_rtx, v1_rtx, sel_rtx)) != NULL) return tmp; diff --git a/gcc/target.def b/gcc/target.def index c9d6067..60fad2a 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -985,9 +985,9 @@ DEFHOOK bool, (const_tree type, bool is_packed), default_builtin_vector_alignment_reachable) -/* Return true if a vector created for builtin_vec_perm is valid. */ +/* Return true if a vector created for vec_perm_const is valid. */ DEFHOOK -(builtin_vec_perm_ok, +(vec_perm_const_ok, , bool, (tree vec_type, tree mask), hook_bool_tree_tree_true)
Re: [cxx-mem-model] expand_atomic_load: Handle an empty target
On 10/20/11 15:56, Andrew MacLeod wrote: On 10/20/2011 04:54 PM, Aldy Hernandez wrote: Found this while testing the branch on ia64. The call to expand_val_compare_and_swap() above returns NULL_RTX when it can't find a suitable instruction. OK for branch? yes. btw, did you audit the other new expand routines to see if they handled a NULL return target as well? Andrew They seem ok, but I am re-running tests on ia64 to see if I find other similar failures. If I do, I will submit fixes. Aldy
Re: Fix gcc.dg/lto/pr46940_0.c for assembler name prefixes
On Thu, Oct 20, 2011 at 17:01, Joseph S. Myers jos...@codesourcery.com wrote: gcc.dg/lto/pr46940_0.c needs fixing for targets using a prefix on assembler names, similar to the fixes recently made by Joern to some other testcases. This patch fixes it in the same way as Joern fixed gcc.dg/lto/20081222_1.c. Testes with cross to i686-mingw32. OK to commit? 2011-10-20 Joseph Myers jos...@codesourcery.com * gcc.dg/lto/pr46940_0.c (ASMNAME, ASMNAME2, STRING): Define. (_moz_foo, EXT__foo): Use ASMNAME. OK. Diego.
Re: [patch, testsuite] Patch for gcc.dg/pr49994-3.c on HP-UX
On Thu, 2011-10-20 at 18:23 +0200, Rainer Orth wrote: Steve Ellcey s...@cup.hp.com writes: Index: gcc.dg/pr49994-3.c === --- gcc.dg/pr49994-3.c (revision 180233) +++ gcc.dg/pr49994-3.c (working copy) @@ -2,6 +2,7 @@ /* { dg-options -O2 -fsched2-use-superblocks -g } */ /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target s390*-*-* } } */ /* { dg-require-effective-target scheduling } */ +/* { dg-skip-if { *-*-hpux* } { * } { } } */ Would you please include either an explanation or a PR reference in the dg-skip-if? Having to search the archives for an explanation is tedious. Btw., you should be able to omit both the * and . Thanks. Rainer I put PR testsuite/50722 in the comment section and removed the * and after verifying that it works and then checked in the change. Steve Ellcey s...@cup.hp.com
[RFC PATCH] SLP vectorize calls
Hi! While looking at *.vect dumps from Polyhedron, I've noticed the lack of SLP vectorization of builtin calls. This patch is an attempt to handle at least 1 and 2 operand builtin calls (SLP doesn't handle ternary stmts either yet), where all the types are the same. E.g. it can handle extern float copysignf (float, float); extern float sqrtf (float); float a[8], b[8], c[8], d[8]; void foo (void) { a[0] = copysignf (b[0], c[0]) + 1.0f + sqrtf (d[0]); a[1] = copysignf (b[1], c[1]) + 2.0f + sqrtf (d[1]); a[2] = copysignf (b[2], c[2]) + 3.0f + sqrtf (d[2]); a[3] = copysignf (b[3], c[3]) + 4.0f + sqrtf (d[3]); a[4] = copysignf (b[4], c[4]) + 5.0f + sqrtf (d[4]); a[5] = copysignf (b[5], c[5]) + 6.0f + sqrtf (d[5]); a[6] = copysignf (b[6], c[6]) + 7.0f + sqrtf (d[6]); a[7] = copysignf (b[7], c[7]) + 8.0f + sqrtf (d[7]); } and compile it into: vmovaps .LC0(%rip), %ymm0 vandnps b(%rip), %ymm0, %ymm1 vandps c(%rip), %ymm0, %ymm0 vorps %ymm0, %ymm1, %ymm0 vsqrtps d(%rip), %ymm1 vaddps %ymm1, %ymm0, %ymm0 vaddps .LC1(%rip), %ymm0, %ymm0 vmovaps %ymm0, a(%rip) I've bootstrapped/regtested it on x86_64-linux and i686-linux, but am not 100% sure about all the changes, e.g. that || PURE_SLP_STMT (stmt_info) part. 2011-10-20 Jakub Jelinek ja...@redhat.com * tree-vect-stmts.c (vectorizable_call): Add SLP_NODE argument. Handle vectorization of SLP calls. (vect_analyze_stmt): Adjust caller, add call to it for SLP too. (vect_transform_stmt): Adjust vectorizable_call caller, remove assertion. * tree-vect-slp.c (vect_get_and_check_slp_defs): Handle one and two argument calls too. (vect_build_slp_tree): Allow CALL_EXPR. (vect_get_slp_defs): Handle calls. --- gcc/tree-vect-stmts.c.jj2011-10-20 14:13:34.0 +0200 +++ gcc/tree-vect-stmts.c 2011-10-20 18:02:43.0 +0200 @@ -1483,7 +1483,8 @@ vectorizable_function (gimple call, tree Return FALSE if not a vectorizable STMT, TRUE otherwise. */ static bool -vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt) +vectorizable_call (gimple stmt, gimple_stmt_iterator *gsi, gimple *vec_stmt, + slp_tree slp_node) { tree vec_dest; tree scalar_dest; @@ -1494,6 +1495,7 @@ vectorizable_call (gimple stmt, gimple_s int nunits_in; int nunits_out; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info); tree fndecl, new_temp, def, rhs_type; gimple def_stmt; enum vect_def_type dt[3] @@ -1505,19 +1507,12 @@ vectorizable_call (gimple stmt, gimple_s size_t i, nargs; tree lhs; - /* FORNOW: unsupported in basic block SLP. */ - gcc_assert (loop_vinfo); - - if (!STMT_VINFO_RELEVANT_P (stmt_info)) + if (!STMT_VINFO_RELEVANT_P (stmt_info) !bb_vinfo) return false; if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_internal_def) return false; - /* FORNOW: SLP not supported. */ - if (STMT_SLP_TYPE (stmt_info)) -return false; - /* Is STMT a vectorizable call? */ if (!is_gimple_call (stmt)) return false; @@ -1558,7 +1553,7 @@ vectorizable_call (gimple stmt, gimple_s if (!rhs_type) rhs_type = TREE_TYPE (op); - if (!vect_is_simple_use_1 (op, loop_vinfo, NULL, + if (!vect_is_simple_use_1 (op, loop_vinfo, bb_vinfo, def_stmt, def, dt[i], opvectype)) { if (vect_print_dump_info (REPORT_DETAILS)) @@ -1620,7 +1615,13 @@ vectorizable_call (gimple stmt, gimple_s gcc_assert (!gimple_vuse (stmt)); - if (modifier == NARROW) + if (slp_node || PURE_SLP_STMT (stmt_info)) +{ + if (modifier != NONE) + return false; + ncopies = 1; +} + else if (modifier == NARROW) ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out; else ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in; @@ -1659,6 +1660,43 @@ vectorizable_call (gimple stmt, gimple_s else VEC_truncate (tree, vargs, 0); + if (slp_node) + { + VEC(tree,heap) *vec_oprnds0 = NULL, *vec_oprnds1 = NULL; + + gcc_assert (j == 0); + if (nargs == 1) + vect_get_vec_defs (gimple_call_arg (stmt, 0), NULL_TREE, stmt, + vec_oprnds0, vec_oprnds1, slp_node); + else if (nargs == 2) + vect_get_vec_defs (gimple_call_arg (stmt, 0), + gimple_call_arg (stmt, 1), stmt, + vec_oprnds0, vec_oprnds1, slp_node); + else + gcc_unreachable (); + + /* Arguments are ready. Create the new vector stmt. */ + FOR_EACH_VEC_ELT (tree, vec_oprnds0, i, vec_oprnd0) + { + vec_oprnd1 = nargs == 2 ? VEC_index (tree, vec_oprnds1, i) +
Re: Breakage with Update testsuite to run with slim LTO
Date: Tue, 27 Sep 2011 19:23:22 +0200 From: Jan Hubicka hubi...@ucw.cz this patch updates testsuite to cover both fat and slim LTO when linker plugin is used and also both linker plugin and collect2 paths. I didn't wanted to slow down testing too much so I just distributes the flags across existing runs with aim to maximize the coverage of testing matrix that is bit large now. I believe it is sufficient and testsuite now runs a bit faster than previously since slim LTO saves some effort. sync and pr34850 tests doesn't pass with slim LTO. The reason is that they excpects diagnostics that is output too late in compilation (usually at expansion time). These should be probably fixed as QOI issue but they are not real bug - the diagnostics will be output at linktime. I will open PR tracking this. We probably should output pretty much everything till end of early opts except for stuff that really looks for optimization results. Especially now when we handle always inline in early inlining. Honza * lib/lto.exp: When linker plugin is available test both plugin/non-plugin LTO paths as well as fat and slim LTO. lib/c-torture.exp: Likewise. lib/gcc-dg.exp: Likweise Looks like this patch broke, for cris-elf with TOT binutils: Running /tmp/hpautotest-gcc1/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ... FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler in2-asm: .mof FAIL: gcc.dg/torture/cris-asm-mof-1.c scan-assembler out2-asm: .mof which for -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects don't produce any code. Is that expected? If so, and if the required update is as for the test-cases you updated, to add: + /* { dg-options -ffat-lto-objects } */ Yes, if we scan assembler, we likely want -fno-fat-lto-objects. then IIUC you need to patch *all* torture tests that use scan-assembler and scan-assembler-not. Alternatively, patch somewhere else, like not passing it if certain directives are used, like scan-assembler{,-not}. And either way, is it safe to add that option always, not just when also passing -flto or something? Hmm, some of assembler scans still works because they check for presence of symbols we output anyway, but indeed, it would make more sense to automatically imply -ffat-lto-object when scan-assembler is used. I am not sure if my dejagnu skill as on par here however. Honza brgds, H-P
Re: [cxx-mem-model] Handle x86-64 with -m32
On Thu, 20 Oct 2011, Aldy Hernandez wrote: These operations don't exist on x86-32 bits, and when running multilibed tests, the target is still x86_64-unknown-linux-gnu but the target is 32-bits when using -m32. Any test that only handles one of x86_64-* and i?86-* is automatically wrong; you can use -m64 with i?86-* targets. You always need to handle both together. Do these operations exist for x32 as well as for -m64? If they do, then lp64 isn't the right test either; if not, then it is. -- Joseph S. Myers jos...@codesourcery.com
Re: [cxx-mem-model] Handle x86-64 with -m32
On Thu, Oct 20, 2011 at 3:38 PM, Joseph S. Myers jos...@codesourcery.com wrote: On Thu, 20 Oct 2011, Aldy Hernandez wrote: These operations don't exist on x86-32 bits, and when running multilibed tests, the target is still x86_64-unknown-linux-gnu but the target is 32-bits when using -m32. Any test that only handles one of x86_64-* and i?86-* is automatically wrong; you can use -m64 with i?86-* targets. You always need to handle both together. Do these operations exist for x32 as well as for -m64? If they do, then lp64 isn't the right test either; if not, then it is. X32 has native int64 and int128. -- H.J.
[v3] libstdc++/50196 - enable std::thread, std::mutex etc. on darwin
This patch should enable macosx support for thread and partial support for mutex, by defining _GLIBCXX_HAS_GTHREADS on POSIX systems without the _POSIX_TIMEOUTS option, and only disabling the types which rely on the Timeouts option, std::timed_mutex and std::recursive_timed_mutex, instead of disabling all thread support. Paolo, Jakub, I'd appreciate it if you two could check this over, as you were responsible for some of this autoconf stuff, via http://gcc.gnu.org/PR49745 I've only tested this on x86-64-linux where everything is supported anyway, but I did tweak the configure test for _POSIX_TIMEOUTS to fail, so I could check the tests were correctly disabled when _GTHREADS_HAS_MUTEX_TIMEDLOCK is zero. If anyone can test this on darwin I'd be very grateful (you'll need to run autoreconf in the libstdc++-v3 directory to regenerate configure and config.h.in, or contact me and I'll mail you the regenerated files) ChangeLog: * acinclude.m4 (GTHREADS_HAS_MUTEX_TIMEDLOCK): Don't depend on _POSIX_TIMEOUTS. * configure: Regenerate. * config.h.in: Regenerate. * include/std/mutex (timed_mutex, recursive_timed_mutex): Define conditionally on GTHREADS_HAS_MUTEX_TIMEDLOCK. * testsuite/lib/libstdc++.exp (check_v3_target_gthreads_timed): Define. * testsuite/lib/dg-options.exp (dg-require-gthreads-timed): Define. * testsuite/30_threads/recursive_timed_mutex/dest/destructor_locked.cc: Use dg-require-gthreads-timed instead of dg-require-gthreads. * testsuite/30_threads/recursive_timed_mutex/native_handle/ typesizes.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/native_handle/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock_until/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock_until/2.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/cons/assign_neg.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/cons/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/cons/copy_neg.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/requirements/typedefs.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock/2.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/lock/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/lock/2.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/unlock/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock_for/1.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock_for/2.cc: Likewise. * testsuite/30_threads/recursive_timed_mutex/try_lock_for/3.cc: Likewise. * testsuite/30_threads/timed_mutex/dest/destructor_locked.cc: Likewise. * testsuite/30_threads/timed_mutex/native_handle/typesizes.cc: Likewise. * testsuite/30_threads/timed_mutex/native_handle/1.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock_until/1.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock_until/2.cc: Likewise. * testsuite/30_threads/timed_mutex/cons/assign_neg.cc: Likewise. * testsuite/30_threads/timed_mutex/cons/1.cc: Likewise. * testsuite/30_threads/timed_mutex/cons/copy_neg.cc: Likewise. * testsuite/30_threads/timed_mutex/requirements/standard_layout.cc: Likewise. * testsuite/30_threads/timed_mutex/requirements/typedefs.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock/1.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock/2.cc: Likewise. * testsuite/30_threads/timed_mutex/lock/1.cc: Likewise. * testsuite/30_threads/timed_mutex/unlock/1.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock_for/1.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock_for/2.cc: Likewise. * testsuite/30_threads/timed_mutex/try_lock_for/3.cc: Likewise. Index: acinclude.m4 === --- acinclude.m4 (revision 180278) +++ acinclude.m4 (working copy) @@ -3358,7 +3358,7 @@ ac_save_CXXFLAGS=$CXXFLAGS CXXFLAGS=$CXXFLAGS -fno-exceptions -I${toplevel_srcdir}/gcc - AC_MSG_CHECKING([check whether it can be safely assumed that mutex_timedlock is available]) + AC_MSG_CHECKING([whether it can be safely assumed that mutex_timedlock is available]) AC_TRY_COMPILE([#include unistd.h], [ @@ -3382,20 +3382,11 @@ AC_MSG_CHECKING([for gthreads library]) - AC_TRY_COMPILE([ - #include gthr.h - #include unistd.h - ], + AC_TRY_COMPILE([#include gthr.h], [ #ifndef __GTHREADS_CXX0X #error #endif - - // In case of POSIX threads check _POSIX_TIMEOUTS
[PATCH] Add gcc-ar/nm/ranlib wrappers for slim LTO v2
From: Andi Kleen a...@linux.intel.com Slim LTO requires running ar/nm/ranlib with the LTO plugin. The most convenient way to get this into existing Makefiles is using small wrappers that pass the plugin. This matches how other compilers (LLVM, icc) do this too. My previous attempt at using shell scripts for this http://gcc.gnu.org/ml/gcc-patches/2010-10/msg02471.html was not approved. Here's another attempt using wrappers written in C. This adds wrappers add a --plugin argument before calling the respective binutils utilities. The logic gcc.c uses to find the files is very complicated. I didn't try to replicate it 100% and left out some magic. I would be interested if this simple method works for everyone or if more code needs to be added. This only needs to support LTO supporting hosts of course. I didn't add any documentation because the syntax is exactly the same as the native ar/ranlib/nm. v2: Address review comments. Makefile follows go now, use own binaries for each sub program. Passed bootstrap and test suite on x86_64-linux. gcc/: 2011-10-19 Andi Kleen a...@linux.intel.com * Makefile.in (MOSTLYCLEANFILES): Add gcc-ar/nm/ranlib. (native): Add gcc-ar, gcc-nm, gcc-ranlib. (AR_LIBS, gcc-ar, gcc-ar.o, gcc-ranlib, gcc-ranlib.o, gcc-nm, gcc-nm.o, gcc-ranlib.c, gcc-nm.c): Add. (install): Depend on install-gcc-ar. (install-gcc-ar): Add. (uninstall): Uninstall gcc-ar, gcc-nm, gcc-ranlib. * gcc-ar.c: Add new file. --- gcc/Makefile.in | 71 +++-- gcc/gcc-ar.c| 96 +++ 2 files changed, 164 insertions(+), 3 deletions(-) create mode 100644 gcc/gcc-ar.c diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6b28ef5..1b9987a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1545,7 +1545,8 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \ genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \ xgcc$(exeext) cpp$(exeext) cc1$(exeext) $(EXTRA_PASSES) \ $(EXTRA_PARTS) $(EXTRA_PROGRAMS) gcc-cross$(exeext) \ - $(SPECS) collect2$(exeext) lto-wrapper$(exeext) \ + $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \ + gcc-ranlib$(exeext) \ gcov-iov$(build_exeext) gcov$(exeext) gcov-dump$(exeext) \ gengtype$(exeext) *.[0-9][0-9].* *.[si] *-checksum.c libbackend.a \ libcommon-target.a libcommon.a libgcc.mk @@ -1791,7 +1792,8 @@ rest.encap: lang.rest.encap # This is what is made with the host's compiler # whether making a cross compiler or not. native: config.status auto-host.h build-@POSUB@ $(LANGUAGES) \ - $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) + $(EXTRA_PASSES) $(EXTRA_PROGRAMS) $(COLLECT2) lto-wrapper$(exeext) \ + gcc-ar$(exeext) gcc-nm$(exeext) gcc-ranlib$(exeext) ifeq ($(enable_plugin),yes) native: gengtype$(exeext) @@ -2049,6 +2051,46 @@ sbitmap.o: sbitmap.c sbitmap.h $(CONFIG_H) $(SYSTEM_H) coretypes.h $(BASIC_BLOCK ebitmap.o: ebitmap.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(EBITMAP_H) sparseset.o: sparseset.c $(SYSTEM_H) sparseset.h $(CONFIG_H) +AR_LIBS = @COLLECT2_LIBS@ + +gcc-ar$(exeext): gcc-ar.o $(LIBDEPS) + +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ar.o -o $@ \ + $(LIBS) $(AR_LIBS) + +gcc-nm$(exeext): gcc-nm.o $(LIBDEPS) + +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-nm.o -o $@ \ + $(LIBS) $(AR_LIBS) + +gcc-ranlib$(exeext): gcc-ranlib.o $(LIBDEPS) + +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) gcc-ranlib.o -o $@ \ + $(LIBS) $(AR_LIBS) + +CFLAGS-gcc-ar.o += $(DRIVER_DEFINES) \ + -DTARGET_MACHINE=\$(target_noncanonical)\ \ + @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\ar\ + +gcc-ar.o: gcc-ar.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H) + +CFLAGS-gcc-ranlib.o += $(DRIVER_DEFINES) \ + -DTARGET_MACHINE=\$(target_noncanonical)\ \ + @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\ranlib\ + +gcc-ranlib.o: gcc-ranlib.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H) + +CFLAGS-gcc-nm.o += $(DRIVER_DEFINES) \ + -DTARGET_MACHINE=\$(target_noncanonical)\ \ + @TARGET_SYSTEM_ROOT_DEFINE@ -DPERSONALITY=\nm\ + +gcc-nm.o: gcc-nm.c $(CONFIG_H) $(SYSTEM_H) $(LIBIBERTY_H) + +# ??? the implicit rules dont trigger if the source file has a different name +# so copy instead +gcc-ranlib.c: gcc-ar.c + cp $^ $@ + +gcc-nm.c: gcc-ar.c + cp $^ $@ + COLLECT2_OBJS = collect2.o collect2-aix.o tlink.o COLLECT2_LIBS = @COLLECT2_LIBS@ collect2$(exeext): $(COLLECT2_OBJS) $(LIBDEPS) @@ -4576,7 +4618,7 @@ maintainer-clean: # broken is small. install: install-common $(INSTALL_HEADERS) \ install-cpp install-man install-info install-@POSUB@ \ -install-driver install-lto-wrapper +install-driver install-lto-wrapper install-gcc-ar ifeq ($(enable_plugin),yes) install: install-plugin @@ -4901,6 +4943,23 @@ install-collect2: collect2 installdirs install-lto-wrapper:
[C++ Patch] PR 50811 (rejects class-virt-specifier if class-head-name includes nested-name-specifier)
Tested on X86-32 linux. 2011-10-21 Ville Voutilainen ville.voutilai...@gmail.com PR c++/50811 * parser.c (cp_parser_class_head): Parse virt-specifiers regardless of whether an id is present diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index ea0c4dc..dd2357b 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -17576,8 +17576,8 @@ cp_parser_class_head (cp_parser* parser, { cp_parser_check_for_invalid_template_id (parser, id, type_start_token-location); - virt_specifiers = cp_parser_virt_specifier_seq_opt (parser); } + virt_specifiers = cp_parser_virt_specifier_seq_opt (parser); /* If it's not a `:' or a `{' then we can't really be looking at a class-head, since a class-head only appears as part of a diff --git a/gcc/testsuite/g++.dg/cpp0x/override2.C b/gcc/testsuite/g++.dg/cpp0x/override2.C index 7f17504..0d8871d 100644 --- a/gcc/testsuite/g++.dg/cpp0x/override2.C +++ b/gcc/testsuite/g++.dg/cpp0x/override2.C @@ -28,6 +28,13 @@ struct B6 final final {}; // { dg-error duplicate virt-specifier } struct B7 override {}; // { dg-error cannot specify 'override' for a class } +namespace N +{ + struct C; +} + +struct N::C final{}; + int main() { D3B1 d;
[commit, spu] Fix vec_perm pattern (Re: [rs6000, spu] Add vec_perm named pattern)
Richard Henderson wrote: The generic support for vector permutation will allow for automatic lowering to V*QImode, so all we need to add to support for these targets is the single V16QI pattern that represents the base permutation insn. I'm not touching any of the other ways that the permutation insn could be generated. After the generic support is added, I'll leave it to the port maintainers to determine what they want to keep. I suspect in many cases using the generic __builtin_shuffle plus some casting in the target-specific header files would be sufficient, eliminating several dozen builtins. Sorry I didn't get to this earlier, I got side-tracked by a number of independent regressions on SPU ... Unfortunately, the semantics of vec_perm do not match 100% those of the SPU Shuffle Bytes instruction. vec_perm assumes the selector elements apply modulo 32, but shufb uses values = 128 for special purposes. See the ISA: Value in Register RC (Expressed in Binary) Result Byte 10xx 0x00 110x 0xFF 111x 0x80 Otherwise The byte of the concatenated register addressed by the rightmost 5 bits of register RC To implement the vec_perm semantics fully, we therefore need to reduce the selector modulo 32 explicitly before using shuf. Tested on spu-elf, fixes various vshuf test cases. Committed to mainline. Bye, Ulrich ChangeLog: * config/spu/spu.md (vec_permv16qi): Reduce selector modulo 32 before using the shufb instruction. Index: gcc/config/spu/spu.md === *** gcc/config/spu/spu.md (revision 180240) --- gcc/config/spu/spu.md (working copy) *** selb\t%0,%4,%0,%3 *** 4395,4410 shufb\t%0,%1,%2,%3 [(set_attr type shuf)]) (define_expand vec_permv16qi ! [(set (match_operand:V16QI 0 spu_reg_operand ) (unspec:V16QI [(match_operand:V16QI 1 spu_reg_operand ) (match_operand:V16QI 2 spu_reg_operand ) ! (match_operand:V16QI 3 spu_reg_operand )] UNSPEC_SHUFB))] { ! operands[3] = gen_lowpart (TImode, operands[3]); }) (define_insn nop --- 4395,4416 shufb\t%0,%1,%2,%3 [(set_attr type shuf)]) + ; The semantics of vec_permv16qi are nearly identical to those of the SPU + ; shufb instruction, except that we need to reduce the selector modulo 32. (define_expand vec_permv16qi ! [(set (match_dup 4) (and:V16QI (match_operand:V16QI 3 spu_reg_operand ) ! (match_dup 6))) !(set (match_operand:V16QI 0 spu_reg_operand ) (unspec:V16QI [(match_operand:V16QI 1 spu_reg_operand ) (match_operand:V16QI 2 spu_reg_operand ) ! (match_dup 5)] UNSPEC_SHUFB))] { ! operands[4] = gen_reg_rtx (V16QImode); ! operands[5] = gen_lowpart (TImode, operands[4]); ! operands[6] = spu_const (V16QImode, 31); }) (define_insn nop -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: regcprop.c bug fix
On Oct 20, 2011, at 6:22 AM, Bernd Schmidt wrote: I found that maximally confusing, so let me try to rephrase it to see if I understood you. The two calls to validate_change clobber the recog_data even if they fail. In case they failed, we want to continue looking at data from the original insn, so we must recompute it. Yes, exactly. diagnosis. Better to move the recomputation into the if statement that contains the validate_change calls and possibly add a comment about the effect of that function; otherwise OK. Ok, I've updated the code and added some comments and finished testing and checked it in. Thanks. 2011-10-20 Mike Stump mikest...@comcast.net * regcprop.c (copyprop_hardreg_forward_1): Update recog_data after validate_change wipes it out. Index: regcprop.c === --- regcprop.c (revision 180265) +++ regcprop.c (working copy) @@ -840,6 +840,12 @@ copyprop_hardreg_forward_1 (basic_block changed = true; goto did_replacement; } + /* We need to re-extract as validate_change clobbers +recog_data. */ + extract_insn (insn); + if (! constrain_operands (1)) + fatal_insn_not_found (insn); + preprocess_constraints (); } /* Otherwise, try all valid registers and see if its valid. */ @@ -862,6 +868,12 @@ copyprop_hardreg_forward_1 (basic_block changed = true; goto did_replacement; } + /* We need to re-extract as validate_change clobbers +recog_data. */ + extract_insn (insn); + if (! constrain_operands (1)) + fatal_insn_not_found (insn); + preprocess_constraints (); } } }
[RFA:] fix breakage with Update testsuite to run with slim LTO
Date: Fri, 21 Oct 2011 00:19:32 +0200 From: Jan Hubicka hubi...@ucw.cz Yes, if we scan assembler, we likely want -fno-fat-lto-objects. then IIUC you need to patch *all* torture tests that use scan-assembler and scan-assembler-not. Alternatively, patch somewhere else, like not passing it if certain directives are used, like scan-assembler{,-not}. And either way, is it safe to add that option always, not just when also passing -flto or something? Hmm, some of assembler scans still works because they check for presence of symbols we output anyway, but indeed, it would make more sense to automatically imply -ffat-lto-object when scan-assembler is used. I am not sure if my dejagnu skill as on par here however. Maybe you could make amends ;) by testing the following, which seems to work at least for dg-torture.exp and cris-elf/cris-sim, in which -ffat-lto-object is automatically added for each scan-assembler and scan-assembler-not test, extensible for other dg-final actions without polluting with checking LTO options and whatnot across the files. I checked (and corrected) so it also works when !check_effective_target_lto by commenting out the setting in the second chunk. gcc/testsuite: * lib/gcc-dg.exp (gcc_force_conventional_output): New global variable, default empty, -ffat-lto-objects for effective_target_lto. (gcc-dg-test-1): Add options from dg-final methods. * lib/scanasm.exp (scan-assembler_required_options) (scan-assembler-not_required_options): New procs. Ok to commit? Index: lib/gcc-dg.exp === --- lib/gcc-dg.exp (revision 180270) +++ lib/gcc-dg.exp (working copy) @@ -68,6 +68,13 @@ if [info exists ADDITIONAL_TORTURE_OPTIO } set LTO_TORTURE_OPTIONS + +# Some torture-options cause intermediate code output, unusable for +# testing using e.g. scan-assembler. In this variable are the options +# how to force it, when needed. +global gcc_force_conventional_output +set gcc_force_conventional_output + if [check_effective_target_lto] { # When having plugin test both slim and fat LTO and plugin/nonplugin # path. @@ -76,6 +83,7 @@ if [check_effective_target_lto] { { -O2 -flto -fno-use-linker-plugin -flto-partition=none } \ { -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects } ] + set gcc_force_conventional_output -ffat-lto-objects } else { set LTO_TORTURE_OPTIONS [list \ { -O2 -flto -flto-partition=none } \ @@ -154,6 +162,19 @@ proc gcc-dg-test-1 { target_compile prog default { perror $do_what: not a valid dg-do keyword return + } +} + +# Let { dg-final { action } } force options as returned by an +# optional proc ${action}_required_options. +upvar 2 dg-final-code finalcode +foreach x [split $finalcode \n] { + set finalcmd [lindex $x 0] + if { [info procs ${finalcmd}_required_options] != } { + set req [${finalcmd}_required_options] + if { $req != } { + lappend extra_tool_flags $req + } } } Index: lib/scanasm.exp === --- lib/scanasm.exp (revision 180270) +++ lib/scanasm.exp (working copy) @@ -85,6 +85,11 @@ proc scan-assembler { args } { dg-scan scan-assembler 1 $testcase $output_file $args } +proc scan-assembler_required_options { args } { +global gcc_force_conventional_output +return $gcc_force_conventional_output +} + # Check that a pattern is not present in the .s file produced by the # compiler. See dg-scan for details. @@ -94,6 +99,11 @@ proc scan-assembler-not { args } { set output_file [file rootname [file tail $testcase]].s dg-scan scan-assembler-not 0 $testcase $output_file $args +} + +proc scan-assembler-not_required_options { args } { +global gcc_force_conventional_output +return $gcc_force_conventional_output } # Return the scan for the assembly for hidden visibility. brgds, H-P
[pph] Re-organize handling of mergeable nodes (issue5318043)
This patch re-organizes the streaming of mergeable nodes so that we can separate the process of merging the incoming ASTs into the current compilation context from the reading of their contents. This problem occurs when two or more PPH images in the same compilation unit have embedded the same text header file. For example, in the following scenario files image1.h and image2.h can be converted into PPH images, while text.h is a regular header inclusion: tu.cc: #include image1.h #include image2.h image1.h #ifndef __IMAGE1_H #define __IMAGE1_H #include text.h ... #endif image2.h #ifndef __IMAGE2_H #define __IMAGE2_H #include text.h ... #endif Since the ASTs generated by text.h are embedded inside both image1.pph and image2.pph. When we read those file, we will read the same symbols. To avoid multiple definition errors, we need to merge the symbols coming from these file. This is not the usual merge done by the parser, however. In here we are merging full definitions, callgraph nodes, types, etc. What the patch does is to break the writing of a tree into two pieces. The first piece is the merge key, which is necessary for allocation and used as a lookup key into the parser data structures to determine if that object has already been seen by the parser. The second piece is the merge body, which carries the bulk of the information, particularly any information that causes circularity. The serialization algorithm occurs in two phases. The first phase walks the data structure emitting only merge keys for all mergeable trees. The second phase walks the data structure emitting only bodies for all mergeable trees. Non-mergeable trees will be emitted in whole as a side effect of the second phase. The de-serialization algorithm follows the same two phases. The first phase reads the tree merge keys, searches for merge matches, and either allocates a new tree or redirects references to an existing tree. The second phase reads the tree merge bodies. When the body corresponds to a matched tree, it must incorporate new information into the existing tree, i.e. do merging. For example, a function definition could come from the first pph file read, or the second pph file read, but in either case the definition must reside in the final data structures. We are still not doing proper merging of everything. This patch simply implements the mechanics of splitting up the streaming of mergeable nodes. We will be working on the actual merge logic on top of this change. Tested on x86_64. Committed to branch. Diego. * pph-streamer-in.c (ALLOC_AND_REGISTER_ALTERNATE): Remove. (pph_in_tree_1): Remove. (pph_in_tree): Rename from pph_in_tree_1. (pph_in_chain): Call streamer_read_chain. (pph_in_merge_key_chain): New. (pph_in_merge_body_chain): Rename from pph_in_mergeable_chain. (pph_in_binding_level_1): Switch the order of the arguments. Update all users. Read fields this_entity and static_decls. (pph_in_mergeable_binding_level): Remove. (pph_in_merge_key_tree): New. (pph_in_tree): Handle PPH_RECORD_START_MERGE_BODY. (pph_in_merge_keys): New. (pph_in_global_binding): New. (pph_read_file_1): Call it. * pph-streamer-out.c (pph_get_marker_for): (pph_out_start_tree_record): Handle PPH_RECORD_START_MERGE_BODY. (pph_out_start_merge_key_record): New. (pph_out_tree_1): Remove. (pph_out_tree): Rename from pph_out_tree_1. Handle PPH_RECORD_START_MERGE_BODY. (pph_out_merge_key_vec): Rename from pph_out_mergeable_tree_vec. Call pph_out_merge_key_tree. (pph_out_merge_key_chain): Rename from pph_out_mergeable_chain_filtered. (pph_out_binding_level_1): Handle fields this_entity and static_decls. (pph_out_mergeable_binding_level): Remove. (pph_out_merge_key_tree): New. (pph_out_merge_keys): New. (pph_out_global_binding): Call pph_out_merge_keys. * pph-streamer.c (pph_cache_insert_at): Return the inserted entry. Update all users. (pph_cache_add): Likewise. (pph_cache_lookup): Return the found entry. Update all users. (pph_cache_lookup_in_includes): Likewise. (pph_merge_name): Ignore EXPRs with no lang-specific info. * pph-streamer.h (pph_cache_entry): Add field needs_merge_body. Update all users. (pph_tag_to_tree_code): Remove. (pph_tree_is_mergeable): New. * pph.h (enum pph_record_marker): Add values PPH_RECORD_START_MERGE_KEY and PPH_RECORD_START_MERGE_BODY. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index d8f17b9..ecb7182 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -58,20 +58,6 @@ static VEC(char_p,heap) *string_tables = NULL; pph_cache_insert_at (CACHE,
Re: [pph] Make libcpp symbol validation a warning (issue5235061)
I just thought about something.. Earlier I said that ALL line_table issues were resolved after this patch (as it ignores the re-included headers that were guarded, as the non-pph compiler does naturally). One problem remains however, I'm pretty sure that re-included non-pph'ed header's line_table entries are still showing up multiple times (as the direct non-pph childs of a given pph_include have their line_table entries copied one by one from the pph file). I think we were talking about somehow remembering guards context in which DECLs were declared and then ignoring DECLs streamed in if they belong to a given header guard type that was previously seen in a prior include using the same non-pph header, allowing us to ignore those DECLs that are re-declared when they should have been guarded out the second time. I'm not sure whether there is machinery to handle non-pph re-includes yet... but... at the very least, I'm pretty sure those non-pph entries still show up multiple times in the line_table. Now, we can't just remove/ignore those entries either... doing so would alter the expected location offset (pph_loc_offset) applied to all tokens streamed in directly from the pph header. What we could potentially do is: - ignore the repeated non-pph entry - remember the number of locations this entry was supposed to take (call that pph_loc_ignored_offset) - then for DECLs imported after it we would then need an offset of pph_loc_offset - pph_loc_ignored_offset, to compensate for the missing entries in the line_table The problem here obviously is that I don't think we have a way of knowing which DECLs come before, inside, and after a given non-pph header included in the parent pph header which we are currently reading. Furthermore, a DECL coming after the non-pph header could potentially refer to something inside the ignored non-pph header and the source_location of the referred token would now be invalid (although that might already be fixed by the cache hit which would redirect that token reference to the same token in the first included copy of that same header which wasn't actually skipped as it was first and which is valid) On Tue, Oct 11, 2011 at 4:26 PM, Diego Novillo dnovi...@google.com wrote: @@ -328,8 +327,6 @@ pph_in_line_table_and_includes (pph_stream *stream) int entries_offset = line_table-used - PPH_NUM_IGNORED_LINE_TABLE_ENTRIES; enum pph_linetable_marker next_lt_marker = pph_in_linetable_marker (stream); - pph_reading_includes++; - for (first = true; next_lt_marker != PPH_LINETABLE_END; next_lt_marker = pph_in_linetable_marker (stream)) { @@ -373,19 +370,33 @@ pph_in_line_table_and_includes (pph_stream *stream) else lm-included_from += entries_offset; Also, if we do ignore some non-pph entries, the included_from calculation is going to need some trickier logic as well (it's fine for the pph includes though as each child calculates it's own offset) - gcc_assert (lm-included_from (int) line_table-used); - Also, I think this slipped in my previous comment, but I don't see how this assert could trigger in the current code. If it did trigger something was definitely wrong as it asserts the offseted included_from is referring to an entry that is actually in the line_table... lm-start_location += pph_loc_offset; Cheers, Gab -- This patch is available for review at http://codereview.appspot.com/5235061
Re: [patch] dwarf2out crash: missing GTY? (PR 50806)
2011/10/20 Jan Kratochvil jan.kratoch...@redhat.com: Hi, with custom patched dwarf2out.c I got a crash on memory mangled by the garbage collector. With patched GTY there the crash no longer happened - but I do not have a reproducer anymore, sorry if it is a bogus patch. The memory corrupted later was initially allocated and stored into mem_loc_result-dw_loc_oprnd1.v.val_loc. I do not think there is any other reference to it than that field with no GTY. 2011-10-20 Jan Kratochvil jan.kratoch...@redhat.com * dwarf2out.c (struct dw_loc_list_struct): Add GTY for expr; This patch is a no-op, as already pointed out. If this comes up again, I'd set a conditional breakpoint on ggc_set_mark if (arg == struct dw_loc_list_struct with the field that gets collected) and try to find out why the field does not get marked. -- Laurynas
Re: [PATCH, i386, PR50766] Fix incorrect mem/reg operands order
Thanks! K On Fri, Oct 21, 2011 at 12:37 AM, H.J. Lu hjl.to...@gmail.com wrote: On Thu, Oct 20, 2011 at 1:30 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: OK. Thanks, Uros. Great, could anybody please commit that? I checked it in for you. -- H.J.
Re: [patch] dwarf2out crash: missing GTY? (PR 50806)
On Fri, 21 Oct 2011 05:37:09 +0200, Laurynas Biveinis wrote: This patch is a no-op, as already pointed out. If this comes up again, I'd set a conditional breakpoint on ggc_set_mark if (arg == struct dw_loc_list_struct with the field that gets collected) and try to find out why the field does not get marked. Thanks all for the review, I see I do not know the GC. I thought the bug is so obvious... I did not make a snapshot of the tree in that crashing state. Therefore I cannot say anything useful about the crash anymore. Thanks, Jan