*ping* - Re: [Patch, Fortran] Support allocatable *scalar* coarrays
*ping* http://gcc.gnu.org/ml/fortran/2011-07/msg00106.html On 07/11/2011 09:49 AM, Tobias Burnus wrote: On 07/10/2011 09:56 PM, Tobias Burnus wrote: This patch implemented the trans*.c part of allocatable scalar coarrays; contrary to noncoarray allocatable scalars, they have cobounds and thus use an array descriptor. I found a test case (part of Reinhold Bader's fortran_tests), which gave an ICE: Allocatable scalar coarrays with SAVE. I have fixed that (trans-decl.c) and added a test. The attached patch was build and regtested on x86-64-linux. OK for the trunk? Tobias
RE: [PATCH, ARM, iWMMXt][1/5]: ARM code generic change
Hi, It is the first part of iWMMXt maintenance. *config/arm/arm.c (arm_option_override): Enable iWMMXt with VFP. iWMMXt and NEON are incompatible. iWMMXt unsupported under Thumb-2 mode. (arm_expand_binop_builtin): Accept immediate op (with mode VOID) *config/arm/arm.md: Resettle include location of iwmmxt.md so that *arm_movdi and *arm_movsi_insn could be used when iWMMXt is enabled. With the current work in trunk to handle enabled attributes and per-alternative predicable attributes (Thanks Bernd) we should be able to get rid of *cond_iwmmxt_movsi_insn in iwmmxt.md file. It's not a matter for this patch but for a follow-up patch. Actually we should probably do the same for the various insns that are dotted around all over the place with final conditions that prevent matching - atleast makes the backend description slightly smaller :). Add pipeline description file include. It is enough to say (filename): Include. in the changelog entry. The include for the pipeline description file should be with the patch that you add this in i.e. patch #5. Please add this to MD_INCLUDES in t-arm as well. Also as a general note, please provide a correct Changelog entry. This is not the format that we expect Changelog entries to be in. Please look at the coding standards on the website for this or at other patches submitted with respect to Changelog entries. Please fix this for each patch in the patch stack. cheers Ramana Thanks for reviewing. I have updated the patches and the Changelog. *config/arm/arm.c (arm_option_override): Enable iWMMXt with VFP. (arm_expand_binop_builtin): Accept VOIDmode op. *config/arm/arm.md (*arm_movdi, *arm_movsi_insn): Remove condition !TARGET_IWMMXT. (iwmmxt.md): Include location. Thanks, Xinyu 1_generic.diff Description: 1_generic.diff
RE: [PATCH, ARM, iWMMXt][2/5]: intrinsic head file change
Hi, It is the second part of iWMMXt maintenance. *config/arm/mmintrin.h: Revise. Thanks, Xinyu 2_mmintrin.diff Description: 2_mmintrin.diff
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On 07/13/2011 04:34 PM, Richard Guenther wrote: On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the widening_mul pass might increase the number of multiplications in the code by transforming a = b * c d = a + 2 e = a + 3 into: d = b * c + 2 e = b * c + 3 under the assumption that an FMA instruction is not more expensive than a simple add. This certainly isn't always true. While e.g. on s390 an fma is indeed not slower than an add execution-wise it has disadvantages regarding instruction grouping. It doesn't group with any other instruction what has a major impact on the instruction dispatch bandwidth. The following patch tries to figure out the costs for adds, mults and fmas by building an RTX and asking the backends cost function in order to estimate whether it is whorthwhile doing the transformation. With that patch the 436.cactus hotloop contains 28 less multiplications than before increasing performance slightly (~2%). Bootstrapped and regtested on x86_64 and s390x. Ick ;) Just to defend myself I did not invent this. I basically copied it over from ivopts. Maybe this is finally the time to introduce target hook(s) to get us back costs for trees? For this case we'd need two actually, or just one - dependent on what finegrained information we pass. Choices: Having another target hook for costs actually doesn't look that much better to me. Duplicating the logic is not really practical for backend developers. We rather should have a generic interface for calculating tree costs based on the rtx_cost hook. Bye, -Andreas-
RE: [PATCH, ARM, iWMMXt][3/5]: built in define and expand
Hi, It is the third part of iWMMXt maintenance. *config/arm/arm.c (enum arm_builtins): Built-in fcode. (builtin_description bdesc_2arg): Built in declare. (builtin_description bdesc_1arg): Ditto. (arm_init_iwmmxt_builtins): Built in initialize. (arm_expand_builtin): Built in expand. Thanks, Xinyu 3_arm_c.diff Description: 3_arm_c.diff
Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)
http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01036.html Georg-Johann Lay wrote: This is a patch to fix PR49487. Forgot to attach it. Here it is. As Denis will be off-line for some time, it'd be great if a global reviewer would review it. It appears that he is the only AVR maintainer who approves patches. The reason for the ICE is as explained in the PR: Rotate pattern use X as constraint for an operand which is used as scratch. However, the operand is a match_operand and not a match_scratch. Because the scratch is not needed in all situations, I choose to use match_scratch instead of match_operand and not to fix the constraints. Fixing constraints would lead to superfluous allocation of register if no scratch was needed. Tested with 2 FAILs less: gcc.c-torture/compile/pr46883.c ICEs without this patch and passes with it. The test case in the PR passes, too. That test case passes also the current unpatched 4.7, but it's obvious that the constraint/operand combination is a bug. Ok to commit and back-port to 4.6? Johann PR target/49487 * config/avr/avr.md (rotlmode3): Generate SCRATCH instead of REG. (*rotwmode): Use const_int_operand for operands2. Use match_scatch for operands3. (*rotbmode): Ditto * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH. Index: config/avr/avr.md === --- config/avr/avr.md (revision 176136) +++ config/avr/avr.md (working copy) @@ -1597,18 +1597,18 @@ (define_expand rotlmode3 (match_operand:VOID 2 const_int_operand ))) (clobber (match_dup 3))])] - -{ - if (CONST_INT_P (operands[2]) 0 == (INTVAL (operands[2]) % 8)) { - if (AVR_HAVE_MOVW 0 == INTVAL (operands[2]) % 16) -operands[3] = gen_reg_rtx (rotsmodemode); - else -operands[3] = gen_reg_rtx (QImode); - } - else -FAIL; -}) +if (CONST_INT_P (operands[2]) + 0 == INTVAL (operands[2]) % 8) + { +if (AVR_HAVE_MOVW 0 == INTVAL (operands[2]) % 16) + operands[3] = gen_rtx_SCRATCH (rotsmodemode); +else + operands[3] = gen_rtx_SCRATCH (QImode); + } +else + FAIL; + }) ;; Overlapping non-HImode registers often (but not always) need a scratch. @@ -1620,34 +1620,38 @@ (define_expand rotlmode3 ; Split word aligned rotates using scratch that is mode dependent. (define_insn_and_split *rotwmode [(set (match_operand:HIDI 0 register_operand =r,r,#r) - (rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r) - (match_operand 2 immediate_operand n,n,n))) - (clobber (match_operand:rotsmode 3 register_operand =rotx ))] - (CONST_INT_P (operands[2]) - (0 == (INTVAL (operands[2]) % 16) AVR_HAVE_MOVW)) +(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r) + (match_operand 2 const_int_operand n,n,n))) + (clobber (match_scratch:rotsmode 3 =rotx))] + AVR_HAVE_MOVW +CONST_INT_P (operands[2]) +0 == INTVAL (operands[2]) % 16 # (reload_completed || MODEmode == DImode) [(const_int 0)] - avr_rotate_bytes (operands); - DONE; -) + { +avr_rotate_bytes (operands); +DONE; + }) ; Split byte aligned rotates using scratch that is always QI mode. (define_insn_and_split *rotbmode [(set (match_operand:HIDI 0 register_operand =r,r,#r) - (rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r) - (match_operand 2 immediate_operand n,n,n))) - (clobber (match_operand:QI 3 register_operand =rotx ))] - (CONST_INT_P (operands[2]) - (8 == (INTVAL (operands[2]) % 16) - || (!AVR_HAVE_MOVW 0 == (INTVAL (operands[2]) % 16 +(rotate:HIDI (match_operand:HIDI 1 register_operand 0,r,r) + (match_operand 2 const_int_operand n,n,n))) + (clobber (match_scratch:QI 3 =rotx))] + CONST_INT_P (operands[2]) +(8 == INTVAL (operands[2]) % 16 + || (!AVR_HAVE_MOVW +0 == INTVAL (operands[2]) % 16)) # (reload_completed || MODEmode == DImode) [(const_int 0)] - avr_rotate_bytes (operands); - DONE; -) + { +avr_rotate_bytes (operands); +DONE; + }) ;; Index: config/avr/avr.c === --- config/avr/avr.c (revision 176141) +++ config/avr/avr.c (working copy) @@ -4438,7 +4438,9 @@ avr_rotate_bytes (rtx operands[]) if (mode == DImode) move_mode = QImode; /* Make scratch smaller if needed. */ -if (GET_MODE (scratch) == HImode move_mode == QImode) +if (SCRATCH != GET_CODE (scratch) + HImode == GET_MODE (scratch) + QImode == move_mode) scratch = simplify_gen_subreg (move_mode, scratch, HImode, 0); move_size = GET_MODE_SIZE (move_mode); @@ -4534,6 +4536,8 @@ avr_rotate_bytes (rtx operands[]) When this move occurs, it will break chain deadlock. The scratch register is substituted for real move. */ +
RE: [PATCH, ARM, iWMMXt][4/5]: WMMX machine description
Hi, It is the fourth part of iWMMXt maintenance. Since *cond_iwmmxt_movsi_insn would be got rid of soon, I keep it unchanged. *config/arm/arm.c (arm_output_iwmmxt_shift_immediate): New function. (arm_output_iwmmxt_tinsr): Ditto. *config/arm/arm-protos.h (arm_output_iwmmxt_shift_immediate): New prototype. (arm_output_iwmmxt_tinsr): Ditto. *config/arm/iwmmxt.md (WCGR0, WCGR1, WCGR2, WCGR3): New constant. (iwmmxt_psadbw, iwmmxt_walign, iwmmxt_tmrc, iwmmxt_tmcr): Delete. (iwmmxt_tbcstqi, iwmmxt_tbcsthi, iwmmxt_tbcstsi, *iwmmxt_clrv8qi, *iwmmxt_clrv4hi, *iwmmxt_clrv2si): Rename... (tbcstv8qi, tbcstv4hi, tbsctv2si, iwmmxt_clrv8qi, iwmmxt_clrv4hi, iwmmxt_clrv2si): ...New pattern. (*andmode3_iwmmxt, *iormode3_iwmmxt, *xormode3_iwmmxt, rorimode3, ashrimode3_iwmmxt, lshrimode3_iwmmxt, ashlimode3_iwmmxt, iwmmxt_waligni, iwmmxt_walignr, iwmmxt_walignr0, iwmmxt_walignr1, iwmmxt_walignr2, iwmmxt_walignr3, iwmmxt_setwcgr0, iwmmxt_setwcgr1, iwmmxt_setwcgr2, iwmmxt_setwcgr3, iwmmxt_getwcgr0, iwmmxt_getwcgr1, iwmmxt_getwcgr2, iwmmxt_getwcgr3): New pattern. (All instruction patterns): Add wtype attribute. (*iwmmxt_arm_movdi, *iwmmxt_movsi_insn, iwmmxt_uavgrndv8qi3, iwmmxt_uavgrndv4hi3, iwmmxt_uavgv8qi3, iwmmxt_uavgv4hi3, iwmmxt_tinsrb, iwmmxt_tinsrh, iwmmxt_tinsrw, eqv8qi3, eqv4hi3, eqv2si3, gtuv8qi3, gtuv4hi3, gtuv2si3, gtv8qi3, gtv4hi3, gtv2si3, iwmmxt_wunpckihb, iwmmxt_wunpckihh, iwmmxt_wunpckihw, iwmmxt_wunpckilb, iwmmxt_wunpckilh, iwmmxt_wunpckilw, iwmmxt_wunpckehub, iwmmxt_wunpckehuh, iwmmxt_wunpckehuw, iwmmxt_wunpckehsb, iwmmxt_wunpckehsh, iwmmxt_wunpckehsw, iwmmxt_wunpckelub, iwmmxt_wunpckeluh, iwmmxt_wunpckeluw, iwmmxt_wunpckelsb, iwmmxt_wunpckelsh, iwmmxt_wunpckelsw, iwmmxt_wmadds, iwmmxt_wmaddu, iwmmxt_wsadb, iwmmxt_wsadh, iwmmxt_wsadbz, iwmmxt_wsadhz): Revise. (iwmmxt2.md): Include. *config/arm/iwmmxt2.md: New file. *config/arm/iterators.md (VMMX2): New mode_iterator. *config/arm/arm.md (wtype): New attribute. *config/arm/t-arm (MD_INCLUDES): Add iwmmxt2.md. Thanks, Xinyu 4_wmmx_md.diff.gz Description: 4_wmmx_md.diff.gz
RE: [PATCH, ARM, iWMMXt][5/5]: pipeline description
Hi, It is the fifth part of iWMMXt maintenance. *config/arm/t-arm (MD_INCLUDES): Add marvell-f-iwmmxt.md. *config/arm/marvell-f-iwmmxt.md: New file. *config/arm/arm.md (marvell-f-iwmmxt.md): Include. Thanks, Xinyu 5_pipeline.diff Description: 5_pipeline.diff
[Patch, Fortran] Mark token of static coarrays restrict
As the title says: Mark token of static coarrays restrict Bootstrapped and regtested on x86-64-linux. OK for the trunk? Tobias 2011-07-13 Tobias Burnus bur...@net-b.de * trans-decl.c (gfc_build_qualified_array): Make coarray's token TYPE_QUAL_RESTRICT. diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c index ddc7c36..65a8efa 100644 --- a/gcc/fortran/trans-decl.c +++ b/gcc/fortran/trans-decl.c @@ -759,7 +759,9 @@ gfc_build_qualified_array (tree decl, gfc_symbol * sym) { tree token; - token = gfc_create_var_np (pvoid_type_node, caf_token); + token = gfc_create_var_np (build_qualified_type (pvoid_type_node, + TYPE_QUAL_RESTRICT), + caf_token); GFC_TYPE_ARRAY_CAF_TOKEN (type) = token; DECL_ARTIFICIAL (token) = 1; TREE_STATIC (token) = 1;
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On 07/13/2011 06:58 PM, Richard Henderson wrote: Why the force_operand? You've got register inputs. Either the target is going to support the operation or it isn't. Seems to me you can check the availability of the operation in the optab and pass that gen_rtx_fmt_ee result to rtx_cost directly. Do I really need to check optab for availability at this point? For FMA convert_mult_to_fma already did the check. But what to do if no optab is available for add/mul? Assume extraordinary high costs? This probably wouldn't make sense since it would make a mul or add more expensive than an fma. Or perhaps the rtx_cost hooks should handle that by returning high costs for everything the backend is not able to implement directly? + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[NUM_MACHINE_MODES]; + static unsigned add_cost[NUM_MACHINE_MODES]; + static unsigned fma_cost[NUM_MACHINE_MODES]; ... + if (!fma_cost[mode]) + { + fma_cost[mode] = compute_costs (mode, FMA, speed); + add_cost[mode] = compute_costs (mode, PLUS, speed); + mul_cost[mode] = compute_costs (mode, MULT, speed); + } Saving cost data dependent on speed, which is non-constant. You probably need to make this a two dimensional array. Right. I'll fix this. Bye, -Andreas-
Re: [build] Remove crt0, mcrt0 support
Richard, [Talking about an immediate removal of the netware target...] Jan Beulich jbeul...@novell.com writes: Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM which variant would you prefer: obsoletion now and removal in 4.8 or immediate removal? Both are fine with me, so unless someone else objects immediate removal would seem better given it had been pretty much unmaintained. Right: it would be a one-time offort to remove the support, but subsequent cleanups wouldn't have to deal with the effectively dead code. I had a quick look and it doesn't seem hard: apart from removing the netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc and libgcc/config.host changes), there's only a small list (apart from netware-related target triplets in the testsuite): config/elf.m4 configure.ac contrib/config-list.mk gcc/config/i386/i386.c gcc/config/i386/i386.h gcc/doc/extend.texi libstdc++-v3/crossconfig.m4 configure.ac may have to stay if binutils/src wants to retain the report, but that's about it. Let's see what the release managers/global reviewers think. what's your take on this? Thanks Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Fix PR ada/48711
This is the build failure of GNAT 4.6.x on Windows. The patch was posted in September of last year: http://gcc.gnu.org/ml/gcc/2010-09/msg00228.html Applied on the mainline and 4.6 branch. 2011-07-14 Florian Weimer f...@deneb.enyo.de PR ada/48711 * g-socthi-mingw.adb (Fill): Guard against invalid MSG_WAITALL. -- Eric Botcazou Index: g-socthi-mingw.adb === --- g-socthi-mingw.adb (revision 176261) +++ g-socthi-mingw.adb (working copy) @@ -277,7 +277,8 @@ package body GNAT.Sockets.Thin is use type C.size_t; Fill : constant Boolean := -(C.unsigned (Flags) and SOSC.MSG_WAITALL) /= 0; + SOSC.MSG_WAITALL /= -1 + and then (C.unsigned (Flags) and SOSC.MSG_WAITALL) /= 0; -- Is the MSG_WAITALL flag set? If so we need to fully fill all vectors Res : C.int;
Re: [Patch] Fix PR 49309, ICE with mudflap
On Wed, Jul 13, 2011 at 10:37 PM, Andrew Pinski pins...@gmail.com wrote: Hi, The problem here is that the type of the POINTER_PLUS_EXPR is incorrect and also the non folded version leaks to the IR. This patch fixes those two problems and fixes the ICE. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Ok. Thanks, Richard. Thanks, Andrew Pinski 2011-07-13 Andrew Pinski pins...@gmail.com PR tree-opt/49309 * tree-mudflap.c (mf_xform_derefs_1 case MEM_REF): Use fold_build2_loc instead of build2. Use the correct type for the new tree. 2011-07-13 Andrew Pinski pins...@gmail.com PR tree-opt/49309 * g++.dg/torture/pr49309.C: New testcase.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Richard. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther richard.guent...@gmail.com wrote: 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Richard. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Wed, Jul 13, 2011 at 11:49 PM, Steven Bosscher stevenb@gmail.com wrote: On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel kreb...@linux.vnet.ibm.com wrote: Hi, the widening_mul pass might increase the number of multiplications in the code by transforming a = b * c d = a + 2 e = a + 3 into: d = b * c + 2 e = b * c + 3 under the assumption that an FMA instruction is not more expensive than a simple add. This certainly isn't always true. While e.g. on s390 an fma is indeed not slower than an add execution-wise it has disadvantages regarding instruction grouping. It doesn't group with any other instruction what has a major impact on the instruction dispatch bandwidth. The following patch tries to figure out the costs for adds, mults and fmas by building an RTX and asking the backends cost function in order to estimate whether it is whorthwhile doing the transformation. With that patch the 436.cactus hotloop contains 28 less multiplications than before increasing performance slightly (~2%). Bootstrapped and regtested on x86_64 and s390x. Ick ;) +1 Maybe this is finally the time to introduce target hook(s) to get us back costs for trees? For this case we'd need two actually, or just one - dependent on what finegrained information we pass. Choices: tree_code_cost (enum tree_code) tree_code_cost (enum tree_code, enum machine_mode mode) unary_cost (enum tree_code, tree actual_arg0) // args will be mostly SSA names or constants, but at least they are typed - works for mixed-typed operations binary_cost (...) ... unary_cost (enum tree_code, enum tree_code arg0_kind) // constant vs. non-constant arg, but lacks type/mode Or maybe add a cost function for all named insns (i.e. http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)? I think that any form of lower GIMPLE will not be so low level that more combinations will exist than the available named patterns. It should be possible to write a gen* tool using rtx_costs to compute some useful cost metric for all named patterns. How complicated that could be (modes, reg vs. mem, etc.), I don't know... But at least that way we don't end up with multiple target costs depending on the IR in use. Yeah, it occured to me as well that when we look for supportable operations via optabs the same mechanism should also be able to provide a cost, maybe as simple as attaching one to the named expanders. Generating RTL from GIMPLE passes just to be able to use rtx_cost is, well ... gross. Yes, we do it in IVOPTs (and that case is even more complex), but I don't think we want to start doing it elsewhere (look how the vectorizer for example uses new target hooks instead of generating vectorized RTL and then using rtx_cost). Richard. Ciao! Steven
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On Wed, Jul 13, 2011 at 10:30:58AM -0400, Jason Merrill wrote: On 07/13/2011 10:06 AM, Jakub Jelinek wrote: --- gcc/testsuite/lib/dg-pch.exp.jj2011-01-03 18:58:03.0 +0100 +++ gcc/testsuite/lib/dg-pch.exp 2011-07-12 23:13:50.943670171 +0200 - dg-test -keep-output ./$bname$suffix $otherflags $flags + dg-test -keep-output ./$bname$suffix -gno-record-gcc-switches $otherflags $flags It is only necessary if somebody wants to make -grecord-gcc-switches the default (for bootstrap/regtest I've tweaked common.opt to do that to test it better). PCH is a big mess and screws debuginfo in many ways, in this case it was just small differences in DW_AT_producer, but we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc. Why would PCH change DW_AT_producer? Because we're restoring single_comp_unit_die from the PCH? Then perhaps we should set DW_AT_producer in output_comp_unit rather than gen_compile_unit_die. output_comp_unit is too late, because at that point has the form been finalized and sizes calculated etc. But at the start of dwarf2out_finish it is possible. Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. 2011-07-14 Jakub Jelinek ja...@redhat.com PR other/32998 * common.opt (grecord-gcc-switches, gno-record-gcc-switches): New options. * dwarf2out.c: Include opts.h. (dchar_p): New typedef. Define heap VEC for it. (producer_string): New variable. (gen_producer_string): New function. (gen_compile_unit_die): Use it. (dwarf2out_finish): Fix up comp_unit_die () DW_AT_producer if needed. * Makefile.in (dwarf2out.o): Depend on $(OPTS_H). --- gcc/common.opt.jj 2011-07-13 17:31:09.0 +0200 +++ gcc/common.opt 2011-07-14 10:36:40.0 +0200 @@ -2184,6 +2184,14 @@ ggdb Common JoinedOrMissing Generate debug information in default extended format +gno-record-gcc-switches +Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0) +Don't record gcc command line switches in DWARF DW_AT_producer. + +grecord-gcc-switches +Common RejectNegative Var(dwarf_record_gcc_switches,1) +Record gcc command line switches in DWARF DW_AT_producer. + gstabs Common JoinedOrMissing Negative(gstabs+) Generate debug information in STABS format --- gcc/dwarf2out.c.jj 2011-07-13 17:31:18.0 +0200 +++ gcc/dwarf2out.c 2011-07-14 11:04:50.0 +0200 @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. #include tree-pass.h #include tree-flow.h #include cfglayout.h +#include opts.h static void dwarf2out_source_line (unsigned int, const char *, int, bool); static rtx last_var_location_insn; @@ -18108,13 +18109,125 @@ gen_ptr_to_mbr_type_die (tree type, dw_d add_type_attribute (ptr_die, TREE_TYPE (type), 0, 0, context_die); } +typedef const char *dchar_p; /* For DEF_VEC_P. */ +DEF_VEC_P(dchar_p); +DEF_VEC_ALLOC_P(dchar_p,heap); + +static char *producer_string; + +/* Return a heap allocated producer string including command line options + if -grecord-gcc-switches. */ + +static char * +gen_producer_string (void) +{ + size_t j; + VEC(dchar_p, heap) *switches = NULL; + const char *language_string = lang_hooks.name; + char *producer, *tail; + const char *p; + size_t len = dwarf_record_gcc_switches ? 0 : 3; + size_t plen = strlen (language_string) + 1 + strlen (version_string); + + for (j = 1; dwarf_record_gcc_switches j save_decoded_options_count; j++) +switch (save_decoded_options[j].opt_index) + { + case OPT_o: + case OPT_d: + case OPT_dumpbase: + case OPT_dumpdir: + case OPT_auxbase: + case OPT_auxbase_strip: + case OPT_quiet: + case OPT_version: + case OPT_v: + case OPT_w: + case OPT_L: + case OPT_D: + case OPT_I: + case OPT_U: + case OPT_SPECIAL_unknown: + case OPT_SPECIAL_ignore: + case OPT_SPECIAL_program_name: + case OPT_SPECIAL_input_file: + case OPT_grecord_gcc_switches: + case OPT_gno_record_gcc_switches: + case OPT__output_pch_: + case OPT_fdiagnostics_show_location_: + case OPT_fdiagnostics_show_option: + case OPT: + case OPT__sysroot_: + /* Ignore these. */ + continue; + default: + if (save_decoded_options[j].orig_option_with_args_text[0] != '-') + continue; + switch (save_decoded_options[j].orig_option_with_args_text[1]) + { + case 'M': + case 'i': + case 'W': + continue; + case 'n': + if (save_decoded_options[j].orig_option_with_args_text[2] == 'o') + continue; + break; + case 'f': + if (strncmp (save_decoded_options[j].orig_option_with_args_text ++ 2, dump, 4) == 0)
Re: [build] Remove crt0, mcrt0 support
On Thu, 14 Jul 2011, Rainer Orth wrote: Richard, [Talking about an immediate removal of the netware target...] Jan Beulich jbeul...@novell.com writes: Rainer Orth r...@cebitec.uni-bielefeld.de 07/13/11 4:34 PM which variant would you prefer: obsoletion now and removal in 4.8 or immediate removal? Both are fine with me, so unless someone else objects immediate removal would seem better given it had been pretty much unmaintained. Right: it would be a one-time offort to remove the support, but subsequent cleanups wouldn't have to deal with the effectively dead code. I had a quick look and it doesn't seem hard: apart from removing the netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc and libgcc/config.host changes), there's only a small list (apart from netware-related target triplets in the testsuite): config/elf.m4 configure.ac contrib/config-list.mk gcc/config/i386/i386.c gcc/config/i386/i386.h gcc/doc/extend.texi libstdc++-v3/crossconfig.m4 configure.ac may have to stay if binutils/src wants to retain the report, but that's about it. Let's see what the release managers/global reviewers think. what's your take on this? I'm fine with it if you install a deprecation patch on the 4.6 branch and mention that in the 4.6 changes.html. Richard. -- Richard Guenther rguent...@suse.de Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
[Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
This patch adds a caf_runtime_error function to the non-MPI implementation of Coarray Fortran. It is based on the MPI function of the same name in mpi.c. Ok to commit? ChangeLog: 2011-07-14 Daniel Carrera dcarr...@gmail.com * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function based on the function in mpi.c with the same name. (_gfortran_caf_init): Use caf_runtime_error. * caf/mpi.c (caf_runtime_error): Add a note to keep in sync with the function in single.c. -- I'm not overweight, I'm undertall. Index: libgfortran/caf/single.c === --- libgfortran/caf/single.c (revision 176230) +++ libgfortran/caf/single.c (working copy) @@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI #include stdio.h /* For fputs and fprintf. */ #include stdlib.h /* For exit and malloc. */ #include string.h /* For memcpy and memset. */ +#include stdarg.h /* For variadic arguments. */ /* Define GFC_CAF_CHECK to enable run-time checking. */ /* #define GFC_CAF_CHECK 1 */ @@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI caf_static_t *caf_static_list = NULL; +/* Keep in sync with mpi.c. */ +static void +caf_runtime_error (int error, const char *message, ...) +{ + va_list ap; + fprintf (stderr, Fortran runtime error.); + va_start (ap, message); + fprintf (stderr, message, ap); + va_end (ap); + fprintf (stderr, \n); + + /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ + exit (error); +} + void _gfortran_caf_init (int *argc __attribute__ ((unused)), char ***argv __attribute__ ((unused)), @@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, if (unlikely (local == NULL || token == NULL)) { + const char msg[] = Failed to allocate coarray; if (stat) { *stat = 1; if (errmsg_len 0) { - const char msg[] = Failed to allocate coarray; int len = ((int) sizeof (msg) errmsg_len) ? errmsg_len : (int) sizeof (msg); memcpy (errmsg, msg, len); @@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, return NULL; } else - { - fprintf (stderr, ERROR: Failed to allocate coarray); - exit (1); - } + caf_runtime_error (1, msg); } if (stat) Index: libgfortran/caf/mpi.c === --- libgfortran/caf/mpi.c (revision 176230) +++ libgfortran/caf/mpi.c (working copy) @@ -47,6 +47,7 @@ static int caf_is_finalized; caf_static_t *caf_static_list = NULL; +/* Keep in sync with single.c. */ static void caf_runtime_error (int error, const char *message, ...) { @@ -62,7 +63,7 @@ caf_runtime_error (int error, const char MPI_Abort (MPI_COMM_WORLD, error); /* Should be unreachable, but to make sure also call exit. */ - exit (2); + exit (error); }
Re: [build] Move crtfastmath to toplevel libgcc
H.J. Lu hjl.to...@gmail.com writes: On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Richard Henderson r...@redhat.com writes: On 07/13/2011 09:57 AM, Rainer Orth wrote: Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? Probably. Ok. I'll will take this on me to get us out of this mess. It has survived i386-pc-solaris2.11, sparc-sun-solaris2.11, x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the risk seems acceptable. +# -frandom-seed is necessary to keep the mangled name of the constructor on +# Tru64 Unix stable, but harmless otherwise. Instead of implying permanent stability, I'd mention bootstrap comparison failures specifically. Ok, will do. I think your patch caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739 Same on ia64: Configuration mismatch! Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o Andreas. -- Andreas Schwab, sch...@redhat.com GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E And now for something completely different.
Re: [build] Remove crt0, mcrt0 support
Richard, what's your take on this? I'm fine with it if you install a deprecation patch on the 4.6 branch and mention that in the 4.6 changes.html. ok, will do. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Introduced target hook does not just define default value for option. It is meant to make decision in each particular case. For now it returns a constant value but march dependent heuristics will be added later. Ilya
Correct fix for scheduler bug PR11320
PR11320 was a scheduler problem where an instruction was moved backwards across a branch, leading to addl r14 = @ltoffx(.LC2), r1 ;; (p7) addl r14 = 1, r0 --- r14 clobbered ;; (p7) st4 [r15] = r14 ld8.mov r14 = [r14], .LC2 --- crash (branch was here) At the time, this was solved by a patch that recognizes when a register is conditionally set in a basic block, and then treats the branch as setting such a register: http://gcc.gnu.org/ml/gcc-patches/2003-07/msg01044.html The proposed fix is to say that JUMP_INSNs set the registers that are live on entry of the fallthrough block and are conditionally set before the jump, because they can be considered as restoring the former value of the conditionally set registers. While it is true that a jump could be seen as setting the correct value for a register, this isn't at all related to conditional sets. Consider a trivial example: a = b + 3; if (b 0) { use a } else { use a } where the value of a is different in each branch of the else, with no conditional execution in sight. From the point of view of the uses, it can be argued that the jump sets the value, but this is irrelevant for scheduling purposes. There's nothing wrong with using a wrong value in an instruction hoisted across a branch if the result of that instruction will (eventually) be dead. We do this all the time in sched-ebb. The patch needlessly restricts scheduling opportunities and should be reverted. The real problem here is that the ia64 backend lies to the rest of the compiler; it produces a load instruction without showing a MEM anywhere in the instruction pattern. Hence, the following patch, which reverts the bogus scheduler changes and adds a MEM to a pattern in ia64.md. The represenation is somewhat weird, but no more so than before where the load was represented as a plain LO_SUM. The documentation I have doesn't mention ld8.mov, so I'll have to leave it to the ia64 maintainers to choose between this change or implement something more involved. Bootstrapped and tested without regressions on ia64-linux with languages=c,c++,obj,fortran and --disable-shared; that was what I could get to work at all at the moment. Even then, C++ seems pretty broken. I've also built a 3.3 cross-cc1 to see if the original bug is still fixed (it is). Ok (scheduler ia64 bits)? Bernd Revert 2003-07-10 Eric Botcazou ebotca...@libertysurf.fr PR optimization/11320 * sched-int.h (struct deps) [reg_conditional_sets]: New field. (struct sched_info) [compute_jump_reg_dependencies]: New prototype. * sched-deps.c (sched_analyze_insn) [JUMP_INSN]: Update call to current_sched_info-compute_jump_reg_dependencies. Record which registers are used and which registers are set by the jump. Clear deps-reg_conditional_sets after a barrier. Set deps-reg_conditional_sets if the insn is a COND_EXEC. Clear deps-reg_conditional_sets if the insn is not a COND_EXEC. (init_deps): Initialize reg_conditional_sets. (free_deps): Clear reg_conditional_sets. * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. Mark registers live on entry of non-fallthrough blocks as used by the jump. * sched-rgn.c (compute_jump_reg_dependencies): New prototype. Mark new parameters as unused. * config/ia64/ia64.md (load_symptr_low): Show a MEM. * config/ia64/ia64.c (ia64_expand_load_address): Generate it. Index: gcc/sched-ebb.c === --- gcc/sched-ebb.c (revision 176195) +++ gcc/sched-ebb.c (working copy) @@ -238,28 +238,18 @@ return 1; } - /* INSN is a JUMP_INSN, COND_SET is the set of registers that are -conditionally set before INSN. Store the set of registers that -must be considered as used by this jump in USED and that of -registers that must be considered as set in SET. */ + /* INSN is a JUMP_INSN. Store the set of registers that +must be considered as used by this jump in USED. */ void -ebb_compute_jump_reg_dependencies (rtx insn, regset cond_set, regset used, - regset set) +ebb_compute_jump_reg_dependencies (rtx insn, regset used) { basic_block b = BLOCK_FOR_INSN (insn); edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, b-succs) -if (e-flags EDGE_FALLTHRU) - /* The jump may be a by-product of a branch that has been merged -in the main codepath after being conditionalized. Therefore -it may guard the fallthrough block from using a value that has -conditionally overwritten that of the main codepath. So we -consider that it restores the value of the main codepath. */ - bitmap_and (set, df_get_live_in (e-dest),
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 11:48 AM, Daniel Carrera wrote: This patch adds a caf_runtime_error function to the non-MPI implementation of Coarray Fortran. It is based on the MPI function of the same name in mpi.c. Ok to commit? I was wondering - based on the discussion - whether one should remove the int error argument from caf_runtime_error and simply use exit (EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. Thus, OK for the patch as is - or with replacing all exit(...) by exit(EXIT_FAILURE), which uses stdlib.h's EXIT_FAILURE. One then can also drop the int error argument to the caf_runtime_error function. Thanks for the patch! Tobias ChangeLog: 2011-07-14 Daniel Carrera dcarr...@gmail.com * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function based on the function in mpi.c with the same name. (_gfortran_caf_init): Use caf_runtime_error. * caf/mpi.c (caf_runtime_error): Add a note to keep in sync with the function in single.c.
Re: [Patch, Fortran] Allocate + CAF library
*ping* ? On 07/11/2011 08:16 PM, Daniel Carrera wrote: Hello, This is my largest patch so far and the first that I'll commit myself. This patch improves support for the ALLOCATE statement when using the coarray library. Specifically, it adds support for the stat= and errmsg= attributes: ALLOCATE( x(n)[*] , stat=i , errmsg=str ) These attributes are now written by the CAF library. This patch also involves a good amount of code cleanup. ChangeLog is attached. As soon as I get the go-ahead, I'll commit this patch. Cheers, Daniel. -- I'm not overweight, I'm undertall.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich enkovich@gmail.com wrote: 2011/7/14 Richard Guenther richard.guent...@gmail.com: Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. Introduced target hook does not just define default value for option. It is meant to make decision in each particular case. For now it returns a constant value but march dependent heuristics will be added later. But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Ilya
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 12:04 PM, Tobias Burnus wrote: I was wondering - based on the discussion - whether one should remove the int error argument from caf_runtime_error and simply use exit (EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. You are the boss. The message I got from Nick's post is that it doesn't matter much and that you could even get surprising behaviour because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required to be non-zero. But maybe I missed the point. So it's up to you. Cheers, Daniel. -- I'm not overweight, I'm undertall.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n -- Michael Meissner, IBM Thanks for the note! Here is fixed patch. Ilya -- gcc/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (ftree-reassoc-width): New option documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [build] Move crtfastmath to toplevel libgcc
Andreas Schwab sch...@redhat.com writes: Same on ia64: Configuration mismatch! Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o I believe you need the following patch. As one of my next tasks, I'll work to completely move crtstuff/extra_parts/EXTRA_PARTS/EXTRA_MULTILIB_PARTS over to libgcc so there's no duplication of configuration between gcc and libgcc. If this tests ok, I suppose it can go in as obvious. Rainer 2011-07-14 Rainer Orth r...@cebitec.uni-bielefeld.de Revert: * config.gcc (ia64*-*-elf*): Remove crtfastmath.o from extra_parts. (ia64*-*-freebsd*): Likewise. (ia64*-*-linux*): Likewise. (mips64*-*-linux*): Likewise. (mips*-*-linux*): Likewise. Index: gcc/config.gcc === --- gcc/config.gcc (revision 176266) +++ gcc/config.gcc (working copy) @@ -1594,13 +1594,13 @@ then target_cpu_default=${target_cpu_default}|MASK_GNU_LD fi - extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o + extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o ;; ia64*-*-freebsd*) tm_file=${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} ia64/sysv4.h ia64/freebsd.h target_cpu_default=MASK_GNU_AS|MASK_GNU_LD tmake_file=${tmake_file} ia64/t-ia64 - extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o + extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o ;; ia64*-*-linux*) tm_file=${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ia64/sysv4.h ia64/linux.h @@ -1609,7 +1609,7 @@ tmake_file=${tmake_file} t-libunwind-elf ia64/t-glibc-libunwind fi target_cpu_default=MASK_GNU_AS|MASK_GNU_LD - extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o + extra_parts=crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o ;; ia64*-*-hpux*) tm_file=${tm_file} dbxelf.h elfos.h ia64/sysv4.h ia64/hpux.h @@ -1849,6 +1849,7 @@ tm_defines=${tm_defines} MIPS_ISA_DEFAULT=65 ;; esac + extra_parts=$extra_parts crtfastmath.o gnu_ld=yes gas=yes test x$with_llsc != x || with_llsc=yes @@ -1868,6 +1869,7 @@ mipsisa32*) tm_defines=${tm_defines} MIPS_ISA_DEFAULT=32 esac + extra_parts=$extra_parts crtfastmath.o test x$with_llsc != x || with_llsc=yes ;; mips*-*-openbsd*) -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 12:14 PM, Daniel Carrera wrote: On 07/14/2011 12:04 PM, Tobias Burnus wrote: I was wondering - based on the discussion - whether one should remove the int error argument from caf_runtime_error and simply use exit (EXIT_FAILURE) for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. You are the boss. The message I got from Nick's post is that it doesn't matter much and that you could even get surprising behaviour because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required to be non-zero. But maybe I missed the point. So it's up to you. Well, for exit(3) one knows that one will get a surprising in Windows: An abort. While EXIT_FAILURE [but also exit(1)] have a very high change to be working (nearly) everywhere. At the end, as long as the operating system does not react in a completely odd way, the exit status does not matter - it's the problem of the caller (e.g. some script) to make use of the result. If EXIT_FAILURE is 0 and the script assumes that it is a success, it's either a fault of the script writer, or of the one putting 0 for EXIT_FAILURE into the header, or a fault of the OS designer. Thus, I think the chance that one avoids surprises is higher with exit(EXIT_FAILURE) and one avoids having some random number (for the exit status) in the code. Users usually do not check the exact exit status value but only its 0- vs non-0-ness. Hence, I think exit(EXIT_FAILURE) is better than the current version. Tobias
[PATCH] Only insert required conversions during gimplification
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-07-14 Richard Guenther rguent...@suse.de * gimplify.c (gimplify_expr): Only do required conversions. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 176266) +++ gcc/gimplify.c (working copy) @@ -6787,22 +6787,20 @@ gimplify_expr (tree *expr_p, gimple_seq case TRUTH_NOT_EXPR: { - tree org_type = TREE_TYPE (*expr_p); - + tree orig_type = TREE_TYPE (*expr_p); *expr_p = gimple_boolify (*expr_p); - if (org_type != boolean_type_node) + if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p))) { - *expr_p = fold_convert (org_type, *expr_p); + *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p); ret = GS_OK; break; } + ret = gimplify_expr (TREE_OPERAND (*expr_p, 0), pre_p, post_p, +is_gimple_val, fb_rvalue); + recalculate_side_effects (*expr_p); + break; } - ret = gimplify_expr (TREE_OPERAND (*expr_p, 0), pre_p, post_p, - is_gimple_val, fb_rvalue); - recalculate_side_effects (*expr_p); - break; - case ADDR_EXPR: ret = gimplify_addr_expr (expr_p, pre_p, post_p); break; @@ -7227,40 +7225,36 @@ gimplify_expr (tree *expr_p, gimple_seq case TRUTH_OR_EXPR: case TRUTH_XOR_EXPR: { - tree org_type = TREE_TYPE (*expr_p); - + tree orig_type = TREE_TYPE (*expr_p); *expr_p = gimple_boolify (*expr_p); - - /* This shouldn't happen, but due fold-const (and here especially - fold_truth_not_expr) happily uses operand type and doesn't - automatically uses boolean_type as result, we need to keep - orignal type. */ - if (org_type != boolean_type_node) + if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p))) { - *expr_p = fold_convert (org_type, *expr_p); + *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p); ret = GS_OK; break; } - } - /* With two-valued operand types binary truth expressions are -semantically equivalent to bitwise binary expressions. Canonicalize -them to the bitwise variant. */ switch (TREE_CODE (*expr_p)) - { - case TRUTH_AND_EXPR: - TREE_SET_CODE (*expr_p, BIT_AND_EXPR); - break; - case TRUTH_OR_EXPR: - TREE_SET_CODE (*expr_p, BIT_IOR_EXPR); - break; - case TRUTH_XOR_EXPR: - TREE_SET_CODE (*expr_p, BIT_XOR_EXPR); - break; - default: - break; + /* Boolified binary truth expressions are semantically equivalent +to bitwise binary expressions. Canonicalize them to the +bitwise variant. */ + switch (TREE_CODE (*expr_p)) + { + case TRUTH_AND_EXPR: + TREE_SET_CODE (*expr_p, BIT_AND_EXPR); + break; + case TRUTH_OR_EXPR: + TREE_SET_CODE (*expr_p, BIT_IOR_EXPR); + break; + case TRUTH_XOR_EXPR: + TREE_SET_CODE (*expr_p, BIT_XOR_EXPR); + break; + default: + break; + } + + /* Continue classified as tcc_binary. */ + goto expr_2; } - /* Classified as tcc_expression. */ - goto expr_2; case FMA_EXPR: /* Classified as tcc_expression. */
Re: Correct fix for scheduler bug PR11320
Hello Bernd, FWIW, we have discussed your change with Alexander and we think you are right about the scheduler changes. One question is: On 14.07.2011 14:03, Bernd Schmidt wrote: --- gcc/sched-deps.c(revision 176195) +++ gcc/sched-deps.c(working copy) @@ -568,7 +568,7 @@ (rev1==rev2 ? reversed_comparison_code (cond2, NULL) : GET_CODE (cond2)) - XEXP (cond1, 0) == XEXP (cond2, 0) + rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0)) XEXP (cond1, 1) == XEXP (cond2, 1)) return 1; return 0; this hunk from conditions_mutex_p seems to be unrelated? Andrey
Re: Correct fix for scheduler bug PR11320
On 07/14/11 13:20, Andrey Belevantsev wrote: Hello Bernd, FWIW, we have discussed your change with Alexander and we think you are right about the scheduler changes. One question is: On 14.07.2011 14:03, Bernd Schmidt wrote: --- gcc/sched-deps.c(revision 176195) +++ gcc/sched-deps.c(working copy) @@ -568,7 +568,7 @@ (rev1==rev2 ? reversed_comparison_code (cond2, NULL) : GET_CODE (cond2)) - XEXP (cond1, 0) == XEXP (cond2, 0) + rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0)) XEXP (cond1, 1) == XEXP (cond2, 1)) return 1; return 0; this hunk from conditions_mutex_p seems to be unrelated? Oh yes, sorry about that. That was approved a while ago and I haven't gotten around to checking it in. Bernd
[patch] fix typo in doc/extend.texi, committed as obvious
fix a typo in doc/extend.texi, committed as obvious. Matthias 2011-07-14 Matthias Klose d...@ubuntu.com * doc/extend.texi (optimize attribute): Fix typo. Index: doc/extend.texi === --- doc/extend.texi (revision 176263) +++ doc/extend.texi (working copy) @@ -3594,7 +3594,7 @@ @item cpu=@var{CPU} @cindex @code{target(cpu=@var{CPU})} attribute Specify the architecture to generate code for when compiling the -function. If you select the @code{target(cpu=power7)} attribute when +function. If you select the @code{target(cpu=power7)} attribute when generating 32-bit code, VSX and Altivec instructions are not generated unless you use the @option{-mabi=altivec} option on the command line.
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
On Wed, Jul 13, 2011 at 2:15 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz ktiet...@googlemail.com wrote: 2011/7/13 Richard Guenther richard.guent...@gmail.com: On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz ktiet...@googlemail.com wrote: Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. Well, look into fold-const's fold_binary_loc function and see /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. First check for cases where an arithmetic operation is applied to a compound, conditional, or comparison operation. Push the arithmetic operation inside the compound or conditional to see if any folding can then be done. Convert comparison to conditional for this purpose. The also optimizes non-constant cases that used to be done in expand_expr. Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR, one of the operands is a comparison and the other is a comparison, a BIT_AND_EXPR with the constant 1, or a truth value. In that case, the code below would make the expression more complex. Change it to a TRUTH_{AND,OR}_EXPR. Likewise, convert a similar NE_EXPR to TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR. */ if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR || code == EQ_EXPR || code == NE_EXPR) ((truth_value_p (TREE_CODE (arg0)) (truth_value_p (TREE_CODE (arg1)) || (TREE_CODE (arg1) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg1, 1) || (truth_value_p (TREE_CODE (arg1)) (truth_value_p (TREE_CODE (arg0)) || (TREE_CODE (arg0) == BIT_AND_EXPR integer_onep (TREE_OPERAND (arg0, 1))) { tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR : TRUTH_XOR_EXPR, boolean_type_node, fold_convert_loc (loc, boolean_type_node, arg0), fold_convert_loc (loc, boolean_type_node, arg1)); if (code == EQ_EXPR) tem = invert_truthvalue_loc (loc, tem); return fold_convert_loc (loc, type, tem); } Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands are of kind truth. This is btw the point, why you see that those cases are handled. But as soon as this part is turned off for BIT_- IOR/AND, we need to do the folding for 1-bit precision case explicit. First of all this checks for a quite complex pattern - where do we pass such complex pattern from the gimple level to fold? For the EQ/NE_EXPR case forwprop probably might be able to feed it that, but then how does it go wrong? The above could also simply be guarded by !in_gimple_form. Richard. See reassoc pass as example I don't see where. and this hacky maybe_fold_and_comparisons / maybe_fold_or_comparisons functions. Yes, that case is known - but unless the folding result is sane we drop it anyway. As indeed we want still be able to do comparison foldings without getting back an TRUTH-op. Well, if we get back one we have canonicalize_cond_expr_cond to make it sane again. Additionally we have a lot of passes - like vectorizer - which are happily try to build new condition on tree-level. This is another That's true and you promised to clean up remaining TRUTH_* consumers / producers when doing the canonicalization to BIT_*. At the moment they get back to BIT_* via force_gimple_operand* which is probably even fine. place I saw issues and tree-cfg failures. And last but not least those truth-ops might be reintroduced in gimple_fold, as soon as we see bitwise-ops on one-bit precision integral type as truth_value. Well, I already said that one-bit-precision as truth_value is wrong. Relying on fold-const.c here with its very strict type rules always was a stretch. If it turns out there are now cases that we want to do differently then we finally have to bite the bullet and do it. There's always two options - duplicate it properly in fold_stmt (or forwprop in case it needs to lookup defs), or remove it from fold and make sure we fold things using fold_stmt at gimplifcation time. Richard. Kai
Re: [patch] fix typo in doc/extend.texi, committed as obvious
On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose d...@ubuntu.com wrote: fix a typo in doc/extend.texi, committed as obvious. Specify the architecture to generate code for when compiling the -function. If you select the @code{target(cpu=power7)} attribute when +function. If you select the @code{target(cpu=power7)} attribute when still mismatched number of ''s. I suggest to drop the first one. Richard. Matthias
Re: RFA: Avoid unnecessary clearing in union initialisers
H.J. Lu hjl.to...@gmail.com writes: This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736 Sorry for the breakage. It was due to a humiliating stupid mistake in the hunk to update all_zeros_p: @@ -5129,13 +5152,12 @@ mostly_zeros_p (const_tree exp) all_zeros_p (const_tree exp) { if (TREE_CODE (exp) == CONSTRUCTOR) - { - HOST_WIDE_INT nz_elts, count; - bool must_clear; + HOST_WIDE_INT nz_elts, init_elts; + bool complete_p; - categorize_ctor_elements (exp, nz_elts, count, must_clear); - return nz_elts == 0; + categorize_ctor_elements (exp, nz_elts, init_elts, complete_p); + return nz_elts == init_elts; } return initializer_zerop (exp); which was due to trying a different approach and not properly reverting it. I've applied the following as obvious. Tested on x86_64-linux-gnu, and on 255.vortex. Richard gcc/ PR middle-end/49736 * expr.c (all_zeros_p): Undo bogus part of last change. Index: gcc/expr.c === --- gcc/expr.c 2011-07-14 11:46:42.0 +0100 +++ gcc/expr.c 2011-07-14 11:47:40.0 +0100 @@ -5157,7 +5157,7 @@ all_zeros_p (const_tree exp) bool complete_p; categorize_ctor_elements (exp, nz_elts, init_elts, complete_p); - return nz_elts == init_elts; + return nz_elts == 0; } return initializer_zerop (exp);
Re: Ping: The TI C6X port
On 06/14/11 22:52, Vladimir Makarov wrote: On 06/06/2011 07:26 AM, Bernd Schmidt wrote: Ping^3 for the C6X port. Now with extra patches: Additional preliminary scheduler tweaks: http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02408.html It is ok for me. Thanks, Bernd. I've committed this with one change: I've dropped the hunk that resolves debug_insn dependencies rather than deleting them. We should do that, but when I tested these changes together with the backtracking scheduler, I saw some problems, so I'll postpone this. Bernd
Re: Correct fix for scheduler bug PR11320
The real problem here is that the ia64 backend lies to the rest of the compiler; it produces a load instruction without showing a MEM anywhere in the instruction pattern. Hence, the following patch, which reverts the bogus scheduler changes and adds a MEM to a pattern in ia64.md. This is probably the root cause of the problem, indeed. But you don't revert everything so, if this isn't an oversight, then the ChangeLog is incorrect. And there is another change in sched-deps.c not mentioned in the ChangeLog. -- Eric Botcazou
Re: Correct fix for scheduler bug PR11320
On 07/14/11 13:57, Eric Botcazou wrote: The real problem here is that the ia64 backend lies to the rest of the compiler; it produces a load instruction without showing a MEM anywhere in the instruction pattern. Hence, the following patch, which reverts the bogus scheduler changes and adds a MEM to a pattern in ia64.md. This is probably the root cause of the problem, indeed. But you don't revert everything so, if this isn't an oversight, then the ChangeLog is incorrect. And there is another change in sched-deps.c not mentioned in the ChangeLog. Well, the actual code has completely changed in the meantime. All the hunks of the original patch failed :) I can write a new ChangeLog entry if that seems important. Any particular bits you still see that don't get reverted with this patch? Bernd
Re: Correct fix for scheduler bug PR11320
On 07/14/11 14:18, Eric Botcazou wrote: Any particular bits you still see that don't get reverted with this patch? The ebb_compute_jump_reg_dependencies changes. The original patch has: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. Mark registers live on entry of non-fallthrough blocks as used by the jump. but you're reverting only: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. ??? Original code: basic_block b = BLOCK_FOR_INSN (insn); edge e; for (e = b-succ; e; e = e-succ_next) ! if ((e-flags EDGE_FALLTHRU) == 0) ! { ! bitmap_operation (set, set, e-dest-global_live_at_start, ! BITMAP_IOR); ! } } Code after the revert: FOR_EACH_EDGE (e, ei, b-succs) +if ((e-flags EDGE_FALLTHRU) == 0) bitmap_ior_into (used, df_get_live_in (e-dest)); As far as I can tell these are identical, modulo the change in variable name (set - used which seems like a better name). Bernd
Re: Correct fix for scheduler bug PR11320
Any particular bits you still see that don't get reverted with this patch? The ebb_compute_jump_reg_dependencies changes. The original patch has: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. Mark registers live on entry of non-fallthrough blocks as used by the jump. but you're reverting only: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. -- Eric Botcazou
[PATCH] Fold to more canonical TRUHT_NOT_EXPRs
The following generates boolean-typed TRUTH_NOT_EXPRs instead of operating on converted boolean operands. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-07-14 Richard Guenther rguent...@suse.de * fold-const.c (fold_binary_loc): Convert the !bool_var result, not bool_var when folding bool_var != 1 or bool_var == 0. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 176266) +++ gcc/fold-const.c(working copy) @@ -12156,14 +12156,16 @@ fold_binary_loc (location_t loc, /* bool_var != 1 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE integer_onep (arg1) code == NE_EXPR) -return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, - fold_convert_loc (loc, type, arg0)); +return fold_convert_loc (loc, type, +fold_build1_loc (loc, TRUTH_NOT_EXPR, + TREE_TYPE (arg0), arg0)); /* bool_var == 0 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE integer_zerop (arg1) code == EQ_EXPR) -return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, - fold_convert_loc (loc, type, arg0)); +return fold_convert_loc (loc, type, +fold_build1_loc (loc, TRUTH_NOT_EXPR, + TREE_TYPE (arg0), arg0)); /* !exp != 0 becomes !exp */ if (TREE_CODE (arg0) == TRUTH_NOT_EXPR integer_zerop (arg1)
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Wed, Jul 13, 2011 at 09:58:08AM -0700, Richard Henderson wrote: Why the force_operand? You've got register inputs. Either the target is going to support the operation or it isn't. I agree that it doesn't seem to be necessary. I've used force_operand since ivopts (add_cost) is doing it without seeing a clear reason for it. So I've removed it now. Saving cost data dependent on speed, which is non-constant. You probably need to make this a two dimensional array. Fixed. Here is an updated version. Bye, -Andreas- 2011-07-14 Andreas Krebbel andreas.kreb...@de.ibm.com * tree-ssa-math-opts.c (compute_costs): New function. (convert_mult_to_fma): Take costs into account when propagating multiplications into several additions. * config/s390/s390.c (z196_costs): Adjust costs for madbr and maebr. Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c.orig --- gcc/tree-ssa-math-opts.c *** convert_plusminus_to_widen (gimple_stmt_ *** 2185,2190 --- 2185,2236 return true; } + /* Computing the costs for calculating RTX with CODE in MODE. */ + + static unsigned + compute_costs (enum machine_mode mode, enum rtx_code code, bool speed) + { + rtx insn; + unsigned cost; + + switch (GET_RTX_LENGTH (code)) + { + case 2: + insn = gen_rtx_fmt_ee (code, mode, +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)); + break; + case 3: + insn = gen_rtx_fmt_eee (code, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3)); + break; + default: + gcc_unreachable (); + } + + if (dump_file (dump_flags TDF_DETAILS)) + { + fprintf (dump_file, Calculating costs of %s in %s mode. RTX is:\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode)); + print_rtl (dump_file, insn); + } + + cost = rtx_cost (insn, SET, speed); + + /* If the backend returns a cost of zero it is most certainly lying. + Set this to one in order to notice that we already calculated it + once. */ + cost = cost ? cost : 1; + + if (dump_file (dump_flags TDF_DETAILS)) + fprintf (dump_file, \n%s in %s costs %d\n\n, + GET_RTX_NAME (code), GET_MODE_NAME (mode), cost); + + return cost; + } + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ *** convert_mult_to_fma (gimple mul_stmt, tr *** 2197,2202 --- 2243,2254 gimple use_stmt, neguse_stmt, fma_stmt; use_operand_p use_p; imm_use_iterator imm_iter; + enum machine_mode mode; + int uses = 0; + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[2][NUM_MACHINE_MODES]; + static unsigned add_cost[2][NUM_MACHINE_MODES]; + static unsigned fma_cost[2][NUM_MACHINE_MODES]; if (FLOAT_TYPE_P (type) flag_fp_contract_mode == FP_CONTRACT_OFF) *** convert_mult_to_fma (gimple mul_stmt, tr *** 2213, if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; /* Make sure that the multiplication statement becomes dead after ! the transformation, thus that all uses are transformed to FMAs. ! This means we assume that an FMA operation has the same cost ! as an addition. */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result) { enum tree_code use_code; --- 2265,2281 if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; + mode = TYPE_MODE (type); + + if (!fma_cost[speed][mode]) + { + fma_cost[speed][mode] = compute_costs (mode, FMA, speed); + add_cost[speed][mode] = compute_costs (mode, PLUS, speed); + mul_cost[speed][mode] = compute_costs (mode, MULT, speed); + } + /* Make sure that the multiplication statement becomes dead after ! the transformation, thus that all uses are transformed to FMAs. */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result) { enum tree_code use_code; *** convert_mult_to_fma (gimple mul_stmt, tr *** 2292,2297 --- 2351,2357 if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false; + uses++; /* While it is possible to validate whether or not the exact form that we've recognized is available in the backend, the assumption is that the transformation is never a loss. For instance, suppose
[PATCH] More efficiently canoncialize operands in fold_stmt
We currently go through a tree expression generation when canonicalizing operand order. That's quite wasteful. The following patch re-orders things in a way to avoid this and remove some duplicate code. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-07-14 Richard Guenther rguent...@suse.de * gimple-fold.c (fold_gimple_assign): Remove operand swapping. (fold_stmt_1): Do it here directly on gimple and as a first thing. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 176267) --- gcc/gimple-fold.c (working copy) *** fold_gimple_assign (gimple_stmt_iterator *** 817,842 if (!result) result = fold_binary_loc (loc, subcode, ! TREE_TYPE (gimple_assign_lhs (stmt)), ! gimple_assign_rhs1 (stmt), ! gimple_assign_rhs2 (stmt)); if (result) { STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; - - /* Fold might have produced non-GIMPLE, so if we trust it blindly -we lose canonicalization opportunities. Do not go again -through fold here though, or the same non-GIMPLE will be -produced. */ - if (commutative_tree_code (subcode) -tree_swap_operands_p (gimple_assign_rhs1 (stmt), -gimple_assign_rhs2 (stmt), false)) - return build2 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)), -gimple_assign_rhs2 (stmt), -gimple_assign_rhs1 (stmt)); } break; --- 817,831 if (!result) result = fold_binary_loc (loc, subcode, ! TREE_TYPE (gimple_assign_lhs (stmt)), ! gimple_assign_rhs1 (stmt), ! gimple_assign_rhs2 (stmt)); if (result) { STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; } break; *** fold_gimple_assign (gimple_stmt_iterator *** 852,869 STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; - - /* Fold might have produced non-GIMPLE, so if we trust it blindly -we lose canonicalization opportunities. Do not go again -through fold here though, or the same non-GIMPLE will be -produced. */ - if (commutative_ternary_tree_code (subcode) -tree_swap_operands_p (gimple_assign_rhs1 (stmt), -gimple_assign_rhs2 (stmt), false)) - return build3 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)), - gimple_assign_rhs2 (stmt), - gimple_assign_rhs1 (stmt), - gimple_assign_rhs3 (stmt)); } break; --- 841,846 *** fold_stmt_1 (gimple_stmt_iterator *gsi, *** 1576,1583 case GIMPLE_ASSIGN: { unsigned old_num_ops = gimple_num_ops (stmt); ! tree new_rhs = fold_gimple_assign (gsi); tree lhs = gimple_assign_lhs (stmt); if (new_rhs !useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs))) --- 1553,1574 case GIMPLE_ASSIGN: { unsigned old_num_ops = gimple_num_ops (stmt); ! enum tree_code subcode = gimple_assign_rhs_code (stmt); tree lhs = gimple_assign_lhs (stmt); + tree new_rhs; + /* First canonicalize operand order. This avoids building new + trees if this is the only thing fold would later do. */ + if ((commutative_tree_code (subcode) +|| commutative_ternary_tree_code (subcode)) +tree_swap_operands_p (gimple_assign_rhs1 (stmt), +gimple_assign_rhs2 (stmt), false)) + { + tree tem = gimple_assign_rhs1 (stmt); + gimple_assign_set_rhs1 (stmt, gimple_assign_rhs2 (stmt)); + gimple_assign_set_rhs2 (stmt, tem); + changed = true; + } + new_rhs = fold_gimple_assign (gsi); if (new_rhs !useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
Re: Correct fix for scheduler bug PR11320
??? Original code: basic_block b = BLOCK_FOR_INSN (insn); edge e; for (e = b-succ; e; e = e-succ_next) ! if ((e-flags EDGE_FALLTHRU) == 0) ! { ! bitmap_operation (set, set, e-dest-global_live_at_start, ! BITMAP_IOR); ! } } Code after the revert: FOR_EACH_EDGE (e, ei, b-succs) +if ((e-flags EDGE_FALLTHRU) == 0) bitmap_ior_into (used, df_get_live_in (e-dest)); As far as I can tell these are identical, modulo the change in variable name (set - used which seems like a better name). Yes, the code does the same thing, but the original patch did clear up the confusion set/use in sched_analyze_insn and compute_jump_reg_dependencies, in particular in the comment of the latter. But OK, never mind. -- Eric Botcazou
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
This update changes only the context modified by changes to patch 2. The patch has already been approved. I'm just posting it for completeness. Andrew 2011-07-14 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single conversion statement separating multiply-and-accumulate. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler mul } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler umlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum tree_code code) { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; + gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; tree type, type1, type2, tmp; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; @@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else return false; + /* Allow for one conversion statement between the multiply + and addition/subtraction statement. If there are more than + one conversions then we assume they would invalidate this + transformation. If that's not the case then they should have + been folded before now. */ + if (CONVERT_EXPR_CODE_P (rhs1_code)) +{ + conv1_stmt = rhs1_stmt; + rhs1 = gimple_assign_rhs1 (rhs1_stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + } + else + return false; +} + if (CONVERT_EXPR_CODE_P (rhs2_code)) +{ + conv2_stmt = rhs2_stmt; + rhs2 = gimple_assign_rhs1 (rhs2_stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); + } + else + return false; +} + /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call is_widening_mult_p, but we still need the rhs returns. @@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, type2, mult_rhs2)) return false; add_rhs = rhs2; + conv_stmt = conv1_stmt; } else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { @@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, type2, mult_rhs2)) return false; add_rhs = rhs1; + conv_stmt = conv2_stmt; } else return false; @@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) return false; + /* If there was a conversion between the multiply and addition + then we need to make sure it fits a multiply-and-accumulate. + The should be a single mode change which does not change the + value. */ + if (conv_stmt) +{ + tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt)); + tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt)); + int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + bool is_unsigned = TYPE_UNSIGNED (type1) TYPE_UNSIGNED (type2); + + if (TYPE_PRECISION (from_type) TYPE_PRECISION (to_type)) + { + /* Conversion is a truncate. */ + if (TYPE_PRECISION (to_type) data_size) + return false; + } + else if (TYPE_PRECISION (from_type) TYPE_PRECISION (to_type)) + { + /* Conversion is an extend. Check it's the right sort. */ + if (TYPE_UNSIGNED (from_type) != is_unsigned + !(is_unsigned TYPE_PRECISION (from_type) data_size)) + return false; + } + /* else convert is a no-op for our purposes. */ +} + /* Verify that the machine can perform a widening multiply accumulate in this mode/signedness combination, otherwise this transformation is likely to pessimize code. */
Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
I've updated this patch following the changes earlier in the patch series. There isn't much left. This should obviate all the review comments. :) OK? Andrew 2011-07-14 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME. Ensure the the larger type is the first operand. gcc/testsuite/ * gcc.target/arm/wmul-7.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-7.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +unsigned long long +foo (unsigned long long a, unsigned char *b, unsigned short *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler umlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2053,9 +2053,17 @@ is_widening_mult_p (gimple stmt, *type2_out = *type1_out; } - /* FIXME: remove this restriction. */ - if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out)) -return false; + /* Ensure that the larger of the two operands comes first. */ + if (TYPE_PRECISION (*type1_out) TYPE_PRECISION (*type2_out)) +{ + tree tmp; + tmp = *type1_out; + *type1_out = *type2_out; + *type2_out = tmp; + tmp = *rhs1_out; + *rhs1_out = *rhs2_out; + *rhs2_out = tmp; +} return true; }
Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
On Thu, Jul 14, 2011 at 4:28 PM, Andrew Stubbs a...@codesourcery.com wrote: I've updated this patch following the changes earlier in the patch series. There isn't much left. This should obviate all the review comments. :) Indeed ;) OK? Ok. Thanks, Richard. Andrew
Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
On 07/07/11 11:13, Richard Guenther wrote: This updates the context changed by my update to patch 3. The content of this patch has not changed. Ok. I know this patch was already approved, but I discovered a bug in this patch that missed optimizing the case where the input to multiply did not come from an assign statement (this can happen when the value comes from a function parameter). This patch fixes that case, and updates the context changed by my updates earlier in the patch series. OK? Andrew 2011-07-14 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'rhs'. Don't reject non-conversion statements. Do return lhs in this case. (is_widening_mult_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'stmt'. Pass type to is_widening_mult_rhs_p. (convert_mult_to_widen): Pass type to is_widening_mult_p. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-8.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-8.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, int *b, int *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler smlal } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1965,7 +1965,8 @@ struct gimple_opt_pass pass_optimize_bswap = } }; -/* Return true if RHS is a suitable operand for a widening multiplication. +/* Return true if RHS is a suitable operand for a widening multiplication, + assuming a target type of TYPE. There are two cases: - RHS makes some value at least twice as wide. Store that value @@ -1975,32 +1976,43 @@ struct gimple_opt_pass pass_optimize_bswap = but leave *TYPE_OUT untouched. */ static bool -is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) +is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out, + tree *new_rhs_out) { gimple stmt; - tree type, type1, rhs1; + tree type1, rhs1; enum tree_code rhs_code; if (TREE_CODE (rhs) == SSA_NAME) { - type = TREE_TYPE (rhs); stmt = SSA_NAME_DEF_STMT (rhs); if (!is_gimple_assign (stmt)) - return false; - - rhs_code = gimple_assign_rhs_code (stmt); - if (TREE_CODE (type) == INTEGER_TYPE - ? !CONVERT_EXPR_CODE_P (rhs_code) - : rhs_code != FIXED_CONVERT_EXPR) - return false; + { + rhs1 = NULL; + type1 = TREE_TYPE (rhs); + } + else + { + rhs1 = gimple_assign_rhs1 (stmt); + type1 = TREE_TYPE (rhs1); + } - rhs1 = gimple_assign_rhs1 (stmt); - type1 = TREE_TYPE (rhs1); if (TREE_CODE (type1) != TREE_CODE (type) || TYPE_PRECISION (type1) * 2 TYPE_PRECISION (type)) return false; - *new_rhs_out = rhs1; + if (rhs1) + { + rhs_code = gimple_assign_rhs_code (stmt); + if (TREE_CODE (type) == INTEGER_TYPE + ? !CONVERT_EXPR_CODE_P (rhs_code) + : rhs_code != FIXED_CONVERT_EXPR) + *new_rhs_out = rhs; + else + *new_rhs_out = rhs1; + } + else + *new_rhs_out = rhs; *type_out = type1; return true; } @@ -2015,28 +2027,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) return false; } -/* Return true if STMT performs a widening multiplication. If so, - store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT - respectively. Also fill *RHS1_OUT and *RHS2_OUT such that converting - those operands to types *TYPE1_OUT and *TYPE2_OUT would give the - operands of the multiplication. */ +/* Return true if STMT performs a widening multiplication, assuming the + output type is TYPE. If so, store the unwidened types of the operands + in *TYPE1_OUT and *TYPE2_OUT respectively. Also fill *RHS1_OUT and + *RHS2_OUT such that converting those operands to types *TYPE1_OUT + and *TYPE2_OUT would give the operands of the multiplication. */ static bool -is_widening_mult_p (gimple stmt, +is_widening_mult_p (tree type, gimple stmt, tree *type1_out, tree *rhs1_out, tree *type2_out, tree *rhs2_out) { - tree type; - - type = TREE_TYPE (gimple_assign_lhs (stmt)); if (TREE_CODE (type) != INTEGER_TYPE TREE_CODE (type) != FIXED_POINT_TYPE) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out, + rhs1_out)) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out, + rhs2_out)) return false; if (*type1_out == NULL) @@ -2088,7 +2099,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) if (TREE_CODE (type) != INTEGER_TYPE) return false; - if (!is_widening_mult_p (stmt, type1, rhs1, type2, rhs2)) + if
Re: [PATCH][1/N][C][C++][Fortran][Java] Change POINTER_PLUS_EXPR offset type requirements
On 07/14/2011 04:28 AM, Richard Guenther wrote: On Tue, 12 Jul 2011, Richard Guenther wrote: This patch is step 1, it abstracts away the type of the offset operand when building a POINTER_PLUS_EXPR. It does so by introducing fold_build_pointer_plus_expr{_hwi,_loc} helpers and use them in all places that currently build POINTER_PLUS_EXPRs (including those that do not fold the result). This patch will make steps 2 and more purely middle-end (fingers crossed). Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages. I also built cc1 for all supported targets via contrib/config-list.mk and saw no fails that relate to this patch. Are the frontend parts ok? Ping. Joseph seems to be on vacation - Richard, can you approve the C frontend/family parts? Yeah, this patch is ok. r~
Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
On Thu, Jul 14, 2011 at 4:34 PM, Andrew Stubbs a...@codesourcery.com wrote: On 07/07/11 11:13, Richard Guenther wrote: This updates the context changed by my update to patch 3. The content of this patch has not changed. Ok. I know this patch was already approved, but I discovered a bug in this patch that missed optimizing the case where the input to multiply did not come from an assign statement (this can happen when the value comes from a function parameter). This patch fixes that case, and updates the context changed by my updates earlier in the patch series. OK? Ok. Thanks, Richard. Andrew
Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
On 28/06/11 17:23, Andrew Stubbs wrote: On 23/06/11 15:43, Andrew Stubbs wrote: Patch 4 introduced support for using signed multiplies to code unsigned multiplies in a narrower mode. Patch 5 then introduced support for mis-matched input modes. These two combined mean that there is case where only the smaller of two inputs is unsigned, and yet it still tries to user a mode wider than the larger, signed input. This is bad because it means unnecessary extends and because the wider operation might not exist. This patch catches that case, and ensures that the smaller, unsigned input, is zero-extended to match the mode of the larger, signed input. Of course, both inputs may still have to be extended to fit the nearest available instruction, so it doesn't make a difference every time. OK? This update fixes Janis' issue with the testsuite. And this version is updated to fit the changes made earlier in the series, and also to use the precision, instead of the mode-size, in order to better optimize bitfields. OK? Andrew 2011-06-24 Andrew Stubbs a...@codesourcery.com gcc/ * tree-ssa-math-opts.c (convert_mult_to_widen): Better handle unsigned inputs of different modes. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-9.c: New file. * gcc.target/arm/wmul-bitfield-2.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +long long +foo (long long a, short *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler smlalbb } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -march=armv7-a } */ + +struct bf +{ + int a : 3; + unsigned int b : 15; + int c : 3; +}; + +long long +foo (long long a, struct bf b, struct bf c) +{ + return a + b.b * c.c; +} + +/* { dg-final { scan-assembler smlalbb } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2121,9 +2121,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) { if (op != smul_widen_optab) { - from_mode = GET_MODE_WIDER_MODE (from_mode); - if (GET_MODE_SIZE (to_mode) = GET_MODE_SIZE (from_mode)) - return false; + /* We can use a signed multiply with unsigned types as long as + there is a wider mode to use, or it is the smaller of the two + types that is unsigned. Note that type1 = type2, always. */ + if ((TYPE_UNSIGNED (type1) + TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode)) + || (TYPE_UNSIGNED (type2) + TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode))) + { + from_mode = GET_MODE_WIDER_MODE (from_mode); + if (GET_MODE_SIZE (to_mode) = GET_MODE_SIZE (from_mode)) + return false; + } op = smul_widen_optab; handler = find_widening_optab_handler_and_mode (op, to_mode, @@ -2290,14 +2299,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, /* There's no such thing as a mixed sign madd yet, so use a wider mode. */ if (from_unsigned1 != from_unsigned2) { - enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode); - if (GET_MODE_PRECISION (mode) GET_MODE_PRECISION (to_mode)) + /* We can use a signed multiply with unsigned types as long as + there is a wider mode to use, or it is the smaller of the two + types that is unsigned. Note that type1 = type2, always. */ + if ((from_unsigned1 + TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode)) + || (from_unsigned2 + TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode))) { - from_mode = mode; - from_unsigned1 = from_unsigned2 = false; + from_mode = GET_MODE_WIDER_MODE (from_mode); + if (GET_MODE_SIZE (from_mode) = GET_MODE_SIZE (to_mode)) + return false; } - else - return false; + + from_unsigned1 = from_unsigned2 = false; } /* If there was a conversion between the multiply and addition
Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
On Thu, Jul 14, 2011 at 4:38 PM, Andrew Stubbs a...@codesourcery.com wrote: On 28/06/11 17:23, Andrew Stubbs wrote: On 23/06/11 15:43, Andrew Stubbs wrote: Patch 4 introduced support for using signed multiplies to code unsigned multiplies in a narrower mode. Patch 5 then introduced support for mis-matched input modes. These two combined mean that there is case where only the smaller of two inputs is unsigned, and yet it still tries to user a mode wider than the larger, signed input. This is bad because it means unnecessary extends and because the wider operation might not exist. This patch catches that case, and ensures that the smaller, unsigned input, is zero-extended to match the mode of the larger, signed input. Of course, both inputs may still have to be extended to fit the nearest available instruction, so it doesn't make a difference every time. OK? This update fixes Janis' issue with the testsuite. And this version is updated to fit the changes made earlier in the series, and also to use the precision, instead of the mode-size, in order to better optimize bitfields. OK? Ok. Thanks, Richard. Andrew
[PATCH] Fix PR49651
This fixes PR49651 where we fail to handle aggregate dereferences in call arguments in PTA properly. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied where applicable. Richard. 2011-07-14 Richard Guenther rguent...@suse.de PR tree-optimization/49651 * tree-ssa-structalias.c (get_constraint_for_1): Properly handle dereferences with subvariables. * gcc.dg/torture/pr49651.c: New testcase. Index: gcc/tree-ssa-structalias.c === *** gcc/tree-ssa-structalias.c (revision 176272) --- gcc/tree-ssa-structalias.c (working copy) *** get_constraint_for_1 (tree t, VEC (ce_s, *** 3349,3357 /* If we are not taking the address then make sure to process all subvariables we might access. */ cs = *VEC_last (ce_s, *results); ! if (address_p ! || cs.type != SCALAR) return; vi = get_varinfo (cs.var); --- 3349,3366 /* If we are not taking the address then make sure to process all subvariables we might access. */ + if (address_p) + return; + cs = *VEC_last (ce_s, *results); ! if (cs.type == DEREF) ! { ! /* For dereferences this means we have to defer it !to solving time. */ ! VEC_last (ce_s, *results)-offset = UNKNOWN_OFFSET; ! return; ! } ! if (cs.type != SCALAR) return; vi = get_varinfo (cs.var); Index: gcc/testsuite/gcc.dg/torture/pr49651.c === *** gcc/testsuite/gcc.dg/torture/pr49651.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr49651.c (revision 0) *** *** 0 --- 1,31 + /* { dg-do run } */ + + extern void abort (void); + + struct X { + int *p; + int *q; + }; + + void __attribute__((noinline, noclone)) + foo (struct X x) { *x.q = 0; } + + volatile int what; + struct X y; + + int main() + { + int i, j; + struct X x, *p; + x.p = i; + x.q = j; + if (what) + p = y; + else + p = x; + j = 1; + foo (*p); + if (j != 0) + abort (); + return 0; + }
Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)
On 07/14/2011 12:41 AM, Georg-Johann Lay wrote: Ok to commit and back-port to 4.6? Johann PR target/49487 * config/avr/avr.md (rotlmode3): Generate SCRATCH instead of REG. (*rotwmode): Use const_int_operand for operands2. Use match_scatch for operands3. (*rotbmode): Ditto * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH. Ok. r~
Re: [PATCH (1/7)] New optab framework for widening multiplies
Ping. This is the last unreviewed patch in this series ... Thanks Andrew On 09/07/11 15:43, Andrew Stubbs wrote: On 23/06/11 15:37, Andrew Stubbs wrote: This patch should have no effect on the compiler output. It merely replaces one way to represent widening operations with another, and refactors the other parts of the compiler to match. The rest of the patch set uses this new framework to implement the optimization improvements. I considered and discarded many approaches to this patch before arriving at this solution, and I feel sure that there'll be somebody out there who will think I chose the wrong one, so let me first explain how I got here The aim is to be able to encode and query optabs that have any given input mode, and any given output mode. This is similar to the convert_optab, but not compatible with that optab since it is handled completely differently in the code. (Just to be clear, the existing widening multiply support only covers instructions that widen by *one* mode, so it's only ever been necessary to know the output mode, up to now.) Option 1 was to add a second dimension to the handlers table in optab_d, but I discarded this option because it would increase the memory usage by the square of the number of modes, which is a bit much. Option 2 was to add a whole new optab, similar to optab_d, but with a second dimension like convert_optab_d, however this turned out to cause way too many pointer type mismatches in the code, and would have been very difficult to fix up. Option 3 was to add new optab entries for widening by two modes, by three modes, and so on. True, I would only need to add one extra set for what I need, but there would be so many places in the code that compare against smul_widen_optab, for example, that would need to be taught about these, that it seemed like a bad idea. Option 4 was to have a separate table that contained the widening operations, and refer to that whenever a widening entry in the main optab is referenced, but I found that there was no easy way to do the mapping without putting some sort of switch table in widening_optab_handler, and that negates the other advantages. So, what I've done in the end is add a new pointer entry widening into optab_d, and dynamically build the widening operations table for each optab that needs it. I've then added new accessor functions that take both input and output modes, and altered the code to use them where appropriate. The down-side of this approach is that the optab entries for widening operations now have two handlers tables, one of which is redundant. That said, those cases are in the minority, and it is the smaller table which is unused. If people find that very distasteful, it might be possible to remove the *_widen_optab entries and unify smul_optab with smul_widen_optab, and so on, and save space that way. I've not done so yet, but I expect I could if people feel strongly about it. As a side-effect, it's now possible for any optab to be widening, should some target happen to have a widening add, shift, or whatever. Is this patch OK? This update has been rebaselined to fix some conflicts with other recent commits in this area. I also identified a small bug which resulted in the operands to some commutative operations being reversed. I don't believe the bug did any harm, logically speaking, but I suppose there could be a testcase that resulted in worse code being generated. With this fix, I now see exactly matching output in all my testcases. Andrew
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther richard.guent...@gmail.com: But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya Here is a patch with new param instead of new option. Bootstrapped and checked on x86_64-linux. Ilya -- gcc/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya ilya.enkov...@intel.com * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
[testsuite]: Some test case skips.
This patchlet fixes two test cases: * gcc.dg/pr32912-2.c: TImode is overkill for AVR. * gcc.dg/pr44674.c: Test if -fprofile-generate is available. Ok to commit? Johann testsuite/ * gcc.dg/pr32912-2.c: Skip for AVR. * gcc.dg/pr44674.c: Add dg-require-profiling. Index: gcc.dg/pr32912-2.c === --- gcc.dg/pr32912-2.c (revision 176136) +++ gcc.dg/pr32912-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -O2 -w } */ +/* { dg-skip-if TImode not supported { avr-*-* } { * } { } } */ extern void abort (void); Index: gcc.dg/pr44674.c === --- gcc.dg/pr44674.c(revision 176136) +++ gcc.dg/pr44674.c(working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -O -fprofile-generate } */ +/* { dg-require-profiling -fprofile-generate } */ void jumpfunc (void *p)
Re: [PATCH] New IPA-CP with real function cloning
Hi, like with the previous mail, I'll reply to the comments here and then post a new version of the patch in a separate thread. On Sun, Jul 10, 2011 at 07:04:21PM +0200, Jan Hubicka wrote: /* If checking is enabled, verify that no lattice is in the TOP state, i.e. not bottom, not containing a variable component and without any known value at the same time. */ static void verify_propagated_values (void) { #ifdef ENABLE_CHECKING Hmm, would not be better to keep this function around to be called from debugger, like other verifiers do? OK. struct cgraph_node *node; FOR_EACH_DEFINED_FUNCTION (node) { struct ipa_node_params *info = IPA_NODE_REF (node); int i, count = ipa_get_param_count (info); for (i = 0; i count; i++) { struct ipcp_lattice *lat = ipa_get_lattice (info, i); if (!lat-bottom !lat-contains_variable lat-values_count == 0) { if (dump_file) { fprintf (dump_file, \nIPA lattices after constant propagation:\n); print_all_lattices (dump_file, true, false); } gcc_unreachable (); } } } #endif } /* Propagate values through a pass-through jump function JFUNC associated with edge CS, taking values from SRC_LAT and putting them into DEST_LAT. SRC_IDX is the index of the source parameter. */ static bool propagate_vals_accross_pass_through (struct cgraph_edge *cs, struct ipa_jump_func *jfunc, struct ipcp_lattice *src_lat, struct ipcp_lattice *dest_lat, int src_idx) { struct ipcp_value *src_val; bool ret = false; if (jfunc-value.pass_through.operation == NOP_EXPR) for (src_val = src_lat-values; src_val; src_val = src_val-next) ret |= add_value_to_lattice (dest_lat, src_val-value, cs, src_val, src_idx); else if (edge_within_scc (cs)) ret = set_lattice_contains_variable (dest_lat); Hmm, is the reason for not using artimetic within SCC documented somewhere? It seems like code someone will eventually run into and remove without much of consideration. I added a comment. /* Calculate devirtualization time bonus for NODE, assuming we know KNOWN_CSTS and KNOWN_BINFOS. */ static int devirtualization_time_bonus (struct cgraph_node *node, VEC (tree, heap) *known_csts, VEC (tree, heap) *known_binfos) Eventually it would be nice to make this common with inliner analysis. We want to increase inlining limits here too True. /* Only bare minimum benefit for clearly un-inlineable targets. */ res += 1; callee = cgraph_get_node (target); if (!callee) continue; Hmm, when you test it here, you might probably ask if callee is analyzed and inlinable then ;) Good point, did that. /* Return true if cloning NODE is a good idea, given the estimated TIME_BENEFIT and SIZE_COST and with the sum of frequencies of incoming edges to the potential new clone in FREQUENCIES. */ static bool good_cloning_opportunity_p (struct cgraph_node *node, int time_benefit, int freq_sum, gcov_type count_sum, int size_cost) { if (time_benefit == 0 || !flag_ipa_cp_clone || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node-decl))) return false; Still I think cloning oppurtunity is good if the saving on call size reduce enough to pay back for function body duplication. It would probably make sense then to create simple wrapper functions instead of duplicating the function body. I was already doing that for considering IPA-CP opportunities for all known contexts - and the effect is more profound still now when we consider moving costs. On the other side, when opportunities for function specialization are concerned, size is always considered a cost and never a benefit when doing the calculations. This simplifies a the code a bit because the size savings are on the side of the caller and so it depends on the number of call sites that actually get redirected. That is not known when estimating size but only we did specialization decisions for all the callers and so it would need to be an extra thing stored along values. I think that since such calls should be picked up by inlining, it's not really worth the added data and complexity. But I could add it later, sure. gcc_checking_assert (size_cost = 0); /* FIXME: These decisions need tuning. */ if (max_count) { int evaluation, factor = (count_sum * 1000) / max_count; evaluation = (time_benefit * factor) /
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
Hi Tobias, As per your suggestion, I'm copying to gfotran and gcc-patches. I've made the change your asked and I'm going to commit the patch. Cheers, Daniel. On 07/14/2011 05:31 PM, Tobias Burnus wrote: On 07/14/2011 05:05 PM, Daniel Carrera wrote: +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, Fortran runtime error.); Could you replace . by : (colon, space), I think Fortran runtime error: Could not ... looks better than Fortran runtime error.Could not Otherwise, the patch is OK. Could you then send the patch as to fortran@ and gcc-patches@? Tobias PS: You could directly commit the modified patch, simply reply to this email, add a CC to fortran@/gcc-patches@ and attach the patch and changelog. There is no need that I approve the patch again. -- I'm not overweight, I'm undertall.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
And of course, here is the patch and ChangeLog. 2011-07-14 Daniel Carrera dcarr...@gmail.com * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function. Use exit(EXIT_FAILURE). (_gfortran_caf_register): Use caf_runtime_error. (_gfortran_caf_sync_images): Use exit(EXIT_FAILURE). * caf/mpi.c (caf_runtime_error): Remove error parameter. Return EXIT_FAILURE instead. (_gfortran_caf_register): Update call to caf_runtime_error. (_gfortran_caf_sync_all): Ditto. (_gfortran_caf_sync_images): Ditto. (_gfortran_caf_error_stop_str): Use exit(EXIT_FAILURE). Now I'm just waiting for SVN update before I commit... On 07/14/2011 05:34 PM, Daniel Carrera wrote: Hi Tobias, As per your suggestion, I'm copying to gfotran and gcc-patches. I've made the change your asked and I'm going to commit the patch. Cheers, Daniel. On 07/14/2011 05:31 PM, Tobias Burnus wrote: On 07/14/2011 05:05 PM, Daniel Carrera wrote: +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, Fortran runtime error.); Could you replace . by : (colon, space), I think Fortran runtime error: Could not ... looks better than Fortran runtime error.Could not Otherwise, the patch is OK. Could you then send the patch as to fortran@ and gcc-patches@? Tobias PS: You could directly commit the modified patch, simply reply to this email, add a CC to fortran@/gcc-patches@ and attach the patch and changelog. There is no need that I approve the patch again. -- I'm not overweight, I'm undertall. Index: libgfortran/caf/single.c === --- libgfortran/caf/single.c (revision 176230) +++ libgfortran/caf/single.c (working copy) @@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI #include stdio.h /* For fputs and fprintf. */ #include stdlib.h /* For exit and malloc. */ #include string.h /* For memcpy and memset. */ +#include stdarg.h /* For variadic arguments. */ /* Define GFC_CAF_CHECK to enable run-time checking. */ /* #define GFC_CAF_CHECK 1 */ @@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI caf_static_t *caf_static_list = NULL; +/* Keep in sync with mpi.c. */ +static void +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, Fortran runtime error: ); + va_start (ap, message); + fprintf (stderr, message, ap); + va_end (ap); + fprintf (stderr, \n); + + /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ + exit (EXIT_FAILURE); +} + void _gfortran_caf_init (int *argc __attribute__ ((unused)), char ***argv __attribute__ ((unused)), @@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, if (unlikely (local == NULL || token == NULL)) { + const char msg[] = Failed to allocate coarray; if (stat) { *stat = 1; if (errmsg_len 0) { - const char msg[] = Failed to allocate coarray; int len = ((int) sizeof (msg) errmsg_len) ? errmsg_len : (int) sizeof (msg); memcpy (errmsg, msg, len); @@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, return NULL; } else - { - fprintf (stderr, ERROR: Failed to allocate coarray); - exit (1); - } + caf_runtime_error (msg); } if (stat) @@ -140,7 +153,7 @@ _gfortran_caf_sync_images (int count __a { fprintf (stderr, COARRAY ERROR: Invalid image index %d to SYNC IMAGES, images[i]); - exit (1); + exit (EXIT_FAILURE); } #endif Index: libgfortran/caf/mpi.c === --- libgfortran/caf/mpi.c (revision 176230) +++ libgfortran/caf/mpi.c (working copy) @@ -47,8 +47,9 @@ static int caf_is_finalized; caf_static_t *caf_static_list = NULL; +/* Keep in sync with single.c. */ static void -caf_runtime_error (int error, const char *message, ...) +caf_runtime_error (const char *message, ...) { va_list ap; fprintf (stderr, Fortran runtime error on image %d: , caf_this_image); @@ -59,10 +60,10 @@ caf_runtime_error (int error, const char /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ /* FIXME: Do some more effort than just MPI_ABORT. */ - MPI_Abort (MPI_COMM_WORLD, error); + MPI_Abort (MPI_COMM_WORLD, EXIT_FAILURE); /* Should be unreachable, but to make sure also call exit. */ - exit (2); + exit (EXIT_FAILURE); } @@ -179,7 +180,7 @@ error: } } else - caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : 1, msg); + caf_runtime_error (msg); } return NULL; @@ -223,7 +224,7 @@ _gfortran_caf_sync_all (int *stat, char memset (errmsg[len], ' ', errmsg_len-len); } else - caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : ierr, msg); + caf_runtime_error (msg); } } @@ -291,7 +292,7 @@ _gfortran_caf_sync_images (int count, in memset (errmsg[len],
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote: On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther richard.guent...@gmail.com wrote: 2011/7/14 Michael Meissner meiss...@linux.vnet.ibm.com: One minor note, you will need to update doc/invoke.texi to document the new switch before checkin: -ftree-reassoc-width=n You also need approval and somebody to review the patch before checkin. Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. You need the target hook that tells how big the reassociation based on the type. Machines have different numbers of functional units, for example, maybe 3 integer units and 2 floating point units. For example, in the PowerPC, I would set the basic integer and binary floating point types to 2 to match the dispatch rules, but for decimal floating point, I would set it to one, since the machines currently only have 1 decimal unit. With the hook, I could see eliminating the switch and/or --param altogether, and doing only in the hook. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
You need the target hook that tells how big the reassociation based on the type. Machines have different numbers of functional units, for example, maybe 3 integer units and 2 floating point units. For example, in the PowerPC, I would set the basic integer and binary floating point types to 2 to match the dispatch rules, but for decimal floating point, I would set it to one, since the machines currently only have 1 decimal unit. With the hook, I could see eliminating the switch and/or --param altogether, and doing only in the hook. Hook introduced by this patch meets these requirements. But I think it is useful to have option to override the hook because you cannot tune it perfectly for each case. Ilya
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 03:03 AM, Bernd Schmidt wrote: +++ gcc/config/ia64/ia64.c(working copy) @@ -1047,7 +1047,7 @@ tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); - tmp = gen_rtx_LO_SUM (Pmode, dest, src); + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); And the bug stems from ... what? Is this bug still fixed if you change this to gen_const_mem? This is a load from the .got. It's constant memory, and it's always present. There's nowhere in the function that we cannot move this load (assuming the address is computed properly) where this load will fail. It's difficult to tell if your raw gen_rtx_MEM with no aliasing info doesn't just paper over a problem by preventing it from being moved. r~
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Thu, Jul 14, 2011 at 11:40 AM, Richard Guenther richard.guent...@gmail.com wrote: Generating RTL from GIMPLE passes just to be able to use rtx_cost is, well ... gross. Indeed. And it is one of the major blockers for fully separating the RTL, GIMPLE and target code off into separate modules. It would be great to get rid of RTL from IVOPTS too... Ciao! Steven
Re: [PATCH] New IPA-CP with real function cloning
if (dec cs-count) cs-count -= dec; else cs-count = 0; } if (dump_file) dump_profile_updates (orig_node, new_node); } if (node-local.can_change_signature) { args_to_skip = BITMAP_GGC_ALLOC (); for (i = 0; i count; i++) { tree t = VEC_index (tree, known_vals, i); if ((t TREE_CODE (t) != TREE_BINFO) || !ipa_is_param_used (info, i)) bitmap_set_bit (args_to_skip, i); } } else args_to_skip = NULL; When we can't change signature, still we can set is_parm_unused flag for the callee to aid later optimizers. I assume I can re-use the node-local.can_change_signature flag? Is that supposed to be set at any given place or can IPA-CP do it on its own? can_change_signature is currently used by i386 backend and it is set by inliner. I plan to move it to visibility pass at the time local functions are dentified. So yes, you can assume it is set and up to date at the time IPA-CP is run. Honza Rest of patch looks OK. It definitely reads better than previous ipa-cp.c ;) I suppose we will need to get some experience with the logic deciding whether to clone.. Thanks, I'll post the current version in a moment. Martin
Re: Fix PR48542: reload register contents reuse crossing setjmp
Jeff Law wrote: On 06/15/11 21:46, Hans-Peter Nilsson wrote: PR rtl-optimization/48542 * reload.c (find_equiv_reg): Stop looking when finding a setjmp-type call. * reload1.c (reload_as_needed): Invalidate all reload registers when crossing a setjmp-type call. OK. Jeff I see that this went already in, but I'm wondering why this change should be necessary. As far as register use is concerned, setjmp ought to behave just like a regular function: if a register is call-clobbered, reload will not attempt to use it across a function call (*any* function call) anyway; but if the register is call-saved, setjmp really ought to restore the old value, *both* times it returns, and so reuse ought to be allowed ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:03, Richard Henderson wrote: On 07/14/2011 03:03 AM, Bernd Schmidt wrote: +++ gcc/config/ia64/ia64.c (working copy) @@ -1047,7 +1047,7 @@ tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); - tmp = gen_rtx_LO_SUM (Pmode, dest, src); + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); And the bug stems from ... what? Is this bug still fixed if you change this to gen_const_mem? It should be. Testing this isn't straightforward bit tricky since the original bug is in gcc-3.3 which doesn't have gen_const_mem, and current mainline with just the scheduler patch removed doesn't reproduce it with the testcase. In mainline, gen_const_mem sets MEM_NOTRAP_P, but it looks like we handle that correctly in may_trap_p: if (/* MEM_NOTRAP_P only relates to the actual position of the memory reference; moving it out of context such as when moving code when optimizing, might cause its address to become invalid. */ code_changed || !MEM_NOTRAP_P (x)) and sched_deps uses rtx_addr_can_trap anyway. This is a load from the .got. Yes, but not using the fixed got pointer in r1, but a random other register which can have different values in the function. It's difficult to tell if your raw gen_rtx_MEM with no aliasing info doesn't just paper over a problem by preventing it from being moved. The problem isn't about memory aliasing, it's about the pointer register being clobbered. Bernd
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:19, Bernd Schmidt wrote: On 07/14/11 18:03, Richard Henderson wrote: On 07/14/2011 03:03 AM, Bernd Schmidt wrote: +++ gcc/config/ia64/ia64.c (working copy) @@ -1047,7 +1047,7 @@ tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); - tmp = gen_rtx_LO_SUM (Pmode, dest, src); + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); And the bug stems from ... what? Is this bug still fixed if you change this to gen_const_mem? It should be. Testing this isn't straightforward bit tricky since the original bug is in gcc-3.3 which doesn't have gen_const_mem, and current mainline with just the scheduler patch removed doesn't reproduce it with the testcase. Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P which doesn't exist in that tree) the load stays behind the branch where it should be. Bernd
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote: On 07/14/2011 05:41 AM, Jakub Jelinek wrote: Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. Why not just wait until then to set it? Because the same routine that sets it is used both to create the single compilation unit early and to create CUs late (for -feliminate-dwarf2-dups already during dwarf2out_finish). Jakub
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:23 AM, Bernd Schmidt wrote: Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P which doesn't exist in that tree) the load stays behind the branch where it should be. RTX_UNCHANGING_P was the bit back then, I believe. r~
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On 07/14/2011 12:29 PM, Jakub Jelinek wrote: On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote: On 07/14/2011 05:41 AM, Jakub Jelinek wrote: Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. Why not just wait until then to set it? Because the same routine that sets it is used both to create the single compilation unit early and to create CUs late (for -feliminate-dwarf2-dups already during dwarf2out_finish). Ah. OK, then. Jason
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:29, Richard Henderson wrote: On 07/14/2011 09:23 AM, Bernd Schmidt wrote: Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P which doesn't exist in that tree) the load stays behind the branch where it should be. RTX_UNCHANGING_P was the bit back then, I believe. Still ok with that also set: addl r14 = @ltoff(ap_standalone#), r1 ;; .mii ld8 r15 = [r14] addl r14 = @ltoff(.LC2), r1 ;; (p7) addl r14 = 1, r0 ;; .mib (p7) st4 [r15] = r14 nop.i 0 (p7) br.cond.dptk .L5 .mib ld8 r14 = [r14] And not ok if the MEM isn't exposed in RTL: addl r14 = @ltoff(ap_standalone#), r1 ;; .mii ld8 r15 = [r14] addl r14 = @ltoff(.LC2), r1 ;; (p7) addl r14 = 1, r0 ;; .mii (p7) st4 [r15] = r14 nop.i 0 nop.i 0 .mbb ld8 r14 = [r14] (p7) br.cond.dptk .L5 I'm attaching the 3.3 patch. Bernd Index: ../../branches/gcc-3_3-branch/gcc/sched-ebb.c === --- ../../branches/gcc-3_3-branch/gcc/sched-ebb.c (revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/sched-ebb.c (working copy) @@ -52,8 +52,7 @@ static int schedule_more_p PARAMS ((void static const char *ebb_print_insn PARAMS ((rtx, int)); static int rank PARAMS ((rtx, rtx)); static int contributes_to_priority PARAMS ((rtx, rtx)); -static void compute_jump_reg_dependencies PARAMS ((rtx, regset, regset, - regset)); +static void compute_jump_reg_dependencies PARAMS ((rtx, regset)); static void schedule_ebb PARAMS ((rtx, rtx)); /* Return nonzero if there are more insns that should be scheduled. */ @@ -161,30 +160,22 @@ contributes_to_priority (next, insn) return 1; } -/* INSN is a JUMP_INSN, COND_SET is the set of registers that are - conditionally set before INSN. Store the set of registers that - must be considered as used by this jump in USED and that of - registers that must be considered as set in SET. */ +/* INSN is a JUMP_INSN. Store the set of registers that must be considered + to be set by this jump in SET. */ static void -compute_jump_reg_dependencies (insn, cond_set, used, set) +compute_jump_reg_dependencies (insn, set) rtx insn; - regset cond_set, used, set; + regset set; { basic_block b = BLOCK_FOR_INSN (insn); edge e; for (e = b-succ; e; e = e-succ_next) -if (e-flags EDGE_FALLTHRU) - /* The jump may be a by-product of a branch that has been merged -in the main codepath after being conditionalized. Therefore -it may guard the fallthrough block from using a value that has -conditionally overwritten that of the main codepath. So we -consider that it restores the value of the main codepath. */ - bitmap_operation (set, e-dest-global_live_at_start, cond_set, - BITMAP_AND); -else - bitmap_operation (used, used, e-dest-global_live_at_start, - BITMAP_IOR); +if ((e-flags EDGE_FALLTHRU) == 0) + { + bitmap_operation (set, set, e-dest-global_live_at_start, + BITMAP_IOR); + } } /* Used in schedule_insns to initialize current_sched_info for scheduling Index: ../../branches/gcc-3_3-branch/gcc/emit-rtl.c === --- ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(working copy) @@ -192,6 +192,18 @@ static tree component_ref_for_mem_expr P static rtx gen_const_vector_0 PARAMS ((enum machine_mode)); static void copy_rtx_if_shared_1 PARAMS ((rtx *orig)); +/* Generate a memory referring to non-trapping constant memory. */ + +rtx +gen_const_mem (enum machine_mode mode, rtx addr) +{ + rtx mem = gen_rtx_MEM (mode, addr); + RTX_UNCHANGING_P (mem) = 1; + MEM_NOTRAP_P (mem) = 1; + return mem; +} + + /* Probability of the conditional branch currently proceeded by try_split. Set to -1 otherwise. */ int split_branch_probability = -1; Index: ../../branches/gcc-3_3-branch/gcc/sched-deps.c === --- ../../branches/gcc-3_3-branch/gcc/sched-deps.c (revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/sched-deps.c (working copy) @@ -981,17 +981,12 @@ sched_analyze_insn (deps, x, insn, loop_ else { rtx pending, pending_mem; - regset_head tmp_uses, tmp_sets; - INIT_REG_SET (tmp_uses); - INIT_REG_SET (tmp_sets); - - (*current_sched_info-compute_jump_reg_dependencies) - (insn, deps-reg_conditional_sets, tmp_uses, tmp_sets); - IOR_REG_SET (reg_pending_uses, tmp_uses); - IOR_REG_SET
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:19 AM, Bernd Schmidt wrote: Yes, but not using the fixed got pointer in r1, but a random other register which can have different values in the function. Oh, I think I see. So if this really had been a PLUS, as implied by the LO_SUM, we would have had garbage input, produced garbage output, but (eventually) ignored the result. But since this really is a load from memory, the garbage input is immediately fatal. Have I got that right? If so, the patch with the use of gen_const_mem is ok. It does raise the question of whether we ought to completely change the way we represent the pairing of LTOFFX/LDXMOV relocations. r~
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:39, Richard Henderson wrote: On 07/14/2011 09:19 AM, Bernd Schmidt wrote: Yes, but not using the fixed got pointer in r1, but a random other register which can have different values in the function. Oh, I think I see. So if this really had been a PLUS, as implied by the LO_SUM, we would have had garbage input, produced garbage output, but (eventually) ignored the result. But since this really is a load from memory, the garbage input is immediately fatal. Have I got that right? This is correct. If so, the patch with the use of gen_const_mem is ok. Will commit. (Although now I wonder if we could instead use one of the speculative load instructions? There's one that sets the NaT bit if the load would fault, isn't there? It's been so long I can't remember.) It does raise the question of whether we ought to completely change the way we represent the pairing of LTOFFX/LDXMOV relocations. This I can't answer since I don't know the definition of these. Bernd
gimplefe patch
Hi, This is the patch for gimple front end branch. I have written this code as a part of Google Summer of Code program. The patch consist of following changes: 1. The grammar rules for variable declaration are modified to include parameters. The code for the same is in gp_parse_var_decl function. 2. New function gp_parse_function_declaration is added to recognize function declaration. 3. Few changes in the function gp_parse_decl have been made to accomodate the above code. 4. The comments have been updated to reflect the changes in the code. 5. Changelog is updated. Please find the attached patch file. I hope you find everything as per the standard -- Ketaki Tiwatne file.diff Description: Binary data
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:43 AM, Bernd Schmidt wrote: (Although now I wonder if we could instead use one of the speculative load instructions? There's one that sets the NaT bit if the load would fault, isn't there? It's been so long I can't remember.) We could, but we also have to insert a check load to match. It gets very difficult to tell when it's worth it. Your example (p7) br.cond.dptk .L5 ld8 r15 = [r14] becomes ld8.sa r15 = [r14] // speculative advanced load (p7) br.cond.dptk .L5 ld8.c.clr r15 = [r14] // checked load (clear ALAT) Note that the speculative load can arbitrarily fail (e.g. tlb miss) and that the checked load can also arbitrarily re-issue the load. Note that one can't split ld8 r14 = [r14] without additional register allocation, because the address needs to remain live until the check. It does raise the question of whether we ought to completely change the way we represent the pairing of LTOFFX/LDXMOV relocations. This I can't answer since I don't know the definition of these. These are markers for linker relaxation. If the symbol is within range, addlx = @ltoffx(foo), gp ... ld8.mov y = [x], foo// .mov adds the LDXMOV reloc vs foo to ld8 insn will be relaxed to nop ... addly = @gprel(foo), gp The constraint in using the relocations is that every ldxmov insn must be fed by an ltoffx reloc with the same symbol, and that an ltoffx reloc cannot feed any other insn. That allows us, at link time, to not consider data flow, merely assert that if foo is relaxed anywhere, it's relaxed everywhere. r~
Remove NetWare support
I've got a preliminary NetWare removal patch ready (yet untested), but have a couple of questions: * Given that there's a considerable amount of NetWare support still in src, toplevel support has to stay. On the other hand, the reference in config/elf.m4 is only used for the LTO plugin and can go, I believe. * The i386 port has some code that seems to be NetWare-specific, namely KEEP_AGGREGATE_RETURN_POINTER, ix86_keep_aggregate_return_pointer and the callee_pop_aggregate_return attribute. I'm uncertain if all this can go now. * There are references to config/i386/netware.h in gcc/po/*.po. Do I need to do anything about this when netware.h is removed? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On Thu, Jul 14, 2011 at 18:40, Daniel Carrera dcarr...@gmail.com wrote: And of course, here is the patch and ChangeLog. 2011-07-14 Daniel Carrera dcarr...@gmail.com * caf/mpi.c (caf_runtime_error): Remove error parameter. Return EXIT_FAILURE instead. Well, this changelog entry is incorrect; you're calling exit(EXIT_FAILURE) and not returning a value. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. -- Janne Blomqvist
Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)
On Wed, Jul 13, 2011 at 22:39, Carrot Wei car...@google.com wrote: Hi Diego The previous patch was done with svn merge. This new version is done with svnmerge.py. Again tested with Great, thanks. This simplifies regular merges, since svnmerge.py will know that this rev was merged already. I think svnmerge.py also reads 'svn merge' markers, but I'm not sure. make check-g++ RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a dg.exp=anon-ns1.C make check-g++ RUNTESTFLAGS=dg.exp=anon-ns1.C BTW, there are some unexpected property changes after merge, I don't how did they come out and how should I deal with them? Don't worry about those. The branch used to merge from multiple sources that had different attributes for those files. It will be gone after the next merge I'm doing next week. The backport is OK, of course. Diego.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 09:48 PM, Janne Blomqvist wrote: 2011-07-14 Daniel Carreradcarr...@gmail.com * caf/mpi.c (caf_runtime_error): Remove error parameter. Return EXIT_FAILURE instead. Well, this changelog entry is incorrect; you're calling exit(EXIT_FAILURE) and not returning a value. Ok. What term should I have used? I was thinking that the program itself returns a value, but I can see that what I wrote was not optimal. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. Ok. I didn't want to mess with comments that I didn't fully understand. -- I'm not overweight, I'm undertall.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
Janne Blomqvist wrote: * caf/mpi.c (caf_runtime_error): Remove error parameter. Return EXIT_FAILURE instead. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. While I think it should be sufficient for single-processor usage, I am not sure that that way all I/O buffers gets written before one calls MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to I/O. In terms of I/O, there are three kinds of I/O, which might need to be treated differently: * I/O to stdout (OUTPUT_UNIT): Here, all output should be collected by MPI - I am not sure whether it will come to later in a destructor * I/O to (local) files * I/O via the communication library: Here, I see the greatest problems, but that's not in F2008's coarrays, but might well be added with the Technical Report. I somehow would feel better if I could ensure that the buffers are flushed and the files closed before I pull the MPI plug (MPI_Finalize, MPI_Abort). For reference, the comment Janne is referring to is the one at the bottom (comment 3) of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43849 Tobias
Re: [build] Move crtfastmath to toplevel libgcc
On Thu, Jul 14, 2011 at 12:09 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Andreas Schwab sch...@redhat.com writes: Same on ia64: Configuration mismatch! Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o Alpha needs the same fix. I need following patch to bootstrap the compiler: --cut here-- Index: gcc/config.gcc === --- gcc/config.gcc (revision 176282) +++ gcc/config.gcc (working copy) @@ -757,6 +757,7 @@ extra_options=${extra_options} alpha/elf.opt target_cpu_default=MASK_GAS tmake_file=${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux + extra_parts=$extra_parts crtfastmath.o ;; alpha*-*-freebsd*) tm_file=${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h --cut here-- Uros.
Re: [PATCH] New IPA-CP with real function cloning
Well, technically they survive until after inlining (because of indirect inlining which also derives information from the lattices corresponding to node-inlined_to node. Results of arithmetic functions are not going to be accessed during inlining when compiling any reasonable program but... Hmm, this sounds bad. We should move it to GTY then incrementally. I however though that indirect inlining looks only at jump functions, not at lattices? __attribute__ ((really_bad_attribute)) function (int param) { use param } and then let ipa-cp to invent param is a constant. what would be such a really_bad_attribute ? Well, we need to go through the attribute list and prepare list of bad guys instead of forbidding any attribute. Obviously we should not give up on pure attribute, for example. Another class of attributes are those referring to function arguments that needs updating if they are still in use after clonning. I think this was original reason for adding the check. I am not sure if we have attributes that should prevent clonning completely. can_change_sigunature will also handle apply_args. Add VA_START code there, too. For the other use of this flag (in i386) VA_START The last one already is VA_START... or do you mean a different one? I meant the code in ipa-inline-analysis.c doing the same checks but skiiping va_start since i386 backend tests it by itself. BTW currently the edges from thunks miss any profile info. (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate those (it is difficult ot measure count of an alias). I'm not really looking at the edges from thunks to the actual function. OTOH, I assume that edges to a thunk do have a count and look at that. They do have (unless they are thunk to thunk edges), but in any case we ought to regularize things here, sooner or later someone will get confused with counts missing in the callgraph. If you walk only hot edges, then you need to make your function descent into both alias refs and thunks calls, or the aliases of thunks will not be seen then. Well, looking at bits of the patch again now, aliases to thunks might indeed be a problem for a few pieces of it. I'll send the patch nevertheless and ponder about this problem later. Hmm, please do :) I will look at the updated patch. Honza Thanks, Martin
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On Thu, Jul 14, 2011 at 23:34, Tobias Burnus bur...@net-b.de wrote: Janne Blomqvist wrote: * caf/mpi.c (caf_runtime_error): Remove error parameter. Return EXIT_FAILURE instead. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. While I think it should be sufficient for single-processor usage, I am not sure that that way all I/O buffers gets written before one calls MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to I/O. Ah, I read that comment from caf/single.c, but now I see it in the caf/mpi.c part of the patch as well. Yeah, in that case it might matter, depending on how the MPI library implements MPI_Abort(). That is, does it call exit() (in which case the destructor will be called) or will it call abort() or raise some other fatal signal. At least Open MPI seems to implement it by sending SIGTERM http://www.open-mpi.org/doc/v1.4/man3/MPI_Abort.3.php Come to think of it, in this case, if you want to use MPI_Abort() rather than MPI_Finalize() + exit(), you should probably reset the fatal signal handlers to the default one unless you want the backtrace to be printed. -- Janne Blomqvist
[patch] Fix PR target/48220 for the SPARC
Hi, this adds support for DW_OP_GNU_entry_value/DW_TAG_GNU_call_site_parameter on SPARC-like architectures (architectures with register windows and explicit window save instruction). The transformation OUTGOING_REGNO - INCOMING_REGNO is explicit for them and not tied to the call-to-subroutine instruction, so this needs to be modelled if you want precise variable tracking. Tested on SPARC/Solaris (both GCC and GDB) and x86/Linux. OK for mainline? 2011-07-14 Eric Botcazou ebotca...@adacore.com PR target/48220 * doc/md.texi (Standard Names): Document window_save. * cfgexpand.c (expand_debug_parm_decl): New function extracted from expand_debug_expr and expand_debug_source_expr. If the target has a window_save instruction, adjust the ENTRY_VALUE_EXP. (expand_debug_expr) SSA_NAME: Call expand_debug_parm_decl if the SSA_NAME_VAR is a parameter. (expand_debug_source_expr) PARM_DECL: Call expand_debug_parm_decl. * var-tracking.c (parm_reg_t): New type and associated vector type. (windowed_parm_regs): New variable. (adjust_insn): If the target has a window_save instruction and this is the instruction, make its effect on parameter registers explicit. (next_non_note_insn_var_location): New function. (emit_notes_in_bb): Use it instead of NEXT_INSN throughout. (vt_add_function_parameter): If the target has a window_save insn, adjust the incoming RTL and record that in windowed_parm_regs. (vt_finalize): Free windowed_parm_regs. -- Eric Botcazou Index: cfgexpand.c === --- cfgexpand.c (revision 176072) +++ cfgexpand.c (working copy) @@ -2358,8 +2358,60 @@ convert_debug_memory_address (enum machi return x; } -/* Return an RTX equivalent to the value of the tree expression - EXP. */ +/* Return an RTX equivalent to the value of the parameter DECL. */ + +static rtx +expand_debug_parm_decl (tree decl) +{ + rtx incoming = DECL_INCOMING_RTL (decl); + + if (incoming + GET_MODE (incoming) != BLKmode + ((REG_P (incoming) HARD_REGISTER_P (incoming)) + || (MEM_P (incoming) + REG_P (XEXP (incoming, 0)) + HARD_REGISTER_P (XEXP (incoming, 0) +{ + rtx rtl = gen_rtx_ENTRY_VALUE (GET_MODE (incoming)); + +#ifdef HAVE_window_save + /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers. + If the target machine has an explicit window save instruction, the + actual entry value is the corresponding OUTGOING_REGNO instead. */ + if (REG_P (incoming) + OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) + incoming + = gen_rtx_REG_offset (incoming, GET_MODE (incoming), +OUTGOING_REGNO (REGNO (incoming)), 0); + else if (MEM_P (incoming)) + { + rtx reg = XEXP (incoming, 0); + if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + { + reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); + incoming = replace_equiv_address_nv (incoming, reg); + } + } +#endif + + ENTRY_VALUE_EXP (rtl) = incoming; + return rtl; +} + + if (incoming + GET_MODE (incoming) != BLKmode + !TREE_ADDRESSABLE (decl) + MEM_P (incoming) + (XEXP (incoming, 0) == virtual_incoming_args_rtx + || (GET_CODE (XEXP (incoming, 0)) == PLUS + XEXP (XEXP (incoming, 0), 0) == virtual_incoming_args_rtx + CONST_INT_P (XEXP (XEXP (incoming, 0), 1) +return incoming; + + return NULL_RTX; +} + +/* Return an RTX equivalent to the value of the tree expression EXP. */ static rtx expand_debug_expr (tree exp) @@ -3169,36 +3221,12 @@ expand_debug_expr (tree exp) if (SSA_NAME_IS_DEFAULT_DEF (exp) TREE_CODE (SSA_NAME_VAR (exp)) == PARM_DECL) { - rtx incoming = DECL_INCOMING_RTL (SSA_NAME_VAR (exp)); - if (incoming - GET_MODE (incoming) != BLKmode - ((REG_P (incoming) HARD_REGISTER_P (incoming)) - || (MEM_P (incoming) - REG_P (XEXP (incoming, 0)) - HARD_REGISTER_P (XEXP (incoming, 0) - { - op0 = gen_rtx_ENTRY_VALUE (GET_MODE (incoming)); - ENTRY_VALUE_EXP (op0) = incoming; - goto adjust_mode; - } - if (incoming - MEM_P (incoming) - !TREE_ADDRESSABLE (SSA_NAME_VAR (exp)) - GET_MODE (incoming) != BLKmode - (XEXP (incoming, 0) == virtual_incoming_args_rtx - || (GET_CODE (XEXP (incoming, 0)) == PLUS - XEXP (XEXP (incoming, 0), 0) - == virtual_incoming_args_rtx - CONST_INT_P (XEXP (XEXP (incoming, 0), - 1) - { - op0 = incoming; - goto adjust_mode; - } + op0 = expand_debug_parm_decl (SSA_NAME_VAR (exp)); + if (op0) + goto adjust_mode; op0 = expand_debug_expr (SSA_NAME_VAR (exp)); - if (!op0) - return NULL; - goto adjust_mode; + if (op0) + goto adjust_mode; } return NULL; } @@ -3327,36 +3355,14 @@
[4.6] Fix PR tree-optimization/49725
Hi, this is the ICE at -O2 on ACATS c34005a introduced on the 4.6 branch by Martin's latest SRA patch. But it's actually the same PRE issue as: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02210.html Bootstrapped/regtested on i586-suse-linux, OK for 4.6 branch? 2011-07-14 Eric Botcazou ebotca...@adacore.com PR tree-optimization/49725 Backport from mainline 2011-03-31 Eric Botcazou ebotca...@adacore.com * tree-ssa-pre.c (create_component_ref_by_pieces_1) ARRAY_REF: Drop a zero minimum index only if it is redundant -- Eric Botcazou Index: tree-ssa-pre.c === --- tree-ssa-pre.c (revision 176264) +++ tree-ssa-pre.c (working copy) @@ -2874,8 +2874,11 @@ create_component_ref_by_pieces_1 (basic_ return NULL_TREE; if (genop2) { - /* Drop zero minimum index. */ - if (tree_int_cst_equal (genop2, integer_zero_node)) + tree domain_type = TYPE_DOMAIN (TREE_TYPE (genop0)); + /* Drop zero minimum index if redundant. */ + if (integer_zerop (genop2) + (!domain_type + || integer_zerop (TYPE_MIN_VALUE (domain_type genop2 = NULL_TREE; else {
[patch] Fix PR middle-end/49732
Hi, this is a regression present on mainline and 4.6 branch. The compiler crashes during gimplification because there is a COMPOUND_EXPR shared between the TYPE_SIZE and TYPE_SIZE_UNIT expressions of an array type. Now this isn't supposed to happen because we run an unsharing pass before gimplification. The problem here is that we have a forward declaration (DECL_EXPR) of a pointer type to the array type. So, during the marking phase of the unsharing pass, the array type gets marked as visited through the pointer, which prevents it from being walked during the same phase when its own DECL_EXPR is processed. This pointer/pointed-to type business is an old pattern. Five years ago, Olivier changed gimplify_type_sizes like so: 2006-10-06 Olivier Hainque hain...@adacore.com * gimplify.c (gimplify_type_sizes) [POINTER_TYPE, REFERENCE_TYPE]: Don't recurse on the pointed-to type. because of a related problem. It turns out that we need the same change in the DECL_EXPR case of walk_tree_1 to fix the bug at hand, which is sort of logical as there is a strong relationship between them: case DECL_EXPR: /* If this is a TYPE_DECL, walk into the fields of the type that it's defining. We only want to walk into these fields of a type in this case and not in the general case of a mere reference to the type. The criterion is as follows: if the field can be an expression, it must be walked only here. This should be in keeping with the fields that are directly gimplified in gimplify_type_sizes in order for the mark/copy-if-shared/unmark machinery of the gimplifier to work with variable-sized types. Note that DECLs get walked as part of processing the BIND_EXPR. */ Tested on x86_64-suse-linux, OK for mainline and 4.6 branch? 2011-07-14 Eric Botcazou ebotca...@adacore.com PR middle-end/49732 * tree.c (walk_tree_1) DECL_EXPR: Do not walk a pointed-to type. 2011-07-14 Eric Botcazou ebotca...@adacore.com * gnat.dg/pointer_controlled.adb: New test. -- Eric Botcazou Index: tree.c === --- tree.c (revision 176261) +++ tree.c (working copy) @@ -10596,9 +10596,14 @@ walk_tree_1 (tree *tp, walk_tree_fn func if (result || !walk_subtrees) return result; - result = walk_type_fields (*type_p, func, data, pset, lh); - if (result) - return result; + /* But do not walk a pointed-to type since it may itself need to + be walked in the declaration case if it isn't anonymous. */ + if (!POINTER_TYPE_P (*type_p)) + { + result = walk_type_fields (*type_p, func, data, pset, lh); + if (result) + return result; + } /* If this is a record type, also walk the fields. */ if (RECORD_OR_UNION_TYPE_P (*type_p)) -- PR ada/49732 -- Testcase by Vorfeed Canal -- { dg-do compile } -- { dg-options -gnato } with Interfaces.C; use Interfaces.C; with Interfaces.C.Strings; use Interfaces.C.Strings; with Interfaces.C.Pointers; procedure Pointer_Controlled is function Create (Name : String) return size_t is type Name_String is new char_array (0 .. Name'Length); type Name_String_Ptr is access Name_String; pragma Controlled (Name_String_Ptr); Name_Str : constant Name_String_Ptr := new Name_String; Name_Len : size_t; begin To_C (Name, Name_Str.all, Name_Len); return 1; end; Test : size_t; begin Test := Create(ABC); end;
Re: [PATCH] New IPA-CP with real function cloning - updated version
2011-07-14 Martin Jambor mjam...@suse.cz * ipa-prop.h: Include alloc-pool.h, all sorts of updates to general comments. (ipcp_values_pool): Declare. (ipcp_sources_pool): Likewise. (ipcp_lattice): Changed to forward declaration. (ipa_param_descriptor): Removed fields ipcp_lattice, types and cannot_devirtualize. (ipa_node_params): New fields descriptors, lattices, known_vals, clone_for_all_contexts and node dead, removed fields params and count_scale. (ipa_set_param_count): Removed. (ipa_get_param_count): Made to work with descriptors vector. (ipa_get_param): Updated. (ipa_param_cannot_devirtualize_p): Removed. (ipa_param_types_vec_empty): Likewise. (ipa_set_param_used): New function. (ipa_get_param_used): Updated to use descriptors vector. (ipa_func_list): Removed. (ipa_init_func_list): Removed declaration. (ipa_push_func_to_list_1): Likewise. (ipa_pop_func_from_list): Likewise. (ipa_push_func_to_list): Removed. (ipa_lattice_from_jfunc): Remove declaration. (ipa_get_jf_pass_through_result): Declare. (ipa_get_jf_ancestor_result): Likewise. (ipa_value_from_jfunc): Likewise. (ipa_get_lattice): Update. (ipa_lat_is_single_const): New function. * ipa-prop.c (ipa_push_func_to_list_1): Removed. (ipa_init_func_list): Likewise. (ipa_pop_func_from_list): Likewise. (ipa_get_param_decl_index): Fix coding style. (count_formal_params): Removed. (count_formal_params_1): Renamed to count_formal_params. (ipa_populate_param_decls): Update to use descriptors vector. (ipa_initialize_node_params): Likewise. (visit_ref_for_mod_analysis): Use ipa_set_param_used. (ipa_analyze_params_uses): Likewise. (ipa_free_node_params_substructures): Likewise and free also lattices and known values. (duplicate_array): Removed. (ipa_edge_duplication_hook): Add the new edge to the list of edge clones. (ipa_node_duplication_hook): Update to use new lattices. (ipa_free_all_structures_after_ipa_cp): Free alloc pools. (ipa_free_all_structures_after_iinln): Likewise. (ipa_write_node_info): Update to use new lattices. (ipa_read_node_info): Likewise. (ipa_get_jf_pass_through_result): New function. (ipa_get_jf_ancestor_result): Likewise. (ipa_value_from_jfunc): Likewise. (ipa_cst_from_jfunc): Reimplemented using ipa_value_from_jfunc. * ipa-cp.c: Reimplemented. * params.def (PARAM_DEVIRT_TYPE_LIST_SIZE): Removed. (PARAM_IPA_CP_VALUE_LIST_SIZE): New parameter. (PARAM_IPA_CP_EVAL_THRESHOLD): Likewise. * Makefile.in (IPA_PROP_H): Added alloc-pool.h to dependencies. * doc/invoke.texi (devirt-type-list-size): Removed description. (ipa-cp-value-list-size): Added description. * testsuite/gcc.dg/ipa/ipa-1.c: Updated testcase dump scan. * testsuite/gcc.dg/ipa/ipa-2.c: Likewise. * testsuite/gcc.dg/ipa/ipa-3.c: Likewise and made functions static. * testsuite/gcc.dg/ipa/ipa-4.c: Updated testcase dump scan. * testsuite/gcc.dg/ipa/ipa-5.c: Likewise. * testsuite/gcc.dg/ipa/ipa-7.c: Likewise. * testsuite/gcc.dg/ipa/ipa-8.c: Updated testcase dump scan. * testsuite/gcc.dg/ipa/ipacost-1.c: Likewise. * testsuite/gcc.dg/ipa/ipacost-2.c: Likewise and increased sizes of some functions. * testsuite/gcc.dg/ipa/ipcp-1.c: New test. * testsuite/gcc.dg/ipa/ipcp-2.c: Likewise. * testsuite/gcc.dg/tree-ssa/ipa-cp-1.c: Updated testcase. /* ipa_node_params stores information related to formal parameters of functions and some other information for interprocedural passes that operate on parameters (such as ipa-cp). */ struct ipa_node_params { /* Information about individual formal parameters that are gathered when summaries are generated. */ VEC (ipa_param_descriptor_t, heap) *descriptors; /* Pointer to an array of structures describing individual formal parameters. */ struct ipcp_lattice *lattices; Hmm, I was under impression that the plan was to break away the stuff used by ipa-cp internally during the propagatoin stage (i.e. ipcp_orig_node/known_vals and probably lattices from the stuff used to represent parameters and jump functions, i.e. descriptors. But this can be handled incrementally. /* FIXME: Can we clobber only the first argument of thunks? */ Well, we should know how to propagate through it. But it is not too important side case I guess untill we can devirtualize to them effectively. if (node-alias || node-thunk.thunk_p || ipa_is_called_with_var_arguments (info)) disable = true; The patch is OK, thanks! It would be nice to add a testcase that the profile updating is done correctly. Honza
[pph] Add symbol table - Fix remaining asm differences (issue4732043)
This patch fixes the remaining assembly differences between non-pph and pph compiles. The idea is to make the pph compiler process functions and variables in the same order that they had been processed in the original compile. To do this, we intercept calls to rest_of_decl_compilation and allocate_struct_function. At every call, we add the declaration to a list. This list becomes the symbol table for the header file, which is then written to the end of the file. When reading the file, we read this table and present the symbols to the middle end in that order. The symbol table is written in pph_out_symtab and read in pph_in_symtab. The other changes are mostly wrapping calls to rest_of_decl_compilation so that we can add the declarations to the symbol table. This simplifies the logic we use to present symbols to the middle end as it encodes the sequence in the symbol table itself. No need to reverse lists of symbols or the other contortions we used to make. Tested on x86_64. Applied to branch. Diego. cp/ChangeLog.pph 2011-07-14 Diego Novillo dnovi...@google.com * Make-lang.in (cp/decl.o): Add dependency on CXX_PPH_STREAMER_H. (cp/pph-streamer-out.o): Add dependency on CGRAPH_H. (cp/pph-streamer-in.o): Add dependency on toplev.h * cp-tree.h (cp_rest_of_decl_compilation): Declare. * call.c (set_up_extended_ref_temp): Call cp_rest_of_decl_compilation instead of rest_of_decl_compilation. * class.c (build_clone): Likewise. * decl2.c (finish_anon_union): Likewise. (maybe_emit_vtables): Likewise. (write_out_vars): Likewise. * method.c (implicitly_declare_fn): Likewise. * semantics.c (maybe_add_lambda_conv_op): Likewise. * decl.c (make_rtl_for_nonlocal_decl): Likewise. (cp_finish_decl): Likewise. (start_preparsed_function): Likewise. (cp_rest_of_decl_compilation): New. * pph-streamer-in.c: Include toplev.h (pph_in_language_function): Tidy. (pph_in_struct_function): Remove parameter DECL. Support reading shared struct functions. Read FN-DECL. (pph_register_decl_in_symtab): Remove. Update all users. (pph_register_binding_in_symtab): Remove. Update all users. (pph_in_symtab_marker): New. (pph_in_symtab): New. (pph_read_file_contents): Call it. (pph_in_function_decl): Do not call pph_in_struct_function. * pph-streamer-out.c: Include cgraph.h. (decls_to_register): New. (pph_out_chain_filtered): Remove argument REVERSE_P. Update all users. (pph_out_struct_function): Write FN-DECL. (pph_out_symtab_marker): New. (pph_out_symtab): New. (pph_write_file_contents): Call it. (pph_out_function_decl): Do not call pph_out_struct_function. (pph_add_decl_to_register): New. * pph-streamer.h (enum pph_symtab_marker): New. (struct pph_stream): Remove field FNS_TO_EXPAND. Update all users. (pph_add_decl_to_register): Declare. testsuite/ChangeLog.pph * g++.dg/pph/c1pr44948-1a.cc: Adjust expected failure. * g++.dg/pph/x0hardlookup.h: Add new symbol in global namespace. * g++.dg/pph/c4meteor-contest.cc: Mark fixed. * g++.dg/pph/c1attr-warn-unused.cc: Likewise. * g++.dg/pph/x1globalref.cc: Likewise. * g++.dg/pph/x1hardlookup.cc: Likewise. diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index 5f3249e..9634c47 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -271,7 +271,7 @@ cp/decl.o: cp/decl.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \ cp/operators.def $(TM_P_H) $(TREE_INLINE_H) $(DIAGNOSTIC_H) $(C_PRAGMA_H) \ debug.h gt-cp-decl.h $(TIMEVAR_H) $(TARGET_H) $(PLUGIN_H) \ intl.h tree-iterator.h pointer-set.h $(SPLAY_TREE_H) \ - c-family/c-objc.h + c-family/c-objc.h $(CXX_PPH_STREAMER_H) cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \ output.h toplev.h $(C_COMMON_H) gt-cp-decl2.h $(CGRAPH_H) \ $(C_PRAGMA_H) $(TREE_DUMP_H) intl.h $(TARGET_H) $(GIMPLE_H) pointer-set.h \ @@ -351,8 +351,8 @@ cp/pph-streamer.o: cp/pph-streamer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ cp/pph-streamer-out.o: cp/pph-streamer-out.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ - cppbuiltin.h tree-iterator.h + cppbuiltin.h tree-iterator.h $(CGRAPH_H) cp/pph-streamer-in.o: cp/pph-streamer-in.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ - cppbuiltin.h tree-iterator.h + cppbuiltin.h tree-iterator.h toplev.h diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 56f3408..6988dc7 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8567,7 +8567,7 @@
[PATCH 3/9] dwarf2cfi: Populate CUR_ROW-REG_SAVE.
To be actually used by a subsequent patch. --- gcc/dwarf2cfi.c | 27 +-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index eb59f28..36fa7f8 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -285,6 +285,17 @@ add_cfi (dw_cfi_ref cfi) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } +/* Perform ROW-REG_SAVE[COLUMN] = CFI. CFI may be null, indicating + that the register column is no longer saved. */ + +static void +update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi) +{ + if (VEC_length (dw_cfi_ref, row-reg_save) = column) +VEC_safe_grow_cleared (dw_cfi_ref, gc, row-reg_save, column + 1); + VEC_replace (dw_cfi_ref, row-reg_save, column, cfi); +} + /* This function fills in aa dw_cfa_location structure from a dwarf location descriptor sequence. */ @@ -574,7 +585,13 @@ reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset) cfi-dw_cfi_oprnd2.dw_cfi_offset = offset; } else if (sreg == reg) -cfi-dw_cfi_opc = DW_CFA_same_value; +{ + /* While we could emit something like DW_CFA_same_value or +DW_CFA_restore, we never expect to see something like that +in a prologue. This is more likely to be a bug. A backend +can always bypass this by using REG_CFA_RESTORE directly. */ + gcc_unreachable (); +} else { cfi-dw_cfi_opc = DW_CFA_register; @@ -582,6 +599,7 @@ reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset) } add_cfi (cfi); + update_row_reg_save (cur_row, reg, cfi); } /* Given a SET, calculate the amount of stack adjustment it @@ -1337,6 +1355,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set) { rtx src, dest, span; dw_cfi_ref cfi = new_cfi (); + unsigned regno; dest = SET_DEST (set); src = SET_SRC (set); @@ -1347,8 +1366,10 @@ dwarf2out_frame_debug_cfa_expression (rtx set) span = targetm.dwarf_register_span (src); gcc_assert (!span); + regno = dwf_regno (src); + cfi-dw_cfi_opc = DW_CFA_expression; - cfi-dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (src); + cfi-dw_cfi_oprnd1.dw_cfi_reg_num = regno; cfi-dw_cfi_oprnd2.dw_cfi_loc = mem_loc_descriptor (XEXP (dest, 0), get_address_mode (dest), GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); @@ -1356,6 +1377,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set) /* ??? We'd like to use queue_reg_save, were the interface different, and, as above, we could manage flushing for epilogues. */ add_cfi (cfi); + update_row_reg_save (cur_row, regno, cfi); } /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ @@ -1370,6 +1392,7 @@ dwarf2out_frame_debug_cfa_restore (rtx reg) cfi-dw_cfi_oprnd1.dw_cfi_reg_num = regno; add_cfi (cfi); + update_row_reg_save (cur_row, regno, NULL); } /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE. -- 1.7.6
[PATCH 2/9] dwarf2cfi: Rename cfi_insn to add_cfi_insn.
Make it consistent with add_cfi_vec. --- gcc/dwarf2cfi.c | 20 +++- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 1c1b74f..eb59f28 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -86,7 +86,7 @@ static GTY(()) dw_cfi_row_ref cie_cfi_row; static GTY(()) unsigned long dwarf2out_cfi_label_num; /* The insn after which a new CFI note should be emitted. */ -static rtx cfi_insn; +static rtx add_cfi_insn; /* When non-null, add_cfi will add the CFI to this vector. */ static cfi_vec *add_cfi_vec; @@ -274,11 +274,13 @@ add_cfi (dw_cfi_ref cfi) } any_cfis_emitted = true; - if (cfi_insn != NULL) + + if (add_cfi_insn != NULL) { - cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn); - NOTE_CFI (cfi_insn) = cfi; + add_cfi_insn = emit_note_after (NOTE_INSN_CFI, add_cfi_insn); + NOTE_CFI (add_cfi_insn) = cfi; } + if (add_cfi_vec != NULL) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } @@ -2319,7 +2321,7 @@ create_cfi_notes (void) { rtx pat; - cfi_insn = PREV_INSN (insn); + add_cfi_insn = PREV_INSN (insn); if (BARRIER_P (insn)) { @@ -2342,7 +2344,7 @@ create_cfi_notes (void) break; case NOTE_INSN_CFA_RESTORE_STATE: - cfi_insn = insn; + add_cfi_insn = insn; dwarf2out_frame_debug_restore_state (); break; } @@ -2372,13 +2374,13 @@ create_cfi_notes (void) /* Do not separate tablejump insns from their ADDR_DIFF_VEC. Putting the note after the VEC should be ok. */ - if (!tablejump_p (insn, NULL, cfi_insn)) - cfi_insn = insn; + if (!tablejump_p (insn, NULL, add_cfi_insn)) + add_cfi_insn = insn; dwarf2out_frame_debug (insn, true); } - cfi_insn = NULL; + add_cfi_insn = NULL; } /* Determine if we need to save and restore CFI information around this -- 1.7.6
[PATCH 7/9] dwarf2cfi: Allocate reg_saved_in_data in the heap.
--- gcc/dwarf2cfi.c | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index a800cb4..1d6413f 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1109,13 +1109,13 @@ typedef struct GTY(()) reg_saved_in_data { } reg_saved_in_data; DEF_VEC_O (reg_saved_in_data); -DEF_VEC_ALLOC_O (reg_saved_in_data, gc); +DEF_VEC_ALLOC_O (reg_saved_in_data, heap); /* A set of registers saved in other registers. This is implemented as a flat array because it normally contains zero or 1 entry, depending on the target. IA-64 is the big spender here, using a maximum of 5 entries. */ -static GTY(()) VEC(reg_saved_in_data, gc) *regs_saved_in_regs; +static VEC(reg_saved_in_data, heap) *regs_saved_in_regs; static GTY(()) reg_saved_in_data *cie_return_save; @@ -1161,7 +1161,7 @@ record_reg_saved_in_reg (rtx dest, rtx src) if (dest == NULL) return; - elt = VEC_safe_push(reg_saved_in_data, gc, regs_saved_in_regs, NULL); + elt = VEC_safe_push(reg_saved_in_data, heap, regs_saved_in_regs, NULL); elt-orig_reg = src; elt-saved_in_reg = dest; } @@ -2699,6 +2699,9 @@ initial_return_save (rtx rtl) static unsigned int execute_dwarf2_frame (void) { + gcc_checking_assert (queued_reg_saves == NULL); + gcc_checking_assert (regs_saved_in_regs == NULL); + /* The first time we're called, compute the incoming frame state. */ if (cie_cfi_vec == NULL) { @@ -2737,7 +2740,7 @@ execute_dwarf2_frame (void) cie_return_save = ggc_alloc_reg_saved_in_data (); *cie_return_save = *VEC_index (reg_saved_in_data, regs_saved_in_regs, 0); - regs_saved_in_regs = NULL; + VEC_pop (reg_saved_in_data, regs_saved_in_regs); break; default: gcc_unreachable (); @@ -2748,12 +2751,10 @@ execute_dwarf2_frame (void) } /* Set up state for generating call frame debug info. */ - gcc_checking_assert (queued_reg_saves == NULL); - gcc_checking_assert (regs_saved_in_regs == NULL); - cur_row = copy_cfi_row (cie_cfi_row); if (cie_return_save) -VEC_safe_push (reg_saved_in_data, gc, regs_saved_in_regs, cie_return_save); +VEC_safe_push (reg_saved_in_data, heap, + regs_saved_in_regs, cie_return_save); cfa_store = cur_row-cfa; args_size = 0; @@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void) /* Reset all function-specific information, particularly for GC. */ XDELETEVEC (barrier_args_size); barrier_args_size = NULL; - regs_saved_in_regs = NULL; + VEC_free (reg_saved_in_data, heap, regs_saved_in_regs); VEC_free (queued_reg_save, heap, queued_reg_saves); free_cfi_row (cur_row); -- 1.7.6
[PATCH 6/9] dwarf2cfi: Convert queued_reg_save to a VEC.
Also, allocate it in the heap instead of garbage collected. --- gcc/dwarf2cfi.c | 51 ++- 1 files changed, 26 insertions(+), 25 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 51fb824..a800cb4 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1091,14 +1091,16 @@ dwarf2out_notice_stack_adjust (rtx insn, bool after_p) of the prologue or (b) the register is clobbered. This clusters register saves so that there are fewer pc advances. */ -struct GTY(()) queued_reg_save { - struct queued_reg_save *next; +typedef struct { rtx reg; - HOST_WIDE_INT cfa_offset; rtx saved_reg; -}; + HOST_WIDE_INT cfa_offset; +} queued_reg_save; -static GTY(()) struct queued_reg_save *queued_reg_saves; +DEF_VEC_O (queued_reg_save); +DEF_VEC_ALLOC_O (queued_reg_save, heap); + +static VEC(queued_reg_save, heap) *queued_reg_saves; /* The caller's ORIG_REG is saved in SAVED_IN_REG. */ typedef struct GTY(()) reg_saved_in_data { @@ -1170,24 +1172,21 @@ record_reg_saved_in_reg (rtx dest, rtx src) static void queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t i; /* Duplicates waste space, but it's also necessary to remove them for correctness, since the queue gets output in reverse order. */ - for (q = queued_reg_saves; q != NULL; q = q-next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) if (compare_reg_or_pc (q-reg, reg)) - break; + goto found; - if (q == NULL) -{ - q = ggc_alloc_queued_reg_save (); - q-next = queued_reg_saves; - queued_reg_saves = q; -} + q = VEC_safe_push (queued_reg_save, heap, queued_reg_saves, NULL); + found: q-reg = reg; - q-cfa_offset = offset; q-saved_reg = sreg; + q-cfa_offset = offset; } /* Output all the entries in QUEUED_REG_SAVES. */ @@ -1195,9 +1194,10 @@ queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset) static void dwarf2out_flush_queued_reg_saves (void) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t i; - for (q = queued_reg_saves; q; q = q-next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) { unsigned int reg, sreg; @@ -1214,7 +1214,7 @@ dwarf2out_flush_queued_reg_saves (void) reg_save (reg, sreg, q-cfa_offset); } - queued_reg_saves = NULL; + VEC_truncate (queued_reg_save, queued_reg_saves, 0); } /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved @@ -1225,17 +1225,18 @@ dwarf2out_flush_queued_reg_saves (void) static bool clobbers_queued_reg_save (const_rtx insn) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t iq; - for (q = queued_reg_saves; q; q = q-next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, iq, q) { - size_t i; + size_t ir; reg_saved_in_data *rir; if (modified_in_p (q-reg, insn)) return true; - FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, i, rir) + FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, ir, rir) if (compare_reg_or_pc (q-reg, rir-orig_reg) modified_in_p (rir-saved_in_reg, insn)) return true; @@ -1250,11 +1251,11 @@ static rtx reg_saved_in (rtx reg) { unsigned int regn = REGNO (reg); - struct queued_reg_save *q; + queued_reg_save *q; reg_saved_in_data *rir; size_t i; - for (q = queued_reg_saves; q; q = q-next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) if (q-saved_reg regn == REGNO (q-saved_reg)) return q-reg; @@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void) XDELETEVEC (barrier_args_size); barrier_args_size = NULL; regs_saved_in_regs = NULL; - queued_reg_saves = NULL; + VEC_free (queued_reg_save, heap, queued_reg_saves); free_cfi_row (cur_row); cur_row = NULL; -- 1.7.6
[PATCH 4/9] dwarf2cfi: Implement change_cfi_row.
Add a generic function to adjust cfi state from one row to another. Use this to implement text section switching. This will also be usable for arbitrary changes around a cfg for shrink-wrapping. --- gcc/dwarf2cfi.c | 376 +-- gcc/dwarf2out.c | 159 ++-- gcc/dwarf2out.h |3 +- 3 files changed, 290 insertions(+), 248 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 36fa7f8..745e137 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -285,6 +285,28 @@ add_cfi (dw_cfi_ref cfi) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } +static void +add_cfi_args_size (HOST_WIDE_INT size) +{ + dw_cfi_ref cfi = new_cfi (); + + cfi-dw_cfi_opc = DW_CFA_GNU_args_size; + cfi-dw_cfi_oprnd1.dw_cfi_offset = size; + + add_cfi (cfi); +} + +static void +add_cfi_restore (unsigned reg) +{ + dw_cfi_ref cfi = new_cfi (); + + cfi-dw_cfi_opc = (reg ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore); + cfi-dw_cfi_oprnd1.dw_cfi_reg_num = reg; + + add_cfi (cfi); +} + /* Perform ROW-REG_SAVE[COLUMN] = CFI. CFI may be null, indicating that the register column is no longer saved. */ @@ -474,64 +496,109 @@ cfa_equal_p (const dw_cfa_location *loc1, const dw_cfa_location *loc2) || loc1-base_offset == loc2-base_offset)); } -/* This routine does the actual work. The CFA is now calculated from - the dw_cfa_location structure. */ +/* Determine if two CFI operands are identical. */ -static void -def_cfa_1 (dw_cfa_location *loc_p) +static bool +cfi_oprnd_equal_p (enum dw_cfi_oprnd_type t, dw_cfi_oprnd *a, dw_cfi_oprnd *b) { - dw_cfi_ref cfi; - dw_cfa_location loc = *loc_p; + switch (t) +{ +case dw_cfi_oprnd_unused: + return true; +case dw_cfi_oprnd_reg_num: + return a-dw_cfi_reg_num == b-dw_cfi_reg_num; +case dw_cfi_oprnd_offset: + return a-dw_cfi_offset == b-dw_cfi_offset; +case dw_cfi_oprnd_addr: + return (a-dw_cfi_addr == b-dw_cfi_addr + || strcmp (a-dw_cfi_addr, b-dw_cfi_addr) == 0); +case dw_cfi_oprnd_loc: + return loc_descr_equal_p (a-dw_cfi_loc, b-dw_cfi_loc); +} + gcc_unreachable (); +} - if (cfa_store.reg == loc.reg loc.indirect == 0) -cfa_store.offset = loc.offset; +/* Determine if two CFI entries are identical. */ + +static bool +cfi_equal_p (dw_cfi_ref a, dw_cfi_ref b) +{ + enum dwarf_call_frame_info opc; + + /* Make things easier for our callers, including missing operands. */ + if (a == b) +return true; + if (a == NULL || b == NULL) +return false; + + /* Obviously, the opcodes must match. */ + opc = a-dw_cfi_opc; + if (opc != b-dw_cfi_opc) +return false; + + /* Compare the two operands, re-using the type of the operands as + already exposed elsewhere. */ + return (cfi_oprnd_equal_p (dw_cfi_oprnd1_desc (opc), +a-dw_cfi_oprnd1, b-dw_cfi_oprnd1) + cfi_oprnd_equal_p (dw_cfi_oprnd2_desc (opc), + a-dw_cfi_oprnd2, b-dw_cfi_oprnd2)); +} + +/* The CFA is now calculated from NEW_CFA. Consider OLD_CFA in determining + what opcode to emit. Returns the CFI opcode to effect the change, or + NULL if NEW_CFA == OLD_CFA. */ + +static dw_cfi_ref +def_cfa_0 (dw_cfa_location *old_cfa, dw_cfa_location *new_cfa) +{ + dw_cfi_ref cfi; /* If nothing changed, no need to issue any call frame instructions. */ - if (cfa_equal_p (loc, cur_row-cfa)) -return; + if (cfa_equal_p (old_cfa, new_cfa)) +return NULL; cfi = new_cfi (); - if (loc.reg == cur_row-cfa.reg !loc.indirect !cur_row-cfa.indirect) + if (new_cfa-reg == old_cfa-reg !new_cfa-indirect !old_cfa-indirect) { /* Construct a DW_CFA_def_cfa_offset offset instruction, indicating the CFA register did not change but the offset did. The data factoring for DW_CFA_def_cfa_offset_sf happens in output_cfi, or in the assembler via the .cfi_def_cfa_offset directive. */ - if (loc.offset 0) + if (new_cfa-offset 0) cfi-dw_cfi_opc = DW_CFA_def_cfa_offset_sf; else cfi-dw_cfi_opc = DW_CFA_def_cfa_offset; - cfi-dw_cfi_oprnd1.dw_cfi_offset = loc.offset; + cfi-dw_cfi_oprnd1.dw_cfi_offset = new_cfa-offset; } #ifndef MIPS_DEBUGGING_INFO /* SGI dbx thinks this means no offset. */ - else if (loc.offset == cur_row-cfa.offset - cur_row-cfa.reg != INVALID_REGNUM - !loc.indirect - !cur_row-cfa.indirect) + else if (new_cfa-offset == old_cfa-offset + old_cfa-reg != INVALID_REGNUM + !new_cfa-indirect + !old_cfa-indirect) { /* Construct a DW_CFA_def_cfa_register register instruction, indicating the CFA register has changed to register but the offset has not changed. */ cfi-dw_cfi_opc = DW_CFA_def_cfa_register; - cfi-dw_cfi_oprnd1.dw_cfi_reg_num = loc.reg; +
[RFC PATCH 0/9] CFG aware dwarf2 cfi generation
This finally brings us to something that can support shrink-wrapping. As mentioned in the description of the last patch, this is 95% of what Bernd had in his last 007-dw2cfg patch, except for the remember/ restore_state stuff. And hopefully with better comments. This is the first version of this that has actually made it into stage3 bootstrap on x86_64, so it isn't well tested yet. This just happens to coincide with the end of my work day, and it's been a while since I've shared state, so I thought I'd post for overnight review. The complete tree is available at git://repo.or.cz/gcc/rth.git rth/cfi-pass r~ Richard Henderson (9): dwarf2cfi: Introduce a dw_cfi_row state. dwarf2cfi: Rename cfi_insn to add_cfi_insn. dwarf2cfi: Populate CUR_ROW-REG_SAVE. dwarf2cfi: Implement change_cfi_row. dwarf2cfi: Remove dw_cfi_row_ref typedef. dwarf2cfi: Convert queued_reg_save to a VEC. dwarf2cfi: Allocate reg_saved_in_data in the heap. dwarf2cfi: Introduce dw_trace_info. dwarf2cfi: Generate and connect traces. gcc/dwarf2cfi.c | 1672 ++- gcc/dwarf2out.c | 159 -- gcc/dwarf2out.h |5 +- 3 files changed, 1030 insertions(+), 806 deletions(-) -- 1.7.6
[PATCH 8/9] dwarf2cfi: Introduce dw_trace_info.
This patch only introduces the structure definition and adjusts the existing routines to use the new cur_trace global to access the variables that were moved into the structure. --- gcc/dwarf2cfi.c | 440 +-- 1 files changed, 266 insertions(+), 174 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 1d6413f..9aa1208 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -74,7 +74,98 @@ typedef struct GTY(()) dw_cfi_row_struct HOST_WIDE_INT args_size; } dw_cfi_row; - +/* The caller's ORIG_REG is saved in SAVED_IN_REG. */ +typedef struct GTY(()) reg_saved_in_data_struct { + rtx orig_reg; + rtx saved_in_reg; +} reg_saved_in_data; + +DEF_VEC_O (reg_saved_in_data); +DEF_VEC_ALLOC_O (reg_saved_in_data, heap); + +/* Since we no longer have a proper CFG, we're going to create a facsimile + of one on the fly while processing the frame-related insns. + + We create dw_trace structures for each instruction trace beginning at + at a label following a barrier (or beginning of the function), and + ending at a barrier (or the end of the function). + + As we encounter control transfer insns, we propagate the current + row state across the edges to the starts of traces. If an edge goes + to a label that is not the start of a trace, we ignore it. This + assumes that previous compiler transformations were correct, and that + we will reach the same row state from any source. (We can perform some + limited validation of this assumption, but without the full CFG we + cannot be sure of full validation coverage. It is expensive, so we + only do so with checking enabled.) + + All traces are members of the TRACE_INFO array, in the order in which + they appear in the instruction stream. + + All labels are given an LUID that indexes the LABEL_INFO array. If + the label is the start of a trace, the TRACE pointer will be non-NULL + and point into the TRACE_INFO array. */ + +typedef struct +{ + /* The label that begins the trace. This will be NULL for the first + trace beginning at function entry. */ + rtx label; + + /* The row state at the beginning and end of the trace. */ + dw_cfi_row *enter_row, *exit_row; + + /* The following variables contain data used in interpreting frame related + expressions. These are not part of the real row state as defined by + Dwarf, but it seems like they need to be propagated into a trace in case + frame related expressions have been sunk. */ + /* ??? This seems fragile. These variables are fragments of a larger + expression. If we do not keep the entire expression together, we risk + not being able to put it together properly. Consider forcing targets + to generate self-contained expressions and dropping all of the magic + interpretation code in this file. Or at least refusing to shrink wrap + any frame related insn that doesn't contain a complete expression. */ + + /* The register used for saving registers to the stack, and its offset + from the CFA. */ + dw_cfa_location cfa_store; + + /* A temporary register holding an integral value used in adjusting SP + or setting up the store_reg. The offset field holds the integer + value, not an offset. */ + dw_cfa_location cfa_temp; + + /* A set of registers saved in other registers. This is the inverse of + the row-reg_save info, if the entry is a DW_CFA_register. This is + implemented as a flat array because it normally contains zero or 1 + entry, depending on the target. IA-64 is the big spender here, using + a maximum of 5 entries. */ + VEC(reg_saved_in_data, heap) *regs_saved_in_regs; + +} dw_trace_info; + +DEF_VEC_O (dw_trace_info); +DEF_VEC_ALLOC_O (dw_trace_info, heap); + +typedef struct +{ + dw_trace_info *trace; + +#ifdef ENABLE_CHECKING + dw_cfi_row *check_row; +#endif +} dw_label_info; + +DEF_VEC_O (dw_label_info); +DEF_VEC_ALLOC_O (dw_label_info, heap); + +/* The variables making up the pseudo-cfg, as described above. */ +#if 0 +static VEC (int, heap) *uid_luid; +static VEC (dw_label_info, heap) *label_info; +static VEC (dw_trace_info, heap) *trace_info; +#endif + /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; @@ -82,6 +173,8 @@ cfi_vec cie_cfi_vec; state provided by the CIE. */ static GTY(()) dw_cfi_row *cie_cfi_row; +static GTY(()) reg_saved_in_data *cie_return_save; + static GTY(()) unsigned long dwarf2out_cfi_label_num; /* The insn after which a new CFI note should be emitted. */ @@ -90,6 +183,34 @@ static rtx add_cfi_insn; /* When non-null, add_cfi will add the CFI to this vector. */ static cfi_vec *add_cfi_vec; +/* The current instruction trace. */ +static dw_trace_info *cur_trace; + +/* The current, i.e. most recently generated, row of the CFI table. */ +static dw_cfi_row *cur_row; + +/* The row state from a preceeding DW_CFA_remember_state. */ +static dw_cfi_row *remember_row; + +/*
[PATCH 9/9] dwarf2cfi: Generate and connect traces.
This kinda-sorta corresponds to Bernd's 007-dw2cfi patch. Certainly the same concepts of splitting the instruction stream into extended basic blocks is the same. This patch does a bit better job with the documentation. Also, I'm a bit more explicit about matching things up with the similar code from the regular CFG routines. What's missing at this point is any attempt to use DW_CFA_remember_state. I've deferred that for the moment because it's easy to test the state change code across epilogues, whereas the shrink-wrapping code is not in this tree and section switching is difficult to force. --- gcc/dwarf2cfi.c | 792 --- 1 files changed, 398 insertions(+), 394 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 9aa1208..10e5740 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see #include flags.h #include rtl.h #include function.h +#include basic-block.h #include dwarf2.h #include dwarf2out.h #include dwarf2asm.h @@ -86,34 +87,30 @@ DEF_VEC_ALLOC_O (reg_saved_in_data, heap); /* Since we no longer have a proper CFG, we're going to create a facsimile of one on the fly while processing the frame-related insns. - We create dw_trace structures for each instruction trace beginning at - at a label following a barrier (or beginning of the function), and - ending at a barrier (or the end of the function). + We create dw_trace_info structures for each extended basic block beginning + and ending at a save point. Save points are labels, barriers, certain + notes, and of course the beginning and end of the function. As we encounter control transfer insns, we propagate the current - row state across the edges to the starts of traces. If an edge goes - to a label that is not the start of a trace, we ignore it. This - assumes that previous compiler transformations were correct, and that - we will reach the same row state from any source. (We can perform some - limited validation of this assumption, but without the full CFG we - cannot be sure of full validation coverage. It is expensive, so we - only do so with checking enabled.) + row state across the edges to the starts of traces. When checking is + enabled, we validate that we propagate the same data from all sources. All traces are members of the TRACE_INFO array, in the order in which they appear in the instruction stream. - All labels are given an LUID that indexes the LABEL_INFO array. If - the label is the start of a trace, the TRACE pointer will be non-NULL - and point into the TRACE_INFO array. */ + All save points are present in the TRACE_INDEX hash, mapping the insn + starting a trace to the dw_trace_info describing the trace. */ typedef struct { - /* The label that begins the trace. This will be NULL for the first - trace beginning at function entry. */ - rtx label; + /* The insn that begins the trace. */ + rtx head; /* The row state at the beginning and end of the trace. */ - dw_cfi_row *enter_row, *exit_row; + dw_cfi_row *beg_row, *end_row; + + /* True if this trace immediately follows NOTE_INSN_SWITCH_TEXT_SECTIONS. */ + bool switch_sections; /* The following variables contain data used in interpreting frame related expressions. These are not part of the real row state as defined by @@ -147,24 +144,15 @@ typedef struct DEF_VEC_O (dw_trace_info); DEF_VEC_ALLOC_O (dw_trace_info, heap); -typedef struct -{ - dw_trace_info *trace; - -#ifdef ENABLE_CHECKING - dw_cfi_row *check_row; -#endif -} dw_label_info; +typedef dw_trace_info *dw_trace_info_ref; -DEF_VEC_O (dw_label_info); -DEF_VEC_ALLOC_O (dw_label_info, heap); +DEF_VEC_P (dw_trace_info_ref); +DEF_VEC_ALLOC_P (dw_trace_info_ref, heap); /* The variables making up the pseudo-cfg, as described above. */ -#if 0 -static VEC (int, heap) *uid_luid; -static VEC (dw_label_info, heap) *label_info; static VEC (dw_trace_info, heap) *trace_info; -#endif +static VEC (dw_trace_info_ref, heap) *trace_work_list; +static htab_t trace_index; /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; @@ -189,9 +177,6 @@ static dw_trace_info *cur_trace; /* The current, i.e. most recently generated, row of the CFI table. */ static dw_cfi_row *cur_row; -/* The row state from a preceeding DW_CFA_remember_state. */ -static dw_cfi_row *remember_row; - /* We delay emitting a register save until either (a) we reach the end of the prologue or (b) the register is clobbered. This clusters register saves so that there are fewer pc advances. */ @@ -211,9 +196,6 @@ static VEC(queued_reg_save, heap) *queued_reg_saves; emitting this data, i.e. updating CUR_ROW, without async unwind. */ static HOST_WIDE_INT queued_args_size; -/* True if remember_state should be emitted before following CFI directive. */ -static bool emit_cfa_remember;