Re: [patch, ARM] Fix PR48250, adjust DImode reload address legitimizing
On 2011/3/30 上午 12:23, Richard Earnshaw wrote: On Tue, 2011-03-29 at 22:53 +0800, Chung-Lin Tang wrote: On 2011/3/29 下午 10:26, Richard Earnshaw wrote: On Tue, 2011-03-29 at 18:25 +0800, Chung-Lin Tang wrote: On 2011/3/24 06:51 PM, Richard Earnshaw wrote: On Thu, 2011-03-24 at 12:56 +0900, Chung-Lin Tang wrote: Hi, PR48250 happens under TARGET_NEON, where DImode is included within the valid NEON modes. This turns the range of legitimate constant indexes to step-4 (coproc load/store), thus arm_legitimize_reload_address() when trying to decompose the [reg+index] reload address into [(reg+index_high)+index_low], can cause an ICE later when 'index_low' part is not aligned to 4. I'm not sure why the current DImode index is computed as: low = ((val 0xf) ^ 0x8) - 0x8; the sign-extending into negative values, then subtracting back, actually creates further off indexes. e.g. in the supplied testcase, [sp+13] was turned into [(sp+16)-3]. Hysterical Raisins... the code there was clearly written for the days before we had LDRD in the architecture. At that time the most efficient way to load a 64-bit object was to use the LDM{ia,ib,da,db} instructions. The computation here was (I think), intended to try and make the most efficient use of an add/sub instruction followed by LDM/STM offsetting. At that time the architecture had no unaligned access either, so dealing with 64-bit that were less than 32-bit aligned (in those days 32-bit was the maximum alignment) probably wasn't considered, or couldn't even get through to reload. I see it now. The code in output_move_double() returning assembly for ldm/stm(db/da/ib) for offsets -8/-4/+4 seems to confirm this. I have changed the patch to let the new code handle the TARGET_LDRD case only. The pre-LDRD case is still handled by the original code, with an additional ~0x3 for aligning the offset to 4. I've also added a comment for the pre-TARGET_LDRD case. Please see if the description is accurate enough. My patch changes the index decomposing to a more straightforward way; it also sort of outlines the way the other reload address indexes are broken by using and-masks, is not the most effective. The address is computed by addition, subtracting away the parts to obtain low+high should be the optimal way of giving the largest computable index range. I have included a few Thumb-2 bits in the patch; I know currently arm_legitimize_reload_address() is only used under TARGET_ARM, but I guess it might eventually be turned into TARGET_32BIT. I think this needs to be looked at carefully on ARMv4/ARMv4T to check that it doesn't cause regressions there when we don't have LDRD in the instruction set. I'll be testing the modified patch under an ARMv4/ARMv4T configuration. Okay for trunk if no regressions? Thanks, Chung-Lin PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Adjust DImode constant index decomposing under TARGET_LDRD. Clear lower two bits for NEON, Thumb-2, and !TARGET_LDRD. Add comment for !TARGET_LDRD case. This looks technically correct, but I can't help feeling that trying to deal with the bottom two bits being non-zero is not really worthwhile. This hook is an optimization that's intended to generate better code than the default mechanisms that reload provides. It is allowed to fail, but it must say so if it does (by returning false). What this hook is trying to do for DImode is to take an address of the form (reg + TOO_BIG_CONST) and turn it into two instructions that are legitimate: tmp = (REG + LEGAL_BIG_CONST) some_use_of (mem (tmp + SMALL_LEGAL_CONST)) The idea behind the optimization is that LEGAL_BIG_CONST will need fewer instructions to generate than TOO_BIG_CONST. It's unlikely (probably impossible in ARM state) that pushing the bottom two bits of the address into LEGAL_BIG_CONST part of the offset is going to lead to a better code sequence. So I think it would be more sensible to just return false if we have a DImode address with the bottom 2 bits non-zero and then let the normal reload recovery mechanism take over. It is supposed to provide better utilization of the full range of LEGAL_BIG_CONST+SMALL_LEGAL_CONST. I am not sure, but I guess reload will resolve it with the reg+LEGAL_BIG_CONST part only, using only (mem (reg)) for the load/store (correct me if I'm wrong). Also, the new code slighty improves the reloading, for example currently [reg+64] is broken into [(reg+72)-8], creating an additional unneeded reload, which is certainly not good when we have ldrd/strd available. Sorry, didn't mean to suggest that we don't want to do something better for addresses that are a multiple of 4, just that for addresses that aren't (at least) word-aligned that we should just bail as the code in that case won't benefit from the optimization. So something like if (mode == DImode || (mode == DFmode
Re: Cleaning up expand optabs code
Richard Henderson r...@redhat.com writes: On 03/29/2011 06:21 AM, Richard Sandiford wrote: - enum machine_mode mode0 = insn_data[(int) icode].operand[1].mode; - enum machine_mode mode1 = insn_data[(int) icode].operand[2].mode; - enum machine_mode tmp_mode; + enum machine_mode xmode0 = insn_data[(int) icode].operand[1].mode; + enum machine_mode xmode1 = insn_data[(int) icode].operand[2].mode; + enum machine_mode mode0, mode1, tmp_mode; ... - if (GET_MODE (xop0) != mode0 mode0 != VOIDmode) -xop0 = convert_modes (mode0, - GET_MODE (xop0) != VOIDmode - ? GET_MODE (xop0) - : mode, - xop0, unsignedp); - - if (GET_MODE (xop1) != mode1 mode1 != VOIDmode) -xop1 = convert_modes (mode1, - GET_MODE (xop1) != VOIDmode - ? GET_MODE (xop1) - : mode, - xop1, unsignedp); + mode0 = GET_MODE (xop0) != VOIDmode ? GET_MODE (xop0) : mode; + if (xmode0 != VOIDmode xmode0 != mode0) +{ + xop0 = convert_modes (xmode0, mode0, xop0, unsignedp); + mode0 = xmode0; +} + + mode1 = GET_MODE (xop1) != VOIDmode ? GET_MODE (xop1) : mode; + if (xmode1 != VOIDmode xmode1 != mode1) +{ + xop1 = convert_modes (xmode1, mode1, xop1, unsignedp); + mode1 = xmode1; +} This smells like a target bug, but I can't quite put my finger on exactly what's wrong, and I haven't debugged the original. The backend has said xmode[01] is what it expects. The only reason you wouldn't write xmode[01] in the create_input_operand line is if xmode[01] are VOIDmode. Right. Sorry, I should have said, but the failure was actually from: case EXPAND_INPUT: input: gcc_assert (mode != VOIDmode); expand_binop_direct passed the match_operand's mode to create_input_operand, which was really the opposite of the intention. The caller should be passing the mode of the rtx, which it should always know. That seems like mistake number one, particularly for a binop, but I'll accept that for the nonce. Which pattern triggered the problem in the target? It was ashrdi3: (define_expand ashrdi3 [(set (match_operand:DI 0 register_operand ) (ashiftrt:DI (match_operand:DI 1 register_operand ) (match_operand 2 const_int_operand )))] !TARGET_COLDFIRE Which seems reasonable in this context, since the shift count isn't really DImode. Then we've got the conditionalization in the init of mode[01], which is presumably to handle CONST_INT as an input. What about something like xmode0 = insn_data... if (xmode0 == VOIDmode) xmode0 = mode; mode0 = GET_MODE (xop0); if (mode0 == VOIDmode) mode0 = mode; if (xmode0 != mode0) convert_modes create_input_operand(..., xmode0) ? That seems more obvious than what you have. And I *think* it should produce the same results. If it doesn't, then this whole block of code needs a lot more explanation. The problem is that a VOIDmode match_operand doesn't necessary imply that mode is the right mode. VOIDmode register-accepting operands are only a warning, not an error, and are sometimes needed for flag-specific variations. They've traditionally not forced a conversion to mode. E.g. if you have something like this on a 32-bit target: unsigned long long foo (unsigned long long x, int y) { return x = y; } then op1 will be (reg:SI y). Having a VOIDmode match_operand shouldn't force that to be converted to (reg:DI y), whereas I think the sequence above would. Or to put it another way: as things stand, mode is only trustworthy for CONST_INT opNs. A VOIDmode match_operand doesn't imply that the opN argument to expand_binop_directly is a CONST_INT, or even that only CONST_INTs are acceptable to the target. This comes back to the point that we really should know up-front what modes op0 and op1 actually have. (The thing I left as a future clean-up.) Until then, the process implemented by yesterday's patch was supposed to be: - work out what mode opN actually has at this point in time - see if the target has specifically asked for a different mode - if so, convert the operand This is directly equivalent to what create_convert_operand_from does: case EXPAND_CONVERT_FROM: if (GET_MODE (op-value) != VOIDmode) mode = GET_MODE (op-value); else /* The caller must tell us what mode this value has. */ gcc_assert (mode != VOIDmode); imode = insn_data[(int) icode].operand[opno].mode; if (imode != VOIDmode imode != mode) { op-value = convert_modes (imode, mode, op-value, op-unsigned_p); mode = imode; } But we have to break the flow there (rather than go on to coerce the operands) because of the commutative thing. Richard
Re: Some remodelling of the ARM vld and vst patterns
Richard Sandiford richard.sandif...@linaro.org writes: The ??? is saying that the V8QI-derived MEM is really a 3-byte access, not a 4-byte (SI) access, and so on. The comment makes the mode sound like a representational niceity, but really, there's no such thing as a conservatively wrong memory size here. If a store's mode is too small, dependent loads could be deleted as dead. If it's too big, unrelated live loads could be deleted as dead. In case it isn't obvious, I meant unrelated live stores. Richard
[RFC][patch] If-conversion of COMPONENT_REFs
Hi, With this patch a data-ref is marked as unconditionally read or written also if its adjacent field is read or written unconditionally in the loop. My concern is that this is not safe enough, even though the fields have to be non-pointers and non-aggregates, and this optimization is applied only with -ftree-loop-if-convert-stores flag. Bootstrapped on powerpc64-suse-linux and tested on x86_64-suse-linux. OK for trunk? Thanks, Ira ChangeLog: * tree-if-conv.c (memrefs_read_or_written_unconditionally): Return true if an adjacent field of the data-ref is accessed unconditionally. testsuite/ChangeLog: * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: New test. * gcc.dg/vect/vect.exp: Run if-cvt-stores-vect* tests with -ftree-loop-if-convert-stores. Index: testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c === --- testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (revision 0) +++ testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c (revision 0) @@ -0,0 +1,69 @@ +/* { dg-require-effective-target vect_int } */ + +#include stdarg.h +#include tree-vect.h + +#define N 50 + +typedef struct { + short a; + short b; +} data; + +data in1[N], in2[N], out[N]; +short result[N*2] = {10,-7,11,-6,12,-5,13,-4,14,-3,15,-2,16,-1,17,0,18,1,19,2,20,3,21,4,22,5,23,6,24,7,25,8,26,9,27,10,28,11,29,12,30,13,31,14,32,15,33,16,34,17,35,18,36,19,37,20,38,21,39,22,40,23,41,24,42,25,43,26,44,27,45,28,46,29,47,30,48,31,49,32,50,33,51,34,52,35,53,36,54,37,55,38,56,39,57,40,58,41,59,42}; +short out1[N], out2[N]; + +__attribute__ ((noinline)) void +foo () +{ + int i; + short c, d; + + for (i = 0; i N; i++) +{ + c = in1[i].b; + d = in2[i].b; + + if (c = d) +{ + out[i].b = in1[i].a; + out[i].a = d + 5; +} + else +{ + out[i].b = d - 12; + out[i].a = in2[i].a + d; +} +} +} + +int +main (void) +{ + int i; + + check_vect (); + + for (i = 0; i N; i++) +{ + in1[i].a = i; + in1[i].b = i + 2; + in2[i].a = 5; + in2[i].b = i + 5; + __asm__ volatile (); +} + + foo (); + + for (i = 0; i N; i++) +{ + if (out[i].a != result[2*i] || out[i].b != result[2*i+1]) +abort (); +} + + return 0; +} + +/* { dg-final { scan-tree-dump-times vectorized 1 loops 1 vect { xfail { vect_no_align || {! vect_strided } } } } } */ +/* { dg-final { cleanup-tree-dump vect } } */ Index: testsuite/gcc.dg/vect/vect.exp === --- testsuite/gcc.dg/vect/vect.exp (revision 171716) +++ testsuite/gcc.dg/vect/vect.exp (working copy) @@ -210,6 +210,12 @@ lappend DEFAULT_VECTCFLAGS --param gg dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/ggc-*.\[cS\]]] \ $DEFAULT_VECTCFLAGS +# -ftree-loop-if-convert-stores +set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS +lappend DEFAULT_VECTCFLAGS -ftree-loop-if-convert-stores +dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/if-cvt-stores-vect-*.\[cS\]]] \ + $DEFAULT_VECTCFLAGS + # With -O3. # Don't allow IPA cloning, because it throws our counts out of whack. set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS Index: tree-if-conv.c === --- tree-if-conv.c (revision 171716) +++ tree-if-conv.c (working copy) @@ -461,11 +461,11 @@ struct ifc_dr { #define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)-written_at_least_once) #define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)-rw_unconditionally) -/* Returns true when the memory references of STMT are read or written - unconditionally. In other words, this function returns true when - for every data reference A in STMT there exist other accesses to - the same data reference with predicates that add up (OR-up) to the - true predicate: this ensures that the data reference A is touched +/* Returns true when the memory references of STMT (or their adjacent fields) + are read or writtenunconditionally. In other words, this function + returns true when for every data reference A in STMT there exist other + accesses to the same data reference with predicates that add up (OR-up) to + the true predicate: this ensures that the data reference A is touched (read or written) on every iteration of the if-converted loop. */ static bool @@ -479,7 +479,7 @@ memrefs_read_or_written_unconditionally for (i = 0; VEC_iterate (data_reference_p, drs, i, a); i++) if (DR_STMT (a) == stmt) { - bool found = false; + bool found = false, same_comp_ref = false; int x = DR_RW_UNCONDITIONALLY (a); if (x == 0) @@ -489,22 +489,45 @@ memrefs_read_or_written_unconditionally continue; for (j = 0; VEC_iterate (data_reference_p, drs, j, b); j++) - if (DR_STMT (b) != stmt - same_data_refs (a,
Re: [patch, ARM] Fix PR48250, adjust DImode reload address legitimizing
On Wed, 2011-03-30 at 15:35 +0800, Chung-Lin Tang wrote: On 2011/3/30 上午 12:23, Richard Earnshaw wrote: On Tue, 2011-03-29 at 22:53 +0800, Chung-Lin Tang wrote: On 2011/3/29 下午 10:26, Richard Earnshaw wrote: On Tue, 2011-03-29 at 18:25 +0800, Chung-Lin Tang wrote: On 2011/3/24 06:51 PM, Richard Earnshaw wrote: On Thu, 2011-03-24 at 12:56 +0900, Chung-Lin Tang wrote: Hi, PR48250 happens under TARGET_NEON, where DImode is included within the valid NEON modes. This turns the range of legitimate constant indexes to step-4 (coproc load/store), thus arm_legitimize_reload_address() when trying to decompose the [reg+index] reload address into [(reg+index_high)+index_low], can cause an ICE later when 'index_low' part is not aligned to 4. I'm not sure why the current DImode index is computed as: low = ((val 0xf) ^ 0x8) - 0x8; the sign-extending into negative values, then subtracting back, actually creates further off indexes. e.g. in the supplied testcase, [sp+13] was turned into [(sp+16)-3]. Hysterical Raisins... the code there was clearly written for the days before we had LDRD in the architecture. At that time the most efficient way to load a 64-bit object was to use the LDM{ia,ib,da,db} instructions. The computation here was (I think), intended to try and make the most efficient use of an add/sub instruction followed by LDM/STM offsetting. At that time the architecture had no unaligned access either, so dealing with 64-bit that were less than 32-bit aligned (in those days 32-bit was the maximum alignment) probably wasn't considered, or couldn't even get through to reload. I see it now. The code in output_move_double() returning assembly for ldm/stm(db/da/ib) for offsets -8/-4/+4 seems to confirm this. I have changed the patch to let the new code handle the TARGET_LDRD case only. The pre-LDRD case is still handled by the original code, with an additional ~0x3 for aligning the offset to 4. I've also added a comment for the pre-TARGET_LDRD case. Please see if the description is accurate enough. My patch changes the index decomposing to a more straightforward way; it also sort of outlines the way the other reload address indexes are broken by using and-masks, is not the most effective. The address is computed by addition, subtracting away the parts to obtain low+high should be the optimal way of giving the largest computable index range. I have included a few Thumb-2 bits in the patch; I know currently arm_legitimize_reload_address() is only used under TARGET_ARM, but I guess it might eventually be turned into TARGET_32BIT. I think this needs to be looked at carefully on ARMv4/ARMv4T to check that it doesn't cause regressions there when we don't have LDRD in the instruction set. I'll be testing the modified patch under an ARMv4/ARMv4T configuration. Okay for trunk if no regressions? Thanks, Chung-Lin PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Adjust DImode constant index decomposing under TARGET_LDRD. Clear lower two bits for NEON, Thumb-2, and !TARGET_LDRD. Add comment for !TARGET_LDRD case. This looks technically correct, but I can't help feeling that trying to deal with the bottom two bits being non-zero is not really worthwhile. This hook is an optimization that's intended to generate better code than the default mechanisms that reload provides. It is allowed to fail, but it must say so if it does (by returning false). What this hook is trying to do for DImode is to take an address of the form (reg + TOO_BIG_CONST) and turn it into two instructions that are legitimate: tmp = (REG + LEGAL_BIG_CONST) some_use_of (mem (tmp + SMALL_LEGAL_CONST)) The idea behind the optimization is that LEGAL_BIG_CONST will need fewer instructions to generate than TOO_BIG_CONST. It's unlikely (probably impossible in ARM state) that pushing the bottom two bits of the address into LEGAL_BIG_CONST part of the offset is going to lead to a better code sequence. So I think it would be more sensible to just return false if we have a DImode address with the bottom 2 bits non-zero and then let the normal reload recovery mechanism take over. It is supposed to provide better utilization of the full range of LEGAL_BIG_CONST+SMALL_LEGAL_CONST. I am not sure, but I guess reload will resolve it with the reg+LEGAL_BIG_CONST part only, using only (mem (reg)) for the load/store (correct me if I'm wrong). Also, the new code slighty improves the reloading, for example currently [reg+64] is broken into [(reg+72)-8], creating an additional unneeded reload, which is certainly not good when we have ldrd/strd available. Sorry, didn't mean to suggest that we don't want to do something better for addresses that are a multiple of 4, just that for addresses that aren't (at least) word-aligned that
Re: [PATCH] Another EQ/NE folder type fix (PR c/48305)
On Tue, Mar 29, 2011 at 5:53 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! I've left these 4 out because I thought they should be fine given that the other operands are equal. But the testcase shows I was wrong. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. 2011-03-29 Jakub Jelinek ja...@redhat.com PR c/48305 * fold-const.c (fold_binary_loc) case EQ_EXPR, NE_EXPR: Make sure arg10/arg11 in (X ^ Y) == (Z ^ W) are always fold converted to matching arg00/arg01 types. * gcc.c-torture/compile/pr48305.c: New test. --- gcc/fold-const.c.jj 2011-03-25 07:56:32.0 +0100 +++ gcc/fold-const.c 2011-03-28 12:45:58.0 +0200 @@ -12625,13 +12625,21 @@ fold_binary_loc (location_t loc, operand_equal_p guarantees no side-effects so we don't need to use omit_one_operand on Z. */ if (operand_equal_p (arg01, arg11, 0)) - return fold_build2_loc (loc, code, type, arg00, arg10); + return fold_build2_loc (loc, code, type, arg00, + fold_convert_loc (loc, TREE_TYPE (arg00), + arg10)); if (operand_equal_p (arg01, arg10, 0)) - return fold_build2_loc (loc, code, type, arg00, arg11); + return fold_build2_loc (loc, code, type, arg00, + fold_convert_loc (loc, TREE_TYPE (arg00), + arg11)); if (operand_equal_p (arg00, arg11, 0)) - return fold_build2_loc (loc, code, type, arg01, arg10); + return fold_build2_loc (loc, code, type, arg01, + fold_convert_loc (loc, TREE_TYPE (arg01), + arg10)); if (operand_equal_p (arg00, arg10, 0)) - return fold_build2_loc (loc, code, type, arg01, arg11); + return fold_build2_loc (loc, code, type, arg01, + fold_convert_loc (loc, TREE_TYPE (arg01), + arg11)); /* Optimize (X ^ C1) op (Y ^ C2) as (X ^ (C1 ^ C2)) op Y. */ if (TREE_CODE (arg01) == INTEGER_CST --- gcc/testsuite/gcc.c-torture/compile/pr48305.c.jj 2011-03-28 12:48:02.0 +0200 +++ gcc/testsuite/gcc.c-torture/compile/pr48305.c 2011-03-28 12:47:49.0 +0200 @@ -0,0 +1,7 @@ +/* PR c/48305 */ + +int +foo (int x) +{ + return (x ^ 1) == (x ^ 1U); +} Jakub
Re: [cxx-mem-model] bitfield tests
On Tue, Mar 29, 2011 at 7:00 PM, Aldy Hernandez al...@redhat.com wrote: [Language lawyers, please correct me if I have mis-interpreted the upcoming standard in any way.] In the C++ memory model, contiguous bitfields comprise a single memory location, so it's fair game to bit twiddle them when setting them. For example: struct { unsigned int a : 4; unsigned int b : 4; unsigned int c : 4; }; In the above example, you can touch b and c while setting a. No race there. However, non contiguous bitfields are a different story: struct { unsigned int a : 4; char b; unsigned int c : 6; }; Here we have 3 distinct memory locations, so you can't touch b or c while setting a. No bit twiddling allowed. Similarly for bitfields separated by a zero-length bitfield: struct { unsigned int a : 4; int : 0; unsigned int c : 6; }; In the above example, a and c are distinct memory locations. Also, a structure/union boundary will also separate previously contiguous bit sequences: struct { unsigned int a : 4; struct { unsigned int b : 4 } BBB; unsigned int c : 4; }; Here we have 3 distinct memory locations, so again, we can't clobber b or c while setting a. The patch below adds a non-contiguous bit test (bitfields-2.C) which passes on x86, but upon assembly inspection, fails on PPC64, s390, and Alpha. These 3 architectures bit-twiddle their way out of the problem. The memory model is not implementable on strict-alignment targets that do not have a byte store operation. But we previously said that ;) Also consider global vars char a; char b; accessing them on strict-align targets may access adjacent globals (that's a problem anyway, also with alias analysis). Richard. There is also a similar test already in the testsuite (bitfields.C) which is similar except one field is a volatile. This test fails on x86-64 as well and is the subject of PR48128 which Jakub is currently tackling. As soon as Jakub finishes with PR48128, I will be working on getting these bitfield tests working in a C++ memory model fashion. Committing to branch.
Re: RFA: patch to solve IRA PR48336, PR48342, PR48345
Hi Vlad, First, I want to echo H-P's thanks for tackling this area. I just wondered: Vladimir Makarov vmaka...@redhat.com writes: The following patch is to solve PR48336, PR48342, PR48345. The profitable hard regs exclude hard regs which are prohibited for the corresponding allocno mode. It is true for primary allocation and it is important for better colorability criteria. Function assign_hard_reg is also based on this assumption. Unfortunately, it is not true for secondary allocation (after IRA IR flattening or during reload). The following patch solves this problem. The patch should be very safe but I am still testing it on x86/x86-64 bootstrap. [...] Index: ira-color.c === --- ira-color.c (revision 171699) +++ ira-color.c (working copy) @@ -1447,7 +1447,9 @@ update_conflict_hard_regno_costs (int *c } /* Set up conflicting and profitable regs (through CONFLICT_REGS and - PROFITABLE_REGS) for each object of allocno A. */ + PROFITABLE_REGS) for each object of allocno A. Remember that the + profitable regs exclude hard regs which can not hold value of mode + of allocno A. */ static inline void setup_conflict_profitable_regs (ira_allocno_t a, bool retry_p, HARD_REG_SET *conflict_regs, @@ -1463,8 +1465,13 @@ setup_conflict_profitable_regs (ira_allo COPY_HARD_REG_SET (conflict_regs[i], OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)); if (retry_p) - COPY_HARD_REG_SET (profitable_regs[i], -reg_class_contents[ALLOCNO_CLASS (a)]); + { + COPY_HARD_REG_SET (profitable_regs[i], + reg_class_contents[ALLOCNO_CLASS (a)]); + AND_COMPL_HARD_REG_SET (profitable_regs[i], + ira_prohibited_class_mode_regs + [ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]); + } else COPY_HARD_REG_SET (profitable_regs[i], OBJECT_COLOR_DATA (obj)-profitable_hard_regs); ira_prohibited_class_mode_regs is partly based on HARD_REGNO_MODE_OK, which is really a property of the first register in a multi-register group (rather than of every register in that group). So is ira_prohibited_class_mode_regs defined in the same way? At the moment, check_hard_reg_p and setup_allocno_available_regs_num test profitability for every register in the group: /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j) || ! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno + j)) break; [...] for (j = 0; j nregs; j++) { [...] /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regno + j) || ! TEST_HARD_REG_BIT (obj_data-profitable_hard_regs, hard_regno + j)) break; So if you have a target in which double-word values have to start in even registers, I think every odd-numbered bit of profitable_hard_regs will be clear, and no register will seem profitable. (I'm seeing this on ARM with some VFP tests.) Restricting the test to the first register fixes things for me, but do we need to check something else for the j != 0 case? Richard gcc/ * ira-color.c (check_hard_reg_p): Restrict the profitability check to the first register. (setup_allocno_available_regs_num): Likewise. Index: gcc/ira-color.c === --- gcc/ira-color.c (revision 171653) +++ gcc/ira-color.c (working copy) @@ -1497,7 +1504,8 @@ check_hard_reg_p (ira_allocno_t a, int h for (k = set_to_test_start; k set_to_test_end; k++) /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j) - || ! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno + j)) + || (j == 0 +! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno))) break; if (k != set_to_test_end) break; @@ -2226,8 +2234,9 @@ setup_allocno_available_regs_num (ira_al /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regno + j) - || ! TEST_HARD_REG_BIT (obj_data-profitable_hard_regs, - hard_regno + j)) + || (j == 0 + ! TEST_HARD_REG_BIT (obj_data-profitable_hard_regs, + hard_regno))) break; } if (k != set_to_test_end)
PATCH: PR target/48349: FLOAT_SSE_REGS typo in i386.h
Hi, I checked this pre-approved patch into trunk/4.6/4.5/4.4. H.J. --- Index: ChangeLog === --- ChangeLog (revision 171717) +++ ChangeLog (working copy) @@ -1,3 +1,9 @@ +2011-03-30 H.J. Lu hongjiu...@intel.com + + PR target/48349 + * config/i386/i386.h (REG_CLASS_CONTENTS): Fix a typo in + FLOAT_SSE_REGS. + 2011-03-30 Joseph Myers jos...@codesourcery.com Rainer Orth r...@cebitec.uni-bielefeld.de Index: config/i386/i386.h === --- config/i386/i386.h (revision 171717) +++ config/i386/i386.h (working copy) @@ -1272,7 +1272,7 @@ enum reg_class { 0xe000,0x1f }, /* MMX_REGS */ \ { 0x1fe00100,0x1fe000 }, /* FP_TOP_SSE_REG */\ { 0x1fe00200,0x1fe000 }, /* FP_SECOND_SSE_REG */ \ -{ 0x1fe0ff00,0x3fe000 }, /* FLOAT_SSE_REGS */\ +{ 0x1fe0ff00,0x1fe000 }, /* FLOAT_SSE_REGS */\ { 0x1, 0x1fe0 }, /* FLOAT_INT_REGS */\ { 0x1fe100ff,0x1fffe0 }, /* INT_SSE_REGS */ \ { 0x1fe1,0x1fffe0 }, /* FLOAT_INT_SSE_REGS */\
Re: [PATCH, ARM] PR47855 Compute attr length for some thumb2 insns
On Wed, 2011-03-30 at 12:38 +0800, Chung-Lin Tang wrote: On 2011/3/30 06:35 AM, Ramana Radhakrishnan wrote: Hi Carrot, On 26/03/11 15:14, Carrot Wei wrote: Index: arm.md === --- arm.md(revision 171337) +++ arm.md(working copy) @@ -7115,7 +7115,18 @@ @ cmp%?\\t%0, %1 cmn%?\\t%0, #%n1 - [(set_attr conds set)] + [(set_attr conds set) + (set (attr length) + (if_then_else + (and (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) + (eq (symbol_ref which_alternative) (const_int 0))) +(ior (ne (symbol_ref REG_P (operands[1])) (const_int 0)) +(and (ne (symbol_ref CONST_INT_P (operands[1])) (const_int 0)) + (and (ge (symbol_ref INTVAL (operands[1])) (const_int 0)) + (le (symbol_ref INTVAL (operands[1])) + (const_int 255)) + (const_int 2) + (const_int 4)))] ) How about adding an alternative only enabled for T2 that uses the `l' constraint and inventing new constraints for some of the constant values that are valid for 16 bit instructions since the `I' and `L' constraints have different meanings depending on whether TARGET_32BIT is valid or not ? We could then set the value of the length attribute accordingly. I don't think we can change the meaning of the I and L constraints very easily given the amount of churn that might be needed (define_insn *cmpsi_shiftsi @@ -7286,7 +7297,14 @@ return \b%d1\\t%l0\; [(set_attr conds use) - (set_attr type branch)] + (set_attr type branch) + (set (attr length) +(if_then_else + (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) +(and (ge (minus (match_dup 0) (pc)) (const_int -250)) + (le (minus (match_dup 0) (pc)) (const_int 256 + (const_int 2) + (const_int 4)))] ) I don't think this and the other conditional branch instruction changes are correct. This could end up being the last instruction in an IT block and will automatically end up getting the 32 bit encoding and end up causing trouble with the length calculations. Remember the 16 bit encoding for the conditional instruction can't be used as the last instruction in an IT block. @@ -10256,7 +10288,26 @@ return \\; } - [(set_attr type store4)] + [(set_attr type store4) + (set (attr length) +(if_then_else + (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) +(ne (symbol_ref { +int i, regno, hi_reg; +int num_saves = XVECLEN (operands[2], 0); +regno = REGNO (operands[1]); +hi_reg = (REGNO_REG_CLASS (regno) == HI_REGS) + (regno != LR_REGNUM); +for (i = 1; i num_saves; i++) (i num_saves (hi_reg == 0)) - what's the point of going through the register list when hi_reg != 0 in this case ? A comment to explain that the length calculation *must* be kept in sync with the template above might be useful. I suggest abstracting all of this Thumb-2 insn length knowledge into a arm.c function, say thumb2_16bit_insn_p() to test for 2-byte patterns. Adding more of these quite large pieces of logic into arm.md does not look very pretty to me, and centralizing it into a C function will also make it easier to manage the cases. Chung-Lin I think Ramana's suggestion should avoid the need for any arithmetic at all here. If the constraints precisely match a 16-bit instruction (and that alternative is disbled for ARM) then we can say unconditionally that the length of the insn is 2. R.
Re: [RFC][patch] If-conversion of COMPONENT_REFs
On 30 March 2011 12:59, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Mar 30, 2011 at 11:13 AM, Ira Rosen ira.ro...@linaro.org wrote: Hi, With this patch a data-ref is marked as unconditionally read or written also if its adjacent field is read or written unconditionally in the loop. My concern is that this is not safe enough, even though the fields have to be non-pointers and non-aggregates, and this optimization is applied only with -ftree-loop-if-convert-stores flag. Bootstrapped on powerpc64-suse-linux and tested on x86_64-suse-linux. OK for trunk? The restrictions do not make too much sense to me. For the C++ memory model we can't do speculative stores at all, but for the rest I'd say just checking if the data-refs access the same object is enough, thus, instead of same_data_refs (a, b) simply check operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0) or operand_equal_p (get_base_address (DR_REF (a)), get_base_address (DR_REF (b)), 0), whatever makes more sense (I always confuse what the contents of the various DR fields are). But what about int a[10], b[100], c[100]; for (i = 0; i 100; i++) { if (i 10) t = a[i]; else t = b[i]; c[i] = t + a[1]; } We can't transform this to int a[10], b[100], c[100]; for (i = 0; i 100; i++) { t1 = a[i]; t2 = b[i]; t = (i 10) ? t1 : t2; c[i] = t + a[1]; } since we create accesses to a[10:100], right? Thanks, Ira Thanks, Richard. Thanks, Ira ChangeLog: * tree-if-conv.c (memrefs_read_or_written_unconditionally): Return true if an adjacent field of the data-ref is accessed unconditionally. testsuite/ChangeLog: * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: New test. * gcc.dg/vect/vect.exp: Run if-cvt-stores-vect* tests with -ftree-loop-if-convert-stores.
Negative option alias facility
This patch adds a NegativeAlias .opt facility, for one option to be marked as an alias for the negative form of another option. There are quite a lot of options like that, and this is of particular use where it allows option handler code to be removed because the aliases can now be handled purely in the .opt file (as opposed to the case where both positive and negative options are fully described by their Var settings). In particular, this patch also converts the legacy rs6000 options -mvrsave=, -misel=, -mspe= to aliases for the standard forms of the options (via listing the =yes and =no forms of each option as separate aliases), so simplifying the rs6000 option handling code. Bootstrapped with no regressions on x86_64-unknown-linux-gnu, and tested building cc1 and xgcc for cross to powerpc-eabispe. Will commit to trunk in the absence of target maintainer objections. 2011-03-30 Joseph Myers jos...@codesourcery.com * doc/options.texi (NegativeAlias): Document. (Alias): Mention NegativeAlias. * opt-functions.awk: Handle NegativeAlias. * optc-gen.awk: Disallow NegativeAlias with multiple Alias arguments. * opts-common.c (decode_cmdline_option): Handle CL_NEGATIVE_ALIAS. * opts.h (CL_NEGATIVE_ALIAS): Define. * config/rs6000/rs6000.c (rs6000_parse_yes_no_option): Remove. (rs6000_handle_option): Don't handle OPT_mvrsave_, OPT_misel_ and OPT_mspe_. * config/rs6000/rs6000.opt (mvrsave=, misel=, mspe=): Replace with Alias entries. * config/rs6000/t-spe (MULTILIB_OPTIONS, MULTILIB_EXCEPTIONS): Use mno-spe and mno-isel instead of mspe=no and -misel=no. Index: gcc/doc/options.texi === --- gcc/doc/options.texi(revision 171701) +++ gcc/doc/options.texi(working copy) @@ -364,7 +364,8 @@ for later processing. @item Alias(@var{opt}) @itemx Alias(@var{opt}, @var{arg}) @itemx Alias(@var{opt}, @var{posarg}, @var{negarg}) -The option is an alias for @option{-@var{opt}}. In the first form, +The option is an alias for @option{-@var{opt}} (or the negative form +of that option, depending on @code{NegativeAlias}). In the first form, any argument passed to the alias is considered to be passed to @option{-@var{opt}}, and @option{-@var{opt}} is considered to be negated if the alias is used in negated form. In the second form, the @@ -387,6 +388,13 @@ not need to handle it and no @samp{OPT_} for it; only the canonical form of the option will be seen in those places. +@item NegativeAlias +For an option marked with @code{Alias(@var{opt})}, the option is +considered to be an alias for the positive form of @option{-@var{opt}} +if negated and for the negative form of @option{-@var{opt}} if not +negated. @code{NegativeAlias} may not be used with the forms of +@code{Alias} taking more than one argument. + @item Ignore This option is ignored apart from printing any warning specified using @code{Warn}. The option will not be seen by specs and no @samp{OPT_} Index: gcc/opts-common.c === --- gcc/opts-common.c (revision 171701) +++ gcc/opts-common.c (working copy) @@ -507,6 +507,7 @@ decode_cmdline_option (const char **argv { gcc_assert (option-alias_arg != NULL); gcc_assert (arg == NULL); + gcc_assert (!(option-flags CL_NEGATIVE_ALIAS)); if (value) arg = option-alias_arg; else @@ -517,9 +518,13 @@ decode_cmdline_option (const char **argv { gcc_assert (value == 1); gcc_assert (arg == NULL); + gcc_assert (!(option-flags CL_NEGATIVE_ALIAS)); arg = option-alias_arg; } + if (option-flags CL_NEGATIVE_ALIAS) + value = !value; + opt_index = new_opt_index; option = new_option; Index: gcc/opts.h === --- gcc/opts.h (revision 171701) +++ gcc/opts.h (working copy) @@ -115,6 +115,7 @@ extern const unsigned int cl_lang_count; #define CL_MISSING_OK (1U 28) /* Missing argument OK (joined). */ #define CL_UINTEGER(1U 29) /* Argument is an integer =0. */ #define CL_UNDOCUMENTED(1U 30) /* Do not output with --help. */ +#define CL_NEGATIVE_ALIAS (1U 31) /* Alias to negative form of option. */ /* Flags for an enumerated option argument. */ #define CL_ENUM_CANONICAL (1 0) /* Canonical for this value. */ Index: gcc/optc-gen.awk === --- gcc/optc-gen.awk(revision 171701) +++ gcc/optc-gen.awk(working copy) @@ -362,6 +362,11 @@ for (i = 0; i n_opts; i++) { print #error Alias with single argument \
Re: [RFC][patch] If-conversion of COMPONENT_REFs
On 30 March 2011 14:41, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Mar 30, 2011 at 2:22 PM, Ira Rosen ira.ro...@linaro.org wrote: On 30 March 2011 12:59, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Mar 30, 2011 at 11:13 AM, Ira Rosen ira.ro...@linaro.org wrote: Hi, With this patch a data-ref is marked as unconditionally read or written also if its adjacent field is read or written unconditionally in the loop. My concern is that this is not safe enough, even though the fields have to be non-pointers and non-aggregates, and this optimization is applied only with -ftree-loop-if-convert-stores flag. Bootstrapped on powerpc64-suse-linux and tested on x86_64-suse-linux. OK for trunk? The restrictions do not make too much sense to me. For the C++ memory model we can't do speculative stores at all, but for the rest I'd say just checking if the data-refs access the same object is enough, thus, instead of same_data_refs (a, b) simply check operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0) or operand_equal_p (get_base_address (DR_REF (a)), get_base_address (DR_REF (b)), 0), whatever makes more sense (I always confuse what the contents of the various DR fields are). But what about int a[10], b[100], c[100]; for (i = 0; i 100; i++) { if (i 10) t = a[i]; else t = b[i]; c[i] = t + a[1]; } We can't transform this to int a[10], b[100], c[100]; for (i = 0; i 100; i++) { t1 = a[i]; t2 = b[i]; t = (i 10) ? t1 : t2; c[i] = t + a[1]; } since we create accesses to a[10:100], right? That's correct - we may only ever create valid accesses, but any valid access to a is ok after we see an unconditional access to a. So we probably have to restrict ourselves to DECL_P (get_base_address ()) objects and make sure we don't access past it. Thus, I see what you were trying to do - may I suggest sth like ref_base_a = DR_REF (a); while (TREE_CODE (ref_base_a) == COMPONENT_REF || TREE_CODE (ref_base_a) == IMAGPART_EXPR || TREE_CODE (ref_base_a) == REALPART_EXPR) ref_base_a = TREE_OPERAND (ref_base_a, 0); ... same for DR_REF (b) ... if (operand_equal_p (ref_base_a, ref_base_b, 0)) ok that is, strip all non-variable offset handled_component_refs and compare the remaining base of the two accesses. If they are equal we are ok. Any hole in that? I don't see any :) I'll test your version. Thanks, Ira Thanks, Richard. Thanks, Ira Thanks, Richard. Thanks, Ira ChangeLog: * tree-if-conv.c (memrefs_read_or_written_unconditionally): Return true if an adjacent field of the data-ref is accessed unconditionally. testsuite/ChangeLog: * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: New test. * gcc.dg/vect/vect.exp: Run if-cvt-stores-vect* tests with -ftree-loop-if-convert-stores.
RX: Sync with 4.6 branch
Hi Guys, I am applying the patch below to sync the RX backend with the 4.6 branch. In practice this means bringing in the peepholes to combine extending loads and simple arithmetic expressions, and adjusting the memory move cost cost for stores. (The insn length attribute is needed for the rx_max_skip_for_label function). Cheers Nick gcc/ChangeLog 2011-03-30 Nick Clifton ni...@redhat.com * config/rx/rx.md: Add peepholes and patterns to combine extending loads and simple arithmetic instructions. * config/rx/rx.h (ADJUST_INSN_LENGTH): Define. * config/rx/rx-protos.h (rx_adjust_insn_length): Prototype. * config/rx/rx.c (rx_is_legitimate_address): Allow QI and HI modes to use pre-decrement and post-increment addressing. (rx_is_restricted_memory_address): Add range checking of REG+INT addresses. (rx_print_operand): Add support for %Q. Fix handling of %Q. (rx_memory_move_cost): Adjust cost of stores. (rx_adjust_insn_length): New function. Index: gcc/config/rx/rx.h === --- gcc/config/rx/rx.h (revision 171716) +++ gcc/config/rx/rx.h (working copy) @@ -630,3 +630,10 @@ #define REGISTER_MOVE_COST(MODE,FROM,TO) 2 #define SELECT_CC_MODE(OP,X,Y) rx_select_cc_mode(OP, X, Y) + +#define ADJUST_INSN_LENGTH(INSN,LENGTH)\ + do \ +{ \ + (LENGTH) = rx_adjust_insn_length ((INSN), (LENGTH)); \ +} \ + while (0) Index: gcc/config/rx/rx-protos.h === --- gcc/config/rx/rx-protos.h (revision 171716) +++ gcc/config/rx/rx-protos.h (working copy) @@ -31,16 +31,17 @@ extern int rx_initial_elimination_offset (int, int); #ifdef RTX_CODE +extern int rx_adjust_insn_length (rtx, int); extern void rx_emit_stack_popm (rtx *, bool); extern void rx_emit_stack_pushm (rtx *); extern voidrx_expand_epilogue (bool); extern char * rx_gen_move_template (rtx *, bool); extern boolrx_is_legitimate_constant (rtx); extern boolrx_is_restricted_memory_address (rtx, Mmode); +extern boolrx_match_ccmode (rtx, Mmode); extern voidrx_notice_update_cc (rtx body, rtx insn); extern voidrx_split_cbranch (Mmode, Rcode, rtx, rtx, rtx); extern Mmode rx_select_cc_mode (Rcode, rtx, rtx); -extern boolrx_match_ccmode (rtx, Mmode); #endif #endif /* GCC_RX_PROTOS_H */ Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 171716) +++ gcc/config/rx/rx.md (working copy) @@ -1545,6 +1545,139 @@ (set_attr length 3,4,5,6,7,6)] ) +;; A set of peepholes to catch extending loads followed by arithmetic operations. +;; We use iterators where possible to reduce the amount of typing and hence the +;; possibilities for typos. + +(define_code_iterator extend_types [(zero_extend ) (sign_extend )]) +(define_code_attr letter [(zero_extend R) (sign_extend Q)]) + +(define_code_iterator memex_commutative [(plus ) (and ) (ior ) (xor )]) +(define_code_iterator memex_noncomm [(div ) (udiv ) (minus )]) +(define_code_iterator memex_nocc[(smax ) (smin ) (mult )]) + +(define_code_attr op[(plus add) (and and) (div div) (udiv divu) (smax max) (smin min) (mult mul) (ior or) (minus sub) (xor xor)]) + +(define_peephole2 + [(set (match_operand:SI 0 register_operand) + (extend_types:SI (match_operand:small_int_modes 1 rx_restricted_mem_operand))) + (parallel [(set (match_operand:SI2 register_operand) + (memex_commutative:SI (match_dup 0) +(match_dup 2))) + (clobber (reg:CC CC_REG))])] + peep2_regno_dead_p (2, REGNO (operands[0])) + [(parallel [(set:SI (match_dup 2) + (memex_commutative:SI (match_dup 2) + (extend_types:SI (match_dup 1 + (clobber (reg:CC CC_REG))])] +) + +(define_peephole2 + [(set (match_operand:SI 0 register_operand) + (extend_types:SI (match_operand:small_int_modes 1 rx_restricted_mem_operand))) + (parallel [(set (match_operand:SI2 register_operand) + (memex_commutative:SI (match_dup 2) +(match_dup 0))) + (clobber (reg:CC CC_REG))])] + peep2_regno_dead_p (2, REGNO (operands[0])) + [(parallel [(set:SI (match_dup 2) + (memex_commutative:SI (match_dup 2) +
Re: [cxx-mem-model] bitfield tests
The memory model is not implementable on strict-alignment targets that do not have a byte store operation. But we previously said that ;) Yes. I think we should issue an error when we have such a target and the user tries -fmemory-model=c++0x. However, how many strict-alignment targets are not byte addressable nowadays? Also consider global vars char a; char b; accessing them on strict-align targets may access adjacent globals (that's a problem anyway, also with alias analysis). Good point. I am adding a test to that effect (see attached patch). BTW, I assume you mean strict-align targets WITHOUT byte-addressability as above. I have spot-checked your scenario on a handful of important targets that have strict alignment, and all of them work without touching adjacent global vars: arm-elf OK sparc-linux OK ia64-linux OK alpha-linux OK, but only with -mbwx (byte addressability) rth tells me that we shouldn't worry about ancient non-byte addressable Alphas, so the last isn't an issue. So... do you have any important targets in mind, because I don't see this being a problem for most targets? As can be expected, I am only interested in x86*, powerpc*, and s390, especially since a cursory glance on other important targets didn't exhibit any problems. However, given my target bias, I am willing to look into any important targets that are problematic (I'm hoping none :)). Let me know if you see anything else, and please take a quick peek at the attached patch below, which I will be committing shortly. As usual, thanks. Aldy Index: testsuite/gcc.dg/memmodel/strict-align-global.c === --- testsuite/gcc.dg/memmodel/strict-align-global.c (revision 0) +++ testsuite/gcc.dg/memmodel/strict-align-global.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do link } */ +/* { dg-options -O2 --param allow-packed-store-data-races=0 } */ +/* { dg-final { memmodel-gdb-test } } */ + +#include stdio.h +#include memmodel.h + +/* This test verifies writes to globals do not write to adjacent + globals. This mostly happens on strict-align targets that are not + byte addressable (old Alphas, etc). */ + +char a = 0; +char b = 77; + +void memmodel_other_threads() +{ +} + +int memmodel_step_verify() +{ + if (b != 77) +{ + printf(FAIL: Unexpected value. b is %d, should be 77\n, b); + return 1; +} + return 0; +} + +/* Verify that every variable has the correct value. */ +int memmodel_final_verify() +{ + int ret = memmodel_step_verify (); + if (a != 66) +{ + printf(FAIL: Unexpected value. a is %d, should be 66\n, a); + return 1; +} + return ret; +} + +int main () +{ + a = 66; + memmodel_done(); + return 0; +}
Re: [cxx-mem-model] bitfield tests
On Wed, Mar 30, 2011 at 4:11 PM, Aldy Hernandez al...@redhat.com wrote: The memory model is not implementable on strict-alignment targets that do not have a byte store operation. But we previously said that ;) Yes. I think we should issue an error when we have such a target and the user tries -fmemory-model=c++0x. However, how many strict-alignment targets are not byte addressable nowadays? Also consider global vars char a; char b; accessing them on strict-align targets may access adjacent globals (that's a problem anyway, also with alias analysis). Good point. I am adding a test to that effect (see attached patch). BTW, I assume you mean strict-align targets WITHOUT byte-addressability as above. I have spot-checked your scenario on a handful of important targets that have strict alignment, and all of them work without touching adjacent global vars: arm-elf OK sparc-linux OK ia64-linux OK alpha-linux OK, but only with -mbwx (byte addressability) rth tells me that we shouldn't worry about ancient non-byte addressable Alphas, so the last isn't an issue. So... do you have any important targets in mind, because I don't see this being a problem for most targets? As can be expected, I am only interested in x86*, powerpc*, and s390, especially since a cursory glance on other important targets didn't exhibit any problems. However, given my target bias, I am willing to look into any important targets that are problematic (I'm hoping none :)). Well, I'm not sure that strict-align targets that provide byte access do not simply hide the issue inside the CPU (thus, perform the read-modify-write there and do not guarantee any atomicity unless you ask for it). It might be even worse - targets might not even guarantee this for shared cache-lines (for non-ccNUMA architectures). But I'm no expert here, but certainly every possible weird CPU architecture has been implemented. Richard.
Re: Continue toplevel cleanup (GCC library handling for unsupported targets etc.)
On Tue, 2011-03-29 at 20:55 +, Joseph S. Myers wrote: This patch continues cleaning up the toplevel configure.ac, in particular as regards cases handling GCC libraries for targets where GCC is no longer (or never was) supported. The principle there, as discussed for the original deprecated targets removal patch, is that in the absence of GCC support for a target it doesn't matter exactly how toplevel is configured to build GCC libraries (or other GCC-related directories) for that target - builds will and should fail if GCC is included in the source tree - so toplevel configure.ac should be set up for maximal simplicity. In most cases that means no explicit mention of GCC libraries for such a target; in some cases it means removing the target entry altogether so that another case becomes active instead (possibly the *-*-* case that disables libgcj only). In general cases just disabling libgcj are redundant with the *-*-* case unless they prevent some other case being active for the target, and so are removed. There were also a couple of removed cases for targets with no support in GCC or src at all (romp-*-*, vax-*-vms) - if there's no support in GCC or src, it's right that a build should fail trying to build whatever sources you have there, and silly for toplevel to try disabling everything because nothing is supported. Other cleanups here: noconfigdirs=$noconfigdirs is completely useless and was removed; c54x*-*-* is always mapped to tic54x-*-* by config.sub so that case can be simplified; cris*-*-none acts just like cris*-*-elf in config.gcc so it's appropriate to make the * subcase of cris*-*-* act like the -elf case; mmix-*-* disabled libgloss, i.e. libgloss for the host, which is never built anyway. OK to commit? 2011-03-29 Joseph Myers jos...@codesourcery.com (arm-*-coff): Don't disable libgcj. (arm*-*-linux-gnueabi): Remove useless assignment. (arm-*-riscix*): Don't disable libgcj. RISC iX support was removed from GCC years ago. Looks like a tiny fragment left over that wasn't cleaned up. The other bits are fine with me. R.
Re: [PATCH: ARM] Replace define_constants with define_c_enum for unspec/unspec_volatile in backend
On Tue, 2011-01-18 at 16:56 +, Yufeng Zhang wrote: This patch replaces define_constants in the ARM backend with define_c_enum for defining the available indexes for the unspecs/unspecvs expressions. This improves the readability of the intermediate dumps for machine-specific operations. For instance, given the following c code: int foo (char* s); extern char hello[]; void test(int x) { int y = x + 4; foo(hello); } If compiled with -c -fpic, the diff of the -fdump-rtl-expand dump between pre-patch and post-patch is: @@ -24,10 +24,10 @@ (const:SI (unspec:SI [ (const:SI (plus:SI (unspec:SI [ (const_int 0 [0]) -] 21) +] UNSPEC_PIC_LABEL) (const_int 8 [0x8]))) -] 24)) -] 3)) test.c:43 -1 +] UNSPEC_GOTSYM_OFF)) +] UNSPEC_PIC_SYM)) test.c:43 -1 (nil)) (insn 10 9 11 2 (set (reg:SI 137) @@ -35,7 +35,7 @@ (reg:SI 137) (const_int 8 [0x8]) (const_int 0 [0]) -] 4)) test.c:43 -1 +] UNSPEC_PIC_BASE)) test.c:43 -1 (nil)) (insn 11 10 5 2 (use (reg:SI 137)) test.c:43 -1 Having the indexes defined by define_c_enum, GCC will use the enumerators' names rather than some magic numbers when printing out unspec/unspec_volatile expressions. The patch does not change the code generation. The post-patch test result is the same as the pre-patch result (tested on qemu for armv7-a and arm-eabi). OK for the trunk? Cheers, Yufeng 2011-01-18 Yufeng Zhang yufeng.zh...@arm.com * config/arm/arm.md (define_constants for unspec): Replace with define_c_enum. (define_constants for unspecv): Replace with define_c_enum. * config/arm/neon.md (define_constants for unspec): Replace with define_c_enum. OK. R.
[Patch] bfin: fix profiling
The attached patch allows long jumps to __mcount, defines PROFILE_BEFORE_PROLOGUE and ensures ASM_OUTPUT_REG_PUSH pre-decrements the stack pointer. 2011-03-30 Stuart Henderson stuart.hender...@analog.com From Bernd Schmidt * config/bfin/bfin.h (FUNCTION_PROFILER): Take TARGET_LONG_CALLS into account and save/restore RETS. (PROFILE_BEFORE_PROLOGUE): Define. (ASM_OUTPUT_REG_PUSH, ASM_OUTPUT_REG_POP): Add tab character. Correct the push insn to use predecrement. Thanks, Stu upstream.patch Description: upstream.patch
[PATCH, Fortran] Correct declaration of frexp and friends
While working on the dragonegg plugin I noticed that the Fortran front-end declares frexp with the parameters the wrong way round. Instead of double frexp(double x, int *exp); it is declared as double frexp(int *exp, double x); This is fairly harmless but might as well be fixed, so here is a patch (as far as I can see fntype[4] is only used in declaring the frexp family of functions). Bootstraps and has no impact on the Fortran testsuite (tested on mainline). OK to apply on mainline and the 4.5 and 4.6 branches? Proposed fortran/Changelog entry: 2011-03-30 Duncan Sands baldr...@free.fr * f95-lang.c (build_builtin_fntypes): Swap frexp parameter types. Index: gcc/fortran/f95-lang.c === --- gcc/fortran/f95-lang.c (revision 171716) +++ gcc/fortran/f95-lang.c (working copy) @@ -695,10 +695,9 @@ type, integer_type_node, NULL_TREE); /* type (*) (void) */ fntype[3] = build_function_type_list (type, NULL_TREE); - /* type (*) (int, type) */ - fntype[4] = build_function_type_list (type, + /* type (*) (type, int) */ + fntype[4] = build_function_type_list (type, type, build_pointer_type (integer_type_node), -type, NULL_TREE); /* type (*) (int, type) */ fntype[5] = build_function_type_list (type,
Re: [ARM] Define unspecs using define_c_enum
On Tue, 2011-03-29 at 15:38 +0100, Richard Sandiford wrote: This ARM patch allows *UNSPEC_* constants to be printed in dump files. It's very much a target decision whether this is worth doing, but just in case... ...tested on arm-linux-gnueabi. OK to install? Richard gcc/ * config/arm/neon.md: Use define_c_enum to define UNSPEC* and VUNSPEC* constants. * config/arm/arm.md: Likewise. Move the synchronization ones to... * config/arm/sync.md: ...here. Hmm, seems we have two patches posted to do this. I've just approved an earlier post that came in while we were in pre-release lock-down. R.
Re: fix for 48208 and 48260 on darwin
On Mar 30, 2011, at 5:41 AM, gcchelp.5.ad...@0sg.net wrote: Tests that now fail, but worked before: g++.dg/warn/Wstrict-aliasing-float-ref-int-obj.C (test for warnings, line 7) Tests that now work, but didn't before: g++.dg/warn/Wstrict-aliasing-float-ref-int-obj.C (test for warnings, line 7) Yeah, someone thought it would be funny to have two test cases with the same exact name, and then to have one of them fail and the other one work, so, technically, what it said is true. I've now applied the patch for you. In the future, by sure to attach the diffs as a file, as Mail eats white spacing in odd ways which corrupts patches. Thanks. 2011-03-30 Christian Schüler cschue...@gmx.de PR/driver 48208 * config/c.opt (F): Added 'Driver' to -F option. PR/driver 48260 * config/darwin-driver.c (darwin_driver_init): Add '-arch' to handler function. * config/darwin.opt: Added '-arch' option. Index: c-family/c.opt === --- c-family/c.opt (revision 171691) +++ c-family/c.opt (working copy) @@ -201,7 +201,7 @@ C ObjC C++ ObjC++ Undocumented Var(flag_preprocess_only) F -C ObjC C++ ObjC++ Joined Separate MissingArgError(missing path after %qs) +Driver C ObjC C++ ObjC++ Joined Separate MissingArgError(missing path after %qs) -F dir Add dir to the end of the main framework include path H Index: config/darwin.opt === --- config/darwin.opt (revision 171691) +++ config/darwin.opt (working copy) @@ -31,6 +31,9 @@ allowable_client Driver Separate Alias(Zallowable_client) +arch +Driver RejectNegative Separate + arch_errors_fatal Driver Alias(Zarch_errors_fatal) Index: config/darwin-driver.c === --- config/darwin-driver.c (revision 171691) +++ config/darwin-driver.c (working copy) @@ -161,6 +161,15 @@ continue; switch ((*decoded_options)[i].opt_index) { +#if DARWIN_X86 + case OPT_arch: + if (!strcmp ((*decoded_options)[i].arg, i386)) + generate_option (OPT_m32, NULL, 1, CL_DRIVER, (*decoded_options)[i]); + else if (!strcmp ((*decoded_options)[i].arg, x86_64)) + generate_option (OPT_m64, NULL, 1, CL_DRIVER, (*decoded_options)[i]); + break; +#endif + case OPT_filelist: case OPT_framework: ++*decoded_options_count;
Re: Move reg_equiv* arrays into a single VEC structure
On Wed, Mar 30, 2011 at 4:23 PM, Jeff Law l...@redhat.com wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 As originally discussed here: http://gcc.gnu.org/ml/gcc-patches/2010-05/msg00987.html Changes since the original submission: Per Richi's suggestion I changed the patch to use a GC'd VEC. I defined accessor macros so that a ton of reformatting isn't needed and finally I made the appropriate tweaks to the few backends that peeked at the reg_equiv arrays. Given the changes and the length of time since the original submission, I think it's probably best to get a fresh approval rather than rely on the prior approval. Looks good to me. Thanks, Richard. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNkzznAAoJEBRtltQi2kC7VgYH/0yTkwWkDRyalReMk3whdIXO 8qw5H6c9Gz+Yj+EnnPySKsFIvJKMBTQHIpdCzjTCVWD/Z7LSJwERzzlNCPrQu2au dpOoUYCTAwgSW0Us9B+2Bcf2DABinYLV+hgKAKFEVi98CheZe3hZZ14lm5mlYDec INYKyfqYHmyahT8fa6ABY2kp0X2xhQhJ0VAGPI34kytJpgLpIdtRwq6PsdsPM0PJ frLAY5xIEmJqBB30RaPqnD07u06xZHi+S9gfAJa4LJTUqVALNusYdZzZajKMtF3i BCXK4UFk+J2MlM9xZkVWqQiryLc6arVT2bMQcvz7tXTMZkY6ZdZ7sKFBgKea6vM= =iMqQ -END PGP SIGNATURE-
Re: [ARM] Define unspecs using define_c_enum
Richard Earnshaw rearn...@arm.com writes: On Tue, 2011-03-29 at 15:38 +0100, Richard Sandiford wrote: This ARM patch allows *UNSPEC_* constants to be printed in dump files. It's very much a target decision whether this is worth doing, but just in case... ...tested on arm-linux-gnueabi. OK to install? Richard gcc/ * config/arm/neon.md: Use define_c_enum to define UNSPEC* and VUNSPEC* constants. * config/arm/arm.md: Likewise. Move the synchronization ones to... * config/arm/sync.md: ...here. Hmm, seems we have two patches posted to do this. I've just approved an earlier post that came in while we were in pre-release lock-down. Oops, sorry, I missed Yufeng's patch. FWIW, the differences in my patch are that I've got rid of the (documented as) unused UNSPEC_SIN and UNSPEC_COS and moved the sync stuff to sync.md. I assumed the old code did things that way in order to make the numbering more obvious, but that isn't an issue any more. Richard
Re: PING [patch, testsuite] Fix a short-enums testsuite error
On Fri, 2011-02-18 at 10:56 +, Ian Bolton wrote: Is the following patch OK for trunk? 2011-01-26 Ian Bolton ian.bol...@arm.com * testsuite/gcc.dg/20100906-1.c: Use -fno-short-enums option for target arm_eabi. Index: gcc/testsuite/gcc.dg/20100906-1.c === --- gcc/testsuite/gcc.dg/20100906-1.c (revision 168543) +++ gcc/testsuite/gcc.dg/20100906-1.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options -O2 } */ +/* { dg-options -O2 -fno-short-enums -Wl,--no-enum-size-warning {target arm_eabi} } */ /* This testcase got misoptimized by combine due to a wrong setting of subst_low_luid in try_combine. */ PING. OK. R.
Re: [cxx-mem-model] bitfield tests
On Mar 30, 2011, at 7:40 AM, Richard Guenther wrote: Is forcing word-alignment too big of a hammer, or will the users for these architectures be content with having no support for the C++0x memory model? I think a memory model that cannot be reasonably (read: also fast) implemented on all HW is screwed from the start and we should simply ditch it. Which is because nobody will use it as you cannot rely on it when writing portable programs or it will be hell slow. I agree 100%. If the standards people can't write a decent standard, they ought not write it. I torpedoed someone refining volatile, which would have been nice to have, because people were laying tracks down the wrong way. Nuke em from orbit I say. Now, I'm sure we have it all wrong and the standard is entirely reasonable... right?
Re: Cleaning up expand optabs code
On 03/30/2011 01:53 AM, Richard Sandiford wrote: This comes back to the point that we really should know up-front what modes op0 and op1 actually have. (The thing I left as a future clean-up.) Until then, the process implemented by yesterday's patch was supposed to be: - work out what mode opN actually has at this point in time - see if the target has specifically asked for a different mode - if so, convert the operand Ok, I get it. The patch is ok as-is. r~
Re: [PATCH][ARM] Tweak arm_class_likely_spilled_p, MODE_BASE_REG_CLASS for Thumb-2
On Mon, 2011-02-14 at 14:20 +, Andrew Stubbs wrote: This patch is a rework of an old one: http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01080.html The ARM parts of that patch were approved, but the target independent parts were never reviewed (AFAICT), and the patch no longer applies. I've updated the target-specific parts. As far as I can tell, the target independent parts are no longer required, so I've dropped them. Tested with no regressions for ARM mode and Thumb2 mode. OK? Andrew 2011-02-14 Andrew Stubbs a...@codesourcery.com Julian Brown jul...@codesourcery.com Mark Shinwell shinw...@codesourcery.com gcc/ * config/arm/arm.h (arm_class_likely_spilled_p): Check against LO_REGS only for Thumb-1. (MODE_BASE_REG_CLASS): Restrict base registers to those which can be used in short instructions when optimising for size on Thumb-2. OK. R.
Re: [PATCH][ARM] Limit thumb2 test to thumb2 targets
On Thu, 2011-02-10 at 15:29 +, Andrew Stubbs wrote: I'm posting this patch on behalf of Jie. The patch simply prevents a test failure on non-thumb2 targets. OK? Andrew 2010-09-30 Jie Zhang j...@codesourcery.com gcc/testsuite/ * gcc.target/arm/neon-thumb2-move.c: Add dg-require-effective-target arm_thumb2_ok. Just in case this never got checked in, OK. R.
Re: Continue toplevel cleanup (GCC library handling for unsupported targets etc.)
On Wed, 30 Mar 2011, Richard Earnshaw wrote: 2011-03-29 Joseph Myers jos...@codesourcery.com (arm-*-coff): Don't disable libgcj. (arm*-*-linux-gnueabi): Remove useless assignment. (arm-*-riscix*): Don't disable libgcj. RISC iX support was removed from GCC years ago. Looks like a tiny fragment left over that wasn't cleaned up. The other bits are fine with me. The RISC iX case is much like the Solaris/PowerPC one I commented on in http://gcc.gnu.org/ml/gcc-patches/2011-03/msg01744.html: it appears there is still assembler support. What I'm doing here is simply cleaning things up by removing mentions of *GCC* directories for targets without GCC support (often ones where GCC support was removed long ago). It's quite possible that some obsolete targets should have their binutils (and newlib) support removed, leading to further toplevel simplification (gdb has been more active in pruning unmaintained targets - but it's also the case that a proliferation of targets is more likely to cause problems there than in binutils and newlib). But I think it's really for the binutils maintainers to work out what they think is worth removing. -- Joseph S. Myers jos...@codesourcery.com
[PATCH] Make renumber_gimple_stmt_uids and renumber_gimple_stmt_uids_in_bbs consistent
As people noted one computes UIDs for PHIs and one doesn't. Now I'm working on sth that needs it for PHIs, thus the following fix. We have to exclude LTO as it re-builds PHIs and gets confused with inconsistent numbering. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2011-03-30 Richard Guenther rguent...@suse.de * tree-dfa.c (renumber_gimple_stmt_uids): Also number PHIs. * lto-streamer-out.c (output_function): Do not use renumber_gimple_stmt_uids. * lto-streamer-in.c (input_function): Likewise. Index: gcc/tree-dfa.c === --- gcc/tree-dfa.c (revision 171716) +++ gcc/tree-dfa.c (working copy) @@ -151,6 +151,11 @@ renumber_gimple_stmt_uids (void) FOR_ALL_BB (bb) { gimple_stmt_iterator bsi; + for (bsi = gsi_start_phis (bb); !gsi_end_p (bsi); gsi_next (bsi)) + { + gimple stmt = gsi_stmt (bsi); + gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun)); + } for (bsi = gsi_start_bb (bb); !gsi_end_p (bsi); gsi_next (bsi)) { gimple stmt = gsi_stmt (bsi); Index: gcc/lto-streamer-out.c === --- gcc/lto-streamer-out.c (revision 171716) +++ gcc/lto-streamer-out.c (working copy) @@ -1981,8 +2031,19 @@ output_function (struct cgraph_node *nod /* We will renumber the statements. The code that does this uses the same ordering that we use for serializing them so we can use the same code on the other end and not have to write out the - statement numbers. */ - renumber_gimple_stmt_uids (); + statement numbers. We do not assign UIDs to PHIs here because + virtual PHIs get re-computed on-the-fly which would make numbers + inconsistent. */ + set_gimple_stmt_max_uid (cfun, 0); + FOR_ALL_BB (bb) +{ + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun)); + } +} /* Output the code for the function. */ FOR_ALL_BB_FN (bb, fn) Index: gcc/lto-streamer-in.c === --- gcc/lto-streamer-in.c (revision 171716) +++ gcc/lto-streamer-in.c (working copy) @@ -1287,7 +1287,16 @@ input_function (tree fn_decl, struct dat /* Fix up the call statements that are mentioned in the callgraph edges. */ - renumber_gimple_stmt_uids (); + set_gimple_stmt_max_uid (cfun, 0); + FOR_ALL_BB (bb) +{ + gimple_stmt_iterator gsi; + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) + { + gimple stmt = gsi_stmt (gsi); + gimple_set_uid (stmt, inc_gimple_stmt_max_uid (cfun)); + } +} stmts = (gimple *) xcalloc (gimple_stmt_max_uid (fn), sizeof (gimple)); FOR_ALL_BB (bb) {
[PATCH] Fix expansion issues on type changing MEM_REFs on LHS (PR middle-end/48335)
Hi! MEM_REFs which can represent type punning on lhs don't force non-gimple types to be addressable. This causes various problems in the expander, which wasn't prepared to handle that. This patch tries to fix what I've found and adds a bunch of testcases. The original report was with just -O2 on some large testcase from Eigen, most of the testcases have -fno-tree-sra just because I've given up on delta when it stopped reducing at 32KB. The first problem (the one from Eigen) is _mm_store_pd into a std::complexdouble var, which is a single field and thus has DCmode TYPE_MODE. As starting with 4.6 that var is not TREE_ADDRESSABLE, its DECL_RTL is a CONCAT, and for assignments to concat expand_assignment was expecting either that from has COMPLEX_TYPE (and matching mode to the store), or that it is a real or imaginary subpart store, thus when trying to store a V2DF mode value it expected it to be real part store (bitpos == 0) and tried to set a DFmode pseudo from V2DFmode rtx. Further testing revealed that it is possible to hit many other cases with CONCAT destination, it can be a store of just a few bits, or can overlap parts of both real and imaginary, or be partially out of bounds. The patch handles the case from Eigen - bitpos == 0 bitsize == GET_MODE_BITSIZE (GET_MODE (to_rtx)) non-COMPLEX_TYPE by setting each half separately, if it is a store which is not touching one of the parts by just adjusting bitpos for the imaginary case and storing just to one of the parts (this is the bitpos + bitsize half_bitsize resp. bitpos = half_bitsize case) and finally adds a generic slow one for the very unusual cases with partial overlap of both. After testing it with the testcases I wrote, I found a bunch of other ICEs though, and reproduced them even without CONCAT on the LHS (the testcases below which don't contain any _Complex keyword). Bootstrapped/regtested on x86_64-linux and i686-linux, regtested with a cross compiler on these new testcases also for powerpc{,64}-linux and s390{,x}-linux. Ok for trunk and after a while for 4.6? 2011-03-30 Jakub Jelinek ja...@redhat.com PR middle-end/48335 * expr.c (expand_assignment): Handle all possibilities if TO_RTX is CONCAT. * expmed.c (store_bit_field_1): Avoid trying to create invalid SUBREGs. (store_split_bit_field): If SUBREG_REG (op0) or op0 itself has smaller mode than word, return it for offset 0 and const0_rtx for out of bounds stores. If word is const0_rtx, skip it. * gcc.c-torture/compile/pr48335-1.c: New test. * gcc.dg/pr48335-1.c: New test. * gcc.dg/pr48335-2.c: New test. * gcc.dg/pr48335-3.c: New test. * gcc.dg/pr48335-4.c: New test. * gcc.dg/pr48335-5.c: New test. * gcc.dg/pr48335-6.c: New test. * gcc.dg/pr48335-7.c: New test. * gcc.dg/pr48335-8.c: New test. * gcc.target/i386/pr48335-1.c: New test. --- gcc/expr.c.jj 2011-03-23 17:15:55.0 +0100 +++ gcc/expr.c 2011-03-30 11:38:15.0 +0200 @@ -4278,16 +4278,47 @@ expand_assignment (tree to, tree from, b /* Handle expand_expr of a complex value returning a CONCAT. */ else if (GET_CODE (to_rtx) == CONCAT) { - if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from + unsigned short mode_bitsize = GET_MODE_BITSIZE (GET_MODE (to_rtx)); + if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from))) + bitpos == 0 + bitsize == mode_bitsize) + result = store_expr (from, to_rtx, false, nontemporal); + else if (bitsize == mode_bitsize / 2 + (bitpos == 0 || bitpos == GET_MODE_BITSIZE (mode1))) + result = store_expr (from, XEXP (to_rtx, bitpos != 0), false, +nontemporal); + else if (bitpos + bitsize = mode_bitsize / 2) + result = store_field (XEXP (to_rtx, 0), bitsize, bitpos, + mode1, from, TREE_TYPE (tem), + get_alias_set (to), nontemporal); + else if (bitpos = mode_bitsize / 2) + result = store_field (XEXP (to_rtx, 1), bitsize, + bitpos - mode_bitsize / 2, mode1, from, + TREE_TYPE (tem), get_alias_set (to), + nontemporal); + else if (bitpos == 0 bitsize == mode_bitsize) { - gcc_assert (bitpos == 0); - result = store_expr (from, to_rtx, false, nontemporal); + rtx from_rtx = expand_normal (from); + from_rtx = simplify_gen_subreg (GET_MODE (to_rtx), from_rtx, + TYPE_MODE (TREE_TYPE (from)), 0); + emit_move_insn (XEXP (to_rtx, 0), + read_complex_part (from_rtx, false)); + emit_move_insn (XEXP (to_rtx, 1), + read_complex_part
A small patch to fix an arm bootstrap failure
The following patch is to fix an arm bootstrap failure described in http://gcc.gnu.org/ml/gcc/2011-03/msg00499.html when variable mode is set but not used because it is used when macro HONOR_REG_ALLOC_ORDER is defined. I found that 2 variables mode in different scopes is defined in function assign_hard_reg. Using only one variable will guarantee usage of the variable. The patch has been committed as obvious after successful bootstrap on x86_64. 2011-03-30 Vladimir Makarov vmaka...@redhat.com * ira-color.c (ira_assign_hard_reg): Use only one variable 'mode'. Index: ira-color.c === --- ira-color.c (revision 171713) +++ ira-color.c (working copy) @@ -1622,10 +1622,11 @@ assign_hard_reg (ira_allocno_t a, bool r if (hard_regno = 0 ira_class_hard_reg_index[aclass][hard_regno] = 0) { - enum machine_mode mode = ALLOCNO_MODE (conflict_a); - int conflict_nregs = hard_regno_nregs[hard_regno][mode]; int n_objects = ALLOCNO_NUM_OBJECTS (conflict_a); + int conflict_nregs; + mode = ALLOCNO_MODE (conflict_a); + conflict_nregs = hard_regno_nregs[hard_regno][mode]; if (conflict_nregs == n_objects conflict_nregs 1) { int num = OBJECT_SUBWORD (conflict_obj);
Re: ira-improv patch has been committed
On Mon, Mar 28, 2011 at 6:11 PM, Vladimir Makarov vmaka...@redhat.com wrote: On 03/27/2011 07:25 PM, Vladimir Makarov wrote: I submitted the following patch. The patch contains original patches from ira-improv branch, changes addressing all Keneth Zadeck's comments in http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00457.html, and a minor change fixing few regression on gcc testsuite for x86/x86-64 and ppc. The patch was successfully tested and bootstrapped on x86/x86-64, ppc64, and ia64. Here is another try to commit the patch. The previous bootstrap failure was because of wrong assertion which fails in extremely rare case (multi-object allocnos which got hard reg set nodes different from the allocno profitable hard regs and the corresponding node hard regs sets happened not be intersected). The fix is very small and described by the first ChangeLog entry. The patch was bootstrapped on x86/x86-64, ppc64, ia64, and i686 with options of H.J.'s automatic tester. Committed as revision 171649. 2011-03-28 Vladimir Makarov vmaka...@redhat.com * ira-color.c (update_left_conflict_sizes_p): Don't assume that conflict object hard regset nodes have intersecting hard reg sets. * regmove.c (regmove_optimize): Move ira_set_pseudo_classes call after regstat_init_n_sets_and_refs. * ira.c: Add more comments at the top. (setup_stack_reg_pressure_class, setup_pressure_classes): Add comments how we compute the register pressure classes. (setup_allocno_and_important_classes): Add more comments. (setup_class_translate_array, reorder_important_classes) (setup_reg_class_relations): Add comments. * ira-emit.c: Add 2011 to the Copyright line. Add comments at the start of the file. * ira-color.c: Add 2011 to the Copyright line. (assign_hard_reg): Add more comments. (improve_allocation): Ditto. * ira-costs.c: Add 2011 to the Copyright line. (setup_cost_classes, setup_regno_cost_classes_by_aclass): Add more comments. (setup_regno_cost_classes_by_mode): Ditto. Initial patches from ira-improv branch: 2010-08-13 Vladimir Makarov vmaka...@redhat.com * ira-build.c: (ira_create_object): Remove initialization of OBJECT_PROFITABLE_HARD_REGS. Initialize OBJECT_ADD_DATA. (ira_create_allocno): Remove initialization of ALLOCNO_MEM_OPTIMIZED_DEST, ALLOCNO_MEM_OPTIMIZED_DEST_P, ALLOCNO_SOMEWHERE_RENAMED_P, ALLOCNO_CHILD_RENAMED_P, ALLOCNO_IN_GRAPH_P, ALLOCNO_MAY_BE_SPILLED_P, ALLOCNO_COLORABLE_P, ALLOCNO_NEXT_BUCKET_ALLOCNO, ALLOCNO_PREV_BUCKET_ALLOCNO, ALLOCNO_FIRST_COALESCED_ALLOCNO, ALLOCNO_NEXT_COALESCED_ALLOCNO. Initialize ALLOCNO_ADD_DATA. (copy_info_to_removed_store_destinations): Use ALLOCNO_EMIT_DATA and allocno_emit_reg instead of ALLOCNO_MEM_OPTIMIZED_DEST_P and ALLOCNO_REG. (ira_flattening): Ditto. Use ALLOCNO_EMIT_DATA instead of ALLOCNO_MEM_OPTIMIZED_DEST and ALLOCNO_SOMEWHERE_RENAMED_P. * ira.c (ira_reallocate): Remove. (setup_pressure_classes): Call ira_init_register_move_cost_if_necessary. Use ira_register_move_cost instead of ira_get_register_move_cost. (setup_allocno_assignment_flags): Use ALLOCNO_EMIT_DATA. (ira): Call ira_initiate_emit_data and ira_finish_emit_data. * ira-color.c: Use ALLOCNO_COLOR_DATA instead of ALLOCNO_IN_GRAPH_P, ALLOCNO_MAY_BE_SPILLED_P, ALLOCNO_COLORABLE_P, ALLOCNO_AVAILABLE_REGS_NUM, ALLOCNO_NEXT_BUCKET_ALLOCNO, ALLOCNO_PREV_BUCKET_ALLOCNO. ALLOCNO_TEMP. Use OBJECT_COLOR_DATA instead of OBJECT_PROFITABLE_HARD_REGS, OBJECT_HARD_REGS_NODE, OBJECT_HARD_REGS_SUBNODES_START, OBJECT_HARD_REGS_SUBNODES_NUM. Fix formatting. (object_hard_regs_t, object_hard_regs_node_t): Move from ira-int.h. (struct object_hard_regs, struct object_hard_regs_node): Ditto. (struct allocno_color_data): New. (allocno_color_data_t): New typedef. (allocno_color_data): New definition. (ALLOCNO_COLOR_DATA): New macro. (struct object_color_data): New. (object_color_data_t): New typedef. (object_color_data): New definition. (OBJECT_COLOR_DATA): New macro. (update_copy_costs, calculate_allocno_spill_cost): Call ira_init_register_move_cost_if_necessary. Use ira_register_move_cost instead of ira_get_register_move_cost. (move_spill_restore, update_curr_costs): Ditto. (allocno_spill_priority): Make it inline. (color_pass): Allocate and free allocno_color_dat and object_color_data. (struct coalesce_data, coalesce_data_t): New. (allocno_coalesce_data): New definition. (ALLOCNO_COALESCE_DATA): New macro. (merge_allocnos, coalesced_allocno_conflict_p): Use ALLOCNO_COALESCED_DATA instead of ALLOCNO_FIRST_COALESCED_ALLOCNO, ALLOCNO_NEXT_COALESCED_ALLOCNO, ALLOCNO_TEMP. (coalesce_allocnos): Ditto. (setup_coalesced_allocno_costs_and_nums): Ditto. (collect_spilled_coalesced_allocnos): Ditto.
Re: RFA: patch to solve IRA PR48336, PR48342, PR48345
On 03/30/2011 06:19 AM, Richard Sandiford wrote: Hi Vlad, First, I want to echo H-P's thanks for tackling this area. I just wondered: Vladimir Makarovvmaka...@redhat.com writes: The following patch is to solve PR48336, PR48342, PR48345. The profitable hard regs exclude hard regs which are prohibited for the corresponding allocno mode. It is true for primary allocation and it is important for better colorability criteria. Function assign_hard_reg is also based on this assumption. Unfortunately, it is not true for secondary allocation (after IRA IR flattening or during reload). The following patch solves this problem. The patch should be very safe but I am still testing it on x86/x86-64 bootstrap. [...] Index: ira-color.c === --- ira-color.c (revision 171699) +++ ira-color.c (working copy) @@ -1447,7 +1447,9 @@ update_conflict_hard_regno_costs (int *c } /* Set up conflicting and profitable regs (through CONFLICT_REGS and - PROFITABLE_REGS) for each object of allocno A. */ + PROFITABLE_REGS) for each object of allocno A. Remember that the + profitable regs exclude hard regs which can not hold value of mode + of allocno A. */ static inline void setup_conflict_profitable_regs (ira_allocno_t a, bool retry_p, HARD_REG_SET *conflict_regs, @@ -1463,8 +1465,13 @@ setup_conflict_profitable_regs (ira_allo COPY_HARD_REG_SET (conflict_regs[i], OBJECT_TOTAL_CONFLICT_HARD_REGS (obj)); if (retry_p) - COPY_HARD_REG_SET (profitable_regs[i], - reg_class_contents[ALLOCNO_CLASS (a)]); + { + COPY_HARD_REG_SET (profitable_regs[i], +reg_class_contents[ALLOCNO_CLASS (a)]); + AND_COMPL_HARD_REG_SET (profitable_regs[i], + ira_prohibited_class_mode_regs + [ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]); + } else COPY_HARD_REG_SET (profitable_regs[i], OBJECT_COLOR_DATA (obj)-profitable_hard_regs); ira_prohibited_class_mode_regs is partly based on HARD_REGNO_MODE_OK, which is really a property of the first register in a multi-register group (rather than of every register in that group). So is ira_prohibited_class_mode_regs defined in the same way? At the moment, check_hard_reg_p and setup_allocno_available_regs_num test profitability for every register in the group: /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j) || ! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno + j)) break; [...] for (j = 0; j nregs; j++) { [...] /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (OBJECT_TOTAL_CONFLICT_HARD_REGS (obj), hard_regno + j) || ! TEST_HARD_REG_BIT (obj_data-profitable_hard_regs, hard_regno + j)) break; So if you have a target in which double-word values have to start in even registers, I think every odd-numbered bit of profitable_hard_regs will be clear, and no register will seem profitable. (I'm seeing this on ARM with some VFP tests.) Restricting the test to the first register fixes things for me, but do we need to check something else for the j != 0 case? Richard, thanks very much for pointing this out. I am agree with you. I think your patch is ok. Although I need some time to check it. Right now I am overwhelmed by # of other bug reports. Another thing bothering me is a new colorability test for pseudos which should start on even or odd hard register. I have suspicion that it is not ok for now. I need some time to check and think about it. I hope that I finish my urgent work on bug reports this week and start to work on your patch and the another issue on next week. gcc/ * ira-color.c (check_hard_reg_p): Restrict the profitability check to the first register. (setup_allocno_available_regs_num): Likewise. Index: gcc/ira-color.c === --- gcc/ira-color.c (revision 171653) +++ gcc/ira-color.c (working copy) @@ -1497,7 +1504,8 @@ check_hard_reg_p (ira_allocno_t a, int h for (k = set_to_test_start; k set_to_test_end; k++) /* Checking only profitable hard regs. */ if (TEST_HARD_REG_BIT (conflict_regs[k], hard_regno + j) - || ! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno + j)) + || (j == 0 + ! TEST_HARD_REG_BIT (profitable_regs[k], hard_regno))) break; if (k != set_to_test_end) break; @@ -2226,8 +2234,9 @@ setup_allocno_available_regs_num (ira_al /* Checking only profitable hard regs. */
Re: Fix realloc_on_assign_2.f03, random segfaults/ICEs
On 03/30/2011 06:21 PM, Michael Matz wrote: Okay for trunk? (regstrapping on x86_64-linux in progress) OK. Thanks for tracing the bug. I think the issue could be the reason for the elusive PR 47516 - thus, you might consider adding that PR to the ChangeLog and close the PR afterwards. Tobias
Re: [Patch] Bfin: Ensure rotrsi and rotlsi don't accept non-const INTVALS
On 03/29/2011 08:49 AM, Henderson, Stuart wrote: (match_operand:SI 2 immediate_operand )))] { - if (INTVAL (operands[2]) != 16) + if (GET_CODE (operands[2]) != CONST_INT || INTVAL (operands[2]) != 16) FAIL; Perhaps use const_int_operand instead of immediate_operand. r~
Re: [Patch] improve bfin conditional move support
On 03/29/2011 05:57 AM, Henderson, Stuart wrote: - operands[1] = bfin_gen_compare (operands[1], SImode); + operands[1] = bfin_gen_compare (operands[1], GET_MODE (operands[0])); FWIW, you can use MODEmode to get the proper value without having to read it from the operand. r~
Re: Fix realloc_on_assign_2.f03, random segfaults/ICEs
Hi, On Wed, 30 Mar 2011, Tobias Burnus wrote: On 03/30/2011 06:21 PM, Michael Matz wrote: Okay for trunk? (regstrapping on x86_64-linux in progress) OK. Thanks for tracing the bug. I think the issue could be the reason for the elusive PR 47516 - thus, you might consider adding that PR to the ChangeLog and close the PR afterwards. (r171736) Hmm, it seems to also be a problem for 4.6.0. Should I merge it to the 4.6 branch too? Ciao, Michael.
Re: Fix realloc_on_assign_2.f03, random segfaults/ICEs
On 03/30/2011 07:33 PM, Michael Matz wrote: Hi, On Wed, 30 Mar 2011, Tobias Burnus wrote: On 03/30/2011 06:21 PM, Michael Matz wrote: Okay for trunk? (regstrapping on x86_64-linux in progress) OK. Thanks for tracing the bug. I think the issue could be the reason for the elusive PR 47516 - thus, you might consider adding that PR to the ChangeLog and close the PR afterwards. (r171736) Hmm, it seems to also be a problem for 4.6.0. Should I merge it to the 4.6 branch too? That's fine with me - the fix is really simple (after one found the cause). Fortunately, the bug only rarely surfaces. Tobias
libgo patch committed: Add missing Makefile dependencies
I committed this patch to libgo to add some missing Makefile dependencies. Bootstrapped on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r 58ba48318471 libgo/Makefile.am --- a/libgo/Makefile.am Wed Mar 30 08:17:36 2011 -0700 +++ b/libgo/Makefile.am Wed Mar 30 10:35:55 2011 -0700 @@ -1705,8 +1705,9 @@ $(CHECK) .PHONY: mime/check -net/net.lo: $(go_net_files) bytes.gox fmt.gox io.gox os.gox reflect.gox \ - strconv.gox strings.gox sync.gox syscall.gox +net/net.lo: $(go_net_files) bytes.gox fmt.gox io.gox os.gox rand.gox \ + reflect.gox strconv.gox strings.gox sync.gox syscall.gox \ + time.gox $(BUILDPACKAGE) net/check: $(CHECK_DEPS) $(CHECK)
C++ PATCH for c++/48281 (ICE with nested initializer_list)
The testcase in 48281 started breaking in 4.6.0 because of my change to finish_compound_literal to stop putting constant compound literals into static variables, because it was interfering with constexpr. First I noticed that the crash was due to non-constant CONSTRUCTORs with TREE_CONSTANT set. So I fixed that, in constructor-const.patch. That change fixed the crash, but then we were just optimizing away the entire contents of the lists due to a bug I just filed as PR 48370. Fixing the bug in general will mean fixing 48370, but we can fix the regression by just reverting the change above; I dealt with a similar issue elsewhere by setting DECL_DECLARED_CONSTEXPR_P on the temporary variable so that constexpr evaluation can look at its initializer, so we can use the same approach here. I'm only doing this for arrays now so that we can continue to elide copies of class temporaries. While I was looking at this stuff, I decided to go ahead and initialize the initializer_list objects directly rather than mess with calling the constructor. Tested x86_64-pc-linux-gnu, applying to trunk (all three patches) and 4.6 (only the first patch). commit 15ce1df0c9f6259babc885e8947719a586a47103 Author: Jason Merrill ja...@redhat.com Date: Wed Mar 30 11:57:50 2011 -0400 PR c++/48281 * semantics.c (finish_compound_literal): Do put static/constant arrays in static variables. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index 9926d26..b88e190 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -2329,7 +2329,34 @@ finish_compound_literal (tree type, tree compound_literal) if (TREE_CODE (type) == ARRAY_TYPE) cp_complete_array_type (type, compound_literal, false); compound_literal = digest_init (type, compound_literal); - return get_target_expr (compound_literal); + /* Put static/constant array temporaries in static variables, but always + represent class temporaries with TARGET_EXPR so we elide copies. */ + if ((!at_function_scope_p () || CP_TYPE_CONST_P (type)) + TREE_CODE (type) == ARRAY_TYPE + initializer_constant_valid_p (compound_literal, type)) +{ + tree decl = create_temporary_var (type); + DECL_INITIAL (decl) = compound_literal; + TREE_STATIC (decl) = 1; + if (literal_type_p (type) CP_TYPE_CONST_NON_VOLATILE_P (type)) + { + /* 5.19 says that a constant expression can include an +lvalue-rvalue conversion applied to a glvalue of literal type +that refers to a non-volatile temporary object initialized +with a constant expression. Rather than try to communicate +that this VAR_DECL is a temporary, just mark it constexpr. */ + DECL_DECLARED_CONSTEXPR_P (decl) = true; + DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = true; + TREE_CONSTANT (decl) = true; + } + cp_apply_type_quals_to_decl (cp_type_quals (type), decl); + decl = pushdecl_top_level (decl); + DECL_NAME (decl) = make_anon_name (); + SET_DECL_ASSEMBLER_NAME (decl, DECL_NAME (decl)); + return decl; +} + else +return get_target_expr (compound_literal); } /* Return the declaration for the function-name variable indicated by diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist46.C b/gcc/testsuite/g++.dg/cpp0x/initlist46.C new file mode 100644 index 000..2b9f07d --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist46.C @@ -0,0 +1,14 @@ +// PR c++/48281 +// { dg-options -std=c++0x -O2 } +// { dg-do run } + +#include initializer_list + +typedef std::initializer_listint int1; +typedef std::initializer_listint1 int2; +static int2 ib = {{42,2,3,4,5},{2,3,4,5,1},{3,4,5,2,1}}; + +int main() +{ + return *(ib.begin()-begin()) != 42; +} commit 4069e2ded787088dcf1d4caf1aeccd26e00524c0 Author: Jason Merrill ja...@redhat.com Date: Wed Mar 30 10:34:09 2011 -0400 * call.c (convert_like_real) [ck_list]: Build up the initializer_list object directly. * decl.c (build_init_list_var_init): Adjust. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index f7d108f..ad2de43 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -5467,8 +5467,8 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, tree elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (totype), 0); tree new_ctor = build_constructor (init_list_type_node, NULL); unsigned len = CONSTRUCTOR_NELTS (expr); - tree array, val; - VEC(tree,gc) *parms; + tree array, val, field; + VEC(constructor_elt,gc) *vec = NULL; unsigned ix; /* Convert all the elements. */ @@ -5490,16 +5490,14 @@ convert_like_real (conversion *convs, tree expr, tree fn, int argnum, array = build_array_of_n_type (elttype, len); array = finish_compound_literal (array, new_ctor); - parms = make_tree_vector (); - VEC_safe_push (tree, gc, parms, decay_conversion (array)); - VEC_safe_push
RFC: PATCH: Remove 26 element limit in vector
On Wed, Mar 30, 2011 at 08:02:38AM -0700, H.J. Lu wrote: Hi, Currently, we limit XVECEXP to 26 elements in machine description since we use letters 'a' to 'z' to encode them. I don't see any reason why we can't go beyond 'z'. This patch removes this restriction. Any comments? That was wrong. The problem is in vector elements. This patch passes bootstrap. Any comments? Thanks. H.J. --- 2011-03-30 H.J. Lu hongjiu...@intel.com * genrecog.c (VECTOR_ELEMENT_BASE): New. (add_to_sequence): Add assert. Use VECTOR_ELEMENT_BASE to encode vector elements. (change_state): Check and support VECTOR_ELEMENT_BASE. (make_insn_sequence): Add assert. diff --git a/gcc/genrecog.c b/gcc/genrecog.c index 74dd0a7..40e9c4d 100644 --- a/gcc/genrecog.c +++ b/gcc/genrecog.c @@ -465,6 +465,9 @@ extern void debug_decision (struct decision *); extern void debug_decision_list (struct decision *); + +/* The base of vector element. */ +#define VECTOR_ELEMENT_BASE 0x80 /* Create a new node in sequence after LAST. */ @@ -912,6 +915,7 @@ add_to_sequence (rtx pattern, struct decision_head *last, const char *position, /* Which insn we're looking at is represented by A-Z. We don't ever use 'A', however; it is always implied. */ + gcc_assert (i 26); subpos[depth] = (i 0 ? 'A' + i : 0); sub = add_to_sequence (XVECEXP (pattern, 0, i), last, subpos, insn_type, 0); @@ -1002,6 +1006,9 @@ add_to_sequence (rtx pattern, struct decision_head *last, const char *position, char base = (was_code == MATCH_OPERATOR ? '0' : 'a'); for (i = 0; i (size_t) XVECLEN (pattern, 2); i++) { + gcc_assert (was_code == MATCH_OPERATOR + ? ISDIGIT (i + base) + : ISLOWER (i + base)); subpos[depth] = i + base; sub = add_to_sequence (XVECEXP (pattern, 2, i), sub-success, subpos, insn_type, 0); @@ -1102,7 +1109,9 @@ add_to_sequence (rtx pattern, struct decision_head *last, const char *position, int j; for (j = 0; j XVECLEN (pattern, i); j++) { - subpos[depth] = 'a' + j; + int val = j + VECTOR_ELEMENT_BASE; + gcc_assert (val = UCHAR_MAX); + subpos[depth] = val; sub = add_to_sequence (XVECEXP (pattern, i, j), sub-success, subpos, insn_type, 0); } @@ -1779,6 +1788,10 @@ change_state (const char *oldpos, const char *newpos, const char *indent) else if (ISLOWER (newpos[depth])) printf (%sx%d = XVECEXP (x%d, 0, %d);\n, indent, depth + 1, depth, newpos[depth] - 'a'); + else if (((unsigned char) newpos[depth]) = VECTOR_ELEMENT_BASE) + printf (%sx%d = XVECEXP (x%d, 0, %d);\n, + indent, depth + 1, depth, + ((unsigned char) newpos[depth]) - VECTOR_ELEMENT_BASE); else printf (%sx%d = XEXP (x%d, %c);\n, indent, depth + 1, depth, newpos[depth]); @@ -2528,6 +2541,7 @@ make_insn_sequence (rtx insn, enum routine_type type) } XVECLEN (x, 0) = j; + gcc_assert ((j - 1) 26); c_test_pos[0] = 'A' + j - 1; c_test_pos[1] = '\0'; }
[PATCH] PR debug/47471 (set prologue_end in .debug_line)
Hello, This is about the line program emitted by the compiler into the .debug_line section, without optimization. In the example accompanying the patch below, at the beginning of the function f, the compiler emits two .loc asm directives that are identical. The first one is right before the .cfi_startproc that starts the prologue. The second one is before the instructions that copy the variable length parameters into f's frame. Both directives do locate instructions that are in the prologue. Unfortunately, GDB uses an heuristic that basically considers that the first opcode of the line program that increments the line register (even with an increment of zero) marks the end of the prologue. Effectively, setting a breakpoint to the beginning of f (e.g, break f) won't work anymore when we emit this because GDB would then try to set the breakpoint inside the prologue. This patch does two things. First, it avoids emitting two consecutive .loc that are identical. Strictly speaking that should fix this issue in this particular case. Second, it emits a '.loc filenum linenum 0 prologue_end' directive on the first instruction that doesn't belong to the prologue (i.e, when the end_prologue debug hook is called). This sets the prologue_end register in the .debug_line state machine to true. After discussing this with Jan (in CC), it appears that setting the prologue_end register to true will help GDB to drop the somewhat non-reliable heuristic it uses to detect the end of the prologue. I have noticed that the end_prologue debug hook was being called, but the dwarf back-end hasn't implemented it (on non-vms platforms). Is there a particular reason for not implementing this? Also, I have noticed that the patch causes the prologue_end directive to be emitted even when we compile with optimizations. I am not sure how right that would be. Tested on x86_64-unknown-linux-gnu against trunk. -- Dodji From dc3dea429f1471540867a0b7e694a60494062ac0 Mon Sep 17 00:00:00 2001 From: Dodji Seketeli do...@redhat.com Date: Tue, 29 Mar 2011 16:56:20 +0200 Subject: [PATCH] PR debug/47471 (set prologue_end in .debug_line) gcc/ * dwarf2out.c (output_source_line_asm_info): Split out of dwarf2out_source_line. Add a new is_prologue_end parameter. Avoid emitting redundant consecutive .loc asm directives. (dwarf2out_source_line): Use output_source_line_asm. (dwarf2out_end_prologue): New function. (dwarf2_debug_hooks-end_prologue): Set to dwarf2out_end_prologue. gcc/testsuite/ * gcc.dg/debug/dwarf2/line-prog-1.c: New test. --- gcc/dwarf2out.c | 95 +-- 1 files changed, 78 insertions(+), 17 deletions(-) diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index efd30ea..6f8285c 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -94,6 +94,8 @@ along with GCC; see the file COPYING3. If not see #include tree-flow.h #include cfglayout.h +static void output_source_line_asm_info (unsigned int, const char *, +int, bool, bool); static void dwarf2out_source_line (unsigned int, const char *, int, bool); static rtx last_var_location_insn; @@ -465,6 +467,7 @@ static void output_call_frame_info (int); static void dwarf2out_note_section_used (void); static bool clobbers_queued_reg_save (const_rtx); static void dwarf2out_frame_debug_expr (rtx, const char *); +static void dwarf2out_end_prologue (unsigned int, const char*); /* Support for complex CFA locations. */ static void output_cfa_loc (dw_cfi_ref, int); @@ -4125,6 +4128,21 @@ dwarf2out_begin_prologue (unsigned int line ATTRIBUTE_UNUSED, } } +/* Implementation of the gcc_debug_hooks::end_prologue hook. + + If the underlying assembler supports it, emit a .loc asm directive + with a 'end_prologue' argument, effectively marking the point where + debugger should set a breakpoint when requested to set one on a + function. */ + +static void +dwarf2out_end_prologue (unsigned int line, const char *filename) +{ + output_source_line_asm_info (line, filename, 0, + /*is_stmt=*/true, + /*is_prologue_end*/true); +} + /* Output a marker (i.e. a label) for the end of the generated code for a function prologue. This gets called *after* the prologue code has been generated. */ @@ -5767,7 +5785,7 @@ const struct gcc_debug_hooks dwarf2_debug_hooks = dwarf2out_vms_end_prologue, dwarf2out_vms_begin_epilogue, #else - debug_nothing_int_charstar, + dwarf2out_end_prologue, debug_nothing_int_charstar, #endif dwarf2out_end_epilogue, @@ -22129,14 +22147,22 @@ dwarf2out_begin_function (tree fun) /* Output a label to mark the beginning of a source code line entry and record information relating to this source line, in - 'line_info_table' for later output of the .debug_line section. */ + 'line_info_table' for later output
Merge mainline to gccgo branch again
I merged mainline revision 171737 onto the gccgo branch. Ian
C++ PATCH for c++/48369 (ICE with isnan)
A couple of missed cases. Tested x86_64-pc-linux-gnu, applying to trunk and 4.6. commit cea17025eb232f3931ce34e16dedff7fd42d2478 Author: Jason Merrill ja...@redhat.com Date: Wed Mar 30 15:17:27 2011 -0400 PR c++/48369 * semantics.c (potential_constant_expression_1): Handle UNORDERED_EXPR and ORDERED_EXPR. diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index b88e190..48dd4ee 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -7741,6 +7741,8 @@ potential_constant_expression_1 (tree t, bool want_rval, tsubst_flags_t flags) case BIT_XOR_EXPR: case BIT_AND_EXPR: case TRUTH_XOR_EXPR: +case UNORDERED_EXPR: +case ORDERED_EXPR: case UNLT_EXPR: case UNLE_EXPR: case UNGT_EXPR: diff --git a/gcc/testsuite/g++.dg/cpp0x/regress/isnan.C b/gcc/testsuite/g++.dg/cpp0x/regress/isnan.C new file mode 100644 index 000..40d07e5 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/regress/isnan.C @@ -0,0 +1,9 @@ +// PR c++/48369 +// { dg-options -std=gnu++0x } + +extern C int isnan (double); + +void f(double d) +{ +bool b = isnan(d); +}
[PATCH] Fix VTA updating in the combiner (PR debug/48343)
Hi! We ICE on the attached testcase, because combiner changes mode of a pseudo (which was set just once and used once plus in debug insns) from SImode to QImode, but the uses in debug insns aren't adjusted. Combiner has code to adjust this too (propagate_for_debug), but only updates debug insns between i2 and i3 (resp. i3 and undobuf.other_insn). The problem on the testcase is that this is a retry, so first try_combine with a later i3 calls propagate_for_debug and changes debug insns before that later i3, then returns an earlier insn that should be retried and we stop adjusting debug insns at that earlier i3. Unfortunately as later debug insns have been already updated earlier, they need to be adjusted too. The following patch fixes that by always stopping on the latest i3 that has been successfully combined into in the current bb, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/4.6? 2011-03-30 Jakub Jelinek ja...@redhat.com PR debug/48343 * combine.c (last_combined_insn): New variable. (combine_instructions): Clear it before processing each bb and after last bb. (try_combine): Set it to i3 if i3 is after it. Use it as the last argument to propagate_for_debug instead of i3. (distribute_notes): Assert SET_INSN_DELETE is not called on last_combined_insn. * gcc.dg/torture/pr48343.c: New test. --- gcc/combine.c.jj2011-03-23 09:34:40.0 +0100 +++ gcc/combine.c 2011-03-30 17:39:09.0 +0200 @@ -337,6 +337,9 @@ static enum machine_mode nonzero_bits_mo static int nonzero_sign_valid; +/* Last insn on which try_combine has been called in the current BB. */ + +static rtx last_combined_insn; /* Record one modification to rtl structure to be undone by storing old_contents into *where. */ @@ -1168,6 +1171,7 @@ combine_instructions (rtx f, unsigned in || single_pred (this_basic_block) != last_bb) label_tick_ebb_start = label_tick; last_bb = this_basic_block; + last_combined_insn = NULL_RTX; rtl_profile_for_bb (this_basic_block); for (insn = BB_HEAD (this_basic_block); @@ -1379,6 +1383,7 @@ combine_instructions (rtx f, unsigned in } } + last_combined_insn = NULL_RTX; default_rtl_profile (); clear_log_links (); clear_bb_flags (); @@ -3830,6 +3835,10 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx return 0; } + if (last_combined_insn == NULL_RTX + || DF_INSN_LUID (last_combined_insn) DF_INSN_LUID (i3)) +last_combined_insn = i3; + if (MAY_HAVE_DEBUG_INSNS) { struct undo *undo; @@ -3851,7 +3860,7 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx i2src while its original mode is temporarily restored, and then clear i2scratch so that we don't do it again later. */ - propagate_for_debug (i2, i3, reg, i2src); + propagate_for_debug (i2, last_combined_insn, reg, i2src); i2scratch = false; /* Put back the new mode. */ adjust_reg_mode (reg, new_mode); @@ -3859,18 +3868,17 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx else { rtx tempreg = gen_raw_REG (old_mode, REGNO (reg)); - rtx first, last; + rtx first; if (reg == i2dest) - { - first = i2; - last = i3; - } + first = i2; else { first = i3; - last = undobuf.other_insn; - gcc_assert (last); + gcc_assert (undobuf.other_insn); + if (DF_INSN_LUID (undobuf.other_insn) +DF_INSN_LUID (last_combined_insn)) + last_combined_insn = undobuf.other_insn; } /* We're dealing with a reg that changed mode but not @@ -3882,9 +3890,9 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx with this copy we have created; then, replace the copy with the SUBREG of the original shared reg, once again changed to the new mode. */ - propagate_for_debug (first, last, reg, tempreg); + propagate_for_debug (first, last_combined_insn, reg, tempreg); adjust_reg_mode (reg, new_mode); - propagate_for_debug (first, last, tempreg, + propagate_for_debug (first, last_combined_insn, tempreg, lowpart_subreg (old_mode, reg, new_mode)); } } @@ -4099,14 +4107,14 @@ try_combine (rtx i3, rtx i2, rtx i1, rtx if (newi2pat) { if (MAY_HAVE_DEBUG_INSNS i2scratch) - propagate_for_debug (i2, i3, i2dest, i2src); + propagate_for_debug (i2, last_combined_insn, i2dest, i2src);
[RFC PATCH, go]: Port to ALPHA arch
Hello! Attached ports go to ALPHA architecture. There are however several problems with the build: a) Bootstrap compare failure in the gcc/go directory due to binutils bug [1], fixed in latest binutils SVN, use --disable-bootstrap b) alpha doesn't define struct user_regs_struct from which type PtraceRegs is derived. I have manually created PtraceRegs from pt_regs structure and patched generated libgo/sysinfo.go in build directory after the build broke. However - the comment from sys/user.h says that this file is for GDB and GDB only... c) some weird issue with double definition of const _SOCK_NONBLOCK in gen-sysinfo.go and consequently sysinfo.go. The test results from make -k check in libgo directory are quite encouraging: PASS: asn1 PASS: big PASS: bufio PASS: bytes FAIL: cmath panic: runtime error: integer divide by zero or floating point error ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 1637 Aborted ./a.out $@ PASS: ebnf PASS: exec PASS: expvar PASS: flag FAIL: fmt mallocs per Sprintf(): 1 mallocs per Sprintf(xxx): 1 mallocs per Sprintf(%x): 3 mallocs per Sprintf(%x %x): 4 panic: runtime error: integer divide by zero or floating point error [recovered] panic: runtime error: integer divide by zero or floating point error ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 2309 Aborted ./a.out $@ PASS: gob PASS: html FAIL: http ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 3133 Segmentation faul t ./a.out $@ PASS: io PASS: json PASS: log FAIL: math panic: runtime error: integer divide by zero or floating point error ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 4321 Aborted ./a.out $@ PASS: mime PASS: netchan PASS: os PASS: patch PASS: path PASS: rand PASS: reflect PASS: regexp FAIL: rpc 2011/03/30 21:31:20 Test RPC server listening on 127.0.0.1:47781 2011/03/30 21:31:20 Test HTTP RPC server listening on 127.0.0.1:59606 2011/03/30 21:31:20 rpc.Serve: accept:accept tcp 127.0.0.1:47781: Resource tempo rarily unavailable FAIL: runtime panic: runtime error: integer divide by zero or floating point error ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 5980 Aborted ./a.out $@ PASS: scanner PASS: smtp PASS: sort FAIL: strconv panic: runtime error: integer divide by zero or floating point error ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 6561 Aborted ./a.out $@ PASS: strings PASS: sync PASS: tabwriter PASS: template PASS: time PASS: try PASS: unicode PASS: utf16 PASS: utf8 FAIL: websocket 2011/03/30 21:32:19 Test WebSocket server listening on 127.0.0.1:33252 --- FAIL: websocket.TestEcho (0.0 seconds) dialing dial tcp 127.0.0.1:33252: Connection refused --- FAIL: websocket.TestEchoDraft75 (0.0 seconds) dialing dial tcp 127.0.0.1:33252: Connection refused --- FAIL: websocket.TestWithQuery (0.0 seconds) dialing dial tcp 127.0.0.1:33252: Connection refused --- FAIL: websocket.TestWithProtocol (0.0 seconds) dialing dial tcp 127.0.0.1:33252: Connection refused --- FAIL: websocket.TestHTTP (0.0 seconds) Get: error http.URLError{Op:Get, URL:http://127.0.0.1:33252/echo;, E rror:(*net.OpError)(0xf840026d00)} --- FAIL: websocket.TestHTTPDraft75 (0.0 seconds) Get: error http.URLError{Op:Get, URL:http://127.0.0.1:33252/echoDraf t75, Error:(*net.OpError)(0xf840026c40)} panic: Dial failed: websocket.Dial ws://127.0.0.1:33252/echo: dial tcp 127.0.0.1 :33252: Connection refused ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 8493 Aborted ./a.out $@ PASS: xml PASS: archive/tar PASS: archive/zip PASS: compress/bzip2 FAIL: compress/flate ../../../gcc-svn/trunk/libgo/testsuite/gotest: line 356: 9194 Segmentation faul t ./a.out $@ PASS: compress/gzip PASS: compress/lzw PASS: compress/zlib PASS: container/heap PASS: container/list PASS: container/ring PASS: container/vector PASS: crypto/aes PASS: crypto/block PASS: crypto/blowfish PASS: crypto/cipher PASS: crypto/dsa PASS: crypto/ecdsa PASS: crypto/elliptic PASS: crypto/hmac PASS: crypto/md4 PASS: crypto/md5 PASS: crypto/ocsp PASS: crypto/openpgp PASS: crypto/rand PASS: crypto/rc4 PASS: crypto/ripemd160 PASS: crypto/rsa PASS: crypto/sha1 PASS: crypto/sha256 PASS: crypto/sha512 PASS: crypto/subtle PASS: crypto/tls PASS: crypto/twofish PASS: crypto/x509 PASS: crypto/xtea PASS: crypto/openpgp/armor PASS: crypto/openpgp/packet PASS: crypto/openpgp/s2k PASS: debug/dwarf PASS: debug/elf PASS: debug/macho PASS: debug/pe PASS: encoding/ascii85 PASS: encoding/base32 PASS: encoding/base64 PASS: encoding/binary PASS: encoding/git85 PASS: encoding/hex PASS: encoding/line PASS: encoding/pem PASS: exp/datafmt PASS: exp/draw PASS: exp/eval PASS: go/parser PASS: go/printer PASS: go/scanner PASS: go/token PASS: go/typechecker PASS: hash/adler32 PASS: hash/crc32 PASS: hash/crc64 PASS: hash/fnv PASS: http/cgi PASS: image/png PASS: index/suffixarray PASS: io/ioutil PASS: mime/multipart PASS: net/textproto PASS:
Re: [4.7] Make ARM -mhard-float and -msoft-float into proper -mfloat-abi= aliases
On Wed, 2 Mar 2011, Richard Earnshaw wrote: Could you remove the documentation entries for the hard/soft-float aliases please? They're really only there for legacy reasons. Is this the change you want here? 2011-03-30 Joseph Myers jos...@codesourcery.com * config/arm/arm.opt (mhard-float, msoft-float): Mark Undocumented. Remove help text. * doc/invoke.texi (ARM Options): Don't document -msoft-float and -mhard-float. Index: doc/invoke.texi === --- doc/invoke.texi (revision 171745) +++ doc/invoke.texi (working copy) @@ -454,7 +454,7 @@ -mapcs-reentrant -mno-apcs-reentrant @gol -msched-prolog -mno-sched-prolog @gol -mlittle-endian -mbig-endian -mwords-little-endian @gol --mfloat-abi=@var{name} -msoft-float -mhard-float -mfpe @gol +-mfloat-abi=@var{name} -mfpe @gol -mfp16-format=@var{name} -mthumb-interwork -mno-thumb-interwork @gol -mcpu=@var{name} -march=@var{name} -mfpu=@var{name} @gol @@ -10079,14 +10079,6 @@ compile your entire program with the same ABI, and link with a compatible set of libraries. -@item -mhard-float -@opindex mhard-float -Equivalent to @option{-mfloat-abi=hard}. - -@item -msoft-float -@opindex msoft-float -Equivalent to @option{-mfloat-abi=soft}. - @item -mlittle-endian @opindex mlittle-endian Generate code for a processor running in little-endian mode. This is Index: config/arm/arm.opt === --- config/arm/arm.opt (revision 171745) +++ config/arm/arm.opt (working copy) @@ -94,8 +94,7 @@ Specify the name of the target floating point hardware/format mhard-float -Target RejectNegative Alias(mfloat-abi=, hard) -Alias for -mfloat-abi=hard +Target RejectNegative Alias(mfloat-abi=, hard) Undocumented mlittle-endian Target Report RejectNegative InverseMask(BIG_END) @@ -122,8 +121,7 @@ Do not load the PIC register in function prologues msoft-float -Target RejectNegative Alias(mfloat-abi=, soft) -Alias for -mfloat-abi=soft +Target RejectNegative Alias(mfloat-abi=, soft) Undocumented mstructure-size-boundary= Target RejectNegative Joined Var(structure_size_string) -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC PATCH, go]: Port to ALPHA arch - sysinfo.go fixup
On Wed, Mar 30, 2011 at 9:58 PM, Uros Bizjak ubiz...@gmail.com wrote: Hello! Attached ports go to ALPHA architecture. There are however several problems with the build: a) Bootstrap compare failure in the gcc/go directory due to binutils bug [1], fixed in latest binutils SVN, use --disable-bootstrap b) alpha doesn't define struct user_regs_struct from which type PtraceRegs is derived. I have manually created PtraceRegs from pt_regs structure and patched generated libgo/sysinfo.go in build directory after the build broke. However - the comment from sys/user.h says that this file is for GDB and GDB only... c) some weird issue with double definition of const _SOCK_NONBLOCK in gen-sysinfo.go and consequently sysinfo.go. d) The definition of type Stat_t struct also includes additional struct and this confuses compilation. Attached is a fixup of sysinfo.go that is neded for successful build. Apply this fixup after build breaks and restart build. Uros. sysinfo.go.diff Description: Binary data
Re: [4.7] Make ARM -mhard-float and -msoft-float into proper -mfloat-abi= aliases
On 30 Mar 2011, at 21:11, Joseph S. Myers jos...@codesourcery.com wrote: On Wed, 2 Mar 2011, Richard Earnshaw wrote: Could you remove the documentation entries for the hard/soft-float aliases please? They're really only there for legacy reasons. Is this the change you want here? 2011-03-30 Joseph Myers jos...@codesourcery.com * config/arm/arm.opt (mhard-float, msoft-float): Mark Undocumented. Remove help text. * doc/invoke.texi (ARM Options): Don't document -msoft-float and -mhard-float. Yes, that's great. Thanks. R.
Re: [libgo] Improve Solaris 2/SPARC support
Rainer Orth r...@cebitec.uni-bielefeld.de writes: 2011-03-24 Rainer Orth r...@cebitec.uni-bielefeld.de go: * go.test/go-test.exp (go-set-goarch): Use sparc64 for 64-bit SPARC. diff -r de1b3baf021b gcc/testsuite/go.test/go-test.exp --- a/gcc/testsuite/go.test/go-test.exp Thu Mar 24 13:19:30 2011 +0100 +++ b/gcc/testsuite/go.test/go-test.exp Thu Mar 24 13:22:43 2011 +0100 @@ -129,7 +129,7 @@ if [check_effective_target_ilp32] { set goarch sparc } else { - set goarch sparcv9 + set goarch sparc64 } } default { Thanks, I just committed this patch to mainline. I'm working on the other ones. Ian
Go testsuite patch committed: Fix env.go for other architectures
I brought over this patch to the Go testsuite from the master repository. This avoids an explicit list of supported architectures, so that we don't have to add each gcc architecture to the list. Ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/testsuite/go.test/test/env.go === --- gcc/testsuite/go.test/test/env.go (revision 171697) +++ gcc/testsuite/go.test/test/env.go (working copy) @@ -6,7 +6,10 @@ package main -import os os +import ( + os + runtime +) func main() { ga, e0 := os.Getenverror(GOARCH) @@ -14,8 +17,8 @@ func main() { print($GOARCH: , e0.String(), \n) os.Exit(1) } - if ga != amd64 ga != 386 ga != arm { - print($GOARCH=, ga, \n) + if ga != runtime.GOARCH { + print($GOARCH=, ga, != runtime.GOARCH=, runtime.GOARCH, \n) os.Exit(1) } xxx, e1 := os.Getenverror(DOES_NOT_EXIST)
[pph] Write language specific data (issue4331046)
This patch implements the writing of DECL_LANG_SPECIFIC fields. It's needed to write global_namespace. The reading part is still incomplete, but I wanted to flush this out before it got too big. The main changes: - We can now write out references to TEMPLATE_DECLs. They are stored in the same index table as all other DECLs. To implement this, I added a streamer hook that's called from lto_output_tree_ref. When it does not recognize the decl, it calls the hook which tells it whether the decl can be added to the index table. - We stream out everything in DECL_LANG_SPECIFIC, which in C++ is quite a bit. This will cover global_namespace since it goes through all the data stored for NAMESPACE_DECLs. - Since DECL_LANG_SPECIFIC can contains tokens, we need to save token caches. Since we are inside a streamer callback, we do not have a pointer to a PPH stream, so I've added a pretty hacky way of getting back to it. When we initially open the PPH stream and associate with it an output_block object, I take an unused field from output_block and use it as my pointer back to the PPH stream. This is awful, but it works for now. I'll clean this up in a future patch. The next patch will add the reader side, which will allow me to reconstruct global_namespace. This will uncover what other tree types we need to handle. With this I'm down to two failures in pph.exp. No changes for pth.exp. ChangeLog.pph 2011-03-30 Diego Novillo dnovi...@google.com * lto-streamer-in.c (lto_input_chain): Make extern. * lto-streamer-out.c (lto_output_tree_ref): Call streamer_hook-indexable_with_decls_p, if it exists. If it returns true, emit a reference to EXPR in the VAR_DECL index. (lto_output_chain): Make extern. (lto_output_tree_pointers): Move checks for invalid gimple trees ... * lto-streamer.c (lto_is_streamable): ... here. * lto-streamer.h (lto_streamer_hooks): Add indexable_with_decls_p. (lto_output_chain): Declare. (lto_input_chain): Declare. c-family/ChangeLog.pph 2011-03-30 Diego Novillo dnovi...@google.com * c-family/c.opt (fpph-fmt): Remove. Update all users. cp/ChangeLog.pph 2011-03-30 Diego Novillo dnovi...@google.com * cp-tree.h (struct language_function): Add prefix 'x_' to fields returns_value, returns_null, returns_abnormally, in_function_try_handler, in_base_initializer. Update all users. * pph-streamer.c (pph_stream_write_ld_base): New. (pph_stream_write_ld_min): New. (pph_stream_write_tree_vec): New. (pph_stream_write_cxx_binding_1): New. (pph_stream_write_cxx_binding): New. (pph_stream_write_class_binding): New. (pph_stream_write_label_binding): New. (pph_stream_write_binding_level): New. (pph_stream_write_c_language_function): New. (pph_stream_write_language_function): New. (pph_stream_write_ld_fn): New. (pph_stream_write_ld_ns): New. (pph_stream_write_ld_parm): New. (pph_stream_write_lang_specific_data): New. (pph_indexable_with_decls_p): New. (pph_stream_hooks_init): Initialize h-indexable_with_decls_p with pph_indexable_with_decls_p. (pph_stream_begin_section): Do not free BLOCK. * pph-streamer.h (pth_save_token_cache): Declare. (pph_get_ob_stream): New. (pph_set_ob_stream): New. * pph.c (pth_save_token_cache): New. (pph_print_macro_defs_before): Remove. (pph_print_macro_defs_after): Remove. (pph_write_namespace): Remove. (pph_write_format): Remove. (pph_write_print): Remove. (pph_write_dump): Remove. (pph_write_symbol): Remove. (declvisitor): Remove. (pph_write_namespace_1): Remove. (pph_write_namespace): Remove. (pph_write_file_contents): Rename from pph_write_file_object. Output global_namespace. (pph_write_file): Call it. (pph_write_file_summary): Remove. (pph_read_file_contents): Rename from pph_file_read_object. (pph_read_file): Rename from pph_file_read. diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt index 54e7461..12d4f06 100644 --- a/gcc/c-family/c.opt +++ b/gcc/c-family/c.opt @@ -933,10 +933,6 @@ fpph-decls= C++ Joined RejectNegative UInteger Var(flag_pph_decls) -fpph-decls=N Enable declaration identifier output at level N from PPH support -fpph-fmt= -C++ Joined RejectNegative UInteger Var(flag_pph_fmt) --fpph-fmt=N Output format is (0) normal (1) pretty summary (2) dump - fpph-hdr= C++ ObjC++ Joined MissingArgError(missing filename after %qs) -fpph-hdr=base-name A mapping from base-name.h to base-name.pph diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 6ef6e6e..aedcab2 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -1012,11 +1012,11 @@ struct GTY(()) language_function { tree
[google] Disable guality tests (issue4328047)
The guality tests are failing/passing intermittently. This confuses our builders which expect 0 failures and 0 xpasses. When they work, it's not that they are working but dejagnu failed to launch gdb: - spawn [open ...] gdb: took too long to attach testcase [...]/gcc/testsuite/gcc.dg/guality/guality.exp completed in 17 seconds - And when they do execute, they show a bunch of XPASS results: - XPASS: gcc.dg/guality/example.c -O0 execution test XPASS: gcc.dg/guality/example.c -O1 execution test XPASS: gcc.dg/guality/example.c -O2 execution test XPASS: gcc.dg/guality/example.c -Os execution test XPASS: gcc.dg/guality/example.c -O2 -flto -flto-partition=none execution test XPASS: gcc.dg/guality/example.c -O2 -flto execution test XPASS: gcc.dg/guality/guality.c -O0 execution test XPASS: gcc.dg/guality/guality.c -O1 execution test XPASS: gcc.dg/guality/guality.c -O2 execution test XPASS: gcc.dg/guality/guality.c -O3 -fomit-frame-pointer execution test XPASS: gcc.dg/guality/guality.c -O3 -g execution test XPASS: gcc.dg/guality/guality.c -Os execution test XPASS: gcc.dg/guality/guality.c -O2 -flto -flto-partition=none execution test XPASS: gcc.dg/guality/guality.c -O2 -flto execution test XPASS: gcc.dg/guality/inline-params.c -O2 execution test XPASS: gcc.dg/guality/inline-params.c -O3 -fomit-frame-pointer execution test XPASS: gcc.dg/guality/inline-params.c -O3 -g execution test XPASS: gcc.dg/guality/inline-params.c -Os execution test XPASS: gcc.dg/guality/pr41353-1.c -O0 line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O1 line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O2 line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O3 -fomit-frame-pointer line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O3 -g line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -Os line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto -flto-partition=none line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41353-1.c -O2 -flto line 28 j == 28 + 37 XPASS: gcc.dg/guality/pr41447-1.c -O0 execution test XPASS: gcc.dg/guality/pr41447-1.c -O1 execution test XPASS: gcc.dg/guality/pr41447-1.c -O2 execution test XPASS: gcc.dg/guality/pr41447-1.c -O3 -fomit-frame-pointer execution test XPASS: gcc.dg/guality/pr41447-1.c -O3 -g execution test XPASS: gcc.dg/guality/pr41447-1.c -Os execution test XPASS: gcc.dg/guality/pr41616-1.c -O0 execution test XPASS: gcc.dg/guality/pr41616-1.c -O1 execution test XPASS: gcc.dg/guality/pr41616-1.c -O2 execution test XPASS: gcc.dg/guality/pr41616-1.c -O3 -fomit-frame-pointer execution test XPASS: gcc.dg/guality/pr41616-1.c -O3 -g execution test XPASS: gcc.dg/guality/pr41616-1.c -Os execution test XPASS: gcc.dg/guality/sra-1.c -O0 line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O1 line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O2 line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O3 -fomit-frame-pointer line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O3 -g line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -Os line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O2 -flto -flto-partition=none line 43 a.j == 14 XPASS: gcc.dg/guality/sra-1.c -O2 -flto line 43 a.j == 14 - Perhaps it's just a matter of removing the XFAILs, but I have seen spurious FAILs in the past and I'm not sure why they are passing if they were meant to fail. Alex, are the above tests supposed to fail or not? This is on x86_64 with dejagnu combinations unix/-m32 and unix/-m64. For now, I am disabling these tests in the google branches. Tested on x86_64. Committed to google/integration. Will merge into the other google branches as well. 2011-03-30 Diego Novillo dnovi...@google.com * g++.dg/guality/guality.exp: Disable. * gcc.dg/guality/guality.exp: Disable. * gfortran.dg/guality/guality.exp: Disable. diff --git a/gcc/testsuite/g++.dg/guality/guality.exp b/gcc/testsuite/g++.dg/guality/guality.exp index 9a17850..a07a628 100644 --- a/gcc/testsuite/g++.dg/guality/guality.exp +++ b/gcc/testsuite/g++.dg/guality/guality.exp @@ -1,5 +1,8 @@ # This harness is for tests that should be run at all optimisation levels. +# Disable everywhere. These tests are very flaky. +return + load_lib g++-dg.exp load_lib gcc-gdb-test.exp diff --git a/gcc/testsuite/gcc.dg/guality/guality.exp b/gcc/testsuite/gcc.dg/guality/guality.exp index 49e2ac5..39bf4e3 100644 --- a/gcc/testsuite/gcc.dg/guality/guality.exp +++ b/gcc/testsuite/gcc.dg/guality/guality.exp @@ -1,5 +1,8 @@ # This harness is for tests that should be run at all optimisation levels. +# Disable everywhere. These tests are very flaky. +return +
Re: [libgo] Improve Solaris 2/SPARC support
Rainer Orth r...@cebitec.uni-bielefeld.de writes: * go-test.exp wasn't updated for the change from sparcv9 to sparc64. While I still don't agree with the new name, at least the two should be consistent. I committed this earlier today. * env.go needs to accept sparc and sparc64. I fixed this in a different way. * Just like 32-bit Solaris 2/x86, 32-bit Solaris 2/SPARC needs to use the largefile variants of several functions. I've not introduced a new LIBGO_IS_SOLARIS32 conditional for that, but perhaps one should? I just committed this one, as appended. Thanks for sending them. Ian diff -r 50a941f17e57 libgo/Makefile.am --- a/libgo/Makefile.am Wed Mar 30 10:36:43 2011 -0700 +++ b/libgo/Makefile.am Wed Mar 30 14:42:46 2011 -0700 @@ -689,8 +689,12 @@ if LIBGO_IS_386 go_os_dir_file = go/os/dir_largefile.go else +if LIBGO_IS_SPARC +go_os_dir_file = go/os/dir_largefile.go +else go_os_dir_file = go/os/dir_regfile.go endif +endif else if LIBGO_IS_LINUX go_os_dir_file = go/os/dir_largefile.go @@ -1219,16 +1223,21 @@ syscall_stat_file = syscalls/sysfile_stat_largefile.go else # !LIBGO_IS_LINUX if LIBGO_IS_SOLARIS -# FIXME: Same for sparc vs. sparc64. Introduce new/additional conditional? if LIBGO_IS_386 -# Use lseek64 on 386 Solaris. +# Use lseek64 on 32-bit Solaris/x86. syscall_filesize_file = syscalls/sysfile_largefile.go syscall_stat_file = syscalls/sysfile_stat_largefile.go -else # !LIBGO_IS_LINUX LIBGO_IS_SOLARIS !LIBGO_IS_386 -# Use lseek on amd64 Solaris. +else # !LIBGO_IS_386 +if LIBGO_IS_SPARC +# Use lseek64 on 32-bit Solaris/SPARC. +syscall_filesize_file = syscalls/sysfile_largefile.go +syscall_stat_file = syscalls/sysfile_stat_largefile.go +else # !LIBGO_IS_386 !LIBGO_IS_SPARC +# Use lseek on 64-bit Solaris. syscall_filesize_file = syscalls/sysfile_regfile.go syscall_stat_file = syscalls/sysfile_stat_regfile.go -endif # !LIBGO_IS_386 +endif # !LIBGO_IS_386 !LIBGO_IS_SPARC +endif # !LIBGO_IS_SOLARIS else # !LIBGO_IS_LINUX !LIBGO_IS_SOLARIS # Use lseek by default. syscall_filesize_file = syscalls/sysfile_regfile.go
Re: [libgo] Provide strerror_r replacement (PR go/47515)
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Apart from the lack of wait4, libgo on IRIX 6.5 contained an undefined reference to strerror_r. This patch provides a replacement, based on gnulib/lib/strerror_r.c, but massively simplified. I addressed this in a different way, with a new variant of syscall.Errstr written in Go. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r ad3d14acbce1 libgo/Makefile.am --- a/libgo/Makefile.am Wed Mar 30 14:45:29 2011 -0700 +++ b/libgo/Makefile.am Wed Mar 30 15:31:01 2011 -0700 @@ -1264,7 +1264,11 @@ if LIBGO_IS_RTEMS syscall_errstr_file = syscalls/errstr_rtems.go else +if HAVE_STRERROR_R syscall_errstr_file = syscalls/errstr.go +else +syscall_errstr_file = syscalls/errstr_nor.go +endif endif # Declare libc_strerror_r which is the Go name for strerror_r. @@ -1273,7 +1277,7 @@ syscall_errstr_decl_file = syscalls/errstr_decl_rtems.go else if LIBGO_IS_LINUX -# In Linux the POSIX strerror_r is called __xpg_strerror_r. +# On GNU/Linux the POSIX strerror_r is called __xpg_strerror_r. syscall_errstr_decl_file = syscalls/errstr_decl_linux.go else # On other systems we hope strerror_r is just strerror_r. diff -r ad3d14acbce1 libgo/configure.ac --- a/libgo/configure.ac Wed Mar 30 14:45:29 2011 -0700 +++ b/libgo/configure.ac Wed Mar 30 15:31:01 2011 -0700 @@ -380,7 +380,9 @@ AC_CHECK_HEADERS(sys/mman.h syscall.h sys/epoll.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h) AM_CONDITIONAL(HAVE_SYS_MMAN_H, test $ac_cv_header_sys_mman_h = yes) -AC_CHECK_FUNCS(srandom random strsignal) + +AC_CHECK_FUNCS(srandom random strerror_r strsignal) +AM_CONDITIONAL(HAVE_STRERROR_R, test $ac_cv_func_strerror_r = yes) AC_CACHE_CHECK([for __sync_bool_compare_and_swap_4], [libgo_cv_func___sync_bool_compare_and_swap_4], diff -r ad3d14acbce1 libgo/syscalls/errstr_nor.go --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/syscalls/errstr_nor.go Wed Mar 30 15:31:01 2011 -0700 @@ -0,0 +1,32 @@ +// errstr.go -- Error strings when there is no strerror_r. + +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package syscall + +import ( + sync + unsafe +) + +func libc_strerror(int) *byte __asm__ (strerror) + +var errstr_lock sync.Mutex + +func Errstr(errno int) string { + errstr_lock.Lock() + + bp := libc_strerror(errno) + b := (*[1000]byte)(unsafe.Pointer(bp)) + i := 0 + for b[i] != 0 { + i++ + } + s := string(b[:i]) + + errstr_lock.Unlock() + + return s +}
[patch][cprop.c] Clean up hash table building
Hi, This is the first cleanup for cprop.c. These cleanups are only possible now, thanks to splitting the CPROP code out from gcse.c The first change is to not treat unfolded conditions as constant in gcse_constant_p(). This never happens because: - during hash table building any such condition should have been simplified by any prior pass - during cprop the expression table is not updated because the patterns are unshared This causes no changes in code generation for all cc1-i files on x86_64-unknown-linux-gnu. I could add an assert to make sure this kind of condition really never happens, but IMHO it's not worth it and the bug would be elsewhere anyway. The second change is to remove oprs_available_p(). After the gcse_constant_p() cleanup, we do not have to traverse the whole pattern of a SET candidate for CPROP, because we know that SET_DEST is a REG and SET_SRC is a REG or a shareable constant. Therefore a simple check to see if the registers involved were set in the block is sufficient, and a regset can be used instead of the last_set table. See reg_available_p(). The third change is to compute_hash_table_work(), which now traverses the insns in each basic block just once (instead of twice), in reverse order to record all registers set between BB_END and the current insn. This change allows further simplify hash_scan_set() which now doesn't have to look at INSN+1 anymore (saving another half insns stream traversal by avoiding next_nonnote_nondebug_insn()). Bootstrapped and tested on x86_64-unknown-linux-gnu. OK? Ciao! Steven * cprop.c: Clean up hash table building. (reg_avail_info): Remove. (oprs_available_p): Remove. (record_last_reg_set_info): Remove. (record_last_set_info): Remove. (reg_available_p): New function. (gcse_constant_p): Do not treat unfolded conditions as constants. (make_set_regs_unavailable): New function. (hash_scan_set): Simplify with new reg_available_p. (compute_hash_table_work): Traverse insns stream only once. Do not compute reg_avail_info. Traverse insns in reverse order. Record implicit sets after recording explicit sets from the block. Index: cprop.c === *** cprop.c (revision 171627) --- cprop.c (working copy) *** static rtx *implicit_sets; *** 118,124 /* Bitmap containing one bit for each register in the program. Used when performing GCSE to track which registers have been set since !the start of the basic block. */ static regset reg_set_bitmap; /* Various variables for statistics gathering. */ --- 118,124 /* Bitmap containing one bit for each register in the program. Used when performing GCSE to track which registers have been set since !the start or end of the basic block while traversing that block. */ static regset reg_set_bitmap; /* Various variables for statistics gathering. */ *** free_gcse_mem (void) *** 183,261 FREE_REG_SET (reg_set_bitmap); } ! struct reg_avail_info ! { ! basic_block last_bb; ! int last_set; ! }; ! ! static struct reg_avail_info *reg_avail_info; ! static basic_block current_bb; ! ! /* Return nonzero if the operands of expression X are unchanged from !INSN to the end of INSN's basic block. */ static int ! oprs_available_p (const_rtx x, const_rtx insn) { ! int i, j; ! enum rtx_code code; ! const char *fmt; ! ! if (x == 0) ! return 1; ! ! code = GET_CODE (x); ! switch (code) ! { ! case REG: ! { ! struct reg_avail_info *info = reg_avail_info[REGNO (x)]; ! ! if (info-last_bb != current_bb) ! return 1; ! return info-last_set DF_INSN_LUID (insn); ! } ! ! case PRE_DEC: ! case PRE_INC: ! case POST_DEC: ! case POST_INC: ! case PRE_MODIFY: ! case POST_MODIFY: ! return 0; ! ! case PC: ! case CC0: /*FIXME*/ ! case CONST: ! case CONST_INT: ! case CONST_DOUBLE: ! case CONST_FIXED: ! case CONST_VECTOR: ! case SYMBOL_REF: ! case LABEL_REF: ! case ADDR_VEC: ! case ADDR_DIFF_VEC: ! return 1; ! ! default: ! break; ! } ! ! for (i = GET_RTX_LENGTH (code) - 1, fmt = GET_RTX_FORMAT (code); i = 0; i--) ! { ! if (fmt[i] == 'e') ! { ! if (! oprs_available_p (XEXP (x, i), insn)) ! return 0; ! } ! else if (fmt[i] == 'E') ! for (j = 0; j XVECLEN (x, i); j++) ! if (! oprs_available_p (XVECEXP (x, i, j), insn)) ! return 0; ! } ! ! return 1; } /* Hash a set of register REGNO. --- 183,195 FREE_REG_SET (reg_set_bitmap); } ! /* Return nonzero if register X is unchanged from INSN to the end !of INSN's basic block. */ static int ! reg_available_p (const_rtx x, const_rtx insn ATTRIBUTE_UNUSED) { ! return ! REGNO_REG_SET_P
Re: [libgo] Replace wait4 by waitpid (PR go/47515)
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Currently, libgo uses wait4 unconditionally, which is missing on IRIX 6.5. Fortunately, the rusage * arg is used nowhere, so I've decide to replace wait4 by waitpid, which seems to be considerably more portable. Thanks for the patch, but I wasn't entirely happy with the approach because it removes the os.Waitmsg.Rusage field. That field is inherently system dependent but I would rather not get rid of it. I wrote this patch instead, which I hope will also solve the problem. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian diff -r a602eae61cf4 libgo/Makefile.am --- a/libgo/Makefile.am Wed Mar 30 15:33:32 2011 -0700 +++ b/libgo/Makefile.am Wed Mar 30 16:01:55 2011 -0700 @@ -1246,13 +1246,20 @@ endif # !LIBGO_IS_LINUX -# Define ForkExec, PtraceForkExec, Exec, and Wait4. +# Define ForkExec, PtraceForkExec, and Exec. if LIBGO_IS_RTEMS syscall_exec_os_file = syscalls/exec_stubs.go else syscall_exec_os_file = syscalls/exec.go endif +# Define Wait4. +if HAVE_WAIT4 +syscall_wait_file = syscalls/wait4.go +else +syscall_wait_file = syscalls/waitpid.go +endif + # Define Sleep. if LIBGO_IS_RTEMS syscall_sleep_file = syscalls/sleep_rtems.go @@ -1329,6 +1336,7 @@ $(syscall_errstr_decl_file) \ syscalls/exec_helpers.go \ $(syscall_exec_os_file) \ + $(syscall_wait_file) \ $(syscall_filesize_file) \ $(syscall_stat_file) \ $(syscall_sleep_file) \ diff -r a602eae61cf4 libgo/configure.ac --- a/libgo/configure.ac Wed Mar 30 15:33:32 2011 -0700 +++ b/libgo/configure.ac Wed Mar 30 16:01:55 2011 -0700 @@ -381,8 +381,9 @@ AC_CHECK_HEADERS(sys/mman.h syscall.h sys/epoll.h sys/ptrace.h sys/syscall.h sys/user.h sys/utsname.h) AM_CONDITIONAL(HAVE_SYS_MMAN_H, test $ac_cv_header_sys_mman_h = yes) -AC_CHECK_FUNCS(srandom random strerror_r strsignal) +AC_CHECK_FUNCS(srandom random strerror_r strsignal wait4) AM_CONDITIONAL(HAVE_STRERROR_R, test $ac_cv_func_strerror_r = yes) +AM_CONDITIONAL(HAVE_WAIT4, test $ac_cv_func_wait4 = yes) AC_CACHE_CHECK([for __sync_bool_compare_and_swap_4], [libgo_cv_func___sync_bool_compare_and_swap_4], diff -r a602eae61cf4 libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh Wed Mar 30 15:33:32 2011 -0700 +++ b/libgo/mksysinfo.sh Wed Mar 30 16:01:55 2011 -0700 @@ -377,6 +377,8 @@ nrusage=$nrusage $field; done echo type Rusage struct {$nrusage } ${OUT} +else + echo type Rusage struct {} ${OUT} fi # The utsname struct. diff -r a602eae61cf4 libgo/syscalls/exec.go --- a/libgo/syscalls/exec.go Wed Mar 30 15:33:32 2011 -0700 +++ b/libgo/syscalls/exec.go Wed Mar 30 16:01:55 2011 -0700 @@ -17,7 +17,6 @@ func libc_dup2(int, int) int __asm__ (dup2) func libc_execve(*byte, **byte, **byte) int __asm__ (execve) func libc_sysexit(int) __asm__ (_exit) -func libc_wait4(Pid_t, *int, int, *Rusage) Pid_t __asm__ (wait4) // Fork, dup fd onto 0..len(fd), and exec(argv0, argvv, envv) in child. // If a dup or exec fails, write the errno int to pipe. @@ -263,16 +262,3 @@ libc_execve(StringBytePtr(argv0), argv_arg[0], envv_arg[0]) return GetErrno() } - -func Wait4(pid int, wstatus *WaitStatus, options int, rusage *Rusage) (wpid int, errno int) { - var status int - r := libc_wait4(Pid_t(pid), status, options, rusage) - wpid = int(r) - if r 0 { - errno = GetErrno() - } - if wstatus != nil { - *wstatus = WaitStatus(status) - } - return -} diff -r a602eae61cf4 libgo/syscalls/wait4.go --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/syscalls/wait4.go Wed Mar 30 16:01:55 2011 -0700 @@ -0,0 +1,22 @@ +// wait4.go -- Wait4 for systems with wait4. + +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package syscall + +func libc_wait4(Pid_t, *int, int, *Rusage) Pid_t __asm__ (wait4) + +func Wait4(pid int, wstatus *WaitStatus, options int, rusage *Rusage) (wpid int, errno int) { + var status int + r := libc_wait4(Pid_t(pid), status, options, rusage) + wpid = int(r) + if r 0 { + errno = GetErrno() + } + if wstatus != nil { + *wstatus = WaitStatus(status) + } + return +} diff -r a602eae61cf4 libgo/syscalls/waitpid.go --- /dev/null Thu Jan 01 00:00:00 1970 + +++ b/libgo/syscalls/waitpid.go Wed Mar 30 16:01:55 2011 -0700 @@ -0,0 +1,22 @@ +// waitpid.go -- Wait4 for systems without wait4, but with waitpid. + +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package syscall + +func libc_waitpid(Pid_t, *int, int) Pid_t __asm__ (waitpid) + +func Wait4(pid int, wstatus *WaitStatus, options int, rusage *Rusage) (wpid int, errno int) { + var status int + r := libc_waitpid(Pid_t(pid), status, options) + wpid = int(r) + if r 0 { + errno = GetErrno() + } + if wstatus != nil { + *wstatus = WaitStatus(status) + } + return +}
[x86] Fix PR target/48142
Hi, this is a regression present for x86-64 on mainline and 4.6 branch with the options -Os -mpreferred-stack-boundary=5 -fstack-check -fno-omit-frame-pointer. This improbable combination of options is necessary because you need to have stack checking + stack realignment + !ACCUMULATE_OUTGOING_ARGS. In this case, the DW_CFA_GNU_args_size CFIs must be correct in spite of the frame pointer. Tested on {i586,x86_64}-suse-linux, OK for mainline and 4.6 branch? 2011-03-30 Eric Botcazou ebotca...@adacore.com PR target/48142 * config/i386/i386.c (ix86_adjust_stack_and_probe): Differentiate frame-related from frame-unrelated adjustments to the stack pointer. 2011-03-30 Eric Botcazou ebotca...@adacore.com * g++.dg/other/pr48142.C: New test. -- Eric Botcazou // PR target/48142 // Testcase by Zdenek Sojka zso...@seznam.cz // { dg-do run { target i?86-*-* x86_64-*-* } } // { dg-options -Os -mpreferred-stack-boundary=5 -fstack-check -fno-omit-frame-pointer } int main() { try { throw 0; } catch (...) {} return 0; } Index: config/i386/i386.c === --- config/i386/i386.c (revision 171716) +++ config/i386/i386.c (working copy) @@ -10006,7 +10006,7 @@ ix86_adjust_stack_and_probe (const HOST_ probe that many bytes past the specified size to maintain a protection area at the botton of the stack. */ const int dope = 4 * UNITS_PER_WORD; - rtx size_rtx = GEN_INT (size); + rtx size_rtx = GEN_INT (size), last; /* See if we have a constant small number of probes to generate. If so, that's the easy case. The run-time loop is made up of 11 insns in the @@ -10046,9 +10046,9 @@ ix86_adjust_stack_and_probe (const HOST_ emit_stack_probe (stack_pointer_rtx); /* Adjust back to account for the additional first interval. */ - emit_insn (gen_rtx_SET (VOIDmode, stack_pointer_rtx, - plus_constant (stack_pointer_rtx, - PROBE_INTERVAL + dope))); + last = emit_insn (gen_rtx_SET (VOIDmode, stack_pointer_rtx, + plus_constant (stack_pointer_rtx, + PROBE_INTERVAL + dope))); } /* Otherwise, do the same as above, but in a loop. Note that we must be @@ -10109,15 +10109,33 @@ ix86_adjust_stack_and_probe (const HOST_ } /* Adjust back to account for the additional first interval. */ - emit_insn (gen_rtx_SET (VOIDmode, stack_pointer_rtx, - plus_constant (stack_pointer_rtx, - PROBE_INTERVAL + dope))); + last = emit_insn (gen_rtx_SET (VOIDmode, stack_pointer_rtx, + plus_constant (stack_pointer_rtx, + PROBE_INTERVAL + dope))); release_scratch_register_on_entry (sr); } gcc_assert (cfun-machine-fs.cfa_reg != stack_pointer_rtx); - cfun-machine-fs.sp_offset += size; + + /* Even if the stack pointer isn't the CFA register, we need to correctly + describe the adjustments made to it, in particular differentiate the + frame-related ones from the frame-unrelated ones. */ + if (size 0) +{ + rtx expr = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2)); + XVECEXP (expr, 0, 0) + = gen_rtx_SET (VOIDmode, stack_pointer_rtx, + plus_constant (stack_pointer_rtx, -size)); + XVECEXP (expr, 0, 1) + = gen_rtx_SET (VOIDmode, stack_pointer_rtx, + plus_constant (stack_pointer_rtx, + PROBE_INTERVAL + dope + size)); + add_reg_note (last, REG_FRAME_RELATED_EXPR, expr); + RTX_FRAME_RELATED_P (last) = 1; + + cfun-machine-fs.sp_offset += size; +} /* Make sure nothing is scheduled before we are done. */ emit_insn (gen_blockage ());
PR bootstrap/48371
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sigh. I changed a reg_equiv_address to reg_equiv_mem when I changed from explicit VEC_index references to macros. That caused all kinds of interesting problems for ia32 :-) I'm also fixing a comment typo that was introduced around the same time. Installed as obvious. Fixes 48371. x86_64-unknown-linux-gnu bootstrapped and regression tested. i686 bootstrap in progress (and probably won't finish until tomorrow...) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNk8UuAAoJEBRtltQi2kC77goIAISRCD1qP9OA8KS1WPYtocFo ZdxfOX0RA1tHZclEVPQbqkWaGxXW7jJbf1Wz8rFsMgQpeXv328NNDgfgmLTMhw6o jxhuOjvNX7rSDYeKGjPBuU+K37co+oVaHi9x8X0ZhOkYUHG9u+hKLlLXUV9h5AGL 0RNY71nfZD4y7y4F6xAqFPGlnh8HCzXTaUbzTXXyEcTo3WH1rWO2HcZhL/5MJ2SR aaoF0lbIC4GAm2kUXUYeSV1JYlzc7ZUZrnjZwKwFiK4UVcF90nJr6Ji7bGEbuRAN 3D0U7Lmda8tWZ0gzkhWlcwx4Ys0gBBoHJowDFXY9hvkff7FABa+raUsEkyO2rTc= =dWZc -END PGP SIGNATURE- * reload1.c (reload): Fix botch in last change. * reload.h (struct reload): Fix typo introduced in last change. Index: reload1.c === *** reload1.c (revision 171731) --- reload1.c (working copy) *** reload (rtx first, int global) *** 895,901 (REGNO (XEXP (XEXP (x, 0), 0)) FIRST_PSEUDO_REGISTER) CONSTANT_P (XEXP (XEXP (x, 0), 1 ! reg_equiv_mem (i) = XEXP (x, 0), reg_equiv_mem (i) = 0; else { /* Make a new stack slot. Then indicate that something --- 895,901 (REGNO (XEXP (XEXP (x, 0), 0)) FIRST_PSEUDO_REGISTER) CONSTANT_P (XEXP (XEXP (x, 0), 1 ! reg_equiv_address (i) = XEXP (x, 0), reg_equiv_mem (i) = 0; else { /* Make a new stack slot. Then indicate that something Index: reload.h === *** reload.h(revision 171731) --- reload.h(working copy) *** struct reload *** 100,106 int inc; /* A reg for which reload_in is the equivalent. If reload_in is a symbol_ref which came from ! reg_equiv_consant, then this is the pseudo which has that symbol_ref as equivalent. */ rtx in_reg; rtx out_reg; --- 100,106 int inc; /* A reg for which reload_in is the equivalent. If reload_in is a symbol_ref which came from ! reg_equiv_constant, then this is the pseudo which has that symbol_ref as equivalent. */ rtx in_reg; rtx out_reg;
Re: Move reg_equiv* arrays into a single VEC structure
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/30/11 10:51, Richard Sandiford wrote: Nice cleanup thanks. Just noticed a couple of things: Jeff Law l...@redhat.com writes: *** struct reload *** 100,106 int inc; /* A reg for which reload_in is the equivalent. If reload_in is a symbol_ref which came from ! reg_equiv_constant, then this is the pseudo which has that symbol_ref as equivalent. */ rtx in_reg; rtx out_reg; --- 100,106 int inc; /* A reg for which reload_in is the equivalent. If reload_in is a symbol_ref which came from ! reg_equiv_consant, then this is the pseudo which has that symbol_ref as equivalent. */ rtx in_reg; rtx out_reg; Adds typo. Yea. I had changed the comment when I was using VEC_blah directly, then introduced the typo when I went to using accessor macros and wanted to change the comment back to its original form :( I just checked in a fix for the typo. *** elimination_effects (rtx x, enum machine *** 3002,3011 } } ! else if (reg_renumber[regno] 0 reg_equiv_constant !reg_equiv_constant[regno] !! function_invariant_p (reg_equiv_constant[regno])) !elimination_effects (reg_equiv_constant[regno], mem_mode); return; case PRE_INC: --- 2996,3006 } } ! else if (reg_renumber[regno] 0 !reg_equiv_constant (0) !reg_equiv_constant (regno) !! function_invariant_p (reg_equiv_constant (regno))) !elimination_effects (reg_equiv_constant (regno), mem_mode); return; case PRE_INC: Looks like this should be s/reg_equiv_constant (0)/reg_equivs != 0/. I thought I'd fixed these. I certainly remember looking at them and thinking that can't be right at some point. I'll look at it again more closely tomorrow and take appropriate corrective action. Thanks for the feedback, Jeff -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJNk8ZSAAoJEBRtltQi2kC7+qYH/R0i1/YC3efnLuQjZ0uieuCU 0b/sMDvP+0xngPbKKn9YviBuNhUv2/poPG1OOOQtolHeZ5c8rftecZKMRgjtmX9W jYxhuY2OLBBTLLPo4eFDdBbEP/m90RGEBtumeIx1isPTOjQ3LVuBF+9GB+wbnr/W OLox1MSfPT7GV3pbBeSMiHiKkw5VOeFKd4vBIbefWAPgjO0G8LFexBYWkw04j1F5 tXbuLn/cnSb4PLoRgrCxDv4XS8Wzx1YJcFtcnjn+a2t1HPUARkSYCSn8o94Yytub k263YKwgRsqwxnrXmGght4+K2kuVjitANa6iX24DWTa+eJBruQ9ObA/f3Xs1P5E= =SgX4 -END PGP SIGNATURE-
Re: Continue toplevel cleanup (GCC library handling for unsupported targets etc.)
On Tue, 29 Mar 2011, Joseph S. Myers wrote: Other cleanups here: cris*-*-none acts just like cris*-*-elf in config.gcc so it's appropriate to make the * subcase of cris*-*-* act like the -elf case; 'k. mmix-*-* disabled libgloss, i.e. libgloss for the host, which is never built anyway. Whoops, my bad. Odd that it worked anyway, but I'm not certain I added that due to actual breakage. All bits fine by me. brgds, H-P
Random cleanups [2/4]: canonicalize ctor values
Hi, this came up when looking into why the static ctors contain useless trees (like casts). We can simply canonicalize them while varpool analyzes pending decls. It'll look at initialzers once, where we can gimplify them. This requires making canonicalize_constructor_val be able to be called outside of functions. And it requires the java frontend not leaving a dangling function decl as current (for the static ctor function it generates). Regstrapped on x86_64-linux with the other three cleanups. Okay for trunk? Ciao, Michael. -- * cgraphbuild.c (record_reference): Canonicalize constructor values. * gimple-fold.c (canonicalize_constructor_val): Accept being called without function context. java/ * jcf-parse.c (java_emit_static_constructor): Clear cfun and current_function_decl. Index: cgraphbuild.c === --- cgraphbuild.c (revision 171537) +++ cgraphbuild.c (working copy) @@ -53,6 +53,8 @@ record_reference (tree *tp, int *walk_su tree decl; struct record_reference_ctx *ctx = (struct record_reference_ctx *)data; +restart: + switch (TREE_CODE (t)) { case VAR_DECL: @@ -98,6 +100,15 @@ record_reference (tree *tp, int *walk_su break; } + t = canonicalize_constructor_val (t); + if (t t != *tp) + { + *tp = t; + goto restart; + } + else + t = *tp; + if ((unsigned int) TREE_CODE (t) = LAST_AND_UNUSED_TREE_CODE) return lang_hooks.callgraph.analyze_expr (tp, walk_subtrees); break; Index: gimple-fold.c === --- gimple-fold.c (revision 171537) +++ gimple-fold.c (working copy) @@ -106,7 +106,7 @@ can_refer_decl_in_current_unit_p (tree d return true; } -/* CVAL is value taken from DECL_INITIAL of variable. Try to transorm it into +/* CVAL is value taken from DECL_INITIAL of variable. Try to transform it into acceptable form for is_gimple_min_invariant. */ tree @@ -131,7 +131,7 @@ canonicalize_constructor_val (tree cval) || TREE_CODE (base) == FUNCTION_DECL) !can_refer_decl_in_current_unit_p (base)) return NULL_TREE; - if (base TREE_CODE (base) == VAR_DECL) + if (cfun base TREE_CODE (base) == VAR_DECL) add_referenced_var (base); /* We never have the chance to fixup types in global initializers during gimplification. Do so here. */ Index: java/jcf-parse.c === --- java/jcf-parse.c.orig 2011-03-26 02:19:03.0 +0100 +++ java/jcf-parse.c2011-03-28 06:16:43.0 +0200 @@ -1725,6 +1725,8 @@ java_emit_static_constructor (void) DECL_STATIC_CONSTRUCTOR (decl) = 1; java_genericize (decl); cgraph_finalize_function (decl, false); + current_function_decl = NULL; + set_cfun (NULL); } }
Re: Random cleanups [1/4]
On Wed, Mar 30, 2011 at 21:16, Michael Matz m...@suse.de wrote: * tree.c (decl_init_priority_insert): Don't create entry for default priority. (decl_fini_priority_insert): Ditto. (fields_compatible_p, find_compatible_field): Remove. * tree.h (fields_compatible_p, find_compatible_field): Remove. * gimple.c (gimple_compare_field_offset): Adjust block comment. OK. Diego.
Random cleanups [3/4]: zero out DECL_VINDEX field
Hi, I noticed this while working on early-merging LTO. The DECL_VINDEX slot of FUNCTION_DECLs is supposed to hold the numeric index of the vtable slot if it's a virtual function. During parsing the C++ frontend uses it to hold a reference to itself, which then later is supposed to be transformed into the real index. Sometimes the C++ frontend copies function_decls, leaving the old one reachable via various means (but not as a reachable function, merely in scopes and the like), but fixes up only the copy. Instead of diddling with the frontend we can just as well zero out meaningless DECL_VINDEX slots (i.e. when they're not an integer). It's used for debug info generation (when it's an integer) and as a flag ala this function is a virtual method by the middle-end. Regstrapped on x86_64-linux with the other three cleanups. Okay for trunk? Ciao, Michael. -- * tree.c (free_lang_data_in_decl): Zero DECL_VINDEX if it's not an integer. * tree.h (tree_decl_non_common.vindex): Adjust comment. Index: tree.c === --- tree.c (revision 171537) +++ tree.c (working copy) @@ -4647,6 +4647,13 @@ free_lang_data_in_decl (tree decl) RECORD_OR_UNION_TYPE_P (DECL_CONTEXT (DECL_ABSTRACT_ORIGIN (decl DECL_ABSTRACT_ORIGIN (decl) = NULL_TREE; + + /* Sometimes the C++ frontend doesn't manage to transform a temporary + DECL_VINDEX referring to itself into a vtable slot number as it +should. Happens with functions that are copied and then forgotten +about. Just clear it, it won't matter anymore. */ + if (DECL_VINDEX (decl) !host_integerp (DECL_VINDEX (decl), 0)) + DECL_VINDEX (decl) = NULL_TREE; } else if (TREE_CODE (decl) == VAR_DECL) { Index: tree.h === --- tree.h (revision 171537) +++ tree.h (working copy) @@ -3228,7 +3233,7 @@ struct GTY(()) tree arguments; /* Almost all FE's use this. */ tree result; - /* C++ uses this in namespaces. */ + /* C++ uses this in namespaces and function_decls. */ tree vindex; };
Re: Random cleanups [3/4]: zero out DECL_VINDEX field
On Wed, Mar 30, 2011 at 21:32, Michael Matz m...@suse.de wrote: * tree.c (free_lang_data_in_decl): Zero DECL_VINDEX if it's not an integer. * tree.h (tree_decl_non_common.vindex): Adjust comment. OK. Diego.
[pph] Split streamer into pph-sreamer-out.c and pph-streamer-in.c (issue4333047)
No functional changes. Just some refactoring to stop mixing up reading and writing routines in the same file. Tested on x86_64. cp/ChangeLog.pph 2011-03-30 Diego Novillo dnovi...@google.com * Make-lang.in (CXX_AND_OBJCXX_OBJS): Add cp/pph-streamer-out.o and cp/pph-streamer-in.o. (cp/pph-streamer-out.o): New. (cp/pph-streamer-in.o): New. * pph-streamer.c (current_pph_file): Move to cp/pph-streamer-out.c. (pph_stream_init_write): Likewise. (pph_stream_write_ld_base): Likewise. (pph_stream_write_ld_min): Likewise. (pph_stream_write_tree_vec): Likewise. (pph_stream_write_cxx_binding_1): Likewise. (pph_stream_write_cxx_binding): Likewise. (pph_stream_write_class_binding): Likewise. (pph_stream_write_label_binding): Likewise. (pph_stream_write_binding_level): Likewise. (pph_stream_write_c_language_function): Likewise. (pph_stream_write_language_function): Likewise. (pph_stream_write_ld_fn): Likewise. (pph_stream_write_ld_ns): Likewise. (pph_stream_write_ld_parm): Likewise. (pph_stream_write_lang_specific_data): Likewise. (pph_stream_write_tree): Likewise. (pph_stream_pack_value_fields): Likewise. (pph_stream_begin_section): Likewise. (pph_stream_write): Likewise. (pph_stream_end_section): Likewise. (pph_stream_write_header): Likewise. (pph_stream_write_body): Likewise. (pp_stream_flush_buffers): New. Factor out of pph_stream_close. (pph_get_section_data): Move to cp/pph-streamer-in.c. (pph_free_section_data): Likewise. (pph_stream_init_read): Likewise. (pph_stream_read_tree): Likewise. (pph_stream_unpack_value_fields): Likewise. * pph-streamer.h (pph_stream_flush_buffers): Declare. (pph_stream_init_write): Declare. (pph_stream_write_tree): Declare. (pph_stream_pack_value_fields): Declare. (pph_stream_init_read): Declare. (pph_stream_read_tree): Declare. (pph_stream_unpack_value_fields): Declare. * pph-streamer-out.c: New file. * pph-streamer-in.c: New file. diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index 70548c6..143df46 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -81,7 +81,8 @@ CXX_AND_OBJCXX_OBJS = cp/call.o cp/decl.o cp/expr.o cp/pt.o cp/typeck2.o \ cp/typeck.o cp/cvt.o cp/except.o cp/friend.o cp/init.o cp/method.o \ cp/search.o cp/semantics.o cp/tree.o cp/repo.o cp/dump.o cp/optimize.o \ cp/mangle.o cp/cp-objcp-common.o cp/name-lookup.o cp/cxx-pretty-print.o \ - cp/cp-gimplify.o cp/pph.o cp/pph-streamer.o tree-mudflap.o $(CXX_C_OBJS) + cp/cp-gimplify.o cp/pph.o cp/pph-streamer.o cp/pph-streamer-out.o \ + cp/pph-streamer-in.o tree-mudflap.o $(CXX_C_OBJS) # Language-specific object files for C++. CXX_OBJS = cp/cp-lang.o c-family/stub-objc.o $(CXX_AND_OBJCXX_OBJS) @@ -337,3 +338,11 @@ cp/pph.o: cp/pph.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(CPPLIB_H) \ cp/pph-streamer.o: cp/pph-streamer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) $(CXX_PPH_STREAMER_H) \ $(CXX_PPH_H) $(TREE_PASS_H) version.h cppbuiltin.h tree-iterator.h +cp/pph-streamer-out.o: cp/pph-streamer-out.c $(CONFIG_H) $(SYSTEM_H) \ + coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ + $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ + cppbuiltin.h tree-iterator.h +cp/pph-streamer-in.o: cp/pph-streamer-in.c $(CONFIG_H) $(SYSTEM_H) \ + coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ + $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ + cppbuiltin.h tree-iterator.h diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c new file mode 100644 index 000..aaa0a6c --- /dev/null +++ b/gcc/cp/pph-streamer-in.c @@ -0,0 +1,155 @@ +/* Routines for reading PPH data. + Copyright (C) 2011 Free Software Foundation, Inc. + Contributed by Diego Novillo dnovi...@google.com. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tree.h +#include langhooks.h +#include tree-iterator.h +#include tree-pretty-print.h +#include lto-streamer.h +#include pph-streamer.h
Random cleanups [4/4]: Streamlining streamer
Hi, I fear I wasn't as thorough in also splitting this one into several patches, but the different cleanups are at least mostly in different files. They are: * lto-lang remembers all builtin decls in a local list, to be returned by the getdecls langhook. But as we have our own write_globals langhook this isn't actually called (except by dbxout.c), so there's no point in remembering. * lto.c:lto_materialize_function has code to read in the function body sections, do something with them in non-wpa mode, and discard them then. There's no point in even reading them in in non-wpa mode (except for a dubious error message that rather is worth an assert). * gimple.c:gimple_type_leader_entry is a fixed length cache for speeding up our type merging machinery. It can hold references to many meanwhile merged trees, interferring with the wish of free up much memory with a ggc_collect with early-merging LTO. We can simply make it deletable. * ipa-inline.c: some tidying in not calling a macro with function call arguments, and calling a costly function only after early-outs. * lto-streamer-out.c : it writes out and compares strings character by character. memcmp and output_data_stream work just as well * lto-streamer: output_unreferenced_globals writes out all global varpool decls. The reading side simply reads over all of them, and ignores them. This was supposed to help symbol resolution, and it probably once did. But since some time we properly emit varpool and cgraph nodes, and references between them, and a proper symtab. There's no need for emitting these trees again. * lto-streamer: the following changes the bytecode: 1: all indices into the cache are unsigned, hence we should say so, instead of casting casts back and forth 2: trees are only appended to the cache, when writing out. When reading in we read in all trees in the stream one after the other, also appending to the cache. References to existing trees _always_ are to - well - existing trees, hence to those already emitted earlier in the stream, i.e. with a smaller offset, and more importantly with a known index even at reader side. So, the offset never is used, so remove that and all associated tracking and params. 3: for the same reason we also don't need to stream the index that new trees get in the cache. They will get exactly the ones they also had when writing out. We could use it as consistency check, but we stream the expected tree-node for back-references for that already. Obviously we do need to stream the index in back references (aka pickled references). (the index could change if there's a different set of nodes preloaded into the cache between writing out and reading in. But that would have much worse problems already, silently overwriting slots with trees from the stream; we should do away with the preloaded nodes, and instead rely on type merging to get canonical versions of the builtin trees) Not streaming offset and index for most trees obviously shortens the bytecode somewhat but I don't have statistics on how much. Not much would be my guess. Regstrapped on x86_64-linux with the other three cleanups. Okay for trunk? Ciao, Michael. -- * lto-streamer.h (struct lto_streamer_cache_d): Remove offsets and next_slot members. (lto_streamer_cache_insert, lto_streamer_cache_insert_at, lto_streamer_cache_lookup, lto_streamer_cache_get): Adjust prototypes. (lto_streamer_cache_append): Declare. * lto-streamer.c (lto_streamer_cache_add_to_node_array): Use unsigned index, remove offset parameter, ensure that we append or update existing entries. (lto_streamer_cache_insert_1): Use unsigned index, remove offset_p parameter, update next_slot for append. (lto_streamer_cache_insert): Use unsigned index, remove offset_p parameter. (lto_streamer_cache_insert_at): Likewise. (lto_streamer_cache_append): New function. (lto_streamer_cache_lookup): Use unsigned index. (lto_streamer_cache_get): Likewise. (lto_record_common_node): Don't test tree_node_can_be_shared. (preload_common_node): Adjust call to lto_streamer_cache_insert. (lto_streamer_cache_delete): Don't free offsets member. * lto-streamer-out.c (eq_string_slot_node): Use memcmp. (lto_output_string_with_length): Use lto_output_data_stream. (lto_output_tree_header): Remove ix parameter, don't write it. (lto_output_builtin_tree): Likewise. (lto_write_tree): Adjust callers to above, don't track and write offset, write unsigned index. (output_unreferenced_globals): Don't emit all global vars. (write_global_references): Use unsigned indices. (lto_output_decl_state_refs): Likewise.
Re: [PATCH, ARM] PR47855 Compute attr length for some thumb2 insns
Hi Ramana On Wed, Mar 30, 2011 at 6:35 AM, Ramana Radhakrishnan ramana.radhakrish...@linaro.org wrote: Hi Carrot, How about adding an alternative only enabled for T2 that uses the `l' constraint and inventing new constraints for some of the constant values that are valid for 16 bit instructions since the `I' and `L' constraints have different meanings depending on whether TARGET_32BIT is valid or not ? We could then set the value of the length attribute accordingly. I don't think we can change the meaning of the I and L constraints very easily given the amount of churn that might be needed You are right. Now the logic is much clearer by splitting the constraints. I don't think this and the other conditional branch instruction changes are correct. This could end up being the last instruction in an IT block and will automatically end up getting the 32 bit encoding and end up causing trouble with the length calculations. Remember the 16 bit encoding for the conditional instruction can't be used as the last instruction in an IT block. According to arm architecture reference manual, chapter A8.6.16, neither 16 bit nor 32 bit conditional branch can be used in IT block. Only unconditional branch can be used as the last instruction in IT block, and it isn't related to instruction length. So we don't need to care about branch instruction encoding in IT block. (i num_saves (hi_reg == 0)) - what's the point of going through the register list when hi_reg != 0 in this case ? A comment to explain that the length calculation *must* be kept in sync with the template above might be useful. done. The following patch has been tested on arm qemu. ChangeLog: 2011-03-31 Wei Guozhi car...@google.com PR target/47855 * config/arm/arm.md (arm_cmpsi_insn): Compute attr length. (arm_cond_branch): Likewise. (arm_cond_branch_reversed): Likewise. (arm_jump): Likewise. (push_multi): Likewise. * config/arm/constraints.md (Py): New constraint. Index: constraints.md === --- constraints.md (revision 171337) +++ constraints.md (working copy) @@ -31,7 +31,7 @@ ;; The following multi-letter normal constraints have been used: ;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd -;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px +;; in Thumb-2 state: Ps, Pt, Pu, Pv, Pw, Px, Py ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us @@ -189,6 +189,11 @@ (and (match_code const_int) (match_test TARGET_THUMB2 ival = -7 ival = -1))) +(define_constraint Py + @internal In Thumb-2 state a constant in the range 0 to 255 + (and (match_code const_int) + (match_test TARGET_THUMB2 ival = 0 ival = 255))) + (define_constraint G In ARM/Thumb-2 state a valid FPA immediate constant. (and (match_code const_double) Index: arm.md === --- arm.md (revision 171337) +++ arm.md (working copy) @@ -7109,13 +7109,21 @@ (define_insn *arm_cmpsi_insn [(set (reg:CC CC_REGNUM) - (compare:CC (match_operand:SI 0 s_register_operand r,r) - (match_operand:SI 1 arm_add_operandrI,L)))] + (compare:CC (match_operand:SI 0 s_register_operand l,r,r,r) + (match_operand:SI 1 arm_add_operandPy,r,I,L)))] TARGET_32BIT @ cmp%?\\t%0, %1 + cmp%?\\t%0, %1 + cmp%?\\t%0, %1 cmn%?\\t%0, #%n1 - [(set_attr conds set)] + [(set_attr conds set) + (set (attr length) + (if_then_else + (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) + (lt (symbol_ref which_alternative) (const_int 2))) + (const_int 2) + (const_int 4)))] ) (define_insn *cmpsi_shiftsi @@ -7286,7 +7294,14 @@ return \b%d1\\t%l0\; [(set_attr conds use) - (set_attr type branch)] + (set_attr type branch) + (set (attr length) + (if_then_else + (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) + (and (ge (minus (match_dup 0) (pc)) (const_int -250)) +(le (minus (match_dup 0) (pc)) (const_int 256 + (const_int 2) + (const_int 4)))] ) (define_insn *arm_cond_branch_reversed @@ -7305,7 +7320,14 @@ return \b%D1\\t%l0\; [(set_attr conds use) - (set_attr type branch)] + (set_attr type branch) + (set (attr length) + (if_then_else + (and (ne (symbol_ref TARGET_THUMB2) (const_int 0)) + (and (ge (minus (match_dup 0) (pc)) (const_int -250)) +(le (minus (match_dup 0) (pc)) (const_int 256 + (const_int 2) + (const_int 4)))] ) @@ -7757,7 +7779,14 @@ return \b%?\\t%l0\; } - [(set_attr predicable yes)] + [(set_attr predicable yes) + (set (attr length) + (if_then_else +
Re: [patch, ARM] Fix PR48250, adjust DImode reload address legitimizing
On 2011/3/30 05:28 PM, Richard Earnshaw wrote: On Wed, 2011-03-30 at 15:35 +0800, Chung-Lin Tang wrote: On 2011/3/30 上午 12:23, Richard Earnshaw wrote: On Tue, 2011-03-29 at 22:53 +0800, Chung-Lin Tang wrote: On 2011/3/29 下午 10:26, Richard Earnshaw wrote: On Tue, 2011-03-29 at 18:25 +0800, Chung-Lin Tang wrote: On 2011/3/24 06:51 PM, Richard Earnshaw wrote: On Thu, 2011-03-24 at 12:56 +0900, Chung-Lin Tang wrote: Hi, PR48250 happens under TARGET_NEON, where DImode is included within the valid NEON modes. This turns the range of legitimate constant indexes to step-4 (coproc load/store), thus arm_legitimize_reload_address() when trying to decompose the [reg+index] reload address into [(reg+index_high)+index_low], can cause an ICE later when 'index_low' part is not aligned to 4. I'm not sure why the current DImode index is computed as: low = ((val 0xf) ^ 0x8) - 0x8; the sign-extending into negative values, then subtracting back, actually creates further off indexes. e.g. in the supplied testcase, [sp+13] was turned into [(sp+16)-3]. Hysterical Raisins... the code there was clearly written for the days before we had LDRD in the architecture. At that time the most efficient way to load a 64-bit object was to use the LDM{ia,ib,da,db} instructions. The computation here was (I think), intended to try and make the most efficient use of an add/sub instruction followed by LDM/STM offsetting. At that time the architecture had no unaligned access either, so dealing with 64-bit that were less than 32-bit aligned (in those days 32-bit was the maximum alignment) probably wasn't considered, or couldn't even get through to reload. I see it now. The code in output_move_double() returning assembly for ldm/stm(db/da/ib) for offsets -8/-4/+4 seems to confirm this. I have changed the patch to let the new code handle the TARGET_LDRD case only. The pre-LDRD case is still handled by the original code, with an additional ~0x3 for aligning the offset to 4. I've also added a comment for the pre-TARGET_LDRD case. Please see if the description is accurate enough. My patch changes the index decomposing to a more straightforward way; it also sort of outlines the way the other reload address indexes are broken by using and-masks, is not the most effective. The address is computed by addition, subtracting away the parts to obtain low+high should be the optimal way of giving the largest computable index range. I have included a few Thumb-2 bits in the patch; I know currently arm_legitimize_reload_address() is only used under TARGET_ARM, but I guess it might eventually be turned into TARGET_32BIT. I think this needs to be looked at carefully on ARMv4/ARMv4T to check that it doesn't cause regressions there when we don't have LDRD in the instruction set. I'll be testing the modified patch under an ARMv4/ARMv4T configuration. Okay for trunk if no regressions? Thanks, Chung-Lin PR target/48250 * config/arm/arm.c (arm_legitimize_reload_address): Adjust DImode constant index decomposing under TARGET_LDRD. Clear lower two bits for NEON, Thumb-2, and !TARGET_LDRD. Add comment for !TARGET_LDRD case. This looks technically correct, but I can't help feeling that trying to deal with the bottom two bits being non-zero is not really worthwhile. This hook is an optimization that's intended to generate better code than the default mechanisms that reload provides. It is allowed to fail, but it must say so if it does (by returning false). What this hook is trying to do for DImode is to take an address of the form (reg + TOO_BIG_CONST) and turn it into two instructions that are legitimate: tmp = (REG + LEGAL_BIG_CONST) some_use_of (mem (tmp + SMALL_LEGAL_CONST)) The idea behind the optimization is that LEGAL_BIG_CONST will need fewer instructions to generate than TOO_BIG_CONST. It's unlikely (probably impossible in ARM state) that pushing the bottom two bits of the address into LEGAL_BIG_CONST part of the offset is going to lead to a better code sequence. So I think it would be more sensible to just return false if we have a DImode address with the bottom 2 bits non-zero and then let the normal reload recovery mechanism take over. It is supposed to provide better utilization of the full range of LEGAL_BIG_CONST+SMALL_LEGAL_CONST. I am not sure, but I guess reload will resolve it with the reg+LEGAL_BIG_CONST part only, using only (mem (reg)) for the load/store (correct me if I'm wrong). Also, the new code slighty improves the reloading, for example currently [reg+64] is broken into [(reg+72)-8], creating an additional unneeded reload, which is certainly not good when we have ldrd/strd available. Sorry, didn't mean to suggest that we don't want to do something better for addresses that are a multiple of 4, just that for addresses that aren't (at least) word-aligned that we should just bail as the code in that case won't benefit