Re: PATCH RFA: New configure option --with-native-system-header-dir
On 10/09/2011 08:18 AM, Ian Lance Taylor wrote: +#undef NATIVE_HEADER_HEADER_COMPONENT +#define NATIVE_SYSTEM_HEADER_COMPONENT MINGW Typo (I think), otherwise okay. Paolo
Re: [4/4] Make SMS schedule register moves
Ayal Zaks ayal.z...@gmail.com writes: The issue of assigning stages to reg-moves is mostly relevant for prolog and epilog generation, which requires and receives special attention -- handled very nicely by ps_num_consecutive_stages! Note that currently a simple boolean indicator for (the exceptional case of) double stages would suffice, instead of generalizing to arbitrary nums of consecutive stages (see any potential use for them?). Not in the immediate term. But I think having a boolean indicator would be inconsistent. If the distance field is an int (even though we only expect distance-0 and distance-1 register dependencies) then I think the number of stages should be too. I did wonder originally about using a boolean, but IMO, it makes the code less readable rather than more. Instead of a simple range check like: if (first_stage_for_insn = last_stage_in_range last_stage_for_insn = first_stage_in_range) we end up with the equivalent of: if (first_stage_for_insn = last_stage_in_range (double_stage_move_p (...) ? first_stage_for_insn + 1 = first_stage_in_range : first_stage_for_insn = first_stage_in_range)) with no corresponding simplification elsewhere. Sure. But setting the range can be done by consulting an simple indicator, rather than generalizing to arbitrary stage numbers; e.g.: +ps_num_consecutive_stages (partial_schedule_ptr ps, int id) +{ + if (id = ps-g-num_nodes ps_reg_move (ps, id)-double_stages) +return 2; + else +return 1; +} or - last_u = first_u + ps_num_consecutive_stages (ps, u) - 1; + if (...double_stages) last_u = first_u + 1; + else last_u = first_u; Understood. I still prefer the posted version though. E.g. adding something like this at the end: ??? The algorithm restricts the scheduling window to II cycles. In rare cases, it may be better to allow windows of II+1 cycles. The window would then start and end on the same row, but with different must precede and must follow requirements. Let me know what you think and I'll add it as a follow-on patch. great, thanks. OK, added with the patch below. + + The move is part of a chain that satisfies register dependencies + between a producing ddg node and various consuming ddg nodes. + If some of these dependencies cross a loop iteration (that is, + have a distance of 1) then DISTANCE1_USES is nonnull and contains + the set of uses with distance-1 dependencies. DISTANCE1_USES + is null otherwise. + Maybe clarify that they are upwards-exposed or live-in uses. OK, changed to: The move is part of a chain that satisfies register dependencies between a producing ddg node and various consuming ddg nodes. If some of these dependencies have a distance of 1 (meaning that the use is upward-exposoed) then DISTANCE1_USES is nonnull and exposed (typo) Oops, also fixed below (and applied). Richard gcc/ * modulo-sched.c: Fix comment typo. Mention the possibility of using scheduling windows of II+1 cycles. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c 2011-10-10 12:42:41.0 +0100 +++ gcc/modulo-sched.c 2011-10-11 09:07:08.069166743 +0100 @@ -545,7 +545,7 @@ set_columns_for_ps (partial_schedule_ptr The move is part of a chain that satisfies register dependencies between a producing ddg node and various consuming ddg nodes. If some of these dependencies have a distance of 1 (meaning that - the use is upward-exposoed) then DISTANCE1_USES is nonnull and + the use is upward-exposed) then DISTANCE1_USES is nonnull and contains the set of uses with distance-1 dependencies. DISTANCE1_USES is null otherwise. @@ -1810,7 +1810,11 @@ sms_schedule (void) 41. endif 42. compute epilogue prologue 43. finish - succeeded to schedule -*/ + + ??? The algorithm restricts the scheduling window to II cycles. + In rare cases, it may be better to allow windows of II+1 cycles. + The window would then start and end on the same row, but with + different must precede and must follow requirements. */ /* A limit on the number of cycles that resource conflicts can span. ??? Should be provided by DFA, and be dependent on the type of insn scheduled. Currently
Re: int_cst_hash_table mapping persistence and the garbage collector
On Mon, Oct 10, 2011 at 7:02 PM, Gary Funck g...@intrepid.com wrote: Recently, a few UPC test programs failed to compile due to mis-matches of parameters in a prototype and its corresponding function definition. The mis-match was based upon the apparent inequality of UPC layout qualifiers (blocking factors). UPC blocking factors are integer constants. They are recorded in a hash table indexed by the type tree node that they correspond to. Currently, the test for equality of blocking factors tests only the pointer to the tree node defining the constant. All blocking factors are recorded as sizetype type'd nodes. Given that integer constants are hashed by type/value, it seemed safe to assume that a given blocking factor would map to a single tree node due to the underlying hash method that is used when integral constants are created. Is it valid to assume that pointer equality is sufficient to ensure that two integer constants are equal as long as their type and values are equal? The bug that we ran into occurred because a garbage collection pass was run between the point that the function prototype tree node was created and the point at which the function declaration was processed. The garbage collector decided that the integer constant representing the blocking factor was no longer in use, because it had not been marked. In fact, the integer constant was needed because it appeared in the blocking factor hash table, but not via a direct pointer. Rather it was referenced by nature of the fact that the blocking factor hash table referenced the integer constant that is mapped in the integer constant hash table. Here's a rough diagram: tree (type) - [ blocking factor hash ] - tree (integer constant) tree (integer constant) - [ integer constant hash ] {unique map} - tree (integer constant) When the garbage collector deleted the entry from the integer constant hash, it forced a new integer constant tree node to be created for the same (type, value) integral constant blocking factor. One easy way to address the current issue is to call tree_int_cst_equal() if the integer constant tree pointers do not match: if ((c1 != c2) !tree_int_cst_equal (c1, c2)) /* integer constants aren't equal. */ This may be necessary if 'int_cst_hash_table' is viewed as a cache rather than a persistent, unique mapping. Another approach, would be to somehow mark the node in int_cst_hash_table as in use when the blocking factor hash table is traversed by the garbage collector, or to add logic the hash table delete function associated with int_cst_hash_table; to dis-allow the delete if the integer constant is present in the UPC blocking factor hash table. To effect this change in a modular way probably the hash table delete function associated with 'int_cst_hash_table' would have to be daisy-chained, where the UPC blocking factor check is made first. The difficulty with implementing the daisy chaining is that int_cst_hash_table needs to exist before the UPC-related initialization code is run. One way to handle this might be yet another language hook, called from the code that creates 'int_cst_hash_table'. That seems overly complex. For reference, the current blocking factor mapping table is created as follows: upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash, tree_map_eq, 0); Summary: 1. Is it valid to assume that pointer equality is sufficient to compare two integer constants for equality as long as they have identical type and value? Yes, if both constants are live 2. Should 'int_cst_hash_table' be viewed as a cache, where the mapping of a given (type, value) integer constant may vary over time? Yes, if a constant is unused it may get collected and re-allocated later. Cannot be observed from any valid use of 1. 3. If the answer to 1. is yes and the answer to 2. is no then what is the recommended way to ensure that nodes in 'int_cst_hash_table' are not removed if the integer constant is being referenced via the 'upc_block_factor_for_type' hash table? You need to ensure the constants are marked properly. Richard. thanks, - Gary
[Patch,AVR]: Housekeeping avr_legitimate_address_p
This is bit of code cleanup and move macro code from avr.h to functions in avr.c. There's no change in functionality. Passed without regressions. Ok? Johann * config/avr/avr-protos.h (avr_mode_code_base_reg_class): New prototype. (avr_regno_mode_code_ok_for_base_p): New prototype. * config/avr/avr.h (BASE_REG_CLASS): Remove. (REGNO_OK_FOR_BASE_P): Remove. (REG_OK_FOR_BASE_NOSTRICT_P): Remove. (REG_OK_FOR_BASE_STRICT_P): Remove. (MODE_CODE_BASE_REG_CLASS): New define. (REGNO_MODE_CODE_OK_FOR_BASE_P): New define. * config/avr/avr.c (avr_mode_code_base_reg_class): New function. (avr_regno_mode_code_ok_for_base_p): New function. (avr_reg_ok_for_addr_p): New static function. (avr_legitimate_address_p): Use it. Beautify. Index: config/avr/avr-protos.h === --- config/avr/avr-protos.h (revision 179765) +++ config/avr/avr-protos.h (working copy) @@ -106,6 +106,8 @@ extern int avr_simplify_comparison_p (en extern RTX_CODE avr_normalize_condition (RTX_CODE condition); extern void out_shift_with_cnt (const char *templ, rtx insn, rtx operands[], int *len, int t_len); +extern reg_class_t avr_mode_code_base_reg_class (enum machine_mode, RTX_CODE, RTX_CODE); +extern bool avr_regno_mode_code_ok_for_base_p (int, enum machine_mode, RTX_CODE, RTX_CODE); extern rtx avr_incoming_return_addr_rtx (void); extern rtx avr_legitimize_reload_address (rtx, enum machine_mode, int, int, int, int, rtx (*)(rtx,int)); #endif /* RTX_CODE */ Index: config/avr/avr.c === --- config/avr/avr.c (revision 179765) +++ config/avr/avr.c (working copy) @@ -1202,43 +1202,68 @@ avr_cannot_modify_jumps_p (void) } +/* Helper function for `avr_legitimate_address_p'. */ + +static inline bool +avr_reg_ok_for_addr_p (rtx reg, addr_space_t as ATTRIBUTE_UNUSED, int strict) +{ + return (REG_P (reg) + (avr_regno_mode_code_ok_for_base_p (REGNO (reg), + QImode, MEM, UNKNOWN) + || (!strict + REGNO (reg) = FIRST_PSEUDO_REGISTER))); +} + + /* Return nonzero if X (an RTX) is a legitimate memory address on the target machine for a memory operand of mode MODE. */ -bool +static bool avr_legitimate_address_p (enum machine_mode mode, rtx x, bool strict) { reg_class_t r = NO_REGS; - if (REG_P (x) (strict ? REG_OK_FOR_BASE_STRICT_P (x) -: REG_OK_FOR_BASE_NOSTRICT_P (x))) -r = POINTER_REGS; + if (REG_P (x) + avr_reg_ok_for_addr_p (x, ADDR_SPACE_GENERIC, strict)) +{ + r = POINTER_REGS; +} else if (CONSTANT_ADDRESS_P (x)) -r = ALL_REGS; +{ + r = ALL_REGS; +} else if (GET_CODE (x) == PLUS REG_P (XEXP (x, 0)) - GET_CODE (XEXP (x, 1)) == CONST_INT - INTVAL (XEXP (x, 1)) = 0) +CONST_INT_P (XEXP (x, 1)) +INTVAL (XEXP (x, 1)) = 0) { - int fit = INTVAL (XEXP (x, 1)) = MAX_LD_OFFSET (mode); + rtx reg = XEXP (x, 0); + bool fit = INTVAL (XEXP (x, 1)) = MAX_LD_OFFSET (mode); + if (fit) - { - if (! strict - || REGNO (XEXP (x,0)) == REG_X - || REGNO (XEXP (x,0)) == REG_Y - || REGNO (XEXP (x,0)) == REG_Z) - r = BASE_POINTER_REGS; - if (XEXP (x,0) == frame_pointer_rtx - || XEXP (x,0) == arg_pointer_rtx) - r = BASE_POINTER_REGS; - } - else if (frame_pointer_needed XEXP (x,0) == frame_pointer_rtx) - r = POINTER_Y_REGS; +{ + if (! strict + || REGNO (reg) == REG_X + || REGNO (reg) == REG_Y + || REGNO (reg) == REG_Z) +{ + r = BASE_POINTER_REGS; +} + + if (reg == frame_pointer_rtx + || reg == arg_pointer_rtx) +{ + r = BASE_POINTER_REGS; +} +} + else if (frame_pointer_needed reg == frame_pointer_rtx) +{ + r = POINTER_Y_REGS; +} } else if ((GET_CODE (x) == PRE_DEC || GET_CODE (x) == POST_INC) REG_P (XEXP (x, 0)) -(strict ? REG_OK_FOR_BASE_STRICT_P (XEXP (x, 0)) - : REG_OK_FOR_BASE_NOSTRICT_P (XEXP (x, 0 +avr_reg_ok_for_addr_p (XEXP (x, 0), ADDR_SPACE_GENERIC, strict)) { r = POINTER_REGS; } @@ -1269,7 +1294,7 @@ avr_legitimate_address_p (enum machine_m /* Attempts to replace X with a valid memory address for an operand of mode MODE */ -rtx +static rtx avr_legitimize_address (rtx x, rtx oldx, enum machine_mode mode) { bool big_offset_p = false; @@ -7170,6 +7195,51 @@ avr_hard_regno_mode_ok (int regno, enum } +/* Implement `MODE_CODE_BASE_REG_CLASS'. */ + +reg_class_t +avr_mode_code_base_reg_class (enum machine_mode mode ATTRIBUTE_UNUSED, +
Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix
Unfortunately, the report was correct in that clang is producing incorrect code and abusing the higher bits of the class-info field to store some other information. The clang folks are pretty responsive. I'd always give them a chance to `fix' thier code, before putting hack-arounds in our code in general. That discussion did happen in private. It wasn't pleasant. They won't change their code. In fact, I just want to fix things and not get into more discussions. Anyhow, summarizing, the traditional GNU runtime ABI has the values 0x1L or 0x2L in the class-info field. But there is no formal definition document for the ABI, so all we can say is that GCC has always set that field to either 0x1L or 0x2L. By the way, the lack of a formal definition document is a problem, and if, at some point, I get to implement a new ABI for the GNU Objective-C runtime (which I want to do) I will produce a formal document describing it - so that anyone can implement a compatible compiler or runtime. But, for the existing ABI, there is no document describing it, hence all that can be said is that GCC only stores the values 0x1L or 0x2L in the class-field. The GNU runtime then uses some of the other bits to store information on the class at runtime - eg. when the class is +initialized it sets a bit, when it is resolved it sets another, etc. clang started abusing a higher bit of that field to store information not normally present in the ABI. That worked with older versions of the GNU runtime, because (by sheer chance in my view) the higher bit they set was not being used. The fact that it was not being used was an implementation accident (in my view) since other higher bits were actually used. The new GNU runtime included in GCC 4.6.x and higher has classes in construction (part of the new Objective-C API) and so the next available bit in the class-info field was used to keep track of the fact that a class is in construction. That was just the next available bit, but (unknown to me) it is precisely the bit that clang was (ab)using. As a consequence, code compiled with clang no longer works with the GNU runtime from GCC 4.6.x. As there is no formal definition document for the ABI, while it seems obvious to me that they broke the ABI (since they produce object files with some reserved bits set that no version of GCC would ever produce), they claim they didn't because their hack worked with GCC up to 4.5.x and the GNU runtime ignored whether that bit was set or not - up until 4.5.x. It's a standoff because they use that higher bit to basically produce a richer ABI, so they can't easily get rid of it now, and they won't. The hack-around I added clears this higher bit, unlocks the standoff and gets things to work again. Let's hope there are no more such issues, and if we introduce a new GNU Objective-C runtime ABI, we need to make sure it is well documented so that it is possible to easily ensure compatibility between different compilers and runtimes. Thanks
[RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP.
This series of 5 patches generate LDRD/STRD instead of POP/PUSH in epilogue/prologue for ARM and Thumb-2 mode of A15. Patch [1/5] introduces new field in tune which can be used to indicate whether LDRD/STRD are preferred over POP/PUSH by the specific core. Patches [2-5/5] use this field to determine if LDRD/STRD can be generated instead of PUSH/POP in ARM and Thumb-2 mode. Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15. This patch depends on patch [1/5]. Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15. This patch depends for variables, functions and patterns defined in [1/5] and [2/5]. Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This patch depends on [1/5]. Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This patch depends for variables, functions and patterns defined in [1/5] and [4/5]. All these patches depend upon the Thumb2/ARM RTL epilogue patches http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html, http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for review. All these patches are applied in given order and tested with check-gcc, check-gdb and bootstrap without regression. In case of ARM mode, significant performance improvement can be seen on some parts of a popular embedded consumer benchmark (~26%). However, in most of the cases, not much effect is seen on performance. (~ 3% improvement) In case of thumb2, the performance improvement observed on same parts the benchmark is ~11% (2.5% improvement). --
Re: int_cst_hash_table mapping persistence and the garbage collector
In fact, the integer constant was needed because it appeared in the blocking factor hash table, but not via a direct pointer. Rather it was referenced by nature of the fact that the blocking factor hash table referenced the integer constant that is mapped in the integer constant hash table. You'd need to elaborate here: what does by nature of the fact that mean? When the garbage collector deleted the entry from the integer constant hash, it forced a new integer constant tree node to be created for the same (type, value) integral constant blocking factor. One easy way to address the current issue is to call tree_int_cst_equal() if the integer constant tree pointers do not match: if ((c1 != c2) !tree_int_cst_equal (c1, c2)) /* integer constants aren't equal. */ You have two objects C1 and C2 for the same constant and you're comparing them. One was created first, say C1. If C1 was still live when C2 was created, why was C2 created in the first class? If C1 wasn't live anymore when C2 was created, why are you still using C1 here? -- Eric Botcazou
[RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.
Hi! This patch adds new field in tune_params to indicate if LDRD/STRD are preferred over PUSH/POP in prologue/epilogue of specific core. It also creates new tune for cortex-A15 and updates tunes for other cores to set new field to default value. Changelog entry for Patch to create tune for cortex-a15: 2011-10-11 Sameera Deshpande sameera.deshpa...@arm.com * config/arm/arm-cores.def (cortex_a15): Update. * config/arm/arm-protos.h (struct tune_params): Add new field... (arm_gen_ldrd_strd): ... this. * config/arm/arm.c (arm_slowmul_tune): Add arm_gen_ldrd_strd field settings. (arm_fastmul_tune): Likewise. (arm_strongarm_tune): Likewise. (arm_xscale_tune): Likewise. (arm_9e_tune): Likewise. (arm_v6t2_tune): Likewise. (arm_cortex_tune): Likewise. (arm_cortex_a5_tune): Likewise. (arm_cortex_a9_tune): Likewise. (arm_fa726te_tune): Likewise. (arm_cortex_a15_tune): New variable. -- On Tue, 2011-10-11 at 10:08 +0100, Sameera Deshpande wrote: This series of 5 patches generate LDRD/STRD instead of POP/PUSH in epilogue/prologue for ARM and Thumb-2 mode of A15. Patch [1/5] introduces new field in tune which can be used to indicate whether LDRD/STRD are preferred over POP/PUSH by the specific core. Patches [2-5/5] use this field to determine if LDRD/STRD can be generated instead of PUSH/POP in ARM and Thumb-2 mode. Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15. This patch depends on patch [1/5]. Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15. This patch depends for variables, functions and patterns defined in [1/5] and [2/5]. Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This patch depends on [1/5]. Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This patch depends for variables, functions and patterns defined in [1/5] and [4/5]. All these patches depend upon the Thumb2/ARM RTL epilogue patches http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html, http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for review. All these patches are applied in given order and tested with check-gcc, check-gdb and bootstrap without regression. In case of ARM mode, significant performance improvement can be seen on some parts of a popular embedded consumer benchmark (~26%). However, in most of the cases, not much effect is seen on performance. (~ 3% improvement) In case of thumb2, the performance improvement observed on same parts the benchmark is ~11% (2.5% improvement). diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 742b5e8..1b42713 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -128,7 +128,7 @@ ARM_CORE(generic-armv7-a, genericv7a, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a5, cortexa5, 7A, FL_LDSCHED, cortex_a5) ARM_CORE(cortex-a8, cortexa8, 7A, FL_LDSCHED, cortex) ARM_CORE(cortex-a9, cortexa9, 7A, FL_LDSCHED, cortex_a9) -ARM_CORE(cortex-a15, cortexa15, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex) +ARM_CORE(cortex-a15, cortexa15, 7A, FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15) ARM_CORE(cortex-r4, cortexr4, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r4f, cortexr4f, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r5, cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index f69bc42..c6b8f71 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -243,6 +243,9 @@ struct tune_params int l1_cache_line_size; bool prefer_constant_pool; int (*branch_cost) (bool, bool); + /* This flag indicates if STRD/LDRD instructions are preferred + over PUSH/POP in epilogue/prologue. */ + bool prefer_ldrd_strd; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 6c09267..d709375 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -850,7 +850,8 @@ const struct tune_params arm_slowmul_tune = 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ - arm_default_branch_cost + arm_default_branch_cost, + false /* Prefer LDRD/STRD. */ }; const struct tune_params arm_fastmul_tune = @@ -861,7 +862,8 @@ const struct tune_params arm_fastmul_tune = 5, /* Max cond insns. */ ARM_PREFETCH_NOT_BENEFICIAL, true, /* Prefer constant pool. */ - arm_default_branch_cost + arm_default_branch_cost, + false /* Prefer LDRD/STRD. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -875,7 +877,8 @@ const struct tune_params
[RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
Hi! This patch generates LDRD instead of POP for Thumb2 epilogue in A15. For optimize_size, original epilogue is generated for A15. The work involves defining new functions, predicates and patterns. As LDRD cannot be generated for PC, if PC is in register-list, LDRD is generated for all other registers in the list which can form register pair. Then LDR with return is generated if PC is the only register left to be popped, otherwise POP with return is generated. The patch is tested with check-gcc, check-gdb and bootstrap with no regression. Changelog entry for Patch to emit LDRD for thumb2 epilogue in A15: 2011-10-11 Sameera Deshpande sameera.deshpa...@arm.com * config/arm/arm-protos.h (bad_reg_pair_for_thumb_ldrd_strd): New declaration. * config/arm/arm.c (bad_reg_pair_for_thumb_ldrd_strd): New helper function. (thumb2_emit_ldrd_pop): New static function. (thumb2_expand_epilogue): Update functions. * config/arm/constraints.md (Pz): New constraint. * config/arm/ldmstm.md (thumb2_ldrd_base): New pattern. (thumb2_ldrd): Likewise. * config/arm/predicates.md (ldrd_immediate_operand): New predicate. -- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index c6b8f71..06a67b5 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -202,6 +202,7 @@ extern void thumb_reload_in_hi (rtx *); extern void thumb_set_return_address (rtx, rtx); extern const char *thumb1_output_casesi (rtx *); extern const char *thumb2_output_casesi (rtx *); +extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx); #endif /* Defined in pe.c. */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d709375..3eba510 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -15410,6 +15410,155 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg) par = emit_insn (par); add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf); } +bool +bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2) +{ + return (GET_CODE (src1) != REG + || GET_CODE (src2) != REG + || (REGNO (src1) == PC_REGNUM) + || (REGNO (src1) == SP_REGNUM) + || (REGNO (src1) == REGNO (src2)) + || (REGNO (src2) == PC_REGNUM) + || (REGNO (src2) == SP_REGNUM)); +} + +/* Generate and emit a pattern that will be recognized as LDRD pattern. If even + number of registers are being popped, multiple LDRD patterns are created for + all register pairs. If odd number of registers are popped, last register is + loaded by using LDR pattern. */ +static bool +thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return) +{ + int num_regs = 0; + int i, j; + rtx par = NULL_RTX; + rtx dwarf = NULL_RTX; + rtx tmp, reg, tmp1; + + for (i = 0; i = LAST_ARM_REGNUM; i++) +if (saved_regs_mask (1 i)) + num_regs++; + + gcc_assert (num_regs num_regs = 16); + gcc_assert (really_return || ((saved_regs_mask (1 PC_REGNUM)) == 0)); + + if (really_return (saved_regs_mask (1 PC_REGNUM))) +/* We cannot generate ldrd for PC. Hence, reduce the count if PC is + to be popped. So, if num_regs is even, now it will become odd, + and we can generate pop with PC. If num_regs is odd, it will be + even now, and ldr with return can be generated for PC. */ +num_regs--; + + for (i = 0, j = 0; i (num_regs - (num_regs % 2)); j++) +/* Var j iterates over all the registers to gather all the registers in + saved_regs_mask. Var i gives index of saved registers in stack frame. + A PARALLEL RTX of register-pair is created here, so that pattern for + LDRD can be matched. As PC is always last register to be popped, and + we have already decremented num_regs if PC, we don't have to worry + about PC in this loop. */ +if (saved_regs_mask (1 j)) + { +gcc_assert (j != SP_REGNUM); + +/* Create RTX for memory load. New RTX is created for dwarf as + they are not sharable. */ +reg = gen_rtx_REG (SImode, j); +tmp = gen_rtx_SET (SImode, + reg, + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4 * i))); + +tmp1 = gen_rtx_SET (SImode, + reg, + gen_frame_mem (SImode, + plus_constant (stack_pointer_rtx, 4 * i))); +RTX_FRAME_RELATED_P (tmp) = 1; +RTX_FRAME_RELATED_P (tmp1) = 1; + +if (i % 2 == 0) + { +/* When saved-register index (i) is even, the RTX to be emitted is + yet to be created. Hence create it first. The LDRD pattern we + are
[RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.
Hi! This patch generates STRD instead of PUSH in prologue for A15 ARM mode. For optimize_size, original prologue is generated for A15. The work involves defining new functions, predicates and patterns, along with minor changes in existing code: * STRD in ARM mode needs consecutive registers to be stored. The performance of compiler degrades greatly if R3 is pushed for stack alignment as it generates single LDR for pushing R3. Instead, having SUB instruction to do stack adjustment is more efficient. Hence, the condition in arm_get_frame_offsets () is changed to disable push-in-R3 if prefer_ldrd_strd in ARM mode. In this patch we keep on accumulating non-consecutive registers till register-pair to be pushed is found. Then, first PUSH all the accumulated registers, followed by STRD with pre-stack update for register-pair. We repeat this until all the registers in register-list are PUSHed. The patch is tested with check-gcc, check-gdb and bootstrap with no regression. Changelog entry for Patch to emit STRD for ARM prologue in A15: 2011-10-11 Sameera Deshpande sameera.deshpa...@arm.com * config/arm/arm-protos.h (bad_reg_pair_for_arm_ldrd_strd): New declaration. * config/arm/arm.c (arm_emit_strd_push): New static function. (bad_reg_pair_for_arm_ldrd_strd): New helper function. (arm_expand_prologue): Update. (arm_get_frame_offsets): Update. * config/arm/ldmstm.md (arm_strd_base): New pattern. -- diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 06a67b5..d5287ad 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -162,6 +162,7 @@ extern const char *arm_output_memory_barrier (rtx *); extern const char *arm_output_sync_insn (rtx, rtx *); extern unsigned int arm_sync_loop_insns (rtx , rtx *); extern int arm_attr_length_push_multi(rtx, rtx); +extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx); #if defined TREE_CODE extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fd8c31d..08fa0d5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int); static void arm_print_operand (FILE *, rtx, int); static void arm_print_operand_address (FILE *, rtx); static bool arm_print_operand_punct_valid_p (unsigned char code); +static rtx emit_multi_reg_push (unsigned long); static const char *fp_const_from_val (REAL_VALUE_TYPE *); static arm_cc get_arm_condition_code (rtx); static HOST_WIDE_INT int_log2 (HOST_WIDE_INT); @@ -15095,6 +15096,116 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED, } } +/* STRD in ARM mode needs consecutive registers to be stored. This function + keeps accumulating non-consecutive registers until first consecutive register + pair is found. It then generates multi-reg PUSH for all accumulated + registers, and then generates STRD with write-back for consecutive register + pair. This process is repeated until all the registers are stored on stack. + multi-reg PUSH takes care of lone registers as well. */ +static void +arm_emit_strd_push (unsigned long saved_regs_mask) +{ + int num_regs = 0; + int i, j; + rtx par = NULL_RTX; + rtx dwarf = NULL_RTX; + rtx insn = NULL_RTX; + rtx tmp, tmp1; + unsigned long regs_to_be_pushed_mask; + + for (i = 0; i = LAST_ARM_REGNUM; i++) +if (saved_regs_mask (1 i)) + num_regs++; + + gcc_assert (num_regs num_regs = 16); + + for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i num_regs; j--) +/* Var j iterates over all registers to gather all registers in + saved_regs_mask. Var i is used to count number of registers stored on + stack. regs_to_be_pushed_mask accumulates non-consecutive registers + that can be pushed using multi-reg PUSH before STRD is generated. */ +if (saved_regs_mask (1 j)) + { +gcc_assert (j != SP_REGNUM); +gcc_assert (j != PC_REGNUM); +i++; + +if ((j % 2 == 1) + (saved_regs_mask (1 (j - 1))) + regs_to_be_pushed_mask) + { +/* Current register and previous register form register pair for + which STRD can be generated. Hence, emit PUSH for accumulated + registers and reset regs_to_be_pushed_mask. */ +insn = emit_multi_reg_push (regs_to_be_pushed_mask); +regs_to_be_pushed_mask = 0; +RTX_FRAME_RELATED_P (insn) = 1; +continue; + } + +regs_to_be_pushed_mask |= (1 j); + +if ((j % 2) == 0 (saved_regs_mask (1 (j + 1 + { +/* We have found 2 consecutive registers, for which STRD can be + generated. Generate pattern to emit STRD as accumulated
[RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.
Hi! This patch generates LDRD instead of POP in epilogue for A15 ARM mode. For optimize_size, original epilogue is generated for A15. The work involves defining new functions, predicates and patterns. In this patch we keep on accumulating non-consecutive registers till register-pair to be popped is found. Then, first POP all the accumulated registers, followed by LDRD with post-stack update for register-pair. We repeat this until all the registers in register-list are POPPed. The patch is tested with check-gcc, check-gdb and bootstrap with no regression. Changelog entry for Patch to emit LDRD for ARM epilogue in A15: 2011-10-11 Sameera Deshpande sameera.deshpa...@arm.com * config/arm/arm.c (arm_emit_ldrd_pop): New static function. (arm_expand_epilogue): Update. * config/arm/ldmstm.md (arm_ldrd_base): New pattern. (arm_ldr_with_update): Likewise. -- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 08fa0d5..0b9fd93 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -967,7 +967,7 @@ const struct tune_params arm_cortex_a9_tune = ARM_PREFETCH_BENEFICIAL(4,32,32), false, /* Prefer constant pool. */ arm_default_branch_cost, - false /* Prefer LDRD/STRD. */ + true /* Prefer LDRD/STRD. */ }; const struct tune_params arm_fa726te_tune = @@ -15664,6 +15664,145 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2) || (REGNO (src2) == SP_REGNUM)); } +/* LDRD in ARM mode needs consecutive registers to be stored. This function + keeps accumulating non-consecutive registers until first consecutive register + pair is found. It then generates multi-reg POP for all accumulated + registers, and then generates LDRD with write-back for consecutive register + pair. This process is repeated until all the registers are loaded from + stack. multi-reg POP takes care of lone registers as well. However, LDRD + cannot be generated for PC, as results are unpredictable. Hence, if PC is + in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN + depending upon number of registers in REGS_TO_BE_POPPED_MASK. */ +static void +arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return) +{ + int num_regs = 0; + int i, j; + rtx par = NULL_RTX; + rtx insn = NULL_RTX; + rtx dwarf = NULL_RTX; + rtx tmp, tmp1; + unsigned long regs_to_be_popped_mask = 0; + bool pc_in_list = false; + + for (i = 0; i = LAST_ARM_REGNUM; i++) +if (saved_regs_mask (1 i)) + num_regs++; + + gcc_assert (num_regs num_regs = 16); + + for (i = 0, j = 0; i num_regs; j++) +if (saved_regs_mask (1 j)) + { +i++; +if ((j % 2) == 0 + (saved_regs_mask (1 (j + 1))) + (j + 1) != SP_REGNUM + (j + 1) != PC_REGNUM + regs_to_be_popped_mask) + { +/* Current register and next register form register pair for which + LDRD can be generated. Generate POP for accumulated registers + and reset regs_to_be_popped_mask. SP should be handled here as + the results are unpredictable if register being stored is same + as index register (in this case, SP). PC is always the last + register being popped. Hence, we don't have to worry about PC + here. */ +arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list); +pc_in_list = false; +regs_to_be_popped_mask = 0; +continue; + } + +if (j == PC_REGNUM) + { +gcc_assert (really_return); +pc_in_list = 1; + } + +regs_to_be_popped_mask |= (1 j); + +if ((j % 2) == 1 + (saved_regs_mask (1 (j - 1))) + j != SP_REGNUM + j != PC_REGNUM) + { + /* Generate a LDRD for register pair R_j, R_j+1. The pattern +generated here is +[(SET SP, (PLUS SP, 8)) + (SET R_j-1, (MEM SP)) + (SET R_j, (MEM (PLUS SP, 4)))]. */ + par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3)); + dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3)); + + tmp = gen_rtx_SET (VOIDmode, +stack_pointer_rtx, +plus_constant (stack_pointer_rtx, 8)); + tmp1 = gen_rtx_SET (VOIDmode, + stack_pointer_rtx, + plus_constant (stack_pointer_rtx, 8)); + RTX_FRAME_RELATED_P (tmp) = 1; + RTX_FRAME_RELATED_P (tmp1) = 1; + XVECEXP (par, 0, 0) = tmp; + XVECEXP (dwarf, 0, 0) = tmp1; + +
Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix
On Oct 11, 2011, at 2:05 AM, Nicola Pero wrote: Unfortunately, the report was correct in that clang is producing incorrect code and abusing the higher bits of the class-info field to store some other information. The clang folks are pretty responsive. I'd always give them a chance to `fix' thier code, before putting hack-arounds in our code in general. That discussion did happen in private. It wasn't pleasant. They won't change their code. Right, then, it isn't a bug, but rather a shared ABI that we choose to be compatible with. We fix in in our system by noticing how we must set or not set the bit in our abi document and code and go on with life, it is too short. It's a standoff It isn't a standoff, we can choose to just fix the issue and be compatible, if we want.
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
Hi Uros, you was right both with fpmath and configflags. That is why it was passing for me. Attached patch which cures the problem. testsuite/ChangeLog entry: 2011-10-11 Kirill Yukhin kirill.yuk...@intel.com * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse. * gcc.target/i386/fma_double_2.c: Ditto. * gcc.target/i386/fma_double_3.c: Ditto. * gcc.target/i386/fma_double_4.c: Ditto. * gcc.target/i386/fma_double_5.c: Ditto. * gcc.target/i386/fma_double_6.c: Ditto. * gcc.target/i386/fma_float_1.c: Ditto. * gcc.target/i386/fma_float_2.c: Ditto. * gcc.target/i386/fma_float_3.c: Ditto. * gcc.target/i386/fma_float_4.c: Ditto. * gcc.target/i386/fma_float_5.c: Ditto. * gcc.target/i386/fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. * gcc.target/i386/l_fma_float_1.c: Ditto. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_run_double_1.c: Ditto. * gcc.target/i386/l_fma_run_double_2.c: Ditto. * gcc.target/i386/l_fma_run_double_3.c: Ditto. * gcc.target/i386/l_fma_run_double_4.c: Ditto. * gcc.target/i386/l_fma_run_double_5.c: Ditto. * gcc.target/i386/l_fma_run_double_6.c: Ditto. * gcc.target/i386/l_fma_run_float_1.c: Ditto. * gcc.target/i386/l_fma_run_float_2.c: Ditto. * gcc.target/i386/l_fma_run_float_3.c: Ditto. * gcc.target/i386/l_fma_run_float_4.c: Ditto. * gcc.target/i386/l_fma_run_float_5.c: Ditto. * gcc.target/i386/l_fma_run_float_6.c: Ditto. Could you please have a look? Sorry for inconvenience, K fma3-tests-fix.gcc.patch Description: Binary data
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
On Tue, Oct 11, 2011 at 12:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Uros, you was right both with fpmath and configflags. That is why it was passing for me. Attached patch which cures the problem. testsuite/ChangeLog entry: 2011-10-11 Kirill Yukhin kirill.yuk...@intel.com * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse. * gcc.target/i386/fma_double_2.c: Ditto. * gcc.target/i386/fma_double_3.c: Ditto. * gcc.target/i386/fma_double_4.c: Ditto. * gcc.target/i386/fma_double_5.c: Ditto. * gcc.target/i386/fma_double_6.c: Ditto. * gcc.target/i386/fma_float_1.c: Ditto. * gcc.target/i386/fma_float_2.c: Ditto. * gcc.target/i386/fma_float_3.c: Ditto. * gcc.target/i386/fma_float_4.c: Ditto. * gcc.target/i386/fma_float_5.c: Ditto. * gcc.target/i386/fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. * gcc.target/i386/l_fma_float_1.c: Ditto. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_run_double_1.c: Ditto. * gcc.target/i386/l_fma_run_double_2.c: Ditto. * gcc.target/i386/l_fma_run_double_3.c: Ditto. * gcc.target/i386/l_fma_run_double_4.c: Ditto. * gcc.target/i386/l_fma_run_double_5.c: Ditto. * gcc.target/i386/l_fma_run_double_6.c: Ditto. * gcc.target/i386/l_fma_run_float_1.c: Ditto. * gcc.target/i386/l_fma_run_float_2.c: Ditto. * gcc.target/i386/l_fma_run_float_3.c: Ditto. * gcc.target/i386/l_fma_run_float_4.c: Ditto. * gcc.target/i386/l_fma_run_float_5.c: Ditto. * gcc.target/i386/l_fma_run_float_6.c: Ditto. OK. (I have also applied your patch to mainline SVN). Thanks, Uros.
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
Thank you! K On Tue, Oct 11, 2011 at 2:19 PM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Oct 11, 2011 at 12:12 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Uros, you was right both with fpmath and configflags. That is why it was passing for me. Attached patch which cures the problem. testsuite/ChangeLog entry: 2011-10-11 Kirill Yukhin kirill.yuk...@intel.com * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse. * gcc.target/i386/fma_double_2.c: Ditto. * gcc.target/i386/fma_double_3.c: Ditto. * gcc.target/i386/fma_double_4.c: Ditto. * gcc.target/i386/fma_double_5.c: Ditto. * gcc.target/i386/fma_double_6.c: Ditto. * gcc.target/i386/fma_float_1.c: Ditto. * gcc.target/i386/fma_float_2.c: Ditto. * gcc.target/i386/fma_float_3.c: Ditto. * gcc.target/i386/fma_float_4.c: Ditto. * gcc.target/i386/fma_float_5.c: Ditto. * gcc.target/i386/fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. * gcc.target/i386/l_fma_float_1.c: Ditto. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_run_double_1.c: Ditto. * gcc.target/i386/l_fma_run_double_2.c: Ditto. * gcc.target/i386/l_fma_run_double_3.c: Ditto. * gcc.target/i386/l_fma_run_double_4.c: Ditto. * gcc.target/i386/l_fma_run_double_5.c: Ditto. * gcc.target/i386/l_fma_run_double_6.c: Ditto. * gcc.target/i386/l_fma_run_float_1.c: Ditto. * gcc.target/i386/l_fma_run_float_2.c: Ditto. * gcc.target/i386/l_fma_run_float_3.c: Ditto. * gcc.target/i386/l_fma_run_float_4.c: Ditto. * gcc.target/i386/l_fma_run_float_5.c: Ditto. * gcc.target/i386/l_fma_run_float_6.c: Ditto. OK. (I have also applied your patch to mainline SVN). Thanks, Uros.
Re: Out-of-order update of new_spill_reg_store[]
I'm not completely following this yet, so please bear with me... On 10/09/11 10:01, Richard Sandiford wrote: Reload 0: GR_REGS, RELOAD_FOR_OUTPUT_ADDRESS (opnum = 0), can't combine, secondary_reload_p reload_reg_rtx: (reg:SI 5 $5) Reload 1: reload_out (SI) = (reg:SI 32 $f0 [1655]) MD1_REG, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (reg:SI 32 $f0 [1655]) reload_reg_rtx: (reg:SI 65 lo) secondary_out_reload = 0 Reload 2: reload_out (SI) = (reg:SI 1656) GR_REGS, RELOAD_FOR_OUTPUT (opnum = 3) reload_out_reg: (reg:SI 1656) reload_reg_rtx: (reg:SI 5 $5) So $5 is first stored in 1656 (operand 3), then $5 is used a secondary reload in copying LO to $f0 (operand 0, reg 1655). The next and final use of 1655 ends up inheriting this second reload of $5, so we try to delete the original output copy. The problem is that we delete the wrong one: we delete the store of $5 to 1656 rather than the copy of $5 to 1655/$f0. So, reload 1 inherited from somewhere else rather than using reg $5 from its secondary reload? Where do we try to delete the insn, and what's the state of the spill_reg_store data at that point? The fix I went for is to clear new_spill_reg_store[] for all reloads as a separate pass (rather than in the main do_{input,output}_reload loop), then only allow new_spill_store_reg[] to be set if the associated reload register reaches the end of the reload sequence. In this case, reload 0 is emitted after reload 2, so it reaches the end. Correct? What would happen if the 0/1 pair and 2 were swapped? Bernd
Re: New warning for expanded vector operations
On Mon, Oct 10, 2011 at 3:21 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Oct 10, 2011 at 12:02 PM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Hi Here is a patch to inform a programmer about the expanded vector operation. Bootstrapped on x86-unknown-linux-gnu. ChangeLog: * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to produce the warning. (expand_vector_parallel): Adjust to produce the warning. Entries start without gcc/, they are relative to the gcc/ChangeLog file. Sure, sorry. (lower_vec_shuffle): Adjust to produce the warning. * gcc/common.opt: New warning Wvector-operation-expanded. * gcc/doc/invoke.texi: Document the wawning. Ok? I don't like the name -Wvector-operation-expanded. We emit a similar warning for missed inline expansions with -Winline, so maybe -Wvector-extensions (that's the name that appears in the C extension documentation). Hm, I don't care much about the name, unless it gets clear what the warning is used for. I am not really sure that Wvector-extensions makes it clear. Also, I don't see anything bad if the warning will pop up during the vectorisation. Any vector operation performed outside the SIMD accelerator looks suspicious, because it actually doesn't improve performance. Such a warning during the vectorisation could mean that a programmer forgot some flag, or the constant propagation failed to deliver a constant, or something else. Conceptually the text I am producing is not really a warning, it is more like an information, but I am not aware of the mechanisms that would allow me to introduce a flag triggering inform () or something similar. What I think we really need to avoid is including this warning in the standard Ox. + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + vector operation will be expanded piecewise); v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta); for (i = 0; i nunits; @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter tree result, compute_type; enum machine_mode mode; int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD; + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + vector operation will be expanded in parallel); what's the difference between 'piecewise' and 'in parallel'? Parallel is a little bit better for performance than piecewise. I see. That difference should probably be documented, maybe with an example. Richard. @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter { int parts_per_word = UNITS_PER_WORD / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1); + location_t loc = gimple_location (gsi_stmt (*gsi)); if (INTEGRAL_TYPE_P (TREE_TYPE (type)) parts_per_word = 4 TYPE_VECTOR_SUBPARTS (type) = 4) - return expand_vector_parallel (gsi, f_parallel, - type, a, b, code); + return expand_vector_parallel (gsi, f_parallel, type, a, b, code); else - return expand_vector_piecewise (gsi, f, - type, TREE_TYPE (type), - a, b, code); + return expand_vector_piecewise (gsi, f, type, + TREE_TYPE (type), a, b, code); } /* Check if vector VEC consists of all the equal elements and unless i miss something loc is unused here. Please avoid random whitespace changes (just review your patch yourself before posting and revert pieces that do nothing). Yes you are right, sorry. +@item -Wvector-operation-expanded +@opindex Wvector-operation-expanded +@opindex Wno-vector-operation-expanded +Warn if vector operation is not implemented via SIMD capabilities of the +architecture. Mainly useful for the performance tuning. I'd mention that this is for vector operations as of the C extension documented in Vector Extensions. The vectorizer can produce some operations that will need further lowering - we probably should make sure to _not_ warn about those. Try running the vect.exp testsuite with the new warning turned on (eventually disabling SSE), like with obj/gcc make check-gcc RUNTESTFLAGS=--target_board=unix/-Wvector-extensions/-mno-sse vect.exp Again, see the
Re: [C++ Patch / RFC] PR 33067
On 10/11/2011 03:04 AM, Jason Merrill wrote: On 10/10/2011 12:40 PM, Paolo Carlini wrote: + // The fraction 643/2136 approximates log10(2) to 7 significant digits. + int max_digits10 = 2 + (is_decimal ? fmt-p : fmt-p * 643L / 2136); Please cite N1822 in the comment and convert it to C syntax. OK with that change. Thanks. The below is what I actually applied. Paolo. / 2011-10-11 Paolo Carlini paolo.carl...@oracle.com PR c++/33067 * c-family/c-pretty-print.c (pp_c_floating_constant): Output max_digits10 (in the ISO C++ WG N1822 sense) decimal digits. Index: c-family/c-pretty-print.c === --- c-family/c-pretty-print.c (revision 179792) +++ c-family/c-pretty-print.c (working copy) @@ -1018,8 +1018,20 @@ pp_c_enumeration_constant (c_pretty_printer *pp, t static void pp_c_floating_constant (c_pretty_printer *pp, tree r) { + const struct real_format *fmt += REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (r))); + + REAL_VALUE_TYPE floating_cst = TREE_REAL_CST (r); + bool is_decimal = floating_cst.decimal; + + /* See ISO C++ WG N1822. Note: The fraction 643/2136 approximates + log10(2) to 7 significant digits. */ + int max_digits10 = 2 + (is_decimal ? fmt-p : fmt-p * 643L / 2136); + real_to_decimal (pp_buffer (pp)-digit_buffer, TREE_REAL_CST (r), - sizeof (pp_buffer (pp)-digit_buffer), 0, 1); + sizeof (pp_buffer (pp)-digit_buffer), + max_digits10, 1); + pp_string (pp, pp_buffer(pp)-digit_buffer); if (TREE_TYPE (r) == float_type_node) pp_character (pp, 'f');
[Committed] S/390: Add -Wno-attributes to testcase options
Hi, in the 20090223-1.c gcc complains about functions possibly not being inlined although they have the always_inline attribute on it. The original failure could only be observed with these functions being inlined. The attached patch forces GCC to suppress the warning. Committed to mainline. Bye, -Andreas- 2011-10-11 Andreas Krebbel andreas.kreb...@de.ibm.com * gcc.target/s390/20090223-1.c: Add -Wno-attributes. Index: gcc/testsuite/gcc.target/s390/20090223-1.c === *** gcc/testsuite/gcc.target/s390/20090223-1.c.orig --- gcc/testsuite/gcc.target/s390/20090223-1.c *** *** 3,9 register asm (0). */ /* { dg-do run } */ ! /* { dg-options -O2 } */ extern void abort (void); --- 3,9 register asm (0). */ /* { dg-do run } */ ! /* { dg-options -O2 -Wno-attributes } */ extern void abort (void);
Re: [PATCH] Fix PR46556 (poor address generation)
On Sat, 8 Oct 2011, William J. Schmidt wrote: Greetings, Here are the revised changes for the tree portions of the patch. I've attempted to resolve all comments to date on those portions. Per Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me know if there's a better place, or whether you'd prefer to leave it where it was. I looked into changing the second reassoc pass to use a different pass_late_reassoc entry, but this impacted the test suite. There are about 20 tests that rely on -fdump-tree-reassoc being associated with two dump files named reassoc1 and reassoc2. Rather than change all these test cases for a temporary solution, I chose to use the deprecated first_pass_instance boolean to distinguish between the two passes. I marked this as a Bad Thing and it will be removed once I have time to work on the straight-line strength reducer. I looked into adding a test case with a negative offset, but was unable to come up with a construct that would have a negative offset on the base MEM_REF and still be recognized by this particular pattern matcher. In any case, the use of double_ints throughout should remove that concern. Comments below. Thanks, Bill 2011-10-08 Bill Schmidt wschm...@linux.vnet.ibm.com gcc: PR rtl-optimization/46556 * tree.h (copy_ref_info): Expose existing function. * tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to... * tree-ssa-address.c (copy_ref_info): ...here, and remove static token. * tree-ssa-reassoc.c (restructure_base_and_offset): New function. (restructure_mem_ref): Likewise. (reassociate_bb): Look for opportunities to call restructure_mem_ref. gcc/testsuite: PR rtl-optimization/46556 * gcc.dg/tree-ssa/pr46556-1.c: New testcase. * gcc.dg/tree-ssa/pr46556-2.c: Likewise. * gcc.dg/tree-ssa/pr46556-3.c: Likewise. Index: gcc/tree.h === --- gcc/tree.h(revision 179708) +++ gcc/tree.h(working copy) @@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree); /* In tree-ssa-address.c. */ extern tree tree_mem_ref_addr (tree, tree); extern void copy_mem_ref_info (tree, tree); +extern void copy_ref_info (tree, tree); /* In tree-vrp.c */ extern bool ssa_name_nonnegative_p (const_tree); Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0) @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-dom2 } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p-a[n], p-c[n], p-b[n]); +} + +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 2 dom2 } } */ +/* { dg-final { cleanup-tree-dump dom2 } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0) @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-dom2 } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p-a[n], p-c[n], p-b[n]); + if (n 12) +foo (p-a[n], p-c[n], p-b[n]); + else if (n 3) +foo (p-b[n], p-a[n], p-c[n]); +} + +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 6 dom2 } } */ +/* { dg-final { cleanup-tree-dump dom2 } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c === --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0) @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fdump-tree-dom2 } */ + +struct x +{ + int a[16]; + int b[16]; + int c[16]; +}; + +extern void foo (int, int, int); + +void +f (struct x *p, unsigned int n) +{ + foo (p-a[n], p-c[n], p-b[n]); + if (n 3) +{ + foo (p-a[n], p-c[n], p-b[n]); + if (n 12) + foo (p-b[n], p-a[n], p-c[n]); +} +} + +/* { dg-final { scan-tree-dump-times \\* 4; 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times p_1\\(D\\) \\+ D 1 dom2 } } */ +/* { dg-final { scan-tree-dump-times MEM\\\[\\(struct x \\*\\)D 6 dom2 } } */ +/* { dg-final { cleanup-tree-dump dom2 } } */ Index:
[PATCH] Fix PR50204
Since we have the alias oracle we no longer optimize the testcase below because I initially restricted the stmt walking to give up for PHIs with more than 2 arguments because of compile-time complexity issues. But it's easy to see that compile-time is not an issue when we reduce PHI args pairwise to a single dominating operand. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-10-11 Richard Guenther rguent...@suse.de PR tree-optimization/50204 * tree-ssa-alias.c (get_continuation_for_phi_1): Split out two argument handling from ... (get_continuation_for_phi): ... here. Handle arbitrary number of PHI args. * gcc.dg/tree-ssa/ssa-fre-36.c: New testcase. Index: gcc/tree-ssa-alias.c === *** gcc/tree-ssa-alias.c(revision 179794) --- gcc/tree-ssa-alias.c(working copy) *** maybe_skip_until (gimple phi, tree targe *** 1875,1880 --- 1875,1934 return true; } + /* For two PHI arguments ARG0 and ARG1 try to skip non-aliasing code +until we hit the phi argument definition that dominates the other one. +Return that, or NULL_TREE if there is no such definition. */ + + static tree + get_continuation_for_phi_1 (gimple phi, tree arg0, tree arg1, + ao_ref *ref, bitmap *visited) + { + gimple def0 = SSA_NAME_DEF_STMT (arg0); + gimple def1 = SSA_NAME_DEF_STMT (arg1); + tree common_vuse; + + if (arg0 == arg1) + return arg0; + else if (gimple_nop_p (def0) + || (!gimple_nop_p (def1) + dominated_by_p (CDI_DOMINATORS, + gimple_bb (def1), gimple_bb (def0 + { + if (maybe_skip_until (phi, arg0, ref, arg1, visited)) + return arg0; + } + else if (gimple_nop_p (def1) + || dominated_by_p (CDI_DOMINATORS, + gimple_bb (def0), gimple_bb (def1))) + { + if (maybe_skip_until (phi, arg1, ref, arg0, visited)) + return arg1; + } + /* Special case of a diamond: +MEM_1 = ... +goto (cond) ? L1 : L2 +L1: store1 = ...#MEM_2 = vuse(MEM_1) + goto L3 +L2: store2 = ...#MEM_3 = vuse(MEM_1) +L3: MEM_4 = PHIMEM_2, MEM_3 + We were called with the PHI at L3, MEM_2 and MEM_3 don't + dominate each other, but still we can easily skip this PHI node + if we recognize that the vuse MEM operand is the same for both, + and that we can skip both statements (they don't clobber us). + This is still linear. Don't use maybe_skip_until, that might + potentially be slow. */ + else if ((common_vuse = gimple_vuse (def0)) + common_vuse == gimple_vuse (def1)) + { + if (!stmt_may_clobber_ref_p_1 (def0, ref) + !stmt_may_clobber_ref_p_1 (def1, ref)) + return common_vuse; + } + + return NULL_TREE; + } + + /* Starting from a PHI node for the virtual operand of the memory reference REF find a continuation virtual operand that allows to continue walking statements dominating PHI skipping only statements that cannot possibly *** get_continuation_for_phi (gimple phi, ao *** 1890,1942 if (nargs == 1) return PHI_ARG_DEF (phi, 0); ! /* For two arguments try to skip non-aliasing code until we hit ! the phi argument definition that dominates the other one. */ ! if (nargs == 2) { tree arg0 = PHI_ARG_DEF (phi, 0); ! tree arg1 = PHI_ARG_DEF (phi, 1); ! gimple def0 = SSA_NAME_DEF_STMT (arg0); ! gimple def1 = SSA_NAME_DEF_STMT (arg1); ! tree common_vuse; ! ! if (arg0 == arg1) ! return arg0; ! else if (gimple_nop_p (def0) ! || (!gimple_nop_p (def1) ! dominated_by_p (CDI_DOMINATORS, ! gimple_bb (def1), gimple_bb (def0 ! { ! if (maybe_skip_until (phi, arg0, ref, arg1, visited)) ! return arg0; ! } ! else if (gimple_nop_p (def1) ! || dominated_by_p (CDI_DOMINATORS, ! gimple_bb (def0), gimple_bb (def1))) ! { ! if (maybe_skip_until (phi, arg1, ref, arg0, visited)) ! return arg1; ! } ! /* Special case of a diamond: ! MEM_1 = ... ! goto (cond) ? L1 : L2 ! L1: store1 = ...#MEM_2 = vuse(MEM_1) ! goto L3 ! L2: store2 = ...#MEM_3 = vuse(MEM_1) ! L3: MEM_4 = PHIMEM_2, MEM_3 !We were called with the PHI at L3, MEM_2 and MEM_3 don't !dominate each other, but still we can easily skip this PHI node !if we recognize that the vuse MEM operand is the same for both, !and that we can skip both statements (they don't clobber us). !This is still linear. Don't use maybe_skip_until, that might !
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Mon, 10 Oct 2011, Kai Tietz wrote: To ensure that we use simple_operand_p in all cases, beside for branching AND/OR chains, in same way as before, I added to this function an additional argument, by which the looking into comparisons can be activated. Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, + tree lhs, tree rhs) { /* If this is the or of two comparisons, we can do something if the comparisons are NE_EXPR. If this is the and, we can do something @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. + if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR) + (BRANCH_COST (optimize_function_for_speed_p (cfun), +false) = 2) + !TREE_SIDE_EFFECTS (arg1) + LOGICAL_OP_NON_SHORT_CIRCUIT + simple_operand_p (arg1, true)) +{ + enum tree_code ncode = (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR +: TRUTH_OR_EXPR); + + /* We don't want to pack more then two leafs to an non-IF Missing continuation of the sentence? + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has side-effects, + or isn't simple, then we can't add to it, as otherwise we might + destroy if-sequence. */ + if (TREE_CODE (arg0) == code + /* Needed for sequence points to handle trappings, and + side-effects. */ +!TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1)) +simple_operand_p (TREE_OPERAND (arg0, 1), true)) + { + tem = fold_build2_loc (loc, ncode, type, TREE_OPERAND (arg0, 1), + arg1); + return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), + tem); + } + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (!TREE_SIDE_EFFECTS (arg0) +simple_operand_p (arg0, true)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); +} + Ciao, Michael.
[C++ Patch] PR 50611
Hi, for this largish testcase (reduced from a big one by Jakub) we ICE due to error reporting routines re-entered. In 4_6-branch the situation is worse, because for the original testcase we don't produce any useful diagnostics at all before ICEing. Thus the below, which seems pretty straightforward to me given that unqualified_name_lookup_error, called by tsubst_copy_and_build (in turn called by tsubst, called by dump_template_bindings) errors out unconditionally. Tested mainline and 4_6-branch. Ok for both? Thanks, Paolo. /// 2011-10-11 Paolo Carlini paolo.carl...@oracle.com PR c++/50611 * pt.c (tsubst_copy_and_build): If (complain tf_error) is false do not call unqualified_name_lookup_error. Index: pt.c === --- pt.c(revision 179798) +++ pt.c(working copy) @@ -13026,7 +13026,11 @@ tsubst_copy_and_build (tree t, if (error_msg) error (error_msg); if (!function_p TREE_CODE (decl) == IDENTIFIER_NODE) - decl = unqualified_name_lookup_error (decl); + { + if (complain tf_error) + unqualified_name_lookup_error (decl); + decl = error_mark_node; + } return decl; }
Re: [pph] Make pph.h _the_ interface header. (issue 5247044)
http://codereview.appspot.com/5247044/diff/1/gcc/cp/pph-streamer.h File gcc/cp/pph-streamer.h (right): http://codereview.appspot.com/5247044/diff/1/gcc/cp/pph-streamer.h#newcode165 gcc/cp/pph-streamer.h:165: struct pph_stream { 165 struct pph_stream { You need the typedef. Stage 1 is still built with C. http://codereview.appspot.com/5247044/
[v3] PR libstdc++/50661
Hi, tested x86_64-linux, committed. Paolo. 2011-10-11 Emil Wojak e...@wojak.eu PR c++/50661 * include/bits/stl_algobase.h (equal): Compare arrays of pointers too with memcmp. Index: include/bits/stl_algobase.h === --- include/bits/stl_algobase.h (revision 179798) +++ include/bits/stl_algobase.h (working copy) @@ -812,7 +812,8 @@ { typedef typename iterator_traits_II1::value_type _ValueType1; typedef typename iterator_traits_II2::value_type _ValueType2; - const bool __simple = (__is_integer_ValueType1::__value + const bool __simple = ((__is_integer_ValueType1::__value + || __is_pointer_ValueType1::__value) __is_pointer_II1::__value __is_pointer_II2::__value __are_same_ValueType1, _ValueType2::__value);
Re: Fix for PR libobjc/49883 (clang + gcc 4.6 runtime = broken) and a small related clang fix
It isn't a standoff, we can choose to just fix the issue and be compatible, if we want. I guess you're right and I'm probably using the wrong word - English is not my first language. ;-) But I meant that they could have made the same choice to be compatible (by fixing the issue in their compiler and making their GCC-compatible ABI output actually compatible with GCC; they already have other, clang-only, GCC-incompatible ABIs in there, so why not make the GCC-compatible one actually compatible with GCC ?), but they didn't. Anyhow I completely agree with you that life is too short and we spent already way too much time discussing this. It's fixed and let's move on. :-) Thanks
Re: [gimplefe][patch] The symbol table for declarations
Sandeep == Sandeep Soni soni.sande...@gmail.com writes: Sandeep The following patch is a basic attempt to build a symbol table that Sandeep stores the names of all the declarations made in the input file. I don't know anything about gimplefe, but unless you have complicated needs, it is more usual to just put a symbol's value directly into the identifier node. The C front end is a good example of this. Tom
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
So updated version for patch. It creates new simple_operand_p_2 function instead of modifying simple_operand_p function. ChangeLog 2011-10-11 Kai Tietz kti...@redhat.com * fold-const.c (simple_operand_p_2): New function. (fold_truthop): Rename to (fold_truth_andor_1): function name. Additionally remove branching creation for logical and/or. (fold_truth_andor): Handle branching creation for logical and/or here. Bootstrapped and regression-tested for all languages plus Ada and Obj-C++ on x86_64-pc-linux-gnu. Ok for apply? Regards, Kai Index: gcc/gcc/fold-const.c === --- gcc.orig/gcc/fold-const.c +++ gcc/gcc/fold-const.c @@ -112,13 +112,13 @@ static tree decode_field_reference (loca static int all_ones_mask_p (const_tree, int); static tree sign_bit_p (tree, const_tree); static int simple_operand_p (const_tree); +static bool simple_operand_p_2 (tree); static tree range_binop (enum tree_code, tree, tree, int, tree, int); static tree range_predecessor (tree); static tree range_successor (tree); static tree fold_range_test (location_t, enum tree_code, tree, tree, tree); static tree fold_cond_expr_with_comparison (location_t, tree, tree, tree, tree); static tree unextend (tree, int, int, tree); -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree); static tree optimize_minmax_comparison (location_t, enum tree_code, tree, tree, tree); static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *); @@ -3500,7 +3500,7 @@ optimize_bit_field_compare (location_t l return lhs; } -/* Subroutine for fold_truthop: decode a field reference. +/* Subroutine for fold_truth_andor_1: decode a field reference. If EXP is a comparison reference, we return the innermost reference. @@ -3668,7 +3668,7 @@ sign_bit_p (tree exp, const_tree val) return NULL_TREE; } -/* Subroutine for fold_truthop: determine if an operand is simple enough +/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough to be evaluated unconditionally. */ static int @@ -3692,6 +3692,46 @@ simple_operand_p (const_tree exp) registers aren't expensive. */ (! TREE_STATIC (exp) || DECL_REGISTER (exp; } + +/* Subroutine for fold_truth_andor: determine if an operand is simple enough + to be evaluated unconditionally. + I addition to simple_operand_p, we assume that comparisons and logic-not + operations are simple, if their operands are simple, too. */ + +static bool +simple_operand_p_2 (tree exp) +{ + enum tree_code code; + + /* Strip any conversions that don't change the machine mode. */ + STRIP_NOPS (exp); + + code = TREE_CODE (exp); + + if (TREE_CODE_CLASS (code) == tcc_comparison) +return (!tree_could_trap_p (exp) +simple_operand_p_2 (TREE_OPERAND (exp, 0)) +simple_operand_p_2 (TREE_OPERAND (exp, 1))); + + if (FLOAT_TYPE_P (TREE_TYPE (exp)) + tree_could_trap_p (exp)) +return false; + + switch (code) +{ +case SSA_NAME: + return true; +case TRUTH_NOT_EXPR: + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +case BIT_NOT_EXPR: + if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) + return false; + return simple_operand_p_2 (TREE_OPERAND (exp, 0)); +default: + return simple_operand_p (exp); +} +} + /* The following functions are subroutines to fold_range_test and allow it to try to change a logical combination of comparisons into a range test. @@ -4888,7 +4928,7 @@ fold_range_test (location_t loc, enum tr return 0; } -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P bit value. Arrange things so the extra bits will be set to zero if and only if C is signed-extended to its full width. If MASK is nonzero, it is an INTEGER_CST that should be AND'ed with the extra bits. */ @@ -5025,8 +5065,8 @@ merge_truthop_with_opposite_arm (locatio We return the simplified tree or 0 if no optimization is possible. */ static tree -fold_truthop (location_t loc, enum tree_code code, tree truth_type, - tree lhs, tree rhs) +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type, + tree lhs, tree rhs) { /* If this is the or of two comparisons, we can do something if the comparisons are NE_EXPR. If this is the and, we can do something @@ -5054,8 +5094,6 @@ fold_truthop (location_t loc, enum tree_ tree lntype, rntype, result; HOST_WIDE_INT first_bit, end_bit; int volatilep; - tree orig_lhs = lhs, orig_rhs = rhs; - enum tree_code orig_code = code; /* Start by getting the comparison codes. Fail if anything is volatile. If one operand is a BIT_AND_EXPR with the constant one, treat it as if @@ -5119,8 +5157,7 @@
[Committed] S/390: Add -mbackchain for a __builtin_return_address testcase
Hi, on s390 we need a backchain in order to implement __builtin_return_address for arguments other than 0. Committed to mainline. Bye, -Andreas- 2011-10-11 Andreas Krebbel andreas.kreb...@de.ibm.com * gcc.dg/pr49994-3.c: Add -mbackchain for s390 and s390x. Index: gcc/testsuite/gcc.dg/pr49994-3.c === *** gcc/testsuite/gcc.dg/pr49994-3.c.orig --- gcc/testsuite/gcc.dg/pr49994-3.c *** *** 1,5 --- 1,6 /* { dg-do compile } */ /* { dg-options -O2 -fsched2-use-superblocks -g } */ + /* { dg-options -O2 -fsched2-use-superblocks -g -mbackchain { target s390*-*-* } } */ /* { dg-require-effective-target scheduling } */ void *
Re: [PATCH] Fix PR46556 (poor address generation)
Hi Richard, Thanks for the comments -- a few responses below. On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote: On Sat, 8 Oct 2011, William J. Schmidt wrote: snip + c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT); You don't verify that bitpos % BITS_PER_UNIT is zero anywhere. I'll add a check in the caller. I was thinking this was unnecessary since I had excluded bitfield operations, but on reflection that may not be sufficient. snip + mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL, + true, GSI_SAME_STMT); + add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr); + add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL, + true, GSI_SAME_STMT); + mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr, +build_int_cst (offset_type, double_int_to_shwi (c))); double_int_to_tree (offset_type, c) Please delay gimplification to the caller, that way this function solely operates on the trees returned from get_inner_reference. Or are you concerned that fold might undo your association? I'll try that. I was just basing this on some suggestions you had made earlier; I don't believe there is any problem with delaying it. snip for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (gsi)) { gimple stmt = gsi_stmt (gsi); - if (is_gimple_assign (stmt) - !stmt_could_throw_p (stmt)) + /* During late reassociation only, look for restructuring +opportunities within an expression that references memory. +We only do this for blocks not contained in loops, since the +ivopts machinery does a good job on loop expressions, and we +don't want to interfere with other loop optimizations. */ I'm not sure I buy this. IVOPTs would have produced [TARGET_]MEM_REFs which you don't handle. Did you do any measurements what happens if you enable it generally? Actually I agree with you -- in an earlier iteration this was still enabled for reassoc1 ahead of loop optimizations and was causing degradations. So long as it doesn't occur early it should be fine to do everywhere now, and catch non-ivar cases in loops. snip You verified the patch has no performance degradations on ppc64 for SPEC CPU, did you see any improvements? Yes, a few in the 2-3% range. Nothing stellar. The pattern matching is still very ad-hoc and doesn't consider statements that feed the base address. There is conceptually no difference between p-a[n] and *(p + n * 4). That's true. Since we abandoned the general address-lowering approach, this was aimed at the specific pattern that comes up frequently in practice. I would expect the *(p + n * 4) cases to be handled by the general straight-line strength reduction, which is the correct long-term approach. (Cases like p-a[n], where the multiplication is not yet explicit, will be a bit of a wart as part of strength reduction, too, but that's still the right place for it eventually.) You placed this lowering in reassoc to catch CSE opportunities with DOM, right? Does RTL CSE not do it's job or is the transform undone by fwprop before it gets a chance to do it? I think with Paolo's suggested patch for RTL CSE, this could be moved back to expand. I will have to experiment with it again to make sure. If so, that would certainly be my preference as well. (Or having the whole problem just disappear might be my preference on some days... :) Thanks, Bill
Re: [patch RFC,PR50038]
2011/10/4 Richard Henderson r...@redhat.com: On 10/04/2011 08:42 AM, Joseph S. Myers wrote: On Tue, 4 Oct 2011, Ilya Tocar wrote: Hi everyone, This patch fixes PR 50038 (redundant zero extensions) by modifying implicit-zee pass to also remove unneeded zero extensions from QImode to SImode. Hardcoding particular modes like this in the target-independent parts of the compiler is fundamentally ill-conceived. Right now it hardcodes the (SImode, DImode) pair. You're adding hardcoding of (QImode, SImode) as well. But really it should consider all pairs of (integer mode, wider integer mode), with the machine description (or target hooks) determining which pairs are relevant on a particular target. Changing it not to hardcode particular modes would be better than adding a second pair. That along with not hard-coding ZERO_EXTEND. Both MIPS and Alpha have much the same free operations, but with SIGN_EXTEND. I remember rejecting one iteration of this pass with this hard-coded, but the pass was apparently approved by someone else without that being corrected. r~ Hello guys, Could you please look at my patch version? I tried to remove all unnecessary mode restrictions and cover SIGN_EXTEND case. I did not test this patch yet, just checked it worked on reproducer from PR50038. Thanks Ilya --- gcc/ * implicit-zee.c (ext_cand): New. (ext_cand_pool): Likewise. (add_ext_candidate): Likewise. (zee_init): Likewise. (zee_cleanup): Likewise. (combine_set_zero_extend): Get extend candidate as new parameter. Now handle sign extend cases and all modes. (transform_ifelse): Likewise. (merge_def_and_ze): Likewise. (combine_reaching_defs): Change parameter type. (zero_extend_info): Changed insn_list type. (add_removable_zero_extend): Relaxed mode and code filter. (find_removable_zero_extends): Changed return type. (find_and_remove_ze): Var type changes. (rest_of_handle_zee): Initialization and cleanup added. PR50038.diff Description: Binary data
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael.
Re: [PATCH 3/7] Emit macro expansion related diagnostics
That looks pretty good, but do you really need to build up a separate data structure to search? You seem to be searching it in the same order that it's built up, so why not just walk the expansion chain directly when searching? Jason
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
2011/10/11 Michael Matz m...@suse.de: Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: Better make it a separate function the first tests your new conditions, and then calls simple_operand_p. Well, either I make it a new function and call it instead of simple_operand_p, That's what I meant, yes. @@ -5149,13 +5176,6 @@ fold_truthop (location_t loc, enum tree_ build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg), ll_arg, rl_arg), build_int_cst (TREE_TYPE (ll_arg), 0)); - - if (LOGICAL_OP_NON_SHORT_CIRCUIT) - { - if (code != orig_code || lhs != orig_lhs || rhs != orig_rhs) - return build2_loc (loc, code, truth_type, lhs, rhs); - return NULL_TREE; - } Why do you remove this hunk? Shouldn't you instead move the hunk you added to fold_truth_andor() here. I realize this needs some TLC to fold_truth_andor_1, because right now it early-outs for non-comparisons, but it seems the better place. I.e. somehow move the below code into the above branch, with the associated diddling on fold_truth_andor_1 that it gets called. This hunk is removed, as it is vain to do here. There is a fallthrough now, that wasn't there before. I don't know if it's harmless, I just wanted to mention it. It is. Before we changed expression here and recurse here with the non-IF AND/OR expression later. So there is no need to do this recursion. Btw richi asked for it, and I agree that new TRUTH-AND/OR packing is better done at a single place in fold_truth_andor only. As fold_truthop is called twice by fold_truth_andor, the latter might indeed be the better place. Ciao, Michael. Kai
Re: [PATCH] Fix PR46556 (poor address generation)
On Tue, 2011-10-11 at 09:12 -0500, William J. Schmidt wrote: The pattern matching is still very ad-hoc and doesn't consider statements that feed the base address. There is conceptually no difference between p-a[n] and *(p + n * 4). That's true. Since we abandoned the general address-lowering approach, this was aimed at the specific pattern that comes up frequently in practice. I would expect the *(p + n * 4) cases to be handled by the general straight-line strength reduction, which is the correct long-term approach. (Cases like p-a[n], where the multiplication is not yet explicit, will be a bit of a wart as part of strength reduction, too, but that's still the right place for it eventually.) Going through my notes, I do have some code for the *(p + n * 4) case lying around from the last time I tried this in expand, so I'll try to get this back in place (either in reassoc2 or expand, depending on how the CSE works out).
Re: [Patch] PR c++/26256
On 10/10/2011 03:59 PM, Fabien Chêne wrote: Sorry but I've failed to see why you called them callers of lookup_field_1, could you elaborate ? Hmm, I was assuming that the other functions would have gotten their decls via lookup_field_1, but I suppose that isn't true for unqualified lookup that finds the name in class_binding_level. Never mind. Jason
Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs
Hi, On Tue, 11 Oct 2011, Kai Tietz wrote: So updated version for patch. It creates new simple_operand_p_2 function instead of modifying simple_operand_p function. FWIW: I also can't think of a nice short name for that predicate function :) One thing: move the test for TREE_SIDE_EFFECTS to that new function, then the if()s in fold_truth_andor become nicer. I think the code then is okay, but I can't approve. Just one remark about the comment: + /* We don't want to pack more then two leafs to an non-IF AND/OR s/then/than/ s/an/a/ + expression. + If tree-code of left-hand operand isn't an AND/OR-IF code and not + equal to CODE, then we don't want to add right-hand operand. + If the inner right-hand side of left-hand operand has side-effects, + or isn't simple, then we can't add to it, as otherwise we might + destroy if-sequence. */ And I think it could use some overview of the transformation done like in the initial patch, ala: Transform ((A B) C) into (A (B C)). and Or (A B) into (A B). for this part: + /* Needed for sequence points to handle trappings, and side-effects. */ + else if (simple_operand_p_2 (arg0)) + return fold_build2_loc (loc, ncode, type, arg0, arg1); Ciao, Michael.
Fix PR 50565 (offsetof-type expressions in static initializers)
This patch fixes PR 50565, a failure to accept certain offsetof-type expressions in static initializers introduced by my constant expressions changes. (These expressions are permitted but not required by ISO C to be accepted; the intent of my constant expressions model is that they should be valid in GNU C.) The problem comes down to an expression with the difference of two pointers being cast to int on a 64-bit system, resulting in convert_to_integer moving the conversions inside the subtraction. (These optimizations at conversion time should really be done later as a part of folding, or even later than that, rather than unconditionally in convert_to_*, but that's another issue.) So when the expression reaches c_fully_fold it is a difference of narrowed pointers being folded, which the compiler cannot optimize as it can a difference of unnarrowed pointers with the same base object. Before the introduction of c_fully_fold the difference would have been folded when built and so the narrowing of operands would never have been applied to it. This patch disables the narrowing in the case of pointer subtraction, as it doesn't seem particularly likely to be useful there and is known to prevent this folding required for these initializers to be accepted. Bootstrapped with no regressions on x86_64-unknown-linux-gnu. OK to commit? 2011-10-11 Joseph Myers jos...@codesourcery.com PR c/50565 * convert.c (convert_to_integer): Do not narrow operands of pointer subtraction. testsuite: 2011-10-11 Joseph Myers jos...@codesourcery.com PR c/50565 * gcc.c-torture/compile/pr50565-1.c, gcc.c-torture/compile/pr50565-2.c: New tests. Index: gcc/testsuite/gcc.c-torture/compile/pr50565-1.c === --- gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr50565-1.c (revision 0) @@ -0,0 +1,4 @@ +struct s { char p[2]; }; +static struct s v; +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0U; +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1U; Index: gcc/testsuite/gcc.c-torture/compile/pr50565-2.c === --- gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0) +++ gcc/testsuite/gcc.c-torture/compile/pr50565-2.c (revision 0) @@ -0,0 +1,4 @@ +struct s { char p[2]; }; +static struct s v; +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0; +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1; Index: gcc/convert.c === --- gcc/convert.c (revision 179754) +++ gcc/convert.c (working copy) @@ -745,6 +745,15 @@ convert_to_integer (tree type, tree expr tree arg0 = get_unwidened (TREE_OPERAND (expr, 0), type); tree arg1 = get_unwidened (TREE_OPERAND (expr, 1), type); + /* Do not try to narrow operands of pointer subtraction; + that will interfere with other folding. */ + if (ex_form == MINUS_EXPR +CONVERT_EXPR_P (arg0) +CONVERT_EXPR_P (arg1) +POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg0, 0))) +POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0 + break; + if (outprec = BITS_PER_WORD || TRULY_NOOP_TRUNCATION (outprec, inprec) || inprec TYPE_PRECISION (TREE_TYPE (arg0)) -- Joseph S. Myers jos...@codesourcery.com
Re: PR c++/30195
On 10/10/2011 03:56 PM, Fabien Chêne wrote: It tried to add the target declaration of a USING_DECL in the method_vec of the class where the USING_DECL is declared. Thus, I copied the target decl, adjusted its access, and then called add_method with the target decl. Copying the decl is unlikely to do what we want, I think. Does putting the target decl directly into the method vec work? If not, perhaps lookup_fnfields_1 should look through the field list for function USING_DECLs. Jason
Re: fix for c++/44473, mangling of decimal types, checked in
On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote: Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was approved by Jason last December but I never got around to checking it in. Paolo Carlini said in PR44473 that it was already approved and doesn't need a new approval, so I checked it in after a bootstrap and regtest of c,c++ for i686-pc-linux-gnu. Jason, I assume your approval for committing Janis' patch for the 4.5 branch still stands. Can we also commit her fix to the 4.6 branch, since it was created after your approval, but before Janis committed the mainline patch? I have verified that Janis' patch bootstraps and regtests with no regressions on both the 4.5 and 4.6 branches. Peter
Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD
On Tue, Oct 11, 2011 at 3:12 AM, Kirill Yukhin kirill.yuk...@gmail.com wrote: Hi Uros, you was right both with fpmath and configflags. That is why it was passing for me. Attached patch which cures the problem. testsuite/ChangeLog entry: 2011-10-11 Kirill Yukhin kirill.yuk...@intel.com * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse. * gcc.target/i386/fma_double_2.c: Ditto. * gcc.target/i386/fma_double_3.c: Ditto. * gcc.target/i386/fma_double_4.c: Ditto. * gcc.target/i386/fma_double_5.c: Ditto. * gcc.target/i386/fma_double_6.c: Ditto. * gcc.target/i386/fma_float_1.c: Ditto. * gcc.target/i386/fma_float_2.c: Ditto. * gcc.target/i386/fma_float_3.c: Ditto. * gcc.target/i386/fma_float_4.c: Ditto. * gcc.target/i386/fma_float_5.c: Ditto. * gcc.target/i386/fma_float_6.c: Ditto. * gcc.target/i386/l_fma_double_1.c: Ditto. * gcc.target/i386/l_fma_double_2.c: Ditto. * gcc.target/i386/l_fma_double_3.c: Ditto. * gcc.target/i386/l_fma_double_4.c: Ditto. * gcc.target/i386/l_fma_double_5.c: Ditto. * gcc.target/i386/l_fma_double_6.c: Ditto. * gcc.target/i386/l_fma_float_1.c: Ditto. * gcc.target/i386/l_fma_float_2.c: Ditto. * gcc.target/i386/l_fma_float_3.c: Ditto. * gcc.target/i386/l_fma_float_4.c: Ditto. * gcc.target/i386/l_fma_float_5.c: Ditto. * gcc.target/i386/l_fma_float_6.c: Ditto. * gcc.target/i386/l_fma_run_double_1.c: Ditto. * gcc.target/i386/l_fma_run_double_2.c: Ditto. * gcc.target/i386/l_fma_run_double_3.c: Ditto. * gcc.target/i386/l_fma_run_double_4.c: Ditto. * gcc.target/i386/l_fma_run_double_5.c: Ditto. * gcc.target/i386/l_fma_run_double_6.c: Ditto. * gcc.target/i386/l_fma_run_float_1.c: Ditto. * gcc.target/i386/l_fma_run_float_2.c: Ditto. * gcc.target/i386/l_fma_run_float_3.c: Ditto. * gcc.target/i386/l_fma_run_float_4.c: Ditto. * gcc.target/i386/l_fma_run_float_5.c: Ditto. * gcc.target/i386/l_fma_run_float_6.c: Ditto. Could you please have a look? Sorry for inconvenience, K All double vector tests are failed when GCC is configured with --with-cpu=atom since double vectorizer is turned off by default. You should add -mtune=generic to those tests. -- H.J.
Re: [Patch,AVR]: Housekeeping avr_legitimate_address_p
2011/10/11 Georg-Johann Lay a...@gjlay.de: This is bit of code cleanup and move macro code from avr.h to functions in avr.c. There's no change in functionality. Passed without regressions. Ok? Johann * config/avr/avr-protos.h (avr_mode_code_base_reg_class): New prototype. (avr_regno_mode_code_ok_for_base_p): New prototype. * config/avr/avr.h (BASE_REG_CLASS): Remove. (REGNO_OK_FOR_BASE_P): Remove. (REG_OK_FOR_BASE_NOSTRICT_P): Remove. (REG_OK_FOR_BASE_STRICT_P): Remove. (MODE_CODE_BASE_REG_CLASS): New define. (REGNO_MODE_CODE_OK_FOR_BASE_P): New define. * config/avr/avr.c (avr_mode_code_base_reg_class): New function. (avr_regno_mode_code_ok_for_base_p): New function. (avr_reg_ok_for_addr_p): New static function. (avr_legitimate_address_p): Use it. Beautify. Approved. Denis.
Re: [Patch,AVR]: Fix PR50447 (4/n)
2011/10/11 Georg-Johann Lay a...@gjlay.de: This is a small addendum to PR50447. It's a change to addsi3 insn; the actual insn sequence printed is still the same (except for adding +/-1 to l-reg) but the effect on cc0 is worked out so that it can be used to cancel out comparisons like in long loops. cc insn attribute gets one more alternative and notice_update_cc calls respective output function that works out the effect on cc0. Passed without regressions. Ok for trunk? Ok. Denis.
Re: [patches] several gcc.target/powerpc tests require hard_float
On 10/10/2011 06:23 PM, Joseph S. Myers wrote: On Mon, 10 Oct 2011, Janis Johnson wrote: This patch skips several Power-specific tests if hard_float support isn't available. OK for trunk? It looks like these are testing for particular instructions using FPRs and so powerpc_fprs is more appropriate; you don't want to match e500v2 hard float. Is a patch using powerpc_fprs instead of hard_float OK for these tests? Janis
Re: fix for c++/44473, mangling of decimal types, checked in
On 10/11/2011 11:34 AM, Peter Bergner wrote: On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote: Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was approved by Jason last December but I never got around to checking it in. Paolo Carlini said in PR44473 that it was already approved and doesn't need a new approval, so I checked it in after a bootstrap and regtest of c,c++ for i686-pc-linux-gnu. Jason, I assume your approval for committing Janis' patch for the 4.5 branch still stands. Can we also commit her fix to the 4.6 branch, since it was created after your approval, but before Janis committed the mainline patch? I have verified that Janis' patch bootstraps and regtests with no regressions on both the 4.5 and 4.6 branches. Yes. Jason
Re: New warning for expanded vector operations
On Tue, Oct 11, 2011 at 11:52 AM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Oct 10, 2011 at 3:21 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Mon, Oct 10, 2011 at 12:02 PM, Richard Guenther richard.guent...@gmail.com wrote: On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Hi Here is a patch to inform a programmer about the expanded vector operation. Bootstrapped on x86-unknown-linux-gnu. ChangeLog: * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to produce the warning. (expand_vector_parallel): Adjust to produce the warning. Entries start without gcc/, they are relative to the gcc/ChangeLog file. Sure, sorry. (lower_vec_shuffle): Adjust to produce the warning. * gcc/common.opt: New warning Wvector-operation-expanded. * gcc/doc/invoke.texi: Document the wawning. Ok? I don't like the name -Wvector-operation-expanded. We emit a similar warning for missed inline expansions with -Winline, so maybe -Wvector-extensions (that's the name that appears in the C extension documentation). Hm, I don't care much about the name, unless it gets clear what the warning is used for. I am not really sure that Wvector-extensions makes it clear. Also, I don't see anything bad if the warning will pop up during the vectorisation. Any vector operation performed outside the SIMD accelerator looks suspicious, because it actually doesn't improve performance. Such a warning during the vectorisation could mean that a programmer forgot some flag, or the constant propagation failed to deliver a constant, or something else. Conceptually the text I am producing is not really a warning, it is more like an information, but I am not aware of the mechanisms that would allow me to introduce a flag triggering inform () or something similar. What I think we really need to avoid is including this warning in the standard Ox. + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + vector operation will be expanded piecewise); v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta); for (i = 0; i nunits; @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter tree result, compute_type; enum machine_mode mode; int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD; + location_t loc = gimple_location (gsi_stmt (*gsi)); + + warning_at (loc, OPT_Wvector_operation_expanded, + vector operation will be expanded in parallel); what's the difference between 'piecewise' and 'in parallel'? Parallel is a little bit better for performance than piecewise. I see. That difference should probably be documented, maybe with an example. Richard. @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter { int parts_per_word = UNITS_PER_WORD / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 1); + location_t loc = gimple_location (gsi_stmt (*gsi)); if (INTEGRAL_TYPE_P (TREE_TYPE (type)) parts_per_word = 4 TYPE_VECTOR_SUBPARTS (type) = 4) - return expand_vector_parallel (gsi, f_parallel, - type, a, b, code); + return expand_vector_parallel (gsi, f_parallel, type, a, b, code); else - return expand_vector_piecewise (gsi, f, - type, TREE_TYPE (type), - a, b, code); + return expand_vector_piecewise (gsi, f, type, + TREE_TYPE (type), a, b, code); } /* Check if vector VEC consists of all the equal elements and unless i miss something loc is unused here. Please avoid random whitespace changes (just review your patch yourself before posting and revert pieces that do nothing). Yes you are right, sorry. +@item -Wvector-operation-expanded +@opindex Wvector-operation-expanded +@opindex Wno-vector-operation-expanded +Warn if vector operation is not implemented via SIMD capabilities of the +architecture. Mainly useful for the performance tuning. I'd mention that this is for vector operations as of the C extension documented in Vector Extensions. The vectorizer can produce some operations that will need further lowering - we probably should make sure to _not_ warn about those. Try running the vect.exp testsuite with the new warning turned on (eventually disabling SSE), like with obj/gcc make check-gcc
Re: [C++-11] User defined literals
On 10/11/2011 12:55 PM, Jason Merrill wrote: On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote: Does cp_parser_identifier (parser) *not* consume the identifier token? I'm pretty sure it does. Does it work to only complain if !cp_parser_parsing_tentatively? I suppose not, if you got no complaints with cp_parser_error. Jason
Re: [patches] several gcc.target/powerpc tests require hard_float
On Tue, 11 Oct 2011, Janis Johnson wrote: On 10/10/2011 06:23 PM, Joseph S. Myers wrote: On Mon, 10 Oct 2011, Janis Johnson wrote: This patch skips several Power-specific tests if hard_float support isn't available. OK for trunk? It looks like these are testing for particular instructions using FPRs and so powerpc_fprs is more appropriate; you don't want to match e500v2 hard float. Is a patch using powerpc_fprs instead of hard_float OK for these tests? OK in the absence of testsuite or target maintainer objections within 24 hours. -- Joseph S. Myers jos...@codesourcery.com
Re: int_cst_hash_table mapping persistence and the garbage collector
On 10/11/11 10:24:52, Richard Guenther wrote: GF: 1. Is it valid to assume that pointer equality is sufficient GF: to compare two integer constants for equality as long as they GF: have identical type and value? Yes, if both constants are live The upc blocking factor hash table is declared as follows: static GTY ((if_marked (tree_map_marked_p), param_is (struct tree_map))) htab_t upc_block_factor_for_type; [...] upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash, tree_map_eq, 0); I had hoped that this would be sufficient to ensure that all integer constant references recorded in this hash table would be considered live by the GC. Reading the code in tree_map_marked_p(), however, I see the following: #define tree_map_marked_p tree_map_base_marked_p [...] /* Return true if this tree map structure is marked for garbage collection purposes. We simply return true if the from tree is marked, so that this structure goes away when the from tree goes away. */ int tree_map_base_marked_p (const void *p) { return ggc_marked_p (((const struct tree_map_base *) p)-from); } This takes care of recycling an entry when the '-from' reference goes away, but it doesn't make sure that the '-to' reference is considered live. I don't understand the GC well enough to know when/where the '-to' entry should be marked as live. (note: in the cited test case, the -from pointers in question are known to be live and did survive garbage collection.) Given that the declaration above tells the GC that the nodes in the blocking factor hash table are of type 'struct tree_map', struct GTY(()) tree_map_base { tree from; }; /* Map from a tree to another tree. */ struct GTY(()) tree_map { struct tree_map_base base; unsigned int hash; tree to; }; I thought that the GC would mark the -to nodes as live automatically? (note: probably the only direct reference to the integer constant that is the focus of this discussion is in the upc_block_factor_for_type hash table. Therefore, if it isn't seen as live there, it won't be seen as live anywhere else.) I suppose that I could declare a linear tree list of mapped integer constants and let the GC walk that, but that is more of a hack than a solution. - Gary
PATCH: Remove the extra break
Hi, I checked in this patch to remove the extra break. H.J. --- Index: config/i386/i386.c === --- config/i386/i386.c (revision 179810) +++ config/i386/i386.c (working copy) @@ -28096,7 +28096,6 @@ ix86_expand_special_args_builtin (const klass = store; memory = 0; break; - break; case UINT64_FTYPE_VOID: case UNSIGNED_FTYPE_VOID: nargs = 0; Index: ChangeLog === --- ChangeLog (revision 179810) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2011-10-11 H.J. Lu hongjiu...@intel.com + + * config/i386/i386.c (ix86_expand_special_args_builtin): Remove + the extra break. + 2011-10-11 Artjoms Sinkarovs artyom.shinkar...@gmail.com * doc/invoke.texi: Document new warning.
Re: fix for c++/44473, mangling of decimal types, checked in
On Tue, 2011-10-11 at 12:12 -0400, Jason Merrill wrote: On 10/11/2011 11:34 AM, Peter Bergner wrote: On Fri, 2011-09-30 at 10:37 -0700, Janis Johnson wrote: Patch http://gcc.gnu.org/ml/gcc-patches/2010-12/msg00625.html was approved by Jason last December but I never got around to checking it in. Paolo Carlini said in PR44473 that it was already approved and doesn't need a new approval, so I checked it in after a bootstrap and regtest of c,c++ for i686-pc-linux-gnu. Jason, I assume your approval for committing Janis' patch for the 4.5 branch still stands. Can we also commit her fix to the 4.6 branch, since it was created after your approval, but before Janis committed the mainline patch? I have verified that Janis' patch bootstraps and regtests with no regressions on both the 4.5 and 4.6 branches. Yes. Ok, it has been committed to both the FSF 4.6 and 4.5 branches now. Thanks! Peter
C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)
The problem in both of these PRs is that G++ has assumed that we don't ever need to actually perform non-dependent conversions in a template; once we know that the conversion can be performed, we just generated a NOP_EXPR to change the type of the expression. But this isn't good enough for initializers for variables that might be used in constant expressions, since we want to reduce them to constants if possible. For conversions between scalar types, we can just go ahead and perform the conversion; this is a bit of a boundary violation, but should be harmless since such conversions are represented by a single tree code that we can just pass along at tsubst time. This is also necessary to avoid breaking checking of non-dependent OpenMP for loops in c_finish_omp_for. For C++11 constexpr conversions, it's more involved, so I've introduced a new tree code IMPLICIT_CONV_EXPR. Tested x86_64-pc-linux-gnu, applying to trunk. commit 6800169668342f9009a1001bddff8b1400245836 Author: Jason Merrill ja...@redhat.com Date: Sun Oct 9 23:27:52 2011 +0100 PR c++/49855 PR c++/49896 * cp-tree.def (IMPLICIT_CONV_EXPR): New. * call.c (perform_implicit_conversion_flags): Build it instead of NOP_EXPR. * cp-objcp-common.c (cp_common_init_ts): It's typed. * cxx-pretty-print.c (pp_cxx_cast_expression): Handle it. (pp_cxx_expression): Likewise. * error.c (dump_expr): Likewise. * semantics.c (potential_constant_expression_1): Likewise. * tree.c (cp_tree_equal): Likewise. (cp_walk_subtrees): Likewise. * pt.c (iterative_hash_template_arg): Likewise. (for_each_template_parm_r): Likewise. (type_dependent_expression_p): Likewise. (tsubst_copy, tsubst_copy_and_build): Handle IMPLICIT_CONV_EXPR and CONVERT_EXPR. * cp-tree.h (IMPLICIT_CONV_EXPR_DIRECT_INIT): New. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 4c03e76..7219afe 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8397,13 +8397,19 @@ perform_implicit_conversion_flags (tree type, tree expr, tsubst_flags_t complain } expr = error_mark_node; } - else if (processing_template_decl) + else if (processing_template_decl + /* As a kludge, we always perform conversions between scalar + types, as IMPLICIT_CONV_EXPR confuses c_finish_omp_for. */ + !(SCALAR_TYPE_P (type) SCALAR_TYPE_P (TREE_TYPE (expr { /* In a template, we are only concerned about determining the type of non-dependent expressions, so we do not have to - perform the actual conversion. */ - if (TREE_TYPE (expr) != type) - expr = build_nop (type, expr); + perform the actual conversion. But for initializers, we + need to be able to perform it at instantiation + (or fold_non_dependent_expr) time. */ + expr = build1 (IMPLICIT_CONV_EXPR, type, expr); + if (!(flags LOOKUP_ONLYCONVERTING)) + IMPLICIT_CONV_EXPR_DIRECT_INIT (expr) = true; } else expr = convert_like (conv, expr, complain); diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c index 1866b81..035fdcd 100644 --- a/gcc/cp/cp-objcp-common.c +++ b/gcc/cp/cp-objcp-common.c @@ -267,6 +267,7 @@ cp_common_init_ts (void) MARK_TS_TYPED (CONST_CAST_EXPR); MARK_TS_TYPED (STATIC_CAST_EXPR); MARK_TS_TYPED (DYNAMIC_CAST_EXPR); + MARK_TS_TYPED (IMPLICIT_CONV_EXPR); MARK_TS_TYPED (TEMPLATE_ID_EXPR); MARK_TS_TYPED (ARROW_EXPR); MARK_TS_TYPED (SIZEOF_EXPR); diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index bb1b753..be29870 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -250,6 +250,7 @@ DEFTREECODE (REINTERPRET_CAST_EXPR, reinterpret_cast_expr, tcc_unary, 1) DEFTREECODE (CONST_CAST_EXPR, const_cast_expr, tcc_unary, 1) DEFTREECODE (STATIC_CAST_EXPR, static_cast_expr, tcc_unary, 1) DEFTREECODE (DYNAMIC_CAST_EXPR, dynamic_cast_expr, tcc_unary, 1) +DEFTREECODE (IMPLICIT_CONV_EXPR, implicit_conv_expr, tcc_unary, 1) DEFTREECODE (DOTSTAR_EXPR, dotstar_expr, tcc_expression, 2) DEFTREECODE (TYPEID_EXPR, typeid_expr, tcc_expression, 1) DEFTREECODE (NOEXCEPT_EXPR, noexcept_expr, tcc_unary, 1) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index f824f38..b53accf 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -72,6 +72,7 @@ c-common.h, not after. DECLTYPE_FOR_LAMBDA_CAPTURE (in DECLTYPE_TYPE) VEC_INIT_EXPR_IS_CONSTEXPR (in VEC_INIT_EXPR) DECL_OVERRIDE_P (in FUNCTION_DECL) + IMPLICIT_CONV_EXPR_DIRECT_INIT (in IMPLICIT_CONV_EXPR) 1: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE) TI_PENDING_TEMPLATE_FLAG. TEMPLATE_PARMS_FOR_INLINE. @@ -3233,6 +3234,11 @@ more_aggr_init_expr_args_p (const aggr_init_expr_arg_iterator *iter) B b{1,2}, not B b({1,2}) or B b = {1,2}. */ #define CONSTRUCTOR_IS_DIRECT_INIT(NODE) (TREE_LANG_FLAG_0 (CONSTRUCTOR_CHECK (NODE))) +/* True if NODE represents a conversion for direct-initialization in a + template. Set by perform_implicit_conversion_flags. */ +#define
[PATCH] [Annotalysis] Bugfix where lock function is attached to a base class.
This patch fixes an error where Annotalysis generates bogus warnings when using lock and unlock functions that are attached to a base class. The canonicalize routine did not work correctly in this case. Bootstrapped and passed gcc regression testsuite on x86_64-unknown-linux-gnu. Okay for google/gcc-4_6? -DeLesley Changelog.google-4_6: 2011-10-11 DeLesley Hutchins deles...@google.com * tree-threadsafe-analyze.c (get_canonical_lock_expr) testsuite/Changelog.google-4_6: 2011-10-11 DeLesley Hutchins deles...@google.com * g++.dg/thread-ann/thread_annot_lock-83.C Index: gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C === --- gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C (revision 0) +++ gcc/testsuite/g++.dg/thread-ann/thread_annot_lock-83.C (revision 0) @@ -1,5 +1,8 @@ -// Regression test for bugfix, where shared locks are not properly -// removed from locksets if a universal lock is present. +// Regression test for two bugfixes. +// Bugfix 1: Shared locks are not properly removed from locksets +// if a universal lock is present. +// Bugfix 2: Canonicalization does not properly store the lock in +// the hash table if the lock function is attached to a base class. // { dg-do compile } // { dg-options -Wthread-safety } @@ -7,6 +10,7 @@ class Foo; +/* Bugfix 1 */ class Bar { public: Foo* foo; @@ -29,3 +33,23 @@ void Bar::bar() { ReaderMutexLock rlock(mu_); } + +/* Bugfix 2 */ +class LOCKABLE Base { +public: + Mutex mu_; + + void Lock() EXCLUSIVE_LOCK_FUNCTION() { mu_.Lock(); } + void Unlock() UNLOCK_FUNCTION() { mu_.Unlock(); } +}; + +class Derived : public Base { +public: + int b; +}; + +void doSomething(Derived *d) { + d-Lock(); + d-Unlock(); +}; + Index: gcc/tree-threadsafe-analyze.c === --- gcc/tree-threadsafe-analyze.c (revision 179771) +++ gcc/tree-threadsafe-analyze.c (working copy) @@ -927,7 +927,16 @@ get_canonical_lock_expr (tree lock, tree base_obj, NULL_TREE); if (lang_hooks.decl_is_base_field (component)) -return canon_base; +{ + if (is_temp_expr) +return canon_base; + else +/* return canon_base, but recalculate it so that it is stored + in the hash table. */ +return get_canonical_lock_expr (base, base_obj, +false /* is_temp_expr */, +new_leftmost_base_var); +} if (base != canon_base) lock = build3 (COMPONENT_REF, TREE_TYPE (component), -- DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315
Re: [testsuite] modify powerpc test for hard_float target, skip powerpc/warn-[12].c for soft-float
On 10/10/2011 01:19 PM, Janis Johnson wrote: Tests gcc.target/powerpc/warn-[12].c fail for soft-float multilibs with the unexpected warning -mvsx requires hardware floating point [enabled by default]. This patch skips those tests for soft-float multilibs and modifies the powerpc check for a soft-float effective target to return true for either __NO_FPRS__ or _SOFT_FLOAT being defined. Is this OK for trunk? I must admit that I'm not sure what all those Power float variants are for. On second thought these tests should use /* { dg-require-effective-target powerpc_vsx_ok } */ instead of requiring hard_float, and forget about the proposed change to effective target hard_float. Is that OK? Janis
Re: [google] record compiler options to .note sections
How about .gnu.switches.text.quote_paths? Sounds good to me. -cary
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
* gcc.target/sparc/sparc.exp: Add vis3 target test. This doesn't work. The code always compiles: (botcazou@ob) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.10 $ gcc/xgcc -Bgcc -c -o vis.o vis.c (botcazou@ob) /nile.build/botcazou/gcc-head/sparc-sun-solaris2.10 $ objdump -d vis.o vis.o: file format elf32-sparc Disassembly of section .text: _vis3_fpadd64: 0: 9d e3 bf 90 save %sp, -112, %sp 4: f0 3f bf f8 std %i0, [ %fp + -8 ] 8: f4 3f bf f0 std %i2, [ %fp + -16 ] c: d0 1f bf f8 ldd [ %fp + -8 ], %o0 10: d4 1f bf f0 ldd [ %fp + -16 ], %o2 14: 40 00 00 00 call 14 _vis3_fpadd64+0x14 18: 01 00 00 00 nop 1c: 82 10 00 08 mov %o0, %g1 20: ba 10 00 01 mov %g1, %i5 24: 83 38 60 1f sra %g1, 0x1f, %g1 28: b8 10 00 01 mov %g1, %i4 2c: 84 10 00 1c mov %i4, %g2 30: 86 10 00 1d mov %i5, %g3 34: b0 10 00 02 mov %g2, %i0 38: b2 10 00 03 mov %g3, %i1 3c: 81 cf e0 08 rett %i7 + 8 40: 01 00 00 00 nop -- Eric Botcazou
[Patch,AVR]: Fix PR49939: Skip 2-word insns
This patch teaches avr-gcc to skip 2-word instructions like STS and LDS. It's just about looking into an 2-word insn and check if it's a 2-word instruction or not. Passes without regression. Ok to install? Johann PR target/49939 * config/avr/avr.md (*movqi): Rename to movqi_insn. (*call_insn): Rename to call_insn. (*call_value_insn): Rename to call_value_insn. * config/avr/avr.c (avr_2word_insn_p): New static function. (jump_over_one_insn_p): Use it. Index: config/avr/avr.md === --- config/avr/avr.md (revision 179765) +++ config/avr/avr.md (working copy) @@ -295,7 +295,7 @@ (define_expand movqi operands[1] = copy_to_mode_reg(QImode, operand1); ) -(define_insn *movqi +(define_insn movqi_insn [(set (match_operand:QI 0 nonimmediate_operand =r,d,Qm,r,q,r,*r) (match_operand:QI 1 general_operand rL,i,rL,Qm,r,q,i))] (register_operand (operands[0],QImode) @@ -3628,7 +3628,7 @@ (define_expand sibcall_value ) -(define_insn *call_insn +(define_insn call_insn [(parallel[(call (mem:HI (match_operand:HI 0 nonmemory_operand z,s,z,s)) (match_operand:HI 1 general_operand X,X,X,X)) (use (match_operand:HI 2 const_int_operand L,L,P,P))])] @@ -3651,7 +3651,7 @@ (define_insn *call_insn (const_int 2) (const_int 1))])]) -(define_insn *call_value_insn +(define_insn call_value_insn [(parallel[(set (match_operand 0 register_operand =r,r,r,r) (call (mem:HI (match_operand:HI 1 nonmemory_operand z,s,z,s)) (match_operand:HI 2 general_operandX,X,X,X))) Index: config/avr/avr.c === --- config/avr/avr.c (revision 179765) +++ config/avr/avr.c (working copy) @@ -7123,6 +7123,56 @@ test_hard_reg_class (enum reg_class rcla } +/* Helper for jump_over_one_insn_p: Test if INSN is a 2-word instruction + and thus is suitable to be skipped by CPSE, SBRC, etc. */ + +static bool +avr_2word_insn_p (rtx insn) +{ + if (avr_current_device-errata_skip + || !insn + || !INSN_P (insn) + || 2 != get_attr_length (insn)) +{ + return false; +} + + switch (INSN_CODE (insn)) +{ +default: + break; + +case CODE_FOR_movqi_insn: + { +rtx set = single_set (insn); +rtx src = SET_SRC (set); +rtx dest = SET_DEST (set); + +/* Factor out LDS and STS from movqi_insn. */ + +if (MEM_P (dest) + (REG_P (src) || src == const0_rtx)) + { +return CONSTANT_ADDRESS_P (XEXP (dest, 0)); + } +else if (REG_P (dest) + MEM_P (src)) + { +return CONSTANT_ADDRESS_P (XEXP (src, 0)); + } + +break; + } + +case CODE_FOR_call_insn: +case CODE_FOR_call_value_insn: + return true; +} + + return false; +} + + int jump_over_one_insn_p (rtx insn, rtx dest) { @@ -7131,7 +7181,11 @@ jump_over_one_insn_p (rtx insn, rtx dest : dest); int jump_addr = INSN_ADDRESSES (INSN_UID (insn)); int dest_addr = INSN_ADDRESSES (uid); - return dest_addr - jump_addr == get_attr_length (insn) + 1; + int jump_offset = dest_addr - jump_addr - get_attr_length (insn); + + return (jump_offset == 1 + || (jump_offset == 2 + avr_2word_insn_p (next_nonnote_nondebug_insn (insn; } /* Returns 1 if a value of mode MODE can be stored starting with hard
Re: [C++-11] User defined literals
On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote: Does cp_parser_identifier (parser) *not* consume the identifier token? I'm pretty sure it does. Does it work to only complain if !cp_parser_parsing_tentatively? Jason
Re: C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)
On 10/11/2011 02:19 PM, Jason Merrill wrote: For the 4.6 branch I'm only making the change for scalars. Tested x86_64-pc-linux-gnu, applying to trunk. Er, to 4.6.
Re: C++ PATCH for c++/49855, c++/49896 (ICE with named constants in templates)
For the 4.6 branch I'm only making the change for scalars. Tested x86_64-pc-linux-gnu, applying to trunk. commit d8978a333ab71a4ad2c38446764c1b37092ea098 Author: Jason Merrill ja...@redhat.com Date: Mon Oct 3 17:06:02 2011 -0400 PR c++/49855 PR c++/49896 * call.c (perform_implicit_conversion_flags): Do perform scalar conversions in templates. * pt.c (tsubst_copy, tsubst_copy_and_build): Handle CONVERT_EXPR. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 0ec0a07..c54ce7b 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8068,7 +8068,8 @@ perform_implicit_conversion_flags (tree type, tree expr, tsubst_flags_t complain } expr = error_mark_node; } - else if (processing_template_decl) + else if (processing_template_decl + !(SCALAR_TYPE_P (type) SCALAR_TYPE_P (TREE_TYPE (expr { /* In a template, we are only concerned about determining the type of non-dependent expressions, so we do not have to diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 2ca1ce4..9a48bb4 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -11486,6 +11486,7 @@ tsubst_copy (tree t, tree args, tsubst_flags_t complain, tree in_decl) case STATIC_CAST_EXPR: case DYNAMIC_CAST_EXPR: case NOP_EXPR: +case CONVERT_EXPR: return build1 (code, tsubst (TREE_TYPE (t), args, complain, in_decl), tsubst_copy (TREE_OPERAND (t, 0), args, complain, in_decl)); @@ -12637,6 +12638,12 @@ tsubst_copy_and_build (tree t, (tsubst (TREE_TYPE (t), args, complain, in_decl), RECUR (TREE_OPERAND (t, 0))); +case CONVERT_EXPR: + return build1 + (CONVERT_EXPR, + tsubst (TREE_TYPE (t), args, complain, in_decl), + RECUR (TREE_OPERAND (t, 0))); + case CAST_EXPR: case REINTERPRET_CAST_EXPR: case CONST_CAST_EXPR: diff --git a/gcc/testsuite/g++.dg/template/constant1.C b/gcc/testsuite/g++.dg/template/constant1.C new file mode 100644 index 000..a2c5a08 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/constant1.C @@ -0,0 +1,13 @@ +// PR c++/49855 + +extern void foo(int); + +template class Key, class Value void Basic() { + const int kT = 1.5e6;// --- causes ICE + int size = kT*2/3; + do { +foo(size); +size = size * 0.5 - 1; + } while (size = 0 ); + +} diff --git a/gcc/testsuite/g++.dg/template/constant2.C b/gcc/testsuite/g++.dg/template/constant2.C new file mode 100644 index 000..f71e4f5 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/constant2.C @@ -0,0 +1,22 @@ +// PR c++/49896 + +templateclass C +class test { + protected: + static const int versionConst = 0x8000; + enum { versionEnum = versionConst }; + public: + int getVersion(); +}; + +templateclass C +int testC::getVersion() { + return versionEnum; +} + +class dummy_class {}; + +int main() { + testdummy_class t; + return t.getVersion(); +}
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
From: Eric Botcazou ebotca...@adacore.com Date: Tue, 11 Oct 2011 20:10:34 +0200 * gcc.target/sparc/sparc.exp: Add vis3 target test. This doesn't work. The code always compiles: What does gcc -mcpu=niagara3 -mvis give to you for the following source file: long long _vis3_fpadd64 (long long __X, long long __Y) { return __builtin_vis_fpadd64 (__X, __Y); } That's what the sparc.exp test is using. I would expect that to spit out a warning. Do I need to explicitly add -Wall, -Wno-implicit or similar? Similar tests in i386.exp don't seem to need this and that was what I used as my template.
Re: C++ PATCH for c++/49216 (problems with new T[1]{})
This ICE is still a regression in 4.6, so I'm checking in this patch to fix it. Note that after this patch, the code generated for new T[1]{} is still wrong, but that isn't a regression. The value-initialization semantics are fixed in 4.7. Tested x86_64-pc-linux-gnu. commit 25a2d664e6a3787cc68e3c41beb12330469ee4a5 Author: Jason Merrill ja...@redhat.com Date: Tue Oct 11 15:22:55 2011 -0400 PR c++/49216 * init.c (build_vec_init): Avoid crash on new int[1]{}. diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 9440c1a..c4bd635 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -3067,8 +3067,9 @@ build_vec_init (tree base, tree maxindex, tree init, unsigned HOST_WIDE_INT idx; tree field, elt; /* Should we try to create a constant initializer? */ - bool try_const = (literal_type_p (inner_elt_type) - || TYPE_HAS_CONSTEXPR_CTOR (inner_elt_type)); + bool try_const = (TREE_CODE (atype) == ARRAY_TYPE + (literal_type_p (inner_elt_type) + || TYPE_HAS_CONSTEXPR_CTOR (inner_elt_type))); bool saw_non_const = false; bool saw_const = false; /* If we're initializing a static array, we want to do static diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C b/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C new file mode 100644 index 000..4bf6082 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist-49216.C @@ -0,0 +1,6 @@ +// PR c++/49216 +// { dg-options -std=c++0x } + +int main() { + new int[1]{}; +}
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
What does gcc -mcpu=niagara3 -mvis give to you for the following source file: long long _vis3_fpadd64 (long long __X, long long __Y) { return __builtin_vis_fpadd64 (__X, __Y); } Nothing at all, with or without the options. That's what the sparc.exp test is using. And that's what I tested of course. I would expect that to spit out a warning. Do I need to explicitly add -Wall, -Wno-implicit or similar? Similar tests in i386.exp don't seem to need this and that was what I used as my template. -Wall does yield a warning: vis.c: In function '_vis3_fpadd64': vis.c:4:3: warning: implicit declaration of function '__builtin_vis_fpadd64' [-Wimplicit-function-declaration] -- Eric Botcazou
Re: [testsuite] modify powerpc test for hard_float target, skip powerpc/warn-[12].c for soft-float
On Oct 11, 2011, at 11:08 AM, Janis Johnson wrote: On 10/10/2011 01:19 PM, Janis Johnson wrote: Tests gcc.target/powerpc/warn-[12].c fail for soft-float multilibs with the unexpected warning -mvsx requires hardware floating point [enabled by default]. This patch skips those tests for soft-float multilibs and modifies the powerpc check for a soft-float effective target to return true for either __NO_FPRS__ or _SOFT_FLOAT being defined. Is this OK for trunk? I must admit that I'm not sure what all those Power float variants are for. On second thought these tests should use /* { dg-require-effective-target powerpc_vsx_ok } */ instead of requiring hard_float, and forget about the proposed change to effective target hard_float. Is that OK? I was hoping a ppc person would chime in... Ok.
[pph] Make libcpp symbol validation a warning (issue5235061)
Currently, the consistency check done on pre-processor symbols is triggering on symbols that are not really problematic (e.g., symbols used for double-include guards). The problem is that in the testsuite, we are refusing to process PPH images that fail that test, which means we don't get to test other issues. To avoid this, I changed the error() call to warning(). Seemed innocent enough, but there were more problems behind that one: 1- We do not really try to avoid reading PPH images more than once. This problem is different than the usual double-inclusion guard. For instance, suppose a file foo.pph includes 1.pph, 2.pph and 3.pph. When generating foo.pph, we read all 3 files just once and double-include guards do not need to trigger. However, if we are later building a TU with: #include 2.pph #include foo.pph we first read 2.pph and when reading foo.pph, we try to read 2.pph again, because it is mentioned in foo.pph's line map table. I added a guard in pph_stream_open() so it doesn't try to open the same file more than once, but that meant adjusting some of the assertions while reading the line table. We should not expect to find foo.pph's line map table exactly like the one we wrote. 2- We cannot keep a global list of included files to consult external caches. In the example above, if foo.pph needs to resolve an external reference to 2.pph, it needs to index into the 2nd slot in its include vector. However, the file including foo.pph needs to index into the 1st slot in its include vector (since 2.pph is included first). This meant moving the includes field inside struct pph_stream. 3- When reading a type, we should not try to access the method vector for it, unless the type is a class. There's some more consequences of this, but the patch was starting to become too big, so I'm submitting it now. This fixes a couple of files and changes the expected error on another two. I'll be fixing them separately. Tested on x86_64. Committed to branch. Diego. * pph-streamer-in.c (pph_reading_includes): Remove. Update all users. (pph_in_include): Call pph_add_include with the newly materialized stream. (pph_in_line_table_and_includes): Document differences between non-PPH compiles and PPH compiles wrt line table behaviour. Modify assertions accordingly. (pph_read_tree_header): Tidy. (report_validation_error): Change error() call to warning(). (pph_image_already_read): Remove. Update all users. (pph_read_file_1): If STREAM-IN_MEMORY_P is set, return. Call pph_mark_strea_read. (pph_add_read_image): Remove. (pph_read_file): Change return type to pph_stream *. Update all users. (pph_reader_finish): Remove. * pph-streamer-out.c (pph_writer_init): Tidy. (pph_add_include): Remove. (pph_get_marker_for): Always consult the pre-loaded cache first. (pph_writer_add_include): New. * pph-streamer.c (pph_read_images): Make static. (pph_init_preloaded_cache): Make static. (pph_streamer_init): New. (pph_streamer_finish): New. (pph_find_stream_for): New. (pph_mark_stream_read): New. (pph_stream_open): Call pph_find_stream_for. If the stream already exists, return it. (pph_add_include): Move from pph-streamer-in.c. Add new argument STREAM. (pph_cache_lookup_in_includes): Add new argument STREAM. Update all users. * pph-streamer.h (pph_read_images): Remove extern declaration. Move field INCLUDES out of union W. Update all users. Add field IN_MEMORY_P. (pph_streamer_init): Declare. (pph_streamer_finish): Declare. (pph_mark_stream_read): Declare. (pph_add_include): Declare. (pph_writer_add_include): Declare. * pph.c (pph_include_handler): Call pph_writer_add_include. (pph_init): Call pph_streamer_init. (pph_finish): Call pph_streamer_finish. testsuite/ChangeLog.pph * g++.dg/pph/d1symnotinc.cc: Change expected error. * g++.dg/pph/x7dynarray6.cc: Likewise. * g++.dg/pph/x7dynarray7.cc: Likewise. * g++.dg/pph/x5dynarray7.h: Mark fixed. * g++.dg/pph/x6dynarray6.h: Mark fixed. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index fbd78d0..ffa1433 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -47,12 +47,6 @@ DEF_VEC_ALLOC_P(char_p,heap); memory will remain allocated until the end of compilation. */ static VEC(char_p,heap) *string_tables = NULL; -/* Increment when we are in the process of reading includes as we do not want - to add those to the parent pph stream's list of includes to be written out. - Decrement when done. We cannot use a simple true/false flag as read includes - will call pph_in_includes as well. */
Re: [PATCH] Fix PR46556 (poor address generation)
On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther rguent...@suse.de wrote: this function misses to transfer TREE_THIS_NOTRAP which is supposed to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF. If you make the function generic please adjust it to at least do ... ... TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base); This line was indeed added to the patch as committed. This appears to have broken the build of libgo. I now get this: ../../../gccgo3/libgo/go/image/png/writer.go: In function ‘png.writeIDATs.pN23_libgo_image.png.encoder’: ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement marked for throw, but doesn’t # .MEM_775 = VDEF .MEM_774 MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPRstruct { uint8 * __values; int __count; int __capacity; }(GOTMP.495); ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement marked for throw, but doesn’t # .MEM_776 = VDEF .MEM_775 D.7574 = MEM[base: D.8325_1069, offset: 0B]; ../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler error: verify_gimple failed Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. I have not yet done a full investigation, but it appears that this function is now marking a newly created reference as TREE_THIS_NOTRAP, which it did not previously do. The new instruction is within an exception region, and the tree-cfg checker insists that instructions in exception region are permitted to trap. It may be that the ivopts pass now requires TODO_cleanup_cfg, or it may be something more complicated. You should be able to recreate the problem yourself by using --enable-languages=go when you run configure. Ian
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
Cool, Eric could you quickly test the following? This still leaves the i386.exp case issue open, it stands to reason that something like -Wall is needed for those tests too. I think that we should go the i386 way. This works on i386 because the builtins are always available (when you pass the right options) and the assembler rejects the unknown instructions. So in config/sparc/sparc.h: #ifndef HAVE_AS_FMAF_HPC_VIS3 #define AS_NIAGARA3_FLAG b #undef TARGET_FMAF #define TARGET_FMAF 0 #undef TARGET_VIS3 #define TARGET_VIS3 0 #else #define AS_NIAGARA3_FLAG d #endif we shouldn't force TARGET_FMAF and TARGET_VIS3 to 0. The configure test would only be used to compute default options. -- Eric Botcazou
Go patch committed: Correct ChangeLog and spacing
I committed this patch to mainline to remove an incorrect ChangeLog entry (the gofrontend directory lives elsewhere; gcc/go/ChangeLog only applies to the files in gcc/go outside of gcc/go/gofrontend) and to fix spacing in a Go frontend file. Ian Index: gofrontend/gogo-tree.cc === --- gofrontend/gogo-tree.cc (revision 179825) +++ gofrontend/gogo-tree.cc (working copy) @@ -69,7 +69,7 @@ define_builtin(built_in_function bcode, libname, NULL_TREE); if (const_p) TREE_READONLY(decl) = 1; - set_builtin_decl (bcode, decl, true); + set_builtin_decl(bcode, decl, true); builtin_functions[name] = decl; if (libname != NULL) { Index: ChangeLog === --- ChangeLog (revision 179825) +++ ChangeLog (working copy) @@ -1,13 +1,3 @@ -2011-10-11 Michael Meissner meiss...@linux.vnet.ibm.com - - * gofrontend/gogo-tree.cc (define_builtin): Delete old interface - with two parallel arrays to hold standard builtin declarations, - and replace it with a function based interface that can support - creating builtins on the fly in the future. Change all uses, and - poison the old names. Make sure 0 is not a legitimate builtin - index. - (Gogo::make_trampoline(tree): Ditto. - 2011-08-24 Roberto Lublinerman rlu...@gmail.com * lang.opt: Add fgo-optimize-.
Re: [PATCH] [Annotalysis] Bugfix for spurious thread safety warnings with shared mutexes
On Mon, Oct 10, 2011 at 3:37 PM, Delesley Hutchins deles...@google.com wrote: --- gcc/tree-threadsafe-analyze.c (revision 179771) +++ gcc/tree-threadsafe-analyze.c (working copy) @@ -1830,14 +1830,27 @@ remove_lock_from_lockset (tree lockable, struct po This feels like a bug in lock_set_contains(), not remove_lock_from_lockset(). I'd modify lock_set_contains() as follows: 1) During the universal lock conditional, remove the return statement. Instead, set default_lock = lock (where default_lock is a new variable initialized to NULL_TREE). 2) Anywhere NULL_TREE is returned later, replace it with default_lock. Ollie
[SPARC] Fix PR target/49965
This is a regression present on mainline and 4.6/4.5 branch. We generate wrong code for the movcc patterns if the operands of the comparison have TFmode and TARGET_HARD_QUAD is not set, because we fail to update the comparison code after going through the comparison routine. Tested on SPARC/Solaris, applied to mainline and 4.6/4.5 branch. 2011-10-11 Eric Botcazou ebotca...@adacore.com PR target/49965 * config/sparc/sparc.md (movI:modecc): Do not save comparison code. (movF:modecc): Likewise. -- Eric Botcazou Index: config/sparc/sparc.md === --- config/sparc/sparc.md (revision 179736) +++ config/sparc/sparc.md (working copy) @@ -2614,11 +2614,9 @@ (define_expand movI:modecc (match_operand:I 3 arith10_operand )))] TARGET_V9 !(I:MODEmode == DImode TARGET_ARCH32) { - enum rtx_code code = GET_CODE (operands[1]); rtx cc_reg; - if (GET_MODE (XEXP (operands[1], 0)) == DImode - ! TARGET_ARCH64) + if (GET_MODE (XEXP (operands[1], 0)) == DImode !TARGET_ARCH64) FAIL; if (GET_MODE (XEXP (operands[1], 0)) == TFmode !TARGET_HARD_QUAD) @@ -2629,12 +2627,14 @@ (define_expand movI:modecc if (XEXP (operands[1], 1) == const0_rtx GET_CODE (XEXP (operands[1], 0)) == REG GET_MODE (XEXP (operands[1], 0)) == DImode - v9_regcmp_p (code)) + v9_regcmp_p (GET_CODE (operands[1]))) cc_reg = XEXP (operands[1], 0); else cc_reg = gen_compare_reg (operands[1]); - operands[1] = gen_rtx_fmt_ee (code, GET_MODE (cc_reg), cc_reg, const0_rtx); + operands[1] += gen_rtx_fmt_ee (GET_CODE (operands[1]), GET_MODE (cc_reg), cc_reg, + const0_rtx); }) (define_expand movF:modecc @@ -2644,11 +2644,9 @@ (define_expand movF:modecc (match_operand:F 3 register_operand )))] TARGET_V9 TARGET_FPU { - enum rtx_code code = GET_CODE (operands[1]); rtx cc_reg; - if (GET_MODE (XEXP (operands[1], 0)) == DImode - ! TARGET_ARCH64) + if (GET_MODE (XEXP (operands[1], 0)) == DImode !TARGET_ARCH64) FAIL; if (GET_MODE (XEXP (operands[1], 0)) == TFmode !TARGET_HARD_QUAD) @@ -2659,12 +2657,14 @@ (define_expand movF:modecc if (XEXP (operands[1], 1) == const0_rtx GET_CODE (XEXP (operands[1], 0)) == REG GET_MODE (XEXP (operands[1], 0)) == DImode - v9_regcmp_p (code)) + v9_regcmp_p (GET_CODE (operands[1]))) cc_reg = XEXP (operands[1], 0); else cc_reg = gen_compare_reg (operands[1]); - operands[1] = gen_rtx_fmt_ee (code, GET_MODE (cc_reg), cc_reg, const0_rtx); + operands[1] += gen_rtx_fmt_ee (GET_CODE (operands[1]), GET_MODE (cc_reg), cc_reg, + const0_rtx); }) ;; Conditional move define_insns
Re: fix for c++/44473, mangling of decimal types, checked in
Ok, it has been committed to both the FSF 4.6 and 4.5 branches now. The ChangeLog entry is in the wrong file, it must be moved to cp/ChangeLog. -- Eric Botcazou
Re: [wwwdocs] gcc-4.6/porting_to.html
I realized this one hasn't made it in, but is really nice. I made a number of minor edits (typos, markup, simplifying headings,... among others). What do you think -- should we include this? Many users still won't have GCC 4.6 deployed yet, so I think it's still worth it. What do you think? Ouch. I see this is not in, and I though I checked in the draft months ago. Please check this in immediately!!! -benjamin
Re: fix for c++/44473, mangling of decimal types, checked in
On Tue, 2011-10-11 at 23:34 +0200, Eric Botcazou wrote: Ok, it has been committed to both the FSF 4.6 and 4.5 branches now. The ChangeLog entry is in the wrong file, it must be moved to cp/ChangeLog. Oops, thanks for catching that! Fixed now. Peter
Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.
I see, so we can test the code generation in the testsuite even if the compiler was built against an assembler without support for the instructions. At least partially, yes. But in such a case, I'm unsure if I understand why i386.exp needs these tests at all. The presence of support for a particular i386 intrinsic is an implicit property of the gcc sources that these test cases are a part of. If the tests are properly added only once the code to support the i386 intrinsic is added as well, the checks seem superfluous. The check is an _object_ check, for example: proc check_effective_target_sse4 { } { return [check_no_compiler_messages sse4.1 object { so it checks that an object file can be produced. You indeed don't need to invoke the check via the sse4.1 tag if you use: /* { dg-do compile } */ in your tests, but you do need the sse4.1 tag if you use: /* { dg-do assemble } */ or /* { dg-do run } */ So the first category of tests will always be executed, whereas the latter two will only be executed if you have the binutils support. -- Eric Botcazou
Re: [arm-embedded] Tune loop unrolling for cortex-m
On Wed, 21 Sep 2011, Joey Ye wrote: Committed in ARM/embedded-4_6-branch. 2011-09-21 Jiangning Liu jiangning@arm.com Tune loop unrolling for cortex-m * config/arm/arm-cores.def (cortex-m0): Change to new tune cortex_v6m. (cortex-m1): Likewise. * config/arm/arm-protos.h (max_unroll_times): New. * config/arm/arm.c (arm_default_unroll_times): New. (arm_cortex_m_unroll_times): New. (arm_cortex_v6m_tune): New. (arm_slowmul_tune): Add max_unroll_times function pointer. (arm_fastmul_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a9_tune, arm_cortex_v7m_tune, arm_cortex_v6m_tune, arm_fa726te_tune): Likewise. (arm_option_override): Enable loop unroll for all all M class Cores, if optimization level is = 1. Shouldn't this kind of stuff get into trunk as well? brgds, H-P
Re: Fix PR 50565 (offsetof-type expressions in static initializers)
On Tue, Oct 11, 2011 at 10:32 AM, Joseph S. Myers jos...@codesourcery.com wrote: The problem comes down to an expression with the difference of two pointers being cast to int on a 64-bit system, resulting in convert_to_integer moving the conversions inside the subtraction. (These optimizations at conversion time should really be done later as a part of folding, or even later than that, rather than unconditionally in convert_to_*, but that's another issue.) Interesting. C++11 classified this as linktime constants, e.g. they are constant expressions for static initialization purposes, but not compile-time constant expressions, precisely because of this kind of issues.
Re: [PATCH 5/9] [SMS] Support new loop pattern
On Fri, Sep 30, 2011 at 5:22 PM, Roman Zhuykov zhr...@ispras.ru wrote: 2011/7/21 zhr...@ispras.ru: This patch should be applied only after pending patches by Revital. Ping. New version is attached, it suits current trunk without additional patches. Thanks for the ping. Also this related patch needs approval: http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01804.html The loop should meet the following requirements. First three are the same as for loop with doloop pattern: ... The next three describe the control part of new supported loops. - the last jump instruction should look like: pc=(regF!=0)?label:pc, regF is you'd probably want to bump to next instruction if falling through, e.g., pc=(regF!=0)?label:pc+4 flag register; - the last instruction which sets regF should be: regF=COMPARE(regC,X), where X is a constant, or maybe a register, which is not changed inside a loop; - only one instruction modifies regC inside a loop (other can use regC, but not write), and it should simply adjust it by a constant: regC=regC+step, where step is a constant. When doloop is succesfully scheduled by SMS, its number of iterations of loop kernel should be decreased by the number of stages in a schedule minus one, while other iterations expand to prologue and epilogue. In new supported loops such approach can't be used, because some instructions can use count register (regC). Instead of this, the final register value X in compare instruction regF=COMPARE(regC,X) is changed to another value Y respective to the stage this instruction is scheduled (Y = X - stage * step). making sure this does not underflow; i.e., that the number of iterations is no less than stage (you've addressed this towards the end below). The main difference from doloop case is that regC can be used by some instructions in loop body. That's why we are unable to simply adjust regC initial value, but have to keep it's value correct on each particular iteration. So, we change comparison instruction accordingly. An example: int a[100]; int main() { int i; for (i = 85; i 12; i -= 5) a[i] = i * i; return a[15]-225; } ARM assembler with -O2 -fno-auto-inc-dec: ldr r0, .L5 mov r3, #85 mov r2, r0 .L2: mul r1, r3, r3 sub r3, r3, #5 cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 bne .L2 ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Loop body is executed 15 times. When compiling with SMS, it finds a schedule with ii=7, stage_count=3 and following times: Stage Time Insn 0 5 mul r1, r3, r3 1 10 sub r3, r3, #5 1 11 cmp r3, #10 1 11 str r1, [r2, #340] 1 13 bne .L2 2 16 sub r2, r2, #20 branch is not scheduled last? To make new schedule correct the loop body should be executed 14 times and we change compare instruction: the loop itself should execute 13 times. regF=COMPARE(regC,X) to regF=COMPARE(regC,Y) where Y = X - stage * step. In our example regC is r3, X is 10, step = -5, compare instruction is scheduled on stage 1, so it should be Y = 10 - 1 * (-5) = 15. right. In general, if the compare is on stage s (starting from 0), it will be executed s times in the epilog, so it should exit the loop upon reaching Y = X - s * step. So, after SMS it looks like: ldr r0, .L5 mov r3, #85 mov r2, r0 ;;prologue mul r1, r3, r3 ;;from stage 0 first iteration sub r3, r3, #5 ;;3 insns from stage 1 first iteration cmp r3, #10 str r1, [r2, #340] mul r1, r3, r3 ;;from stage 0 second iteration ;;body .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 ;; new value to compare with is Y=15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 ;;epilogue sub r2, r2, #20 ;;from stage 2 pre-last iteration sub r3, r3, #5 ;;3 insns from stage 1 last iteration cmp r3, #10 str r1, [r2, #340] sub r2, r2, #20 ;;from stage 2 last iteration ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L5: .word a Real ARM assembler with SMS (after some optimizations and without dead code): mov r3, #85 ldr r0, .L8 mul r1, r3, r3 sub r3, r3, #5 mov r2, r0 str r1, [r0, #340] mul r1, r3, r3 .L2: sub r3, r3, #5 sub r2, r2, #20 cmp r3, #15 str r1, [r2, #340] mul r1, r3, r3 bne .L2 str r1, [r2, #320] ldr r0, [r0, #60] sub r0, r0, #225 bx lr .L8: .word a
Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]
On 6 October 2011 18:52, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place relative to the branch target of the compare, since the load could float up beyond the ldrex. PR target/48126 * config/arm/arm.c (arm_output_sync_loop): Move label before barrier diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5161439..6e7105a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit, } } - arm_process_output_memory_barrier (emit, NULL); + /* Note: label is before barrier so that in cmp failure case we still get + a barrier to stop subsequent loads floating upwards past the ldrex + pr/48126. */ arm_output_asm_insn (emit, 1, operands, %sLSYB%%=:, LOCAL_LABEL_PREFIX); + arm_process_output_memory_barrier (emit, NULL); } static rtx OK. Ramana
Re: [Patch 4/5] ARM 64 bit sync atomic operations [V3]
On 6 October 2011 18:54, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: Add ARM 64bit sync helpers for use on older ARMs. Based on 32bit versions but with check for sufficiently new kernel version. gcc/ * config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c) * config/arm/linux-atomic.c: Change comment to point to 64bit version (SYNC_LOCK_RELEASE): Instantiate 64bit version. * config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c OK. Ramana diff --git a/gcc/config/arm/linux-atomic-64bit.c b/gcc/config/arm/linux-atomic-64bit.c new file mode 100644 index 000..6966e66 --- /dev/null +++ b/gcc/config/arm/linux-atomic-64bit.c @@ -0,0 +1,166 @@ +/* 64bit Linux-specific atomic operations for ARM EABI. + Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc. + Based on linux-atomic.c + + 64 bit additions david.gilb...@linaro.org + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +http://www.gnu.org/licenses/. */ + +/* 64bit helper functions for atomic operations; the compiler will + call these when the code is compiled for a CPU without ldrexd/strexd. + (If the CPU had those then the compiler inlines the operation). + + These helpers require a kernel helper that's only present on newer + kernels; we check for that in an init section and bail out rather + unceremoneously. */ + +extern unsigned int __write (int fd, const void *buf, unsigned int count); +extern void abort (void); + +/* Kernel helper for compare-and-exchange. */ +typedef int (__kernel_cmpxchg64_t) (const long long* oldval, + const long long* newval, + long long *ptr); +#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60) + +/* Kernel helper page version number. */ +#define __kernel_helper_version (*(unsigned int *)0x0ffc) + +/* Check that the kernel has a new enough version at load. */ +static void __check_for_sync8_kernelhelper (void) +{ + if (__kernel_helper_version 5) + { + const char err[] = A newer kernel is required to run this binary. + (__kernel_cmpxchg64 helper)\n; + /* At this point we need a way to crash with some information + for the user - I'm not sure I can rely on much else being + available at this point, so do the same as generic-morestack.c + write () and abort (). */ + __write (2 /* stderr. */, err, sizeof (err)); + abort (); + } +}; + +static void (*__sync8_kernelhelper_inithook[]) (void) + __attribute__ ((used, section (.init_array))) = { + __check_for_sync8_kernelhelper +}; + +#define HIDDEN __attribute__ ((visibility (hidden))) + +#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP) \ + long long HIDDEN \ + __sync_fetch_and_##OP##_8 (long long *ptr, long long val) \ + { \ + int failure; \ + long long tmp,tmp2; \ + \ + do { \ + tmp = *ptr; \ + tmp2 = PFX_OP (tmp INF_OP val); \ + failure = __kernel_cmpxchg64 (tmp, tmp2, ptr); \ + } while (failure != 0); \ + \ + return tmp; \ + } + +FETCH_AND_OP_WORD64 (add, , +) +FETCH_AND_OP_WORD64 (sub, , -) +FETCH_AND_OP_WORD64 (or, , |) +FETCH_AND_OP_WORD64 (and, , ) +FETCH_AND_OP_WORD64 (xor, , ^) +FETCH_AND_OP_WORD64 (nand, ~, ) + +#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH +#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH + +/* Implement both
Re: RFC: ARM: Add comments to emitted .eabi_attribute directives
Any objections to this version of the patch ? Fine with me though I think it's worthwhile to have such comments without -dA but that's my personal impression. I don't care either way. cheers Ramana Cheers Nick gcc/ChangeLog 2011-10-05 Nick Clifton ni...@redhat.com * config/arm/arm.c (EMIT_EABI_ATTRIBUTE): New macro. Used to emit a .eabi_attribute assembler directive, possibly with a comment attached. (asm_file_start): Use the new macro. Index: gcc/config/arm/arm.c === --- gcc/config/arm/arm.c (revision 179554) +++ gcc/config/arm/arm.c (working copy) @@ -22243,6 +22243,21 @@ asm_fprintf (stream, %U%s, name); } +/* This macro is used to emit an EABI tag and its associated value. + We emit the numerical value of the tag in case the assembler does not + support textual tags. (Eg gas prior to 2.20). If requested we include + the tag name in a comment so that anyone reading the assembler output + will know which tag is being set. */ +#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL) \ + do \ + { \ + asm_fprintf (asm_out_file, \t.eabi_attribute %d, %d, NUM, VAL); \ + if (flag_verbose_asm || flag_debug_asm) \ + asm_fprintf (asm_out_file, \t%s #NAME, ASM_COMMENT_START); \ + asm_fprintf (asm_out_file, \n); \ + } \ + while (0) + static void arm_file_start (void) { @@ -22274,9 +22289,9 @@ if (arm_fpu_desc-model == ARM_FP_MODEL_VFP) { if (TARGET_HARD_FLOAT) - asm_fprintf (asm_out_file, \t.eabi_attribute 27, 3\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3); if (TARGET_HARD_FLOAT_ABI) - asm_fprintf (asm_out_file, \t.eabi_attribute 28, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1); } } asm_fprintf (asm_out_file, \t.fpu %s\n, fpu_name); @@ -22285,31 +22300,24 @@ are used. However we don't have any easy way of figuring this out. Conservatively record the setting that would have been used. */ - /* Tag_ABI_FP_rounding. */ if (flag_rounding_math) - asm_fprintf (asm_out_file, \t.eabi_attribute 19, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1); + if (!flag_unsafe_math_optimizations) { - /* Tag_ABI_FP_denomal. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 20, 1\n); - /* Tag_ABI_FP_exceptions. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 21, 1\n); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1); } - /* Tag_ABI_FP_user_exceptions. */ if (flag_signaling_nans) - asm_fprintf (asm_out_file, \t.eabi_attribute 22, 1\n); - /* Tag_ABI_FP_number_model. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 23, %d\n, - flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1); - /* Tag_ABI_align8_needed. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 24, 1\n); - /* Tag_ABI_align8_preserved. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 25, 1\n); - /* Tag_ABI_enum_size. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 26, %d\n, - flag_short_enums ? 1 : 2); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23, + flag_finite_math_only ? 1 : 3); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1); + EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2); + /* Tag_ABI_optimization_goals. */ if (optimize_size) val = 4; @@ -22319,16 +22327,12 @@ val = 1; else val = 6; - asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val); + EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val); - /* Tag_CPU_unaligned_access. */ - asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n, - unaligned_access); + EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access); - /* Tag_ABI_FP_16bit_format. */ if (arm_fp16_format) - asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n, - (int)arm_fp16_format); + EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_16bit_format, 38, (int) arm_fp16_format); if (arm_lang_output_object_attributes_hook) arm_lang_output_object_attributes_hook();
Re: [Patch 2/5] ARM 64 bit sync atomic operations [V3]
On 6 October 2011 18:52, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: Micahel K. Edwards points out in PR/48126 that the sync is in the wrong place relative to the branch target of the compare, since the load could float up beyond the ldrex. PR target/48126 * config/arm/arm.c (arm_output_sync_loop): Move label before barrier diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 5161439..6e7105a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit, } } - arm_process_output_memory_barrier (emit, NULL); + /* Note: label is before barrier so that in cmp failure case we still get + a barrier to stop subsequent loads floating upwards past the ldrex + pr/48126. */ Just one minor nit I just noticed. Please correct this to PR 48126 in the comment rather than pr/48126. Otherwise OK. Ramana
Re: [Patch 3/5] ARM 64 bit sync atomic operations [V3]
On 6 October 2011 18:53, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: Add support for ARM 64bit sync intrinsics. gcc/ * arm.c (arm_output_ldrex): Support ldrexd. (arm_output_strex): Support strexd. (arm_output_it): New helper to output it in Thumb2 mode only. (arm_output_sync_loop): Support DI mode, Change comment to not support const_int. (arm_expand_sync): Support DI mode. * arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD. * iterators.md (NARROW): move from sync.md. (QHSD): New iterator for all current ARM integer modes. (SIDI): New iterator for SI and DI modes only. * sync.md (sync_predtab): New mode_attr (sync_compare_and_swapsi): Fold into sync_compare_and_swapmode (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsimode (sync_sync_optabsi): Fold into sync_sync_optabmode (sync_nandsi): Fold into sync_nandmode (sync_new_sync_optabsi): Fold into sync_new_sync_optabmode (sync_new_nandsi): Fold into sync_new_nandmode (sync_old_sync_optabsi): Fold into sync_old_sync_optabmode (sync_old_nandsi): Fold into sync_old_nandmode (sync_compare_and_swapmode): Support SI DI (sync_lock_test_and_setmode): Likewise (sync_sync_optabmode): Likewise (sync_nandmode): Likewise (sync_new_sync_optabmode): Likewise (sync_new_nandmode): Likewise (sync_old_sync_optabmode): Likewise (sync_old_nandmode): Likewise (arm_sync_compare_and_swapsi): Turn into iterator on SI DI (arm_sync_lock_test_and_setsi): Likewise (arm_sync_new_sync_optabsi): Likewise (arm_sync_new_nandsi): Likewise (arm_sync_old_sync_optabsi): Likewise (arm_sync_old_nandsi): Likewise (arm_sync_compare_and_swapmode NARROW): use sync_predtab, fix indent (arm_sync_lock_test_and_setsimode NARROW): Likewise (arm_sync_new_sync_optabmode NARROW): Likewise (arm_sync_new_nandmode NARROW): Likewise (arm_sync_old_sync_optabmode NARROW): Likewise (arm_sync_old_nandmode NARROW): Likewise OK . Please commit this by Friday if no one else objects. cheers Ramana
Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]
On 6 October 2011 18:54, Dr. David Alan Gilbert david.gilb...@linaro.org wrote: Test support for ARM 64bit sync intrinsics. gcc/testsuite/ * gcc.dg/di-longlong64-sync-1.c: New test. * gcc.dg/di-sync-multithread.c: New test. * gcc.target/arm/di-longlong64-sync-withhelpers.c: New test. * gcc.target/arm/di-longlong64-sync-withldrexd.c: New test. * lib/target-supports.exp: (arm_arch_*_ok): Series of effective-target tests for v5, v6, v6k, and v7-a, and add-options helpers. (check_effective_target_arm_arm_ok): New helper. (check_effective_target_sync_longlong): New helper. I would like one of the testsuite maintainers to have a second look at this. I'm not confident about my dejagnu foo to fully review this. Ramana
RE: [arm-embedded] Tune loop unrolling for cortex-m
-Original Message- From: Hans-Peter Nilsson [mailto:h...@bitrange.com] Sent: Wednesday, October 12, 2011 06:57 To: Joey Ye Cc: gcc-patches@gcc.gnu.org Subject: Re: [arm-embedded] Tune loop unrolling for cortex-m On Wed, 21 Sep 2011, Joey Ye wrote: Committed in ARM/embedded-4_6-branch. 2011-09-21 Jiangning Liu jiangning@arm.com Tune loop unrolling for cortex-m * config/arm/arm-cores.def (cortex-m0): Change to new tune cortex_v6m. (cortex-m1): Likewise. * config/arm/arm-protos.h (max_unroll_times): New. * config/arm/arm.c (arm_default_unroll_times): New. (arm_cortex_m_unroll_times): New. (arm_cortex_v6m_tune): New. (arm_slowmul_tune): Add max_unroll_times function pointer. (arm_fastmul_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a9_tune, arm_cortex_v7m_tune, arm_cortex_v6m_tune, arm_fa726te_tune): Likewise. (arm_option_override): Enable loop unroll for all all M class Cores, if optimization level is = 1. Shouldn't this kind of stuff get into trunk as well? Sure. Working on it. Thanks - Joey
Re: [Patch 5/5] ARM 64 bit sync atomic operations [V3]
On Oct 6, 2011, at 10:54 AM, Dr. David Alan Gilbert wrote: Test support for ARM 64bit sync intrinsics. Ok. Watch for any fallout on non-arm systems. I'd always invite people who think they know the best way to test volatile to chime in. There is the new infrastructure to test multi core synch issues with gdb trickery. As you want more beef, you can consider it. I'll note that I do sometimes wonder if this type of code isn't better handled in #if feature tests inside the testcases themselves.
Re: [google] record compiler options to .note sections
Attached is the new patch. Bootstrapped on x86_64, no regressions. gcc/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com Add a flag (-frecord-gcc-switches-in-elf) to record compiler command line options to .gnu.switches.text sections of the object file. * coverage.c (write_opts_to_asm): Write the options to .gnu.switches.text sections. * common.opt: Ditto. * opts.h: Ditto. gcc/c-family/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * c-opts.c (c_common_parse_file): Write the options to .gnu.switches.text sections. gcc/testsuite/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * gcc.dg/record-gcc-switches-in-elf-1.c: New test. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 179836) +++ gcc/doc/invoke.texi (working copy) @@ -391,6 +391,7 @@ -fpmu-profile-generate=@var{pmuoption} @gol -fpmu-profile-use=@var{pmuoption} @gol -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol +-frecord-gcc-switches-in-elf@gol -freorder-blocks-and-partition -freorder-functions @gol -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol @@ -8170,6 +8171,11 @@ number of times it is called. The params variable note-cgraph-section-edge-threshold can be used to only list edges above a certain threshold. + +@item -frecord-gcc-switches-in-elf +@opindex frecord-gcc-switches-in-elf +Record the command line options in the .gnu.switches.text elf section for sample +based LIPO to do module grouping. @end table The following options control compiler behavior regarding floating Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 179836) +++ gcc/c-family/c-opts.c (working copy) @@ -1109,6 +1109,8 @@ for (;;) { c_finish_options (); + if (flag_record_gcc_switches_in_elf i == 0) + write_opts_to_asm (); pch_init (); set_lipo_c_parsing_context (parse_in, i, verbose); push_file_scope (); Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c === --- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) +++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile} */ +/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i 100; i++) +{ + foobar(i); +} +} + +/* { dg-final { scan-assembler-times Dtest 1 } } */ Index: gcc/opts.h === --- gcc/opts.h (revision 179836) +++ gcc/opts.h (working copy) @@ -381,4 +381,5 @@ extern void set_struct_debug_option (struct gcc_options *opts, location_t loc, const char *value); +extern void write_opts_to_asm (void); #endif Index: gcc/coverage.c === --- gcc/coverage.c (revision 179836) +++ gcc/coverage.c (working copy) @@ -55,6 +55,7 @@ #include diagnostic-core.h #include intl.h #include l-ipo.h +#include dwarf2asm.h #include gcov-io.h #include gcov-io.c @@ -2146,4 +2147,69 @@ return 0; } +/* Write command line options to the .note section. */ + +void +write_opts_to_asm (void) +{ + size_t i; + cpp_dir *quote_paths, *bracket_paths, *pdir; + struct str_list *pdef, *pinc; + int num_quote_paths = 0; + int num_bracket_paths = 0; + + get_include_chains (quote_paths, bracket_paths); + + /* Write quote_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.quote_paths, + SECTION_DEBUG, NULL)); + for (pdir = quote_paths; pdir; pdir = pdir-next) +{ + if (pdir == bracket_paths) + break; + num_quote_paths++; +} + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_quote_paths, NULL); + for (pdir = quote_paths; pdir; pdir = pdir-next) +{ + if (pdir == bracket_paths) + break; + dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL); +} + + /* Write bracket_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.bracket_paths, + SECTION_DEBUG, NULL)); + for (pdir = bracket_paths; pdir; pdir = pdir-next) +num_bracket_paths++; + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_bracket_paths, NULL); + for (pdir = bracket_paths; pdir; pdir = pdir-next) +dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL); + + /* Write cpp_defines to ASM section. */ + switch_to_section (get_section
Re: [google] record compiler options to .note sections
ok. David On Tue, Oct 11, 2011 at 9:51 PM, Dehao Chen de...@google.com wrote: Attached is the new patch. Bootstrapped on x86_64, no regressions. gcc/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com Add a flag (-frecord-gcc-switches-in-elf) to record compiler command line options to .gnu.switches.text sections of the object file. * coverage.c (write_opts_to_asm): Write the options to .gnu.switches.text sections. * common.opt: Ditto. * opts.h: Ditto. gcc/c-family/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * c-opts.c (c_common_parse_file): Write the options to .gnu.switches.text sections. gcc/testsuite/ChangeLog.google-4_6: 2011-10-08 Dehao Chen de...@google.com * gcc.dg/record-gcc-switches-in-elf-1.c: New test. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 179836) +++ gcc/doc/invoke.texi (working copy) @@ -391,6 +391,7 @@ -fpmu-profile-generate=@var{pmuoption} @gol -fpmu-profile-use=@var{pmuoption} @gol -freciprocal-math -fregmove -frename-registers -freorder-blocks @gol +-frecord-gcc-switches-in-elf@gol -freorder-blocks-and-partition -freorder-functions @gol -frerun-cse-after-loop -freschedule-modulo-scheduled-loops @gol -fripa -fripa-disallow-asm-modules -fripa-disallow-opt-mismatch @gol @@ -8170,6 +8171,11 @@ number of times it is called. The params variable note-cgraph-section-edge-threshold can be used to only list edges above a certain threshold. + +@item -frecord-gcc-switches-in-elf +@opindex frecord-gcc-switches-in-elf +Record the command line options in the .gnu.switches.text elf section for sample +based LIPO to do module grouping. @end table The following options control compiler behavior regarding floating Index: gcc/c-family/c-opts.c === --- gcc/c-family/c-opts.c (revision 179836) +++ gcc/c-family/c-opts.c (working copy) @@ -1109,6 +1109,8 @@ for (;;) { c_finish_options (); + if (flag_record_gcc_switches_in_elf i == 0) + write_opts_to_asm (); pch_init (); set_lipo_c_parsing_context (parse_in, i, verbose); push_file_scope (); Index: gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c === --- gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) +++ gcc/testsuite/gcc.dg/record-gcc-switches-in-elf-1.c (revision 0) @@ -0,0 +1,16 @@ +/* { dg-do compile} */ +/* { dg-options -frecord-gcc-switches-in-elf -Dtest -dA } */ + +void foobar(int); + +void +foo (void) +{ + int i; + for (i = 0; i 100; i++) + { + foobar(i); + } +} + +/* { dg-final { scan-assembler-times Dtest 1 } } */ Index: gcc/opts.h === --- gcc/opts.h (revision 179836) +++ gcc/opts.h (working copy) @@ -381,4 +381,5 @@ extern void set_struct_debug_option (struct gcc_options *opts, location_t loc, const char *value); +extern void write_opts_to_asm (void); #endif Index: gcc/coverage.c === --- gcc/coverage.c (revision 179836) +++ gcc/coverage.c (working copy) @@ -55,6 +55,7 @@ #include diagnostic-core.h #include intl.h #include l-ipo.h +#include dwarf2asm.h #include gcov-io.h #include gcov-io.c @@ -2146,4 +2147,69 @@ return 0; } +/* Write command line options to the .note section. */ + +void +write_opts_to_asm (void) +{ + size_t i; + cpp_dir *quote_paths, *bracket_paths, *pdir; + struct str_list *pdef, *pinc; + int num_quote_paths = 0; + int num_bracket_paths = 0; + + get_include_chains (quote_paths, bracket_paths); + + /* Write quote_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.quote_paths, + SECTION_DEBUG, NULL)); + for (pdir = quote_paths; pdir; pdir = pdir-next) + { + if (pdir == bracket_paths) + break; + num_quote_paths++; + } + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_quote_paths, NULL); + for (pdir = quote_paths; pdir; pdir = pdir-next) + { + if (pdir == bracket_paths) + break; + dw2_asm_output_nstring (pdir-name, (size_t)-1, NULL); + } + + /* Write bracket_paths to ASM section. */ + switch_to_section (get_section (.gnu.switches.text.bracket_paths, + SECTION_DEBUG, NULL)); + for (pdir = bracket_paths; pdir; pdir = pdir-next) + num_bracket_paths++; + dw2_asm_output_nstring (in_fnames[0], (size_t)-1, NULL); + dw2_asm_output_data_uleb128 (num_bracket_paths, NULL); + for (pdir =
Re: [C++-11] User defined literals
On 10/11/2011 12:57 PM, Jason Merrill wrote: On 10/11/2011 12:55 PM, Jason Merrill wrote: On 10/09/2011 07:19 PM, Ed Smith-Rowland wrote: Does cp_parser_identifier (parser) *not* consume the identifier token? I'm pretty sure it does. It does. Does it work to only complain if !cp_parser_parsing_tentatively? I suppose not, if you got no complaints with cp_parser_error. Jason cp_parser_operator(function_id) is simply run twice in cp_parser_unqualified_id. Once inside cp_parser_template_id called at parser.c:4515. Once directly inside cp_parser_unqualified_id at parser.c:4525. cp_parser_template_id never succeeds with literal operator templates. I find that curious. But I haven't looked real hard and the things do get parsed somehow.
Re: PR c++/30195
2011/10/11 Jason Merrill ja...@redhat.com: On 10/10/2011 03:56 PM, Fabien Chêne wrote: It tried to add the target declaration of a USING_DECL in the method_vec of the class where the USING_DECL is declared. Thus, I copied the target decl, adjusted its access, and then called add_method with the target decl. Copying the decl is unlikely to do what we want, I think. Does putting the target decl directly into the method vec work? Unfortunately not, it ends up with the same error: undefined reference. Furthermore, I don't think it is the right approach since the access may be different between the member function and the using declaration... Never mind. If not, perhaps lookup_fnfields_1 should look through the field list for function USING_DECLs. That's what I've tried first, and it works. Though, I guess you mean lookup_field_r should perform an additional lookup if lookup_fnfields_1 does not find anything. The attached patch implement that, and eventually fixed c++/26256, c++/25994, c++/30195, c++/6936. tested x86_64-unknown-linux-gnu without new regressions. gcc/ChangeLog 2011-10-11 Fabien Chêne fab...@gcc.gnu.org PR c++/6936 PR c++/25994 PR c++/26256 PR c++/30195 * dbxout.c (dbxout_type_fields): Ignore using declarations. gcc/testsuite/ChangeLog 2011-10-11 Fabien Chêne fab...@gcc.gnu.org PR c++/6936 PR c++/25994 PR c++/26256 PR c++/30195 * g++.dg/lookup/using23.C: New. * g++.dg/lookup/using24.C: New. * g++.dg/lookup/using25.C: New. * g++.dg/lookup/using26.C: New. * g++.dg/lookup/using27.C: New. * g++.dg/lookup/using28.C: New. * g++.dg/lookup/using29.C: New. * g++.dg/lookup/using30.C: New. * g++.dg/lookup/using31.C: New. * g++.dg/lookup/using32.C: New. * g++.dg/lookup/using33.C: New. * g++.dg/lookup/using34.C: New. * g++.dg/lookup/using35.C: New. * g++.dg/lookup/using36.C: New. * g++.dg/lookup/using37.C: New. * g++.dg/lookup/using38.C: New. * g++.dg/debug/using4.C: New. * g++.dg/debug/using5.C: New. * g++.dg/cpp0x/forw_enum10.C: New. * g++.old-deja/g++.other/using1.C: Adjust. * g++.dg/template/using2.C: Likewise. gcc/cp/ChangeLog 2011-10-11 Fabien Chêne fab...@gcc.gnu.org PR c++/6936 PR c++/25994 PR c++/26256 PR c++/30195 * search.c (lookup_field_1): Get rid of the comment saying that USING_DECL should not be returned, and actually return USING_DECL if appropriate. (lookup_field_r): Call lookup_fnfields_slot with LOOKUP_USING=true instead of lookup_fnfields_1. (lookup_fnfields_slot): add a new parameter LOOKUP_USING, and perform an additional lookup for USING_DECLs targeting functions if LOOKUP_USING is set to true. * semantics.c (finish_member_declaration): Remove the check that prevents USING_DECLs from being verified by pushdecl_class_level. * typeck.c (build_class_member_access_expr): Handle USING_DECLs. * class.c (check_field_decls): Keep using declarations. (add_method): Remove a wrong diagnostic about conflicting using declarations. (type_has_move_assign): Call lookup_fnfields_slot with LOOKUP_USING set to false. * parser.c (cp_parser_nonclass_name): Handle USING_DECLs. * decl.c (start_enum): Call xref_tag whenever possible. * name-lookup.c (strip_using_decl): New function. (supplement_binding_1): Call strip_using_decl on decl and bval. Perform most of the checks with USING_DECLs stripped. Also check that the target decl and the target bval does not refer to the same declaration. Allow pushing an enum multiple times in a template class. (push_class_level_binding): Call strip_using_decl on decl and bval. Perform most of the checks with USING_DECLs stripped. Return true if both decl and bval refer to USING_DECLs and are dependent. * call.c (build_user_type_conversion_1): Call lookup_fnfields_slot with LOOKUP_USING set to false. -- Fabien using.patch Description: Binary data