RE: CR16 Port addition
PING 3: For review Hi, Please review the attached patch and you can view the explanations for the earlier communication below. NOTE: From now on , Jayant (jayant.so...@kpitcummins.com) will be posting the patches related to CR16. Please feel free to contact him if you need any information related to the patches posted. Thanks in advance, Sumanth G, PUNE (India). = Begin Message == -Original Message- From: Sumanth Gundapaneni Sent: Monday, May 30, 2011 6:57 PM To: 'Joseph Myers' Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jayant R. Sonar Subject: RE: CR16 Port addition Hi Joseph and Richard, Thanks for your time in reviewing the CR16 port and once again I am grateful for your valuable suggestions. Please find attached the patch "cr16-gcc.patch" which contains modifications as suggested by Joseph in his previous mail. For your kind information, I am providing the recent posts regarding CR16 patches. Initial post : http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01676.html Second post : http://gcc.gnu.org/ml/gcc-patches/2011-01/msg00803.html Third post : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00624.html Fourth post : http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01441.html Please review the patch and let me know if there should be any modifications in it. Joseph, please go through my explanation for your comments. >Please remove the remaining top level configure changes from the patch. The top level configure changes have been removed as advised. >Now you seem to have a stray change to the generated file >libstdc++-v3/configure. Again, you can probably just eliminate this The configure changes in libtsdc++-v3 are removed too and you can find the same in the updated patch. >and make the code in unwind-dw2-* use those macros, >instead of having separate copies of the files. A new file unwind-helper.h file is created in libgcc/config/cr16 directory as per your suggestion and defined a few macros in this newly defined file which are getting called from gcc/unwind-dw2.c. Please review the same and do let me know for further modifications, if any. We have identified issues related to exception handling using prints in unwind code and will debug the same to a greater extent in near future with further development of GNU based tools. >Since you first submitted the port, a file contrib/config-list.mk "cr16-elf" is added to config-list.mk file. gcc/ChangeLog: -- 2011-05-28 Sumanth G Jayant R Sonar * config.gcc: Add cr16-* support. * doc/extend.texi: Document cr16 extensions. * doc/install.texi: Document cr16 install. * doc/invoke.texi: Document cr16 options. * doc/md.texi: Document cr16 constraints. * config/cr16/cr16.c: New file. * config/cr16/cr16.h: New file. * config/cr16/cr16.md: New file. * config/cr16/cr16.opt: New file. * config/cr16/cr16-libgcc.s: New file. * config/cr16/cr16-protos.h: New file. * config/cr16/divmodhi3.c: New file. * config/cr16/predicates.md: New file. * config/cr16/constraints.md: New file. * config/cr16/t-cr16: New file. libgcc/ChangeLog: - 2011-05-28 Sumanth G Jayant R Sonar * config.host: Add National Semiconductor CR16 target (cr16-*-*). * config/cr16/crti.S: New file. * config/cr16/crtlibid.S: New file. * config/cr16/crtn.S: New file. * config/cr16/t-cr16: New file. * config/cr16/unwind-helper.h: New file. contrib/ChangeLog: -- 2011-05-28 Sumanth G Jayant R Sonar * config-list.mk: Add National Semiconductor CR16 target. Thanks in advance, Sumanth G, PUNE (India). = End Message ==
[committed] Fix ada build on HP-UX 10.20
The patch below fixes ada bootstrap on HP-UX 10. Patch preapproved by Eric. Tested on HP-UX 10.20. Dave -- J. David Anglin dave.ang...@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) 2011-07-14 John David Anglin PR ada/46350 * s-taprop-hpux-dce.adb (Abort_Task): Remove unnecessary cast. Index: s-taprop-hpux-dce.adb === --- s-taprop-hpux-dce.adb (revision 175188) +++ s-taprop-hpux-dce.adb (working copy) @@ -888,8 +888,7 @@ if T.Common.State = Interrupt_Server_Blocked_On_Event_Flag then System.Interrupt_Management.Operations.Interrupt_Self_Process - (System.Interrupt_Management.Interrupt_ID - (PIO.Get_Interrupt_ID (T))); + (PIO.Get_Interrupt_ID (T)); end if; end Abort_Task;
[PATCH 5/9] dwarf2cfi: Remove dw_cfi_row_ref typedef.
--- gcc/dwarf2cfi.c | 23 +++ 1 files changed, 11 insertions(+), 12 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 745e137..51fb824 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -74,14 +74,13 @@ typedef struct GTY(()) dw_cfi_row_struct HOST_WIDE_INT args_size; } dw_cfi_row; -typedef dw_cfi_row *dw_cfi_row_ref; /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; /* The state of the first row of the FDE table, which includes the state provided by the CIE. */ -static GTY(()) dw_cfi_row_ref cie_cfi_row; +static GTY(()) dw_cfi_row *cie_cfi_row; static GTY(()) unsigned long dwarf2out_cfi_label_num; @@ -209,10 +208,10 @@ new_cfi (void) /* Return a newly allocated CFI row, with no defined data. */ -static dw_cfi_row_ref +static dw_cfi_row * new_cfi_row (void) { - dw_cfi_row_ref row = ggc_alloc_cleared_dw_cfi_row (); + dw_cfi_row *row = ggc_alloc_cleared_dw_cfi_row (); row->cfa.reg = INVALID_REGNUM; @@ -221,10 +220,10 @@ new_cfi_row (void) /* Return a copy of an existing CFI row. */ -static dw_cfi_row_ref -copy_cfi_row (dw_cfi_row_ref src) +static dw_cfi_row * +copy_cfi_row (dw_cfi_row *src) { - dw_cfi_row_ref dst = ggc_alloc_dw_cfi_row (); + dw_cfi_row *dst = ggc_alloc_dw_cfi_row (); *dst = *src; dst->reg_save = VEC_copy (dw_cfi_ref, gc, src->reg_save); @@ -235,7 +234,7 @@ copy_cfi_row (dw_cfi_row_ref src) /* Free an allocated CFI row. */ static void -free_cfi_row (dw_cfi_row_ref row) +free_cfi_row (dw_cfi_row *row) { if (row != NULL) { @@ -311,7 +310,7 @@ add_cfi_restore (unsigned reg) that the register column is no longer saved. */ static void -update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi) +update_row_reg_save (dw_cfi_row *row, unsigned column, dw_cfi_ref cfi) { if (VEC_length (dw_cfi_ref, row->reg_save) <= column) VEC_safe_grow_cleared (dw_cfi_ref, gc, row->reg_save, column + 1); @@ -466,10 +465,10 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_location *loc, dw_cfa_location *remember) } /* The current, i.e. most recently generated, row of the CFI table. */ -static dw_cfi_row_ref cur_row; +static dw_cfi_row *cur_row; /* The row state from a preceeding DW_CFA_remember_state. */ -static dw_cfi_row_ref remember_row; +static dw_cfi_row *remember_row; /* The register used for saving registers to the stack, and its offset from the CFA. */ @@ -2316,7 +2315,7 @@ dwarf2out_frame_debug (rtx insn, bool after_p) /* Emit CFI info to change the state from OLD_ROW to NEW_ROW. */ static void -change_cfi_row (dw_cfi_row_ref old_row, dw_cfi_row_ref new_row) +change_cfi_row (dw_cfi_row *old_row, dw_cfi_row *new_row) { size_t i, n_old, n_new, n_max; dw_cfi_ref cfi; -- 1.7.6
[PATCH 9/9] dwarf2cfi: Generate and connect traces.
This kinda-sorta corresponds to Bernd's 007-dw2cfi patch. Certainly the same concepts of splitting the instruction stream into extended basic blocks is the same. This patch does a bit better job with the documentation. Also, I'm a bit more explicit about matching things up with the similar code from the regular CFG routines. What's missing at this point is any attempt to use DW_CFA_remember_state. I've deferred that for the moment because it's easy to test the state change code across epilogues, whereas the shrink-wrapping code is not in this tree and section switching is difficult to force. --- gcc/dwarf2cfi.c | 792 --- 1 files changed, 398 insertions(+), 394 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 9aa1208..10e5740 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -27,6 +27,7 @@ along with GCC; see the file COPYING3. If not see #include "flags.h" #include "rtl.h" #include "function.h" +#include "basic-block.h" #include "dwarf2.h" #include "dwarf2out.h" #include "dwarf2asm.h" @@ -86,34 +87,30 @@ DEF_VEC_ALLOC_O (reg_saved_in_data, heap); /* Since we no longer have a proper CFG, we're going to create a facsimile of one on the fly while processing the frame-related insns. - We create dw_trace structures for each instruction trace beginning at - at a label following a barrier (or beginning of the function), and - ending at a barrier (or the end of the function). + We create dw_trace_info structures for each extended basic block beginning + and ending at a "save point". Save points are labels, barriers, certain + notes, and of course the beginning and end of the function. As we encounter control transfer insns, we propagate the "current" - row state across the edges to the starts of traces. If an edge goes - to a label that is not the start of a trace, we ignore it. This - assumes that previous compiler transformations were correct, and that - we will reach the same row state from any source. (We can perform some - limited validation of this assumption, but without the full CFG we - cannot be sure of full validation coverage. It is expensive, so we - only do so with checking enabled.) + row state across the edges to the starts of traces. When checking is + enabled, we validate that we propagate the same data from all sources. All traces are members of the TRACE_INFO array, in the order in which they appear in the instruction stream. - All labels are given an LUID that indexes the LABEL_INFO array. If - the label is the start of a trace, the TRACE pointer will be non-NULL - and point into the TRACE_INFO array. */ + All save points are present in the TRACE_INDEX hash, mapping the insn + starting a trace to the dw_trace_info describing the trace. */ typedef struct { - /* The label that begins the trace. This will be NULL for the first - trace beginning at function entry. */ - rtx label; + /* The insn that begins the trace. */ + rtx head; /* The row state at the beginning and end of the trace. */ - dw_cfi_row *enter_row, *exit_row; + dw_cfi_row *beg_row, *end_row; + + /* True if this trace immediately follows NOTE_INSN_SWITCH_TEXT_SECTIONS. */ + bool switch_sections; /* The following variables contain data used in interpreting frame related expressions. These are not part of the "real" row state as defined by @@ -147,24 +144,15 @@ typedef struct DEF_VEC_O (dw_trace_info); DEF_VEC_ALLOC_O (dw_trace_info, heap); -typedef struct -{ - dw_trace_info *trace; - -#ifdef ENABLE_CHECKING - dw_cfi_row *check_row; -#endif -} dw_label_info; +typedef dw_trace_info *dw_trace_info_ref; -DEF_VEC_O (dw_label_info); -DEF_VEC_ALLOC_O (dw_label_info, heap); +DEF_VEC_P (dw_trace_info_ref); +DEF_VEC_ALLOC_P (dw_trace_info_ref, heap); /* The variables making up the pseudo-cfg, as described above. */ -#if 0 -static VEC (int, heap) *uid_luid; -static VEC (dw_label_info, heap) *label_info; static VEC (dw_trace_info, heap) *trace_info; -#endif +static VEC (dw_trace_info_ref, heap) *trace_work_list; +static htab_t trace_index; /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; @@ -189,9 +177,6 @@ static dw_trace_info *cur_trace; /* The current, i.e. most recently generated, row of the CFI table. */ static dw_cfi_row *cur_row; -/* The row state from a preceeding DW_CFA_remember_state. */ -static dw_cfi_row *remember_row; - /* We delay emitting a register save until either (a) we reach the end of the prologue or (b) the register is clobbered. This clusters register saves so that there are fewer pc advances. */ @@ -211,9 +196,6 @@ static VEC(queued_reg_save, heap) *queued_reg_saves; emitting this data, i.e. updating CUR_ROW, without async unwind. */ static HOST_WIDE_INT queued_args_size; -/* True if remember_state should be emitted before following CFI directive. */ -static bool
[PATCH 8/9] dwarf2cfi: Introduce dw_trace_info.
This patch only introduces the structure definition and adjusts the existing routines to use the new cur_trace global to access the variables that were moved into the structure. --- gcc/dwarf2cfi.c | 440 +-- 1 files changed, 266 insertions(+), 174 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 1d6413f..9aa1208 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -74,7 +74,98 @@ typedef struct GTY(()) dw_cfi_row_struct HOST_WIDE_INT args_size; } dw_cfi_row; - +/* The caller's ORIG_REG is saved in SAVED_IN_REG. */ +typedef struct GTY(()) reg_saved_in_data_struct { + rtx orig_reg; + rtx saved_in_reg; +} reg_saved_in_data; + +DEF_VEC_O (reg_saved_in_data); +DEF_VEC_ALLOC_O (reg_saved_in_data, heap); + +/* Since we no longer have a proper CFG, we're going to create a facsimile + of one on the fly while processing the frame-related insns. + + We create dw_trace structures for each instruction trace beginning at + at a label following a barrier (or beginning of the function), and + ending at a barrier (or the end of the function). + + As we encounter control transfer insns, we propagate the "current" + row state across the edges to the starts of traces. If an edge goes + to a label that is not the start of a trace, we ignore it. This + assumes that previous compiler transformations were correct, and that + we will reach the same row state from any source. (We can perform some + limited validation of this assumption, but without the full CFG we + cannot be sure of full validation coverage. It is expensive, so we + only do so with checking enabled.) + + All traces are members of the TRACE_INFO array, in the order in which + they appear in the instruction stream. + + All labels are given an LUID that indexes the LABEL_INFO array. If + the label is the start of a trace, the TRACE pointer will be non-NULL + and point into the TRACE_INFO array. */ + +typedef struct +{ + /* The label that begins the trace. This will be NULL for the first + trace beginning at function entry. */ + rtx label; + + /* The row state at the beginning and end of the trace. */ + dw_cfi_row *enter_row, *exit_row; + + /* The following variables contain data used in interpreting frame related + expressions. These are not part of the "real" row state as defined by + Dwarf, but it seems like they need to be propagated into a trace in case + frame related expressions have been sunk. */ + /* ??? This seems fragile. These variables are fragments of a larger + expression. If we do not keep the entire expression together, we risk + not being able to put it together properly. Consider forcing targets + to generate self-contained expressions and dropping all of the magic + interpretation code in this file. Or at least refusing to shrink wrap + any frame related insn that doesn't contain a complete expression. */ + + /* The register used for saving registers to the stack, and its offset + from the CFA. */ + dw_cfa_location cfa_store; + + /* A temporary register holding an integral value used in adjusting SP + or setting up the store_reg. The "offset" field holds the integer + value, not an offset. */ + dw_cfa_location cfa_temp; + + /* A set of registers saved in other registers. This is the inverse of + the row->reg_save info, if the entry is a DW_CFA_register. This is + implemented as a flat array because it normally contains zero or 1 + entry, depending on the target. IA-64 is the big spender here, using + a maximum of 5 entries. */ + VEC(reg_saved_in_data, heap) *regs_saved_in_regs; + +} dw_trace_info; + +DEF_VEC_O (dw_trace_info); +DEF_VEC_ALLOC_O (dw_trace_info, heap); + +typedef struct +{ + dw_trace_info *trace; + +#ifdef ENABLE_CHECKING + dw_cfi_row *check_row; +#endif +} dw_label_info; + +DEF_VEC_O (dw_label_info); +DEF_VEC_ALLOC_O (dw_label_info, heap); + +/* The variables making up the pseudo-cfg, as described above. */ +#if 0 +static VEC (int, heap) *uid_luid; +static VEC (dw_label_info, heap) *label_info; +static VEC (dw_trace_info, heap) *trace_info; +#endif + /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; @@ -82,6 +173,8 @@ cfi_vec cie_cfi_vec; state provided by the CIE. */ static GTY(()) dw_cfi_row *cie_cfi_row; +static GTY(()) reg_saved_in_data *cie_return_save; + static GTY(()) unsigned long dwarf2out_cfi_label_num; /* The insn after which a new CFI note should be emitted. */ @@ -90,6 +183,34 @@ static rtx add_cfi_insn; /* When non-null, add_cfi will add the CFI to this vector. */ static cfi_vec *add_cfi_vec; +/* The current instruction trace. */ +static dw_trace_info *cur_trace; + +/* The current, i.e. most recently generated, row of the CFI table. */ +static dw_cfi_row *cur_row; + +/* The row state from a preceeding DW_CFA_remember_state. */ +static dw_cfi_row *remember_row;
[RFC PATCH 0/9] CFG aware dwarf2 cfi generation
This finally brings us to something that can support shrink-wrapping. As mentioned in the description of the last patch, this is 95% of what Bernd had in his last 007-dw2cfg patch, except for the remember/ restore_state stuff. And hopefully with better comments. This is the first version of this that has actually made it into stage3 bootstrap on x86_64, so it isn't well tested yet. This just happens to coincide with the end of my work day, and it's been a while since I've shared state, so I thought I'd post for overnight review. The complete tree is available at git://repo.or.cz/gcc/rth.git rth/cfi-pass r~ Richard Henderson (9): dwarf2cfi: Introduce a dw_cfi_row state. dwarf2cfi: Rename cfi_insn to add_cfi_insn. dwarf2cfi: Populate CUR_ROW->REG_SAVE. dwarf2cfi: Implement change_cfi_row. dwarf2cfi: Remove dw_cfi_row_ref typedef. dwarf2cfi: Convert queued_reg_save to a VEC. dwarf2cfi: Allocate reg_saved_in_data in the heap. dwarf2cfi: Introduce dw_trace_info. dwarf2cfi: Generate and connect traces. gcc/dwarf2cfi.c | 1672 ++- gcc/dwarf2out.c | 159 -- gcc/dwarf2out.h |5 +- 3 files changed, 1030 insertions(+), 806 deletions(-) -- 1.7.6
[PATCH 4/9] dwarf2cfi: Implement change_cfi_row.
Add a generic function to adjust cfi state from one row to another. Use this to implement text section switching. This will also be usable for arbitrary changes around a cfg for shrink-wrapping. --- gcc/dwarf2cfi.c | 376 +-- gcc/dwarf2out.c | 159 ++-- gcc/dwarf2out.h |3 +- 3 files changed, 290 insertions(+), 248 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 36fa7f8..745e137 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -285,6 +285,28 @@ add_cfi (dw_cfi_ref cfi) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } +static void +add_cfi_args_size (HOST_WIDE_INT size) +{ + dw_cfi_ref cfi = new_cfi (); + + cfi->dw_cfi_opc = DW_CFA_GNU_args_size; + cfi->dw_cfi_oprnd1.dw_cfi_offset = size; + + add_cfi (cfi); +} + +static void +add_cfi_restore (unsigned reg) +{ + dw_cfi_ref cfi = new_cfi (); + + cfi->dw_cfi_opc = (reg & ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore); + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg; + + add_cfi (cfi); +} + /* Perform ROW->REG_SAVE[COLUMN] = CFI. CFI may be null, indicating that the register column is no longer saved. */ @@ -474,64 +496,109 @@ cfa_equal_p (const dw_cfa_location *loc1, const dw_cfa_location *loc2) || loc1->base_offset == loc2->base_offset)); } -/* This routine does the actual work. The CFA is now calculated from - the dw_cfa_location structure. */ +/* Determine if two CFI operands are identical. */ -static void -def_cfa_1 (dw_cfa_location *loc_p) +static bool +cfi_oprnd_equal_p (enum dw_cfi_oprnd_type t, dw_cfi_oprnd *a, dw_cfi_oprnd *b) { - dw_cfi_ref cfi; - dw_cfa_location loc = *loc_p; + switch (t) +{ +case dw_cfi_oprnd_unused: + return true; +case dw_cfi_oprnd_reg_num: + return a->dw_cfi_reg_num == b->dw_cfi_reg_num; +case dw_cfi_oprnd_offset: + return a->dw_cfi_offset == b->dw_cfi_offset; +case dw_cfi_oprnd_addr: + return (a->dw_cfi_addr == b->dw_cfi_addr + || strcmp (a->dw_cfi_addr, b->dw_cfi_addr) == 0); +case dw_cfi_oprnd_loc: + return loc_descr_equal_p (a->dw_cfi_loc, b->dw_cfi_loc); +} + gcc_unreachable (); +} - if (cfa_store.reg == loc.reg && loc.indirect == 0) -cfa_store.offset = loc.offset; +/* Determine if two CFI entries are identical. */ + +static bool +cfi_equal_p (dw_cfi_ref a, dw_cfi_ref b) +{ + enum dwarf_call_frame_info opc; + + /* Make things easier for our callers, including missing operands. */ + if (a == b) +return true; + if (a == NULL || b == NULL) +return false; + + /* Obviously, the opcodes must match. */ + opc = a->dw_cfi_opc; + if (opc != b->dw_cfi_opc) +return false; + + /* Compare the two operands, re-using the type of the operands as + already exposed elsewhere. */ + return (cfi_oprnd_equal_p (dw_cfi_oprnd1_desc (opc), +&a->dw_cfi_oprnd1, &b->dw_cfi_oprnd1) + && cfi_oprnd_equal_p (dw_cfi_oprnd2_desc (opc), + &a->dw_cfi_oprnd2, &b->dw_cfi_oprnd2)); +} + +/* The CFA is now calculated from NEW_CFA. Consider OLD_CFA in determining + what opcode to emit. Returns the CFI opcode to effect the change, or + NULL if NEW_CFA == OLD_CFA. */ + +static dw_cfi_ref +def_cfa_0 (dw_cfa_location *old_cfa, dw_cfa_location *new_cfa) +{ + dw_cfi_ref cfi; /* If nothing changed, no need to issue any call frame instructions. */ - if (cfa_equal_p (&loc, &cur_row->cfa)) -return; + if (cfa_equal_p (old_cfa, new_cfa)) +return NULL; cfi = new_cfi (); - if (loc.reg == cur_row->cfa.reg && !loc.indirect && !cur_row->cfa.indirect) + if (new_cfa->reg == old_cfa->reg && !new_cfa->indirect && !old_cfa->indirect) { /* Construct a "DW_CFA_def_cfa_offset " instruction, indicating the CFA register did not change but the offset did. The data factoring for DW_CFA_def_cfa_offset_sf happens in output_cfi, or in the assembler via the .cfi_def_cfa_offset directive. */ - if (loc.offset < 0) + if (new_cfa->offset < 0) cfi->dw_cfi_opc = DW_CFA_def_cfa_offset_sf; else cfi->dw_cfi_opc = DW_CFA_def_cfa_offset; - cfi->dw_cfi_oprnd1.dw_cfi_offset = loc.offset; + cfi->dw_cfi_oprnd1.dw_cfi_offset = new_cfa->offset; } #ifndef MIPS_DEBUGGING_INFO /* SGI dbx thinks this means no offset. */ - else if (loc.offset == cur_row->cfa.offset - && cur_row->cfa.reg != INVALID_REGNUM - && !loc.indirect - && !cur_row->cfa.indirect) + else if (new_cfa->offset == old_cfa->offset + && old_cfa->reg != INVALID_REGNUM + && !new_cfa->indirect + && !old_cfa->indirect) { /* Construct a "DW_CFA_def_cfa_register " instruction, indicating the CFA register has changed to but the offset has not changed. */ cfi->dw_cfi_opc = DW_CFA_def_cfa_register; - cfi->dw_cfi_oprnd1.
[PATCH 1/9] dwarf2cfi: Introduce a dw_cfi_row state.
Use it instead of old_cfa, old_args_size, and cfa_remember variables. Remove the global cfa variable, as it was usually a duplicate of old_cfa and otherwise confusing. Always make a local copy of the cur_row->cfa variable before modification instead. --- gcc/dwarf2cfi.c | 208 +++ gcc/dwarf2out.h |2 +- 2 files changed, 135 insertions(+), 75 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 4e648ae..1c1b74f 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -58,9 +58,31 @@ along with GCC; see the file COPYING3. If not see /* Maximum size (in bytes) of an artificially generated label. */ #define MAX_ARTIFICIAL_LABEL_BYTES 30 +/* A collected description of an entire row of the abstract CFI table. */ +typedef struct GTY(()) dw_cfi_row_struct +{ + /* The expression that computes the CFA, expressed in two different ways. + The CFA member for the simple cases, and the full CFI expression for + the complex cases. The later will be a DW_CFA_cfa_expression. */ + dw_cfa_location cfa; + dw_cfi_ref cfa_cfi; + + /* The expressions for any register column that is saved. */ + cfi_vec reg_save; + + /* The value of any DW_CFA_GNU_args_size. */ + HOST_WIDE_INT args_size; +} dw_cfi_row; + +typedef dw_cfi_row *dw_cfi_row_ref; + /* A vector of call frame insns for the CIE. */ cfi_vec cie_cfi_vec; +/* The state of the first row of the FDE table, which includes the + state provided by the CIE. */ +static GTY(()) dw_cfi_row_ref cie_cfi_row; + static GTY(()) unsigned long dwarf2out_cfi_label_num; /* The insn after which a new CFI note should be emitted. */ @@ -185,6 +207,43 @@ new_cfi (void) return cfi; } +/* Return a newly allocated CFI row, with no defined data. */ + +static dw_cfi_row_ref +new_cfi_row (void) +{ + dw_cfi_row_ref row = ggc_alloc_cleared_dw_cfi_row (); + + row->cfa.reg = INVALID_REGNUM; + + return row; +} + +/* Return a copy of an existing CFI row. */ + +static dw_cfi_row_ref +copy_cfi_row (dw_cfi_row_ref src) +{ + dw_cfi_row_ref dst = ggc_alloc_dw_cfi_row (); + + *dst = *src; + dst->reg_save = VEC_copy (dw_cfi_ref, gc, src->reg_save); + + return dst; +} + +/* Free an allocated CFI row. */ + +static void +free_cfi_row (dw_cfi_row_ref row) +{ + if (row != NULL) +{ + VEC_free (dw_cfi_ref, gc, row->reg_save); + ggc_free (row); +} +} + /* Generate a new label for the CFI info to refer to. */ static char * @@ -371,28 +430,25 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_location *loc, dw_cfa_location *remember) } } -/* The current rule for calculating the DWARF2 canonical frame address. */ -static dw_cfa_location cfa; +/* The current, i.e. most recently generated, row of the CFI table. */ +static dw_cfi_row_ref cur_row; -/* A copy of the CFA, for comparison purposes. */ -static dw_cfa_location old_cfa; +/* The row state from a preceeding DW_CFA_remember_state. */ +static dw_cfi_row_ref remember_row; /* The register used for saving registers to the stack, and its offset from the CFA. */ static dw_cfa_location cfa_store; -/* The current save location around an epilogue. */ -static dw_cfa_location cfa_remember; - -/* Like cfa_remember, but a copy of old_cfa. */ -static dw_cfa_location old_cfa_remember; +/* A temporary register holding an integral value used in adjusting SP + or setting up the store_reg. The "offset" field holds the integer + value, not an offset. */ +static dw_cfa_location cfa_temp; -/* The running total of the size of arguments pushed onto the stack. */ +/* The (really) current value for DW_CFA_GNU_args_size. We delay actually + emitting this data, i.e. updating CUR_ROW, without async unwind. */ static HOST_WIDE_INT args_size; -/* The last args_size we actually output. */ -static HOST_WIDE_INT old_args_size; - /* Determine if two dw_cfa_location structures define the same data. */ bool @@ -412,21 +468,18 @@ static void def_cfa_1 (dw_cfa_location *loc_p) { dw_cfi_ref cfi; - dw_cfa_location loc; - - cfa = *loc_p; - loc = *loc_p; + dw_cfa_location loc = *loc_p; if (cfa_store.reg == loc.reg && loc.indirect == 0) cfa_store.offset = loc.offset; /* If nothing changed, no need to issue any call frame instructions. */ - if (cfa_equal_p (&loc, &old_cfa)) + if (cfa_equal_p (&loc, &cur_row->cfa)) return; cfi = new_cfi (); - if (loc.reg == old_cfa.reg && !loc.indirect && !old_cfa.indirect) + if (loc.reg == cur_row->cfa.reg && !loc.indirect && !cur_row->cfa.indirect) { /* Construct a "DW_CFA_def_cfa_offset " instruction, indicating the CFA register did not change but the offset did. The data @@ -440,10 +493,10 @@ def_cfa_1 (dw_cfa_location *loc_p) } #ifndef MIPS_DEBUGGING_INFO /* SGI dbx thinks this means no offset. */ - else if (loc.offset == old_cfa.offset - && old_cfa.reg != INVALID_REGNUM + else if (loc.offset == cur_row->cfa.
[PATCH 6/9] dwarf2cfi: Convert queued_reg_save to a VEC.
Also, allocate it in the heap instead of garbage collected. --- gcc/dwarf2cfi.c | 51 ++- 1 files changed, 26 insertions(+), 25 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 51fb824..a800cb4 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1091,14 +1091,16 @@ dwarf2out_notice_stack_adjust (rtx insn, bool after_p) of the prologue or (b) the register is clobbered. This clusters register saves so that there are fewer pc advances. */ -struct GTY(()) queued_reg_save { - struct queued_reg_save *next; +typedef struct { rtx reg; - HOST_WIDE_INT cfa_offset; rtx saved_reg; -}; + HOST_WIDE_INT cfa_offset; +} queued_reg_save; -static GTY(()) struct queued_reg_save *queued_reg_saves; +DEF_VEC_O (queued_reg_save); +DEF_VEC_ALLOC_O (queued_reg_save, heap); + +static VEC(queued_reg_save, heap) *queued_reg_saves; /* The caller's ORIG_REG is saved in SAVED_IN_REG. */ typedef struct GTY(()) reg_saved_in_data { @@ -1170,24 +1172,21 @@ record_reg_saved_in_reg (rtx dest, rtx src) static void queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t i; /* Duplicates waste space, but it's also necessary to remove them for correctness, since the queue gets output in reverse order. */ - for (q = queued_reg_saves; q != NULL; q = q->next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) if (compare_reg_or_pc (q->reg, reg)) - break; + goto found; - if (q == NULL) -{ - q = ggc_alloc_queued_reg_save (); - q->next = queued_reg_saves; - queued_reg_saves = q; -} + q = VEC_safe_push (queued_reg_save, heap, queued_reg_saves, NULL); + found: q->reg = reg; - q->cfa_offset = offset; q->saved_reg = sreg; + q->cfa_offset = offset; } /* Output all the entries in QUEUED_REG_SAVES. */ @@ -1195,9 +1194,10 @@ queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset) static void dwarf2out_flush_queued_reg_saves (void) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t i; - for (q = queued_reg_saves; q; q = q->next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) { unsigned int reg, sreg; @@ -1214,7 +1214,7 @@ dwarf2out_flush_queued_reg_saves (void) reg_save (reg, sreg, q->cfa_offset); } - queued_reg_saves = NULL; + VEC_truncate (queued_reg_save, queued_reg_saves, 0); } /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved @@ -1225,17 +1225,18 @@ dwarf2out_flush_queued_reg_saves (void) static bool clobbers_queued_reg_save (const_rtx insn) { - struct queued_reg_save *q; + queued_reg_save *q; + size_t iq; - for (q = queued_reg_saves; q; q = q->next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, iq, q) { - size_t i; + size_t ir; reg_saved_in_data *rir; if (modified_in_p (q->reg, insn)) return true; - FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, i, rir) + FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, ir, rir) if (compare_reg_or_pc (q->reg, rir->orig_reg) && modified_in_p (rir->saved_in_reg, insn)) return true; @@ -1250,11 +1251,11 @@ static rtx reg_saved_in (rtx reg) { unsigned int regn = REGNO (reg); - struct queued_reg_save *q; + queued_reg_save *q; reg_saved_in_data *rir; size_t i; - for (q = queued_reg_saves; q; q = q->next) + FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q) if (q->saved_reg && regn == REGNO (q->saved_reg)) return q->reg; @@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void) XDELETEVEC (barrier_args_size); barrier_args_size = NULL; regs_saved_in_regs = NULL; - queued_reg_saves = NULL; + VEC_free (queued_reg_save, heap, queued_reg_saves); free_cfi_row (cur_row); cur_row = NULL; -- 1.7.6
[PATCH 7/9] dwarf2cfi: Allocate reg_saved_in_data in the heap.
--- gcc/dwarf2cfi.c | 19 ++- 1 files changed, 10 insertions(+), 9 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index a800cb4..1d6413f 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -1109,13 +1109,13 @@ typedef struct GTY(()) reg_saved_in_data { } reg_saved_in_data; DEF_VEC_O (reg_saved_in_data); -DEF_VEC_ALLOC_O (reg_saved_in_data, gc); +DEF_VEC_ALLOC_O (reg_saved_in_data, heap); /* A set of registers saved in other registers. This is implemented as a flat array because it normally contains zero or 1 entry, depending on the target. IA-64 is the big spender here, using a maximum of 5 entries. */ -static GTY(()) VEC(reg_saved_in_data, gc) *regs_saved_in_regs; +static VEC(reg_saved_in_data, heap) *regs_saved_in_regs; static GTY(()) reg_saved_in_data *cie_return_save; @@ -1161,7 +1161,7 @@ record_reg_saved_in_reg (rtx dest, rtx src) if (dest == NULL) return; - elt = VEC_safe_push(reg_saved_in_data, gc, regs_saved_in_regs, NULL); + elt = VEC_safe_push(reg_saved_in_data, heap, regs_saved_in_regs, NULL); elt->orig_reg = src; elt->saved_in_reg = dest; } @@ -2699,6 +2699,9 @@ initial_return_save (rtx rtl) static unsigned int execute_dwarf2_frame (void) { + gcc_checking_assert (queued_reg_saves == NULL); + gcc_checking_assert (regs_saved_in_regs == NULL); + /* The first time we're called, compute the incoming frame state. */ if (cie_cfi_vec == NULL) { @@ -2737,7 +2740,7 @@ execute_dwarf2_frame (void) cie_return_save = ggc_alloc_reg_saved_in_data (); *cie_return_save = *VEC_index (reg_saved_in_data, regs_saved_in_regs, 0); - regs_saved_in_regs = NULL; + VEC_pop (reg_saved_in_data, regs_saved_in_regs); break; default: gcc_unreachable (); @@ -2748,12 +2751,10 @@ execute_dwarf2_frame (void) } /* Set up state for generating call frame debug info. */ - gcc_checking_assert (queued_reg_saves == NULL); - gcc_checking_assert (regs_saved_in_regs == NULL); - cur_row = copy_cfi_row (cie_cfi_row); if (cie_return_save) -VEC_safe_push (reg_saved_in_data, gc, regs_saved_in_regs, cie_return_save); +VEC_safe_push (reg_saved_in_data, heap, + regs_saved_in_regs, cie_return_save); cfa_store = cur_row->cfa; args_size = 0; @@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void) /* Reset all function-specific information, particularly for GC. */ XDELETEVEC (barrier_args_size); barrier_args_size = NULL; - regs_saved_in_regs = NULL; + VEC_free (reg_saved_in_data, heap, regs_saved_in_regs); VEC_free (queued_reg_save, heap, queued_reg_saves); free_cfi_row (cur_row); -- 1.7.6
[PATCH 2/9] dwarf2cfi: Rename cfi_insn to add_cfi_insn.
Make it consistent with add_cfi_vec. --- gcc/dwarf2cfi.c | 20 +++- 1 files changed, 11 insertions(+), 9 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index 1c1b74f..eb59f28 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -86,7 +86,7 @@ static GTY(()) dw_cfi_row_ref cie_cfi_row; static GTY(()) unsigned long dwarf2out_cfi_label_num; /* The insn after which a new CFI note should be emitted. */ -static rtx cfi_insn; +static rtx add_cfi_insn; /* When non-null, add_cfi will add the CFI to this vector. */ static cfi_vec *add_cfi_vec; @@ -274,11 +274,13 @@ add_cfi (dw_cfi_ref cfi) } any_cfis_emitted = true; - if (cfi_insn != NULL) + + if (add_cfi_insn != NULL) { - cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn); - NOTE_CFI (cfi_insn) = cfi; + add_cfi_insn = emit_note_after (NOTE_INSN_CFI, add_cfi_insn); + NOTE_CFI (add_cfi_insn) = cfi; } + if (add_cfi_vec != NULL) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } @@ -2319,7 +2321,7 @@ create_cfi_notes (void) { rtx pat; - cfi_insn = PREV_INSN (insn); + add_cfi_insn = PREV_INSN (insn); if (BARRIER_P (insn)) { @@ -2342,7 +2344,7 @@ create_cfi_notes (void) break; case NOTE_INSN_CFA_RESTORE_STATE: - cfi_insn = insn; + add_cfi_insn = insn; dwarf2out_frame_debug_restore_state (); break; } @@ -2372,13 +2374,13 @@ create_cfi_notes (void) /* Do not separate tablejump insns from their ADDR_DIFF_VEC. Putting the note after the VEC should be ok. */ - if (!tablejump_p (insn, NULL, &cfi_insn)) - cfi_insn = insn; + if (!tablejump_p (insn, NULL, &add_cfi_insn)) + add_cfi_insn = insn; dwarf2out_frame_debug (insn, true); } - cfi_insn = NULL; + add_cfi_insn = NULL; } /* Determine if we need to save and restore CFI information around this -- 1.7.6
[PATCH 3/9] dwarf2cfi: Populate CUR_ROW->REG_SAVE.
To be actually used by a subsequent patch. --- gcc/dwarf2cfi.c | 27 +-- 1 files changed, 25 insertions(+), 2 deletions(-) diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c index eb59f28..36fa7f8 100644 --- a/gcc/dwarf2cfi.c +++ b/gcc/dwarf2cfi.c @@ -285,6 +285,17 @@ add_cfi (dw_cfi_ref cfi) VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi); } +/* Perform ROW->REG_SAVE[COLUMN] = CFI. CFI may be null, indicating + that the register column is no longer saved. */ + +static void +update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi) +{ + if (VEC_length (dw_cfi_ref, row->reg_save) <= column) +VEC_safe_grow_cleared (dw_cfi_ref, gc, row->reg_save, column + 1); + VEC_replace (dw_cfi_ref, row->reg_save, column, cfi); +} + /* This function fills in aa dw_cfa_location structure from a dwarf location descriptor sequence. */ @@ -574,7 +585,13 @@ reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset) cfi->dw_cfi_oprnd2.dw_cfi_offset = offset; } else if (sreg == reg) -cfi->dw_cfi_opc = DW_CFA_same_value; +{ + /* While we could emit something like DW_CFA_same_value or +DW_CFA_restore, we never expect to see something like that +in a prologue. This is more likely to be a bug. A backend +can always bypass this by using REG_CFA_RESTORE directly. */ + gcc_unreachable (); +} else { cfi->dw_cfi_opc = DW_CFA_register; @@ -582,6 +599,7 @@ reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset) } add_cfi (cfi); + update_row_reg_save (cur_row, reg, cfi); } /* Given a SET, calculate the amount of stack adjustment it @@ -1337,6 +1355,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set) { rtx src, dest, span; dw_cfi_ref cfi = new_cfi (); + unsigned regno; dest = SET_DEST (set); src = SET_SRC (set); @@ -1347,8 +1366,10 @@ dwarf2out_frame_debug_cfa_expression (rtx set) span = targetm.dwarf_register_span (src); gcc_assert (!span); + regno = dwf_regno (src); + cfi->dw_cfi_opc = DW_CFA_expression; - cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (src); + cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno; cfi->dw_cfi_oprnd2.dw_cfi_loc = mem_loc_descriptor (XEXP (dest, 0), get_address_mode (dest), GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED); @@ -1356,6 +1377,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set) /* ??? We'd like to use queue_reg_save, were the interface different, and, as above, we could manage flushing for epilogues. */ add_cfi (cfi); + update_row_reg_save (cur_row, regno, cfi); } /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note. */ @@ -1370,6 +1392,7 @@ dwarf2out_frame_debug_cfa_restore (rtx reg) cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno; add_cfi (cfi); + update_row_reg_save (cur_row, regno, NULL); } /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE. -- 1.7.6
[pph] Add symbol table - Fix remaining asm differences (issue4732043)
This patch fixes the remaining assembly differences between non-pph and pph compiles. The idea is to make the pph compiler process functions and variables in the same order that they had been processed in the original compile. To do this, we intercept calls to rest_of_decl_compilation and allocate_struct_function. At every call, we add the declaration to a list. This list becomes the symbol table for the header file, which is then written to the end of the file. When reading the file, we read this table and present the symbols to the middle end in that order. The symbol table is written in pph_out_symtab and read in pph_in_symtab. The other changes are mostly wrapping calls to rest_of_decl_compilation so that we can add the declarations to the symbol table. This simplifies the logic we use to present symbols to the middle end as it encodes the sequence in the symbol table itself. No need to reverse lists of symbols or the other contortions we used to make. Tested on x86_64. Applied to branch. Diego. cp/ChangeLog.pph 2011-07-14 Diego Novillo * Make-lang.in (cp/decl.o): Add dependency on CXX_PPH_STREAMER_H. (cp/pph-streamer-out.o): Add dependency on CGRAPH_H. (cp/pph-streamer-in.o): Add dependency on toplev.h * cp-tree.h (cp_rest_of_decl_compilation): Declare. * call.c (set_up_extended_ref_temp): Call cp_rest_of_decl_compilation instead of rest_of_decl_compilation. * class.c (build_clone): Likewise. * decl2.c (finish_anon_union): Likewise. (maybe_emit_vtables): Likewise. (write_out_vars): Likewise. * method.c (implicitly_declare_fn): Likewise. * semantics.c (maybe_add_lambda_conv_op): Likewise. * decl.c (make_rtl_for_nonlocal_decl): Likewise. (cp_finish_decl): Likewise. (start_preparsed_function): Likewise. (cp_rest_of_decl_compilation): New. * pph-streamer-in.c: Include toplev.h (pph_in_language_function): Tidy. (pph_in_struct_function): Remove parameter DECL. Support reading shared struct functions. Read FN->DECL. (pph_register_decl_in_symtab): Remove. Update all users. (pph_register_binding_in_symtab): Remove. Update all users. (pph_in_symtab_marker): New. (pph_in_symtab): New. (pph_read_file_contents): Call it. (pph_in_function_decl): Do not call pph_in_struct_function. * pph-streamer-out.c: Include cgraph.h. (decls_to_register): New. (pph_out_chain_filtered): Remove argument REVERSE_P. Update all users. (pph_out_struct_function): Write FN->DECL. (pph_out_symtab_marker): New. (pph_out_symtab): New. (pph_write_file_contents): Call it. (pph_out_function_decl): Do not call pph_out_struct_function. (pph_add_decl_to_register): New. * pph-streamer.h (enum pph_symtab_marker): New. (struct pph_stream): Remove field FNS_TO_EXPAND. Update all users. (pph_add_decl_to_register): Declare. testsuite/ChangeLog.pph * g++.dg/pph/c1pr44948-1a.cc: Adjust expected failure. * g++.dg/pph/x0hardlookup.h: Add new symbol in global namespace. * g++.dg/pph/c4meteor-contest.cc: Mark fixed. * g++.dg/pph/c1attr-warn-unused.cc: Likewise. * g++.dg/pph/x1globalref.cc: Likewise. * g++.dg/pph/x1hardlookup.cc: Likewise. diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in index 5f3249e..9634c47 100644 --- a/gcc/cp/Make-lang.in +++ b/gcc/cp/Make-lang.in @@ -271,7 +271,7 @@ cp/decl.o: cp/decl.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \ cp/operators.def $(TM_P_H) $(TREE_INLINE_H) $(DIAGNOSTIC_H) $(C_PRAGMA_H) \ debug.h gt-cp-decl.h $(TIMEVAR_H) $(TARGET_H) $(PLUGIN_H) \ intl.h tree-iterator.h pointer-set.h $(SPLAY_TREE_H) \ - c-family/c-objc.h + c-family/c-objc.h $(CXX_PPH_STREAMER_H) cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \ output.h toplev.h $(C_COMMON_H) gt-cp-decl2.h $(CGRAPH_H) \ $(C_PRAGMA_H) $(TREE_DUMP_H) intl.h $(TARGET_H) $(GIMPLE_H) pointer-set.h \ @@ -351,8 +351,8 @@ cp/pph-streamer.o: cp/pph-streamer.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ cp/pph-streamer-out.o: cp/pph-streamer-out.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ - cppbuiltin.h tree-iterator.h + cppbuiltin.h tree-iterator.h $(CGRAPH_H) cp/pph-streamer-in.o: cp/pph-streamer-in.c $(CONFIG_H) $(SYSTEM_H) \ coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \ $(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \ - cppbuiltin.h tree-iterator.h + cppbuiltin.h tree-iterator.h toplev.h diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 56f3408..6988dc7 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -8567,7 +8567,7 @@ set_up_extended_ref_temp (tree
Re: [PATCH] New IPA-CP with real function cloning - updated version
> 2011-07-14 Martin Jambor > > * ipa-prop.h: Include alloc-pool.h, all sorts of updates to general > comments. > (ipcp_values_pool): Declare. > (ipcp_sources_pool): Likewise. > (ipcp_lattice): Changed to forward declaration. > (ipa_param_descriptor): Removed fields ipcp_lattice, types and > cannot_devirtualize. > (ipa_node_params): New fields descriptors, lattices, known_vals, > clone_for_all_contexts and node dead, removed fields params and > count_scale. > (ipa_set_param_count): Removed. > (ipa_get_param_count): Made to work with descriptors vector. > (ipa_get_param): Updated. > (ipa_param_cannot_devirtualize_p): Removed. > (ipa_param_types_vec_empty): Likewise. > (ipa_set_param_used): New function. > (ipa_get_param_used): Updated to use descriptors vector. > (ipa_func_list): Removed. > (ipa_init_func_list): Removed declaration. > (ipa_push_func_to_list_1): Likewise. > (ipa_pop_func_from_list): Likewise. > (ipa_push_func_to_list): Removed. > (ipa_lattice_from_jfunc): Remove declaration. > (ipa_get_jf_pass_through_result): Declare. > (ipa_get_jf_ancestor_result): Likewise. > (ipa_value_from_jfunc): Likewise. > (ipa_get_lattice): Update. > (ipa_lat_is_single_const): New function. > * ipa-prop.c (ipa_push_func_to_list_1): Removed. > (ipa_init_func_list): Likewise. > (ipa_pop_func_from_list): Likewise. > (ipa_get_param_decl_index): Fix coding style. > (count_formal_params): Removed. > (count_formal_params_1): Renamed to count_formal_params. > (ipa_populate_param_decls): Update to use descriptors vector. > (ipa_initialize_node_params): Likewise. > (visit_ref_for_mod_analysis): Use ipa_set_param_used. > (ipa_analyze_params_uses): Likewise. > (ipa_free_node_params_substructures): Likewise and free also lattices > and known values. > (duplicate_array): Removed. > (ipa_edge_duplication_hook): Add the new edge to the list of edge > clones. > (ipa_node_duplication_hook): Update to use new lattices. > (ipa_free_all_structures_after_ipa_cp): Free alloc pools. > (ipa_free_all_structures_after_iinln): Likewise. > (ipa_write_node_info): Update to use new lattices. > (ipa_read_node_info): Likewise. > (ipa_get_jf_pass_through_result): New function. > (ipa_get_jf_ancestor_result): Likewise. > (ipa_value_from_jfunc): Likewise. > (ipa_cst_from_jfunc): Reimplemented using ipa_value_from_jfunc. > * ipa-cp.c: Reimplemented. > * params.def (PARAM_DEVIRT_TYPE_LIST_SIZE): Removed. > (PARAM_IPA_CP_VALUE_LIST_SIZE): New parameter. > (PARAM_IPA_CP_EVAL_THRESHOLD): Likewise. > * Makefile.in (IPA_PROP_H): Added alloc-pool.h to dependencies. > > * doc/invoke.texi (devirt-type-list-size): Removed description. > (ipa-cp-value-list-size): Added description. > > * testsuite/gcc.dg/ipa/ipa-1.c: Updated testcase dump scan. > * testsuite/gcc.dg/ipa/ipa-2.c: Likewise. > * testsuite/gcc.dg/ipa/ipa-3.c: Likewise and made functions static. > * testsuite/gcc.dg/ipa/ipa-4.c: Updated testcase dump scan. > * testsuite/gcc.dg/ipa/ipa-5.c: Likewise. > * testsuite/gcc.dg/ipa/ipa-7.c: Likewise. > * testsuite/gcc.dg/ipa/ipa-8.c: Updated testcase dump scan. > * testsuite/gcc.dg/ipa/ipacost-1.c: Likewise. > * testsuite/gcc.dg/ipa/ipacost-2.c: Likewise and increased sizes > of some functions. > * testsuite/gcc.dg/ipa/ipcp-1.c: New test. > * testsuite/gcc.dg/ipa/ipcp-2.c: Likewise. > * testsuite/gcc.dg/tree-ssa/ipa-cp-1.c: Updated testcase. > > > /* ipa_node_params stores information related to formal parameters of > functions >and some other information for interprocedural passes that operate on >parameters (such as ipa-cp). */ > > struct ipa_node_params > { > /* Information about individual formal parameters that are gathered when > summaries are generated. */ > VEC (ipa_param_descriptor_t, heap) *descriptors; > /* Pointer to an array of structures describing individual formal > parameters. */ > struct ipcp_lattice *lattices; Hmm, I was under impression that the plan was to break away the stuff used by ipa-cp internally during the propagatoin stage (i.e. ipcp_orig_node/known_vals and probably lattices from the stuff used to represent parameters and jump functions, i.e. descriptors. But this can be handled incrementally. > /* FIXME: Can we clobber only the first argument of thunks? */ Well, we should know how to propagate through it. But it is not too important side case I guess untill we can devirtualize to them effectively. > if (node->alias || node->thunk.thunk_p > || ipa_is_called_with_var_arguments (info)) > disable = true; The patch is OK, thanks! It would be nice to add
[patch] Fix PR middle-end/49732
Hi, this is a regression present on mainline and 4.6 branch. The compiler crashes during gimplification because there is a COMPOUND_EXPR shared between the TYPE_SIZE and TYPE_SIZE_UNIT expressions of an array type. Now this isn't supposed to happen because we run an unsharing pass before gimplification. The problem here is that we have a forward declaration (DECL_EXPR) of a pointer type to the array type. So, during the marking phase of the unsharing pass, the array type gets marked as visited through the pointer, which prevents it from being walked during the same phase when its own DECL_EXPR is processed. This pointer/pointed-to type business is an old pattern. Five years ago, Olivier changed gimplify_type_sizes like so: 2006-10-06 Olivier Hainque * gimplify.c (gimplify_type_sizes) [POINTER_TYPE, REFERENCE_TYPE]: Don't recurse on the pointed-to type. because of a related problem. It turns out that we need the same change in the DECL_EXPR case of walk_tree_1 to fix the bug at hand, which is sort of logical as there is a strong relationship between them: case DECL_EXPR: /* If this is a TYPE_DECL, walk into the fields of the type that it's defining. We only want to walk into these fields of a type in this case and not in the general case of a mere reference to the type. The criterion is as follows: if the field can be an expression, it must be walked only here. This should be in keeping with the fields that are directly gimplified in gimplify_type_sizes in order for the mark/copy-if-shared/unmark machinery of the gimplifier to work with variable-sized types. Note that DECLs get walked as part of processing the BIND_EXPR. */ Tested on x86_64-suse-linux, OK for mainline and 4.6 branch? 2011-07-14 Eric Botcazou PR middle-end/49732 * tree.c (walk_tree_1) : Do not walk a pointed-to type. 2011-07-14 Eric Botcazou * gnat.dg/pointer_controlled.adb: New test. -- Eric Botcazou Index: tree.c === --- tree.c (revision 176261) +++ tree.c (working copy) @@ -10596,9 +10596,14 @@ walk_tree_1 (tree *tp, walk_tree_fn func if (result || !walk_subtrees) return result; - result = walk_type_fields (*type_p, func, data, pset, lh); - if (result) - return result; + /* But do not walk a pointed-to type since it may itself need to + be walked in the declaration case if it isn't anonymous. */ + if (!POINTER_TYPE_P (*type_p)) + { + result = walk_type_fields (*type_p, func, data, pset, lh); + if (result) + return result; + } /* If this is a record type, also walk the fields. */ if (RECORD_OR_UNION_TYPE_P (*type_p)) -- PR ada/49732 -- Testcase by Vorfeed Canal -- { dg-do compile } -- { dg-options "-gnato" } with Interfaces.C; use Interfaces.C; with Interfaces.C.Strings; use Interfaces.C.Strings; with Interfaces.C.Pointers; procedure Pointer_Controlled is function Create (Name : String) return size_t is type Name_String is new char_array (0 .. Name'Length); type Name_String_Ptr is access Name_String; pragma Controlled (Name_String_Ptr); Name_Str : constant Name_String_Ptr := new Name_String; Name_Len : size_t; begin To_C (Name, Name_Str.all, Name_Len); return 1; end; Test : size_t; begin Test := Create("ABC"); end;
[4.6] Fix PR tree-optimization/49725
Hi, this is the ICE at -O2 on ACATS c34005a introduced on the 4.6 branch by Martin's latest SRA patch. But it's actually the same PRE issue as: http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02210.html Bootstrapped/regtested on i586-suse-linux, OK for 4.6 branch? 2011-07-14 Eric Botcazou PR tree-optimization/49725 Backport from mainline 2011-03-31 Eric Botcazou * tree-ssa-pre.c (create_component_ref_by_pieces_1) : Drop a zero minimum index only if it is redundant -- Eric Botcazou Index: tree-ssa-pre.c === --- tree-ssa-pre.c (revision 176264) +++ tree-ssa-pre.c (working copy) @@ -2874,8 +2874,11 @@ create_component_ref_by_pieces_1 (basic_ return NULL_TREE; if (genop2) { - /* Drop zero minimum index. */ - if (tree_int_cst_equal (genop2, integer_zero_node)) + tree domain_type = TYPE_DOMAIN (TREE_TYPE (genop0)); + /* Drop zero minimum index if redundant. */ + if (integer_zerop (genop2) + && (!domain_type + || integer_zerop (TYPE_MIN_VALUE (domain_type genop2 = NULL_TREE; else {
[patch] Fix PR target/48220 for the SPARC
Hi, this adds support for DW_OP_GNU_entry_value/DW_TAG_GNU_call_site_parameter on SPARC-like architectures (architectures with register windows and explicit window save instruction). The transformation OUTGOING_REGNO -> INCOMING_REGNO is explicit for them and not tied to the call-to-subroutine instruction, so this needs to be modelled if you want precise variable tracking. Tested on SPARC/Solaris (both GCC and GDB) and x86/Linux. OK for mainline? 2011-07-14 Eric Botcazou PR target/48220 * doc/md.texi (Standard Names): Document window_save. * cfgexpand.c (expand_debug_parm_decl): New function extracted from expand_debug_expr and expand_debug_source_expr. If the target has a window_save instruction, adjust the ENTRY_VALUE_EXP. (expand_debug_expr) : Call expand_debug_parm_decl if the SSA_NAME_VAR is a parameter. (expand_debug_source_expr) : Call expand_debug_parm_decl. * var-tracking.c (parm_reg_t): New type and associated vector type. (windowed_parm_regs): New variable. (adjust_insn): If the target has a window_save instruction and this is the instruction, make its effect on parameter registers explicit. (next_non_note_insn_var_location): New function. (emit_notes_in_bb): Use it instead of NEXT_INSN throughout. (vt_add_function_parameter): If the target has a window_save insn, adjust the incoming RTL and record that in windowed_parm_regs. (vt_finalize): Free windowed_parm_regs. -- Eric Botcazou Index: cfgexpand.c === --- cfgexpand.c (revision 176072) +++ cfgexpand.c (working copy) @@ -2358,8 +2358,60 @@ convert_debug_memory_address (enum machi return x; } -/* Return an RTX equivalent to the value of the tree expression - EXP. */ +/* Return an RTX equivalent to the value of the parameter DECL. */ + +static rtx +expand_debug_parm_decl (tree decl) +{ + rtx incoming = DECL_INCOMING_RTL (decl); + + if (incoming + && GET_MODE (incoming) != BLKmode + && ((REG_P (incoming) && HARD_REGISTER_P (incoming)) + || (MEM_P (incoming) + && REG_P (XEXP (incoming, 0)) + && HARD_REGISTER_P (XEXP (incoming, 0) +{ + rtx rtl = gen_rtx_ENTRY_VALUE (GET_MODE (incoming)); + +#ifdef HAVE_window_save + /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers. + If the target machine has an explicit window save instruction, the + actual entry value is the corresponding OUTGOING_REGNO instead. */ + if (REG_P (incoming) + && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming)) + incoming + = gen_rtx_REG_offset (incoming, GET_MODE (incoming), +OUTGOING_REGNO (REGNO (incoming)), 0); + else if (MEM_P (incoming)) + { + rtx reg = XEXP (incoming, 0); + if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg)) + { + reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg))); + incoming = replace_equiv_address_nv (incoming, reg); + } + } +#endif + + ENTRY_VALUE_EXP (rtl) = incoming; + return rtl; +} + + if (incoming + && GET_MODE (incoming) != BLKmode + && !TREE_ADDRESSABLE (decl) + && MEM_P (incoming) + && (XEXP (incoming, 0) == virtual_incoming_args_rtx + || (GET_CODE (XEXP (incoming, 0)) == PLUS + && XEXP (XEXP (incoming, 0), 0) == virtual_incoming_args_rtx + && CONST_INT_P (XEXP (XEXP (incoming, 0), 1) +return incoming; + + return NULL_RTX; +} + +/* Return an RTX equivalent to the value of the tree expression EXP. */ static rtx expand_debug_expr (tree exp) @@ -3169,36 +3221,12 @@ expand_debug_expr (tree exp) if (SSA_NAME_IS_DEFAULT_DEF (exp) && TREE_CODE (SSA_NAME_VAR (exp)) == PARM_DECL) { - rtx incoming = DECL_INCOMING_RTL (SSA_NAME_VAR (exp)); - if (incoming - && GET_MODE (incoming) != BLKmode - && ((REG_P (incoming) && HARD_REGISTER_P (incoming)) - || (MEM_P (incoming) -&& REG_P (XEXP (incoming, 0)) -&& HARD_REGISTER_P (XEXP (incoming, 0) - { - op0 = gen_rtx_ENTRY_VALUE (GET_MODE (incoming)); - ENTRY_VALUE_EXP (op0) = incoming; - goto adjust_mode; - } - if (incoming - && MEM_P (incoming) - && !TREE_ADDRESSABLE (SSA_NAME_VAR (exp)) - && GET_MODE (incoming) != BLKmode - && (XEXP (incoming, 0) == virtual_incoming_args_rtx - || (GET_CODE (XEXP (incoming, 0)) == PLUS -&& XEXP (XEXP (incoming, 0), 0) - == virtual_incoming_args_rtx -&& CONST_INT_P (XEXP (XEXP (incoming, 0), - 1) - { - op0 = incoming; - goto adjust_mode; - } + op0 = expand_debug_parm_decl (SSA_NAME_VAR (exp)); + if (op0) + goto adjust_mode; op0 = expand_debug_expr (SSA_NAME_VAR (exp)); - if (!op0) - return NULL; - goto adjust_mode; + if (op0) + goto adjust_mode; } return NULL; } @@ -3327,36 +3355,14
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On Thu, Jul 14, 2011 at 23:34, Tobias Burnus wrote: > Janne Blomqvist wrote: >> >> * caf/mpi.c (caf_runtime_error): Remove "error" parameter. >> Return EXIT_FAILURE instead. >> > From the patch: >> >> /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ >> >> This is unnecessary, as the call to exit() will call the libgfortran >> destructor which will close all units, as explained in comment #3 in >> the PR. > > While I think it should be sufficient for single-processor usage, I am not > sure that that way all I/O buffers gets written before one calls > MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to > I/O. Ah, I read that comment from caf/single.c, but now I see it in the caf/mpi.c part of the patch as well. Yeah, in that case it might matter, depending on how the MPI library implements MPI_Abort(). That is, does it call exit() (in which case the destructor will be called) or will it call abort() or raise some other fatal signal. At least Open MPI seems to implement it by sending SIGTERM http://www.open-mpi.org/doc/v1.4/man3/MPI_Abort.3.php Come to think of it, in this case, if you want to use MPI_Abort() rather than MPI_Finalize() + exit(), you should probably reset the fatal signal handlers to the default one unless you want the backtrace to be printed. -- Janne Blomqvist
Re: [PATCH] New IPA-CP with real function cloning
> > Well, technically they survive until after inlining (because of > indirect inlining which also derives information from the lattices > corresponding to node->inlined_to node. Results of arithmetic > functions are not going to be accessed during inlining when compiling > any reasonable program but... Hmm, this sounds bad. We should move it to GTY then incrementally. I however though that indirect inlining looks only at jump functions, not at lattices? > > > > __attribute__ ((really_bad_attribute)) > > function (int param) > > { > > use param > > } > > > > and then let ipa-cp to invent param is a constant. > > what would be such a "really_bad_attribute" ? Well, we need to go through the attribute list and prepare list of "bad guys" instead of forbidding any attribute. Obviously we should not give up on "pure" attribute, for example. Another class of attributes are those referring to function arguments that needs updating if they are still in use after clonning. I think this was original reason for adding the check. I am not sure if we have attributes that should prevent clonning completely. > > > > > > > > can_change_sigunature will also handle apply_args. > > > > Add VA_START code there, too. For the other use of this flag (in i386) > > > > VA_START > > The last one already is VA_START... or do you mean a different one? I meant the code in ipa-inline-analysis.c doing the same checks but skiiping va_start since i386 backend tests it by itself. > > BTW currently the edges from thunks miss any profile info. > > (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate > > those > > (it is difficult ot measure count of an alias). > > > > I'm not really looking at the edges from thunks to the actual > function. OTOH, I assume that edges to a thunk do have a count and > look at that. They do have (unless they are thunk to thunk edges), but in any case we ought to regularize things here, sooner or later someone will get confused with counts missing in the callgraph. > > > If you walk only hot edges, then you need to make your function descent into > > both alias refs and thunks calls, or the aliases of thunks will not be seen > > then. > > Well, looking at bits of the patch again now, aliases to thunks might > indeed be a problem for a few pieces of it. I'll send the patch > nevertheless and ponder about this problem later. Hmm, please do :) I will look at the updated patch. Honza > > Thanks, > > Martin
Re: [build] Move crtfastmath to toplevel libgcc
On Thu, Jul 14, 2011 at 12:09 PM, Rainer Orth wrote: > Andreas Schwab writes: > >> Same on ia64: >> >> Configuration mismatch! >> Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o >> Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o >> crtfastmath.o Alpha needs the same fix. I need following patch to bootstrap the compiler: --cut here-- Index: gcc/config.gcc === --- gcc/config.gcc (revision 176282) +++ gcc/config.gcc (working copy) @@ -757,6 +757,7 @@ extra_options="${extra_options} alpha/elf.opt" target_cpu_default="MASK_GAS" tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux" + extra_parts="$extra_parts crtfastmath.o" ;; alpha*-*-freebsd*) tm_file="${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h" --cut here-- Uros.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
Janne Blomqvist wrote: * caf/mpi.c (caf_runtime_error): Remove "error" parameter. Return EXIT_FAILURE instead. > From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. While I think it should be sufficient for single-processor usage, I am not sure that that way all I/O buffers gets written before one calls MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to I/O. In terms of I/O, there are three kinds of I/O, which might need to be treated differently: * I/O to stdout (OUTPUT_UNIT): Here, all output should be collected by MPI - I am not sure whether it will come to later in a destructor * I/O to (local) files * I/O via the communication library: Here, I see the greatest problems, but that's not in F2008's coarrays, but might well be added with the Technical Report. I somehow would feel better if I could ensure that the buffers are flushed and the files closed before I pull the MPI plug (MPI_Finalize, MPI_Abort). For reference, the comment Janne is referring to is the one at the bottom (comment 3) of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43849 Tobias
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 09:48 PM, Janne Blomqvist wrote: 2011-07-14 Daniel Carrera * caf/mpi.c (caf_runtime_error): Remove "error" parameter. Return EXIT_FAILURE instead. Well, this changelog entry is incorrect; you're calling exit(EXIT_FAILURE) and not returning a value. Ok. What term should I have used? I was thinking that the program itself returns a value, but I can see that what I wrote was not optimal. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. Ok. I didn't want to mess with comments that I didn't fully understand. -- I'm not overweight, I'm undertall.
Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)
On Wed, Jul 13, 2011 at 22:39, Carrot Wei wrote: > Hi Diego > > The previous patch was done with svn merge. > > This new version is done with svnmerge.py. Again tested with Great, thanks. This simplifies regular merges, since svnmerge.py will know that this rev was merged already. I think svnmerge.py also reads 'svn merge' markers, but I'm not sure. > make check-g++ RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a > dg.exp=anon-ns1.C" > make check-g++ RUNTESTFLAGS="dg.exp=anon-ns1.C" > > BTW, there are some unexpected property changes after merge, I don't > how did they come out and how should I deal with them? Don't worry about those. The branch used to merge from multiple sources that had different attributes for those files. It will be gone after the next merge I'm doing next week. The backport is OK, of course. Diego.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On Thu, Jul 14, 2011 at 18:40, Daniel Carrera wrote: > And of course, here is the patch and ChangeLog. > > 2011-07-14 Daniel Carrera > * caf/mpi.c (caf_runtime_error): Remove "error" parameter. > Return EXIT_FAILURE instead. Well, this changelog entry is incorrect; you're calling exit(EXIT_FAILURE) and not returning a value. From the patch: /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ This is unnecessary, as the call to exit() will call the libgfortran destructor which will close all units, as explained in comment #3 in the PR. -- Janne Blomqvist
Remove NetWare support
I've got a preliminary NetWare removal patch ready (yet untested), but have a couple of questions: * Given that there's a considerable amount of NetWare support still in src, toplevel support has to stay. On the other hand, the reference in config/elf.m4 is only used for the LTO plugin and can go, I believe. * The i386 port has some code that seems to be NetWare-specific, namely KEEP_AGGREGATE_RETURN_POINTER, ix86_keep_aggregate_return_pointer and the callee_pop_aggregate_return attribute. I'm uncertain if all this can go now. * There are references to config/i386/netware.h in gcc/po/*.po. Do I need to do anything about this when netware.h is removed? Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:43 AM, Bernd Schmidt wrote: > (Although now I wonder if we could instead use one of the speculative > load instructions? There's one that sets the NaT bit if the load would > fault, isn't there? It's been so long I can't remember.) We could, but we also have to insert a check load to match. It gets very difficult to tell when it's worth it. Your example (p7) br.cond.dptk .L5 ld8 r15 = [r14] becomes ld8.sa r15 = [r14] // speculative advanced load (p7) br.cond.dptk .L5 ld8.c.clr r15 = [r14] // checked load (clear ALAT) Note that the speculative load can arbitrarily fail (e.g. tlb miss) and that the checked load can also arbitrarily re-issue the load. Note that one can't split "ld8 r14 = [r14]" without additional register allocation, because the address needs to remain live until the check. >> It does raise the question of whether we ought to completely >> change the way we represent the pairing of LTOFFX/LDXMOV >> relocations. > > This I can't answer since I don't know the definition of these. These are markers for linker relaxation. If the symbol is within range, addlx = @ltoffx(foo), gp ... ld8.mov y = [x], foo// .mov adds the LDXMOV reloc vs foo to ld8 insn will be relaxed to nop ... addly = @gprel(foo), gp The constraint in using the relocations is that every ldxmov insn must be fed by an ltoffx reloc with the same symbol, and that an ltoffx reloc cannot feed any other insn. That allows us, at link time, to not consider data flow, merely assert that if foo is relaxed anywhere, it's relaxed everywhere. r~
gimplefe patch
Hi, This is the patch for gimple front end branch. I have written this code as a part of Google Summer of Code program. The patch consist of following changes: 1. The grammar rules for variable declaration are modified to include parameters. The code for the same is in gp_parse_var_decl function. 2. New function gp_parse_function_declaration is added to recognize function declaration. 3. Few changes in the function gp_parse_decl have been made to accomodate the above code. 4. The comments have been updated to reflect the changes in the code. 5. Changelog is updated. Please find the attached patch file. I hope you find everything as per the standard -- Ketaki Tiwatne file.diff Description: Binary data
Fix a testcase
I've committed the following as obvious. This showed up as a failure with C6X. Bernd Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 176278) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,10 +1,14 @@ +2011-07-14 Bernd Schmidt + + * gcc.dg/pr48770.c: Add dg-require-effective-target fpic. + 2011-07-14 Richard Guenther PR tree-optimization/49651 * gcc.dg/torture/pr49651.c: New testcase. 2011-07-14 Georg-Johann Lay - + PR target/43746 * gcc.dg/array-quals-1.c: Don't xfail on AVR. Index: gcc/testsuite/gcc.dg/pr48770.c === --- gcc/testsuite/gcc.dg/pr48770.c (revision 176171) +++ gcc/testsuite/gcc.dg/pr48770.c (working copy) @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-effective-target fpic } */ /* { dg-options "-O -fprofile-arcs -fPIC -fno-dce -fno-forward-propagate" } */ int test_goto2 (int f)
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:39, Richard Henderson wrote: > On 07/14/2011 09:19 AM, Bernd Schmidt wrote: >> Yes, but not using the fixed got pointer in r1, but a random other >> register which can have different values in the function. > > Oh, I think I see. > > So if this really had been a PLUS, as implied by the LO_SUM, > we would have had garbage input, produced garbage output, but > (eventually) ignored the result. > > But since this really is a load from memory, the garbage > input is immediately fatal. > > Have I got that right? This is correct. > If so, the patch with the use of gen_const_mem is ok. Will commit. (Although now I wonder if we could instead use one of the speculative load instructions? There's one that sets the NaT bit if the load would fault, isn't there? It's been so long I can't remember.) > It does raise the question of whether we ought to completely > change the way we represent the pairing of LTOFFX/LDXMOV > relocations. This I can't answer since I don't know the definition of these. Bernd
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:19 AM, Bernd Schmidt wrote: > Yes, but not using the fixed got pointer in r1, but a random other > register which can have different values in the function. Oh, I think I see. So if this really had been a PLUS, as implied by the LO_SUM, we would have had garbage input, produced garbage output, but (eventually) ignored the result. But since this really is a load from memory, the garbage input is immediately fatal. Have I got that right? If so, the patch with the use of gen_const_mem is ok. It does raise the question of whether we ought to completely change the way we represent the pairing of LTOFFX/LDXMOV relocations. r~
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:29, Richard Henderson wrote: > On 07/14/2011 09:23 AM, Bernd Schmidt wrote: >> Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P >> which doesn't exist in that tree) the load stays behind the branch where >> it should be. >> > > RTX_UNCHANGING_P was the bit back then, I believe. Still ok with that also set: addl r14 = @ltoff(ap_standalone#), r1 ;; .mii ld8 r15 = [r14] addl r14 = @ltoff(.LC2), r1 ;; (p7) addl r14 = 1, r0 ;; .mib (p7) st4 [r15] = r14 nop.i 0 (p7) br.cond.dptk .L5 .mib ld8 r14 = [r14] And not ok if the MEM isn't exposed in RTL: addl r14 = @ltoff(ap_standalone#), r1 ;; .mii ld8 r15 = [r14] addl r14 = @ltoff(.LC2), r1 ;; (p7) addl r14 = 1, r0 ;; .mii (p7) st4 [r15] = r14 nop.i 0 nop.i 0 .mbb ld8 r14 = [r14] (p7) br.cond.dptk .L5 I'm attaching the 3.3 patch. Bernd Index: ../../branches/gcc-3_3-branch/gcc/sched-ebb.c === --- ../../branches/gcc-3_3-branch/gcc/sched-ebb.c (revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/sched-ebb.c (working copy) @@ -52,8 +52,7 @@ static int schedule_more_p PARAMS ((void static const char *ebb_print_insn PARAMS ((rtx, int)); static int rank PARAMS ((rtx, rtx)); static int contributes_to_priority PARAMS ((rtx, rtx)); -static void compute_jump_reg_dependencies PARAMS ((rtx, regset, regset, - regset)); +static void compute_jump_reg_dependencies PARAMS ((rtx, regset)); static void schedule_ebb PARAMS ((rtx, rtx)); /* Return nonzero if there are more insns that should be scheduled. */ @@ -161,30 +160,22 @@ contributes_to_priority (next, insn) return 1; } -/* INSN is a JUMP_INSN, COND_SET is the set of registers that are - conditionally set before INSN. Store the set of registers that - must be considered as used by this jump in USED and that of - registers that must be considered as set in SET. */ +/* INSN is a JUMP_INSN. Store the set of registers that must be considered + to be set by this jump in SET. */ static void -compute_jump_reg_dependencies (insn, cond_set, used, set) +compute_jump_reg_dependencies (insn, set) rtx insn; - regset cond_set, used, set; + regset set; { basic_block b = BLOCK_FOR_INSN (insn); edge e; for (e = b->succ; e; e = e->succ_next) -if (e->flags & EDGE_FALLTHRU) - /* The jump may be a by-product of a branch that has been merged -in the main codepath after being conditionalized. Therefore -it may guard the fallthrough block from using a value that has -conditionally overwritten that of the main codepath. So we -consider that it restores the value of the main codepath. */ - bitmap_operation (set, e->dest->global_live_at_start, cond_set, - BITMAP_AND); -else - bitmap_operation (used, used, e->dest->global_live_at_start, - BITMAP_IOR); +if ((e->flags & EDGE_FALLTHRU) == 0) + { + bitmap_operation (set, set, e->dest->global_live_at_start, + BITMAP_IOR); + } } /* Used in schedule_insns to initialize current_sched_info for scheduling Index: ../../branches/gcc-3_3-branch/gcc/emit-rtl.c === --- ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(working copy) @@ -192,6 +192,18 @@ static tree component_ref_for_mem_expr P static rtx gen_const_vector_0 PARAMS ((enum machine_mode)); static void copy_rtx_if_shared_1 PARAMS ((rtx *orig)); +/* Generate a memory referring to non-trapping constant memory. */ + +rtx +gen_const_mem (enum machine_mode mode, rtx addr) +{ + rtx mem = gen_rtx_MEM (mode, addr); + RTX_UNCHANGING_P (mem) = 1; + MEM_NOTRAP_P (mem) = 1; + return mem; +} + + /* Probability of the conditional branch currently proceeded by try_split. Set to -1 otherwise. */ int split_branch_probability = -1; Index: ../../branches/gcc-3_3-branch/gcc/sched-deps.c === --- ../../branches/gcc-3_3-branch/gcc/sched-deps.c (revision 175226) +++ ../../branches/gcc-3_3-branch/gcc/sched-deps.c (working copy) @@ -981,17 +981,12 @@ sched_analyze_insn (deps, x, insn, loop_ else { rtx pending, pending_mem; - regset_head tmp_uses, tmp_sets; - INIT_REG_SET (&tmp_uses); - INIT_REG_SET (&tmp_sets); - - (*current_sched_info->compute_jump_reg_dependencies) - (insn, &deps->reg_conditional_sets, &tmp_uses, &tmp_sets); - IOR_REG_SET (reg_pending_uses, &tmp_uses); -
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On 07/14/2011 12:29 PM, Jakub Jelinek wrote: On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote: On 07/14/2011 05:41 AM, Jakub Jelinek wrote: Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. Why not just wait until then to set it? Because the same routine that sets it is used both to create the single compilation unit early and to create CUs late (for -feliminate-dwarf2-dups already during dwarf2out_finish). Ah. OK, then. Jason
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 09:23 AM, Bernd Schmidt wrote: > Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P > which doesn't exist in that tree) the load stays behind the branch where > it should be. > RTX_UNCHANGING_P was the bit back then, I believe. r~
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote: > On 07/14/2011 05:41 AM, Jakub Jelinek wrote: > >Here is a PCH friendly variant of the patch which tries to set right > >producer from the start, but early in dwarf2out_finish double checks > >if PCH hasn't changed it and if it did, updates it back to the expected > >string. > > Why not just wait until then to set it? Because the same routine that sets it is used both to create the single compilation unit early and to create CUs late (for -feliminate-dwarf2-dups already during dwarf2out_finish). Jakub
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On 07/14/2011 05:41 AM, Jakub Jelinek wrote: Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. Why not just wait until then to set it? Jason
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:19, Bernd Schmidt wrote: > On 07/14/11 18:03, Richard Henderson wrote: >> On 07/14/2011 03:03 AM, Bernd Schmidt wrote: >>> +++ gcc/config/ia64/ia64.c (working copy) >>> @@ -1047,7 +1047,7 @@ >>>tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); >>>emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); >>> >>> - tmp = gen_rtx_LO_SUM (Pmode, dest, src); >>> + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); >> >> And the bug stems from ... what? >> >> Is this bug still fixed if you change this to gen_const_mem? > > It should be. Testing this isn't straightforward bit tricky since the > original bug is in gcc-3.3 which doesn't have gen_const_mem, and current > mainline with just the scheduler patch removed doesn't reproduce it with > the testcase. Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P which doesn't exist in that tree) the load stays behind the branch where it should be. Bernd
Re: Correct fix for scheduler bug PR11320
On 07/14/11 18:03, Richard Henderson wrote: > On 07/14/2011 03:03 AM, Bernd Schmidt wrote: >> +++ gcc/config/ia64/ia64.c (working copy) >> @@ -1047,7 +1047,7 @@ >>tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); >>emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); >> >> - tmp = gen_rtx_LO_SUM (Pmode, dest, src); >> + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); > > And the bug stems from ... what? > > Is this bug still fixed if you change this to gen_const_mem? It should be. Testing this isn't straightforward bit tricky since the original bug is in gcc-3.3 which doesn't have gen_const_mem, and current mainline with just the scheduler patch removed doesn't reproduce it with the testcase. In mainline, gen_const_mem sets MEM_NOTRAP_P, but it looks like we handle that correctly in may_trap_p: if (/* MEM_NOTRAP_P only relates to the actual position of the memory reference; moving it out of context such as when moving code when optimizing, might cause its address to become invalid. */ code_changed || !MEM_NOTRAP_P (x)) and sched_deps uses rtx_addr_can_trap anyway. > This is a load from the .got. Yes, but not using the fixed got pointer in r1, but a random other register which can have different values in the function. > It's difficult to tell if your raw gen_rtx_MEM with no aliasing > info doesn't just paper over a problem by preventing it from > being moved. The problem isn't about memory aliasing, it's about the pointer register being clobbered. Bernd
Re: Fix PR48542: reload register contents reuse crossing setjmp
Jeff Law wrote: > On 06/15/11 21:46, Hans-Peter Nilsson wrote: > > PR rtl-optimization/48542 > > * reload.c (find_equiv_reg): Stop looking when finding a > > setjmp-type call. > > * reload1.c (reload_as_needed): Invalidate all reload > > registers when crossing a setjmp-type call. > OK. > Jeff I see that this went already in, but I'm wondering why this change should be necessary. As far as register use is concerned, setjmp ought to behave just like a regular function: if a register is call-clobbered, reload will not attempt to use it across a function call (*any* function call) anyway; but if the register is call-saved, setjmp really ought to restore the old value, *both* times it returns, and so reuse ought to be allowed ... Bye, Ulrich -- Dr. Ulrich Weigand GNU Toolchain for Linux on System z and Cell BE ulrich.weig...@de.ibm.com
Re: [PATCH] New IPA-CP with real function cloning
> > > if (dec < cs->count) > > > cs->count -= dec; > > > else > > > cs->count = 0; > > > } > > > > > > if (dump_file) > > > dump_profile_updates (orig_node, new_node); > > > } > > > > > > if (node->local.can_change_signature) > > > { > > > args_to_skip = BITMAP_GGC_ALLOC (); > > > for (i = 0; i < count; i++) > > > { > > > tree t = VEC_index (tree, known_vals, i); > > > > > > if ((t && TREE_CODE (t) != TREE_BINFO) > > > || !ipa_is_param_used (info, i)) > > > bitmap_set_bit (args_to_skip, i); > > > } > > > } > > > else > > > args_to_skip = NULL; > > When we can't change signature, still we can set is_parm_unused flag for > > the callee > > to aid later optimizers. > > I assume I can re-use the node->local.can_change_signature flag? Is > that supposed to be set at any given place or can IPA-CP do it on its own? can_change_signature is currently used by i386 backend and it is set by inliner. I plan to move it to visibility pass at the time local functions are dentified. So yes, you can assume it is set and up to date at the time IPA-CP is run. Honza > > > > > Rest of patch looks OK. It definitely reads better than previous ipa-cp.c ;) > > I suppose we will need to get some experience with the logic deciding > > whether to clone.. > > > Thanks, I'll post the current version in a moment. > > Martin
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Thu, Jul 14, 2011 at 11:40 AM, Richard Guenther wrote: > Generating RTL from GIMPLE passes just to be able to use rtx_cost is, > well ... gross. Indeed. And it is one of the major blockers for fully separating the RTL, GIMPLE and target code off into separate modules. It would be great to get rid of RTL from IVOPTS too... Ciao! Steven
Re: Correct fix for scheduler bug PR11320
On 07/14/2011 03:03 AM, Bernd Schmidt wrote: > +++ gcc/config/ia64/ia64.c(working copy) > @@ -1047,7 +1047,7 @@ >tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx); >emit_insn (gen_rtx_SET (VOIDmode, dest, tmp)); > > - tmp = gen_rtx_LO_SUM (Pmode, dest, src); > + tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src); And the bug stems from ... what? Is this bug still fixed if you change this to gen_const_mem? This is a load from the .got. It's constant memory, and it's always present. There's nowhere in the function that we cannot move this load (assuming the address is computed properly) where this load will fail. It's difficult to tell if your raw gen_rtx_MEM with no aliasing info doesn't just paper over a problem by preventing it from being moved. r~
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
> > You need the target hook that tells how big the reassociation based on the > type. Machines have different numbers of functional units, for example, maybe > 3 integer units and 2 floating point units. For example, in the PowerPC, I > would set the basic integer and binary floating point types to 2 to match the > dispatch rules, but for decimal floating point, I would set it to one, since > the machines currently only have 1 decimal unit. > > With the hook, I could see eliminating the switch and/or --param altogether, > and doing only in the hook. > Hook introduced by this patch meets these requirements. But I think it is useful to have option to override the hook because you cannot tune it perfectly for each case. Ilya
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote: > On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther > wrote: > > 2011/7/14 Michael Meissner : > >> One minor note, you will need to update doc/invoke.texi to document the new > >> switch before checkin: -ftree-reassoc-width= > > > > You also need approval and somebody to review the patch before checkin. > > Btw, rather than a new target hook and an option I suggest to use > a --param which default you can modify in the backend. You need the target hook that tells how big the reassociation based on the type. Machines have different numbers of functional units, for example, maybe 3 integer units and 2 floating point units. For example, in the PowerPC, I would set the basic integer and binary floating point types to 2 to match the dispatch rules, but for decimal floating point, I would set it to one, since the machines currently only have 1 decimal unit. With the hook, I could see eliminating the switch and/or --param altogether, and doing only in the hook. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
And of course, here is the patch and ChangeLog. 2011-07-14 Daniel Carrera * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function. Use "exit(EXIT_FAILURE)". (_gfortran_caf_register): Use caf_runtime_error. (_gfortran_caf_sync_images): Use "exit(EXIT_FAILURE)". * caf/mpi.c (caf_runtime_error): Remove "error" parameter. Return EXIT_FAILURE instead. (_gfortran_caf_register): Update call to caf_runtime_error. (_gfortran_caf_sync_all): Ditto. (_gfortran_caf_sync_images): Ditto. (_gfortran_caf_error_stop_str): Use "exit(EXIT_FAILURE)". Now I'm just waiting for SVN update before I commit... On 07/14/2011 05:34 PM, Daniel Carrera wrote: Hi Tobias, As per your suggestion, I'm copying to gfotran and gcc-patches. I've made the change your asked and I'm going to commit the patch. Cheers, Daniel. On 07/14/2011 05:31 PM, Tobias Burnus wrote: On 07/14/2011 05:05 PM, Daniel Carrera wrote: +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, "Fortran runtime error."); Could you replace "." by ": " (colon, space), I think "Fortran runtime error: Could not ..." looks better than "Fortran runtime error.Could not ...". Otherwise, the patch is OK. Could you then send the patch as to fortran@ and gcc-patches@? Tobias PS: You could directly commit the modified patch, simply reply to this email, add a CC to fortran@/gcc-patches@ and attach the patch and changelog. There is no need that I approve the patch again. -- I'm not overweight, I'm undertall. Index: libgfortran/caf/single.c === --- libgfortran/caf/single.c (revision 176230) +++ libgfortran/caf/single.c (working copy) @@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI #include /* For fputs and fprintf. */ #include /* For exit and malloc. */ #include /* For memcpy and memset. */ +#include /* For variadic arguments. */ /* Define GFC_CAF_CHECK to enable run-time checking. */ /* #define GFC_CAF_CHECK 1 */ @@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI caf_static_t *caf_static_list = NULL; +/* Keep in sync with mpi.c. */ +static void +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, "Fortran runtime error: "); + va_start (ap, message); + fprintf (stderr, message, ap); + va_end (ap); + fprintf (stderr, "\n"); + + /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ + exit (EXIT_FAILURE); +} + void _gfortran_caf_init (int *argc __attribute__ ((unused)), char ***argv __attribute__ ((unused)), @@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, if (unlikely (local == NULL || token == NULL)) { + const char msg[] = "Failed to allocate coarray"; if (stat) { *stat = 1; if (errmsg_len > 0) { - const char msg[] = "Failed to allocate coarray"; int len = ((int) sizeof (msg) > errmsg_len) ? errmsg_len : (int) sizeof (msg); memcpy (errmsg, msg, len); @@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, return NULL; } else - { - fprintf (stderr, "ERROR: Failed to allocate coarray"); - exit (1); - } + caf_runtime_error (msg); } if (stat) @@ -140,7 +153,7 @@ _gfortran_caf_sync_images (int count __a { fprintf (stderr, "COARRAY ERROR: Invalid image index %d to SYNC " "IMAGES", images[i]); - exit (1); + exit (EXIT_FAILURE); } #endif Index: libgfortran/caf/mpi.c === --- libgfortran/caf/mpi.c (revision 176230) +++ libgfortran/caf/mpi.c (working copy) @@ -47,8 +47,9 @@ static int caf_is_finalized; caf_static_t *caf_static_list = NULL; +/* Keep in sync with single.c. */ static void -caf_runtime_error (int error, const char *message, ...) +caf_runtime_error (const char *message, ...) { va_list ap; fprintf (stderr, "Fortran runtime error on image %d: ", caf_this_image); @@ -59,10 +60,10 @@ caf_runtime_error (int error, const char /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ /* FIXME: Do some more effort than just MPI_ABORT. */ - MPI_Abort (MPI_COMM_WORLD, error); + MPI_Abort (MPI_COMM_WORLD, EXIT_FAILURE); /* Should be unreachable, but to make sure also call exit. */ - exit (2); + exit (EXIT_FAILURE); } @@ -179,7 +180,7 @@ error: } } else - caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : 1, msg); + caf_runtime_error (msg); } return NULL; @@ -223,7 +224,7 @@ _gfortran_caf_sync_all (int *stat, char memset (&errmsg[len], ' ', errmsg_len-len); } else - caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : ierr, msg); + caf_runtime_error (msg); } } @@ -291,7 +292,7 @@ _gfortran_caf_sync_images (int count, in memset (&errmsg[len], ' ', errmsg_
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
Hi Tobias, As per your suggestion, I'm copying to gfotran and gcc-patches. I've made the change your asked and I'm going to commit the patch. Cheers, Daniel. On 07/14/2011 05:31 PM, Tobias Burnus wrote: On 07/14/2011 05:05 PM, Daniel Carrera wrote: +caf_runtime_error (const char *message, ...) +{ + va_list ap; + fprintf (stderr, "Fortran runtime error."); Could you replace "." by ": " (colon, space), I think "Fortran runtime error: Could not ..." looks better than "Fortran runtime error.Could not ...". Otherwise, the patch is OK. Could you then send the patch as to fortran@ and gcc-patches@? Tobias PS: You could directly commit the modified patch, simply reply to this email, add a CC to fortran@/gcc-patches@ and attach the patch and changelog. There is no need that I approve the patch again. -- I'm not overweight, I'm undertall.
Re: [PATCH] New IPA-CP with real function cloning
Hi, like with the previous mail, I'll reply to the comments here and then post a new version of the patch in a separate thread. On Sun, Jul 10, 2011 at 07:04:21PM +0200, Jan Hubicka wrote: > > > > /* If checking is enabled, verify that no lattice is in the TOP state, i.e. > > not > >bottom, not containing a variable component and without any known value > > at > >the same time. */ > > > > static void > > verify_propagated_values (void) > > { > > #ifdef ENABLE_CHECKING > > Hmm, would not be better to keep this function around to be called from > debugger, like > other verifiers do? OK. > > struct cgraph_node *node; > > > > FOR_EACH_DEFINED_FUNCTION (node) > > { > > struct ipa_node_params *info = IPA_NODE_REF (node); > > int i, count = ipa_get_param_count (info); > > > > for (i = 0; i < count; i++) > > { > > struct ipcp_lattice *lat = ipa_get_lattice (info, i); > > > > if (!lat->bottom > > && !lat->contains_variable > > && lat->values_count == 0) > > { > > if (dump_file) > > { > > fprintf (dump_file, "\nIPA lattices after constant " > >"propagation:\n"); > > print_all_lattices (dump_file, true, false); > > } > > > > gcc_unreachable (); > > } > > } > > } > > #endif > > } > > > > /* Propagate values through a pass-through jump function JFUNC associated > > with > >edge CS, taking values from SRC_LAT and putting them into DEST_LAT. > > SRC_IDX > >is the index of the source parameter. */ > > > > static bool > > propagate_vals_accross_pass_through (struct cgraph_edge *cs, > > struct ipa_jump_func *jfunc, > > struct ipcp_lattice *src_lat, > > struct ipcp_lattice *dest_lat, > > int src_idx) > > { > > struct ipcp_value *src_val; > > bool ret = false; > > > > if (jfunc->value.pass_through.operation == NOP_EXPR) > > for (src_val = src_lat->values; src_val; src_val = src_val->next) > > ret |= add_value_to_lattice (dest_lat, src_val->value, cs, > >src_val, src_idx); > > else if (edge_within_scc (cs)) > > ret = set_lattice_contains_variable (dest_lat); > > Hmm, is the reason for not using artimetic within SCC documented somewhere? > It seems like code someone will eventually run into and remove without much > of consideration. I added a comment. > > > > /* Calculate devirtualization time bonus for NODE, assuming we know > > KNOWN_CSTS > >and KNOWN_BINFOS. */ > > > > static int > > devirtualization_time_bonus (struct cgraph_node *node, > > VEC (tree, heap) *known_csts, > > VEC (tree, heap) *known_binfos) > > Eventually it would be nice to make this common with inliner analysis. We > want to > increase inlining limits here too True. > > /* Only bare minimum benefit for clearly un-inlineable targets. */ > > res += 1; > > callee = cgraph_get_node (target); > > if (!callee) > > continue; > > Hmm, when you test it here, you might probably ask if callee is analyzed and > inlinable then ;) Good point, did that. > > > > /* Return true if cloning NODE is a good idea, given the estimated > > TIME_BENEFIT > >and SIZE_COST and with the sum of frequencies of incoming edges to the > >potential new clone in FREQUENCIES. */ > > > > static bool > > good_cloning_opportunity_p (struct cgraph_node *node, int time_benefit, > > int freq_sum, gcov_type count_sum, int size_cost) > > { > > if (time_benefit == 0 > > || !flag_ipa_cp_clone > > || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl))) > > return false; > > Still I think cloning oppurtunity is good if the saving on call size reduce > enough > to pay back for function body duplication. > It would probably make sense then to create simple wrapper functions instead > of > duplicating the function body. I was already doing that for considering IPA-CP opportunities for all known contexts - and the effect is more profound still now when we consider moving costs. On the other side, when opportunities for function specialization are concerned, size is always considered a cost and never a benefit when doing the calculations. This simplifies a the code a bit because the size savings are on the side of the caller and so it depends on the number of call sites that actually get redirected. That is not known when estimating size but only we did specialization decisions for all the callers and so it would need to be an extra thing stored along values. I think that since such calls should be picked up by inlining, it's not really worth the added data and complexity. But I could add it later, sure. > > > > > gcc_checking
[testsuite]: Some test case skips.
This patchlet fixes two test cases: * gcc.dg/pr32912-2.c: TImode is overkill for AVR. * gcc.dg/pr44674.c: Test if -fprofile-generate is available. Ok to commit? Johann testsuite/ * gcc.dg/pr32912-2.c: Skip for AVR. * gcc.dg/pr44674.c: Add dg-require-profiling. Index: gcc.dg/pr32912-2.c === --- gcc.dg/pr32912-2.c (revision 176136) +++ gcc.dg/pr32912-2.c (working copy) @@ -1,5 +1,6 @@ /* { dg-do run } */ /* { dg-options "-O2 -w" } */ +/* { dg-skip-if "TImode not supported" { "avr-*-*" } { "*" } { "" } } */ extern void abort (void); Index: gcc.dg/pr44674.c === --- gcc.dg/pr44674.c(revision 176136) +++ gcc.dg/pr44674.c(working copy) @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-O -fprofile-generate" } */ +/* { dg-require-profiling "-fprofile-generate" } */ void jumpfunc (void *p)
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
> 2011/7/14 Richard Guenther : >> >> But then how comes the option to override it is useful? It isn't dependent >> on the particular case. At least the option should be a --param. >> >> Richard. >> > > Option is still useful if you want to try feature on platform with no > hook implemented and for other performance experiments. I agree > --param usage should be better here. I'll fix it. > > Ilya > Here is a patch with new param instead of new option. Bootstrapped and checked on x86_64-linux. Ilya -- gcc/ 2011-07-14 Enkovich Ilya PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (tree-reassoc-width): New param documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * params.def (PARAM_TREE_REASSOC_WIDTH): New param added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option --param tree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [PATCH (1/7)] New optab framework for widening multiplies
Ping. This is the last unreviewed patch in this series ... Thanks Andrew On 09/07/11 15:43, Andrew Stubbs wrote: On 23/06/11 15:37, Andrew Stubbs wrote: This patch should have no effect on the compiler output. It merely replaces one way to represent widening operations with another, and refactors the other parts of the compiler to match. The rest of the patch set uses this new framework to implement the optimization improvements. I considered and discarded many approaches to this patch before arriving at this solution, and I feel sure that there'll be somebody out there who will think I chose the wrong one, so let me first explain how I got here The aim is to be able to encode and query optabs that have any given input mode, and any given output mode. This is similar to the convert_optab, but not compatible with that optab since it is handled completely differently in the code. (Just to be clear, the existing widening multiply support only covers instructions that widen by *one* mode, so it's only ever been necessary to know the output mode, up to now.) Option 1 was to add a second dimension to the handlers table in optab_d, but I discarded this option because it would increase the memory usage by the square of the number of modes, which is a bit much. Option 2 was to add a whole new optab, similar to optab_d, but with a second dimension like convert_optab_d, however this turned out to cause way too many pointer type mismatches in the code, and would have been very difficult to fix up. Option 3 was to add new optab entries for widening by two modes, by three modes, and so on. True, I would only need to add one extra set for what I need, but there would be so many places in the code that compare against smul_widen_optab, for example, that would need to be taught about these, that it seemed like a bad idea. Option 4 was to have a separate table that contained the widening operations, and refer to that whenever a widening entry in the main optab is referenced, but I found that there was no easy way to do the mapping without putting some sort of switch table in widening_optab_handler, and that negates the other advantages. So, what I've done in the end is add a new pointer entry "widening" into optab_d, and dynamically build the widening operations table for each optab that needs it. I've then added new accessor functions that take both input and output modes, and altered the code to use them where appropriate. The down-side of this approach is that the optab entries for widening operations now have two "handlers" tables, one of which is redundant. That said, those cases are in the minority, and it is the smaller table which is unused. If people find that very distasteful, it might be possible to remove the *_widen_optab entries and unify smul_optab with smul_widen_optab, and so on, and save space that way. I've not done so yet, but I expect I could if people feel strongly about it. As a side-effect, it's now possible for any optab to be "widening", should some target happen to have a widening add, shift, or whatever. Is this patch OK? This update has been rebaselined to fix some conflicts with other recent commits in this area. I also identified a small bug which resulted in the operands to some commutative operations being reversed. I don't believe the bug did any harm, logically speaking, but I suppose there could be a testcase that resulted in worse code being generated. With this fix, I now see exactly matching output in all my testcases. Andrew
Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)
On 07/14/2011 12:41 AM, Georg-Johann Lay wrote: >> Ok to commit and back-port to 4.6? >> >> Johann >> >> PR target/49487 >> * config/avr/avr.md (rotl3): Generate SCRATCH instead >> of REG. >> (*rotw): Use const_int_operand for operands2. >> Use match_scatch for operands3. >> (*rotb): Ditto >> * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH. Ok. r~
[PATCH] Fix PR49651
This fixes PR49651 where we fail to handle aggregate dereferences in call arguments in PTA properly. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied where applicable. Richard. 2011-07-14 Richard Guenther PR tree-optimization/49651 * tree-ssa-structalias.c (get_constraint_for_1): Properly handle dereferences with subvariables. * gcc.dg/torture/pr49651.c: New testcase. Index: gcc/tree-ssa-structalias.c === *** gcc/tree-ssa-structalias.c (revision 176272) --- gcc/tree-ssa-structalias.c (working copy) *** get_constraint_for_1 (tree t, VEC (ce_s, *** 3349,3357 /* If we are not taking the address then make sure to process all subvariables we might access. */ cs = *VEC_last (ce_s, *results); ! if (address_p ! || cs.type != SCALAR) return; vi = get_varinfo (cs.var); --- 3349,3366 /* If we are not taking the address then make sure to process all subvariables we might access. */ + if (address_p) + return; + cs = *VEC_last (ce_s, *results); ! if (cs.type == DEREF) ! { ! /* For dereferences this means we have to defer it !to solving time. */ ! VEC_last (ce_s, *results)->offset = UNKNOWN_OFFSET; ! return; ! } ! if (cs.type != SCALAR) return; vi = get_varinfo (cs.var); Index: gcc/testsuite/gcc.dg/torture/pr49651.c === *** gcc/testsuite/gcc.dg/torture/pr49651.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr49651.c (revision 0) *** *** 0 --- 1,31 + /* { dg-do run } */ + + extern void abort (void); + + struct X { + int *p; + int *q; + }; + + void __attribute__((noinline, noclone)) + foo (struct X x) { *x.q = 0; } + + volatile int what; + struct X y; + + int main() + { + int i, j; + struct X x, *p; + x.p = &i; + x.q = &j; + if (what) + p = &y; + else + p = &x; + j = 1; + foo (*p); + if (j != 0) + abort (); + return 0; + }
Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
On Thu, Jul 14, 2011 at 4:38 PM, Andrew Stubbs wrote: > On 28/06/11 17:23, Andrew Stubbs wrote: >> >> On 23/06/11 15:43, Andrew Stubbs wrote: >>> >>> Patch 4 introduced support for using signed multiplies to code unsigned >>> multiplies in a narrower mode. Patch 5 then introduced support for >>> mis-matched input modes. >>> >>> These two combined mean that there is case where only the smaller of two >>> inputs is unsigned, and yet it still tries to user a mode wider than the >>> larger, signed input. This is bad because it means unnecessary extends >>> and because the wider operation might not exist. >>> >>> This patch catches that case, and ensures that the smaller, unsigned >>> input, is zero-extended to match the mode of the larger, signed input. >>> >>> Of course, both inputs may still have to be extended to fit the nearest >>> available instruction, so it doesn't make a difference every time. >>> >>> OK? >> >> This update fixes Janis' issue with the testsuite. > > And this version is updated to fit the changes made earlier in the series, > and also to use the precision, instead of the mode-size, in order to better > optimize bitfields. > > OK? Ok. Thanks, Richard. > Andrew >
Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode
On 28/06/11 17:23, Andrew Stubbs wrote: On 23/06/11 15:43, Andrew Stubbs wrote: Patch 4 introduced support for using signed multiplies to code unsigned multiplies in a narrower mode. Patch 5 then introduced support for mis-matched input modes. These two combined mean that there is case where only the smaller of two inputs is unsigned, and yet it still tries to user a mode wider than the larger, signed input. This is bad because it means unnecessary extends and because the wider operation might not exist. This patch catches that case, and ensures that the smaller, unsigned input, is zero-extended to match the mode of the larger, signed input. Of course, both inputs may still have to be extended to fit the nearest available instruction, so it doesn't make a difference every time. OK? This update fixes Janis' issue with the testsuite. And this version is updated to fit the changes made earlier in the series, and also to use the precision, instead of the mode-size, in order to better optimize bitfields. OK? Andrew 2011-06-24 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_mult_to_widen): Better handle unsigned inputs of different modes. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-9.c: New file. * gcc.target/arm/wmul-bitfield-2.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, short *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "smlalbb" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +struct bf +{ + int a : 3; + unsigned int b : 15; + int c : 3; +}; + +long long +foo (long long a, struct bf b, struct bf c) +{ + return a + b.b * c.c; +} + +/* { dg-final { scan-assembler "smlalbb" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2121,9 +2121,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) { if (op != smul_widen_optab) { - from_mode = GET_MODE_WIDER_MODE (from_mode); - if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode)) - return false; + /* We can use a signed multiply with unsigned types as long as + there is a wider mode to use, or it is the smaller of the two + types that is unsigned. Note that type1 >= type2, always. */ + if ((TYPE_UNSIGNED (type1) + && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode)) + || (TYPE_UNSIGNED (type2) + && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode))) + { + from_mode = GET_MODE_WIDER_MODE (from_mode); + if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode)) + return false; + } op = smul_widen_optab; handler = find_widening_optab_handler_and_mode (op, to_mode, @@ -2290,14 +2299,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, /* There's no such thing as a mixed sign madd yet, so use a wider mode. */ if (from_unsigned1 != from_unsigned2) { - enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode); - if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode)) + /* We can use a signed multiply with unsigned types as long as + there is a wider mode to use, or it is the smaller of the two + types that is unsigned. Note that type1 >= type2, always. */ + if ((from_unsigned1 + && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode)) + || (from_unsigned2 + && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode))) { - from_mode = mode; - from_unsigned1 = from_unsigned2 = false; + from_mode = GET_MODE_WIDER_MODE (from_mode); + if (GET_MODE_SIZE (from_mode) >= GET_MODE_SIZE (to_mode)) + return false; } - else - return false; + + from_unsigned1 = from_unsigned2 = false; } /* If there was a conversion between the multiply and addition
Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
On Thu, Jul 14, 2011 at 4:34 PM, Andrew Stubbs wrote: > On 07/07/11 11:13, Richard Guenther wrote: >>> >>> This updates the context changed by my update to patch 3. >>> > >>> > The content of this patch has not changed. >> >> Ok. > > I know this patch was already approved, but I discovered a bug in this patch > that missed optimizing the case where the input to multiply did not come > from an assign statement (this can happen when the value comes from a > function parameter). > > This patch fixes that case, and updates the context changed by my updates > earlier in the patch series. > > OK? Ok. Thanks, Richard. > Andrew >
Re: [PATCH][1/N][C][C++][Fortran][Java] Change POINTER_PLUS_EXPR offset type requirements
On 07/14/2011 04:28 AM, Richard Guenther wrote: > On Tue, 12 Jul 2011, Richard Guenther wrote: > >> >> This patch is step 1, it abstracts away the type of the offset operand >> when building a POINTER_PLUS_EXPR. It does so by introducing >> fold_build_pointer_plus_expr{_hwi,_loc} helpers and use them in all >> places that currently build POINTER_PLUS_EXPRs (including those >> that do not fold the result). This patch will make steps 2 and more >> purely middle-end (fingers crossed). >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages. >> >> I also built cc1 for all supported targets via contrib/config-list.mk >> and saw no fails that relate to this patch. >> >> Are the frontend parts ok? > > Ping. Joseph seems to be on vacation - Richard, can you approve > the C frontend/family parts? Yeah, this patch is ok. r~
Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching
On 07/07/11 11:13, Richard Guenther wrote: This updates the context changed by my update to patch 3. > > The content of this patch has not changed. Ok. I know this patch was already approved, but I discovered a bug in this patch that missed optimizing the case where the input to multiply did not come from an assign statement (this can happen when the value comes from a function parameter). This patch fixes that case, and updates the context changed by my updates earlier in the patch series. OK? Andrew 2011-07-14 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'rhs'. Don't reject non-conversion statements. Do return lhs in this case. (is_widening_mult_p): Add new argument 'type'. Use 'type' from caller, not inferred from 'stmt'. Pass type to is_widening_mult_rhs_p. (convert_mult_to_widen): Pass type to is_widening_mult_p. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-8.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-8.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, int *b, int *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "smlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -1965,7 +1965,8 @@ struct gimple_opt_pass pass_optimize_bswap = } }; -/* Return true if RHS is a suitable operand for a widening multiplication. +/* Return true if RHS is a suitable operand for a widening multiplication, + assuming a target type of TYPE. There are two cases: - RHS makes some value at least twice as wide. Store that value @@ -1975,32 +1976,43 @@ struct gimple_opt_pass pass_optimize_bswap = but leave *TYPE_OUT untouched. */ static bool -is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) +is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out, + tree *new_rhs_out) { gimple stmt; - tree type, type1, rhs1; + tree type1, rhs1; enum tree_code rhs_code; if (TREE_CODE (rhs) == SSA_NAME) { - type = TREE_TYPE (rhs); stmt = SSA_NAME_DEF_STMT (rhs); if (!is_gimple_assign (stmt)) - return false; - - rhs_code = gimple_assign_rhs_code (stmt); - if (TREE_CODE (type) == INTEGER_TYPE - ? !CONVERT_EXPR_CODE_P (rhs_code) - : rhs_code != FIXED_CONVERT_EXPR) - return false; + { + rhs1 = NULL; + type1 = TREE_TYPE (rhs); + } + else + { + rhs1 = gimple_assign_rhs1 (stmt); + type1 = TREE_TYPE (rhs1); + } - rhs1 = gimple_assign_rhs1 (stmt); - type1 = TREE_TYPE (rhs1); if (TREE_CODE (type1) != TREE_CODE (type) || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type)) return false; - *new_rhs_out = rhs1; + if (rhs1) + { + rhs_code = gimple_assign_rhs_code (stmt); + if (TREE_CODE (type) == INTEGER_TYPE + ? !CONVERT_EXPR_CODE_P (rhs_code) + : rhs_code != FIXED_CONVERT_EXPR) + *new_rhs_out = rhs; + else + *new_rhs_out = rhs1; + } + else + *new_rhs_out = rhs; *type_out = type1; return true; } @@ -2015,28 +2027,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out) return false; } -/* Return true if STMT performs a widening multiplication. If so, - store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT - respectively. Also fill *RHS1_OUT and *RHS2_OUT such that converting - those operands to types *TYPE1_OUT and *TYPE2_OUT would give the - operands of the multiplication. */ +/* Return true if STMT performs a widening multiplication, assuming the + output type is TYPE. If so, store the unwidened types of the operands + in *TYPE1_OUT and *TYPE2_OUT respectively. Also fill *RHS1_OUT and + *RHS2_OUT such that converting those operands to types *TYPE1_OUT + and *TYPE2_OUT would give the operands of the multiplication. */ static bool -is_widening_mult_p (gimple stmt, +is_widening_mult_p (tree type, gimple stmt, tree *type1_out, tree *rhs1_out, tree *type2_out, tree *rhs2_out) { - tree type; - - type = TREE_TYPE (gimple_assign_lhs (stmt)); if (TREE_CODE (type) != INTEGER_TYPE && TREE_CODE (type) != FIXED_POINT_TYPE) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out, + rhs1_out)) return false; - if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out)) + if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out, + rhs2_out)) return false; if (*type1_out == NULL) @@ -2088,7 +2099,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) if (TREE_CODE (type) != INTEGER_TYPE) return false; - if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2)) + if (!is_widening_m
Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
On Thu, Jul 14, 2011 at 4:28 PM, Andrew Stubbs wrote: > I've updated this patch following the changes earlier in the patch series. > There isn't much left. > > This should obviate all the review comments. :) Indeed ;) > OK? Ok. Thanks, Richard. > Andrew >
Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs
I've updated this patch following the changes earlier in the patch series. There isn't much left. This should obviate all the review comments. :) OK? Andrew 2011-07-14 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME. Ensure the the larger type is the first operand. gcc/testsuite/ * gcc.target/arm/wmul-7.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-7.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +unsigned long long +foo (unsigned long long a, unsigned char *b, unsigned short *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2053,9 +2053,17 @@ is_widening_mult_p (gimple stmt, *type2_out = *type1_out; } - /* FIXME: remove this restriction. */ - if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out)) -return false; + /* Ensure that the larger of the two operands comes first. */ + if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out)) +{ + tree tmp; + tmp = *type1_out; + *type1_out = *type2_out; + *type2_out = tmp; + tmp = *rhs1_out; + *rhs1_out = *rhs2_out; + *rhs2_out = tmp; +} return true; }
Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
On Thu, Jul 14, 2011 at 4:23 PM, Andrew Stubbs wrote: > On 12/07/11 15:07, Andrew Stubbs wrote: >> >> This update does the same thing as before, but updated for the changes >> earlier in the patch series. In particular, the build_and_insert_cast >> function and find_widening_optab_handler_and_mode changes have been >> moved up to patch 2. > > And this update changes the way the casts are handled, partly because it got > unwieldy towards the end of the patch series, and partly because I found a > few bugs. > > I've also ensured that it checks the precision of the types, rather than the > mode size to ensure that it is bitfield safe. > > OK? Ok. Thanks, Richard. > Andrew >
Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies
On 12/07/11 15:07, Andrew Stubbs wrote: This update does the same thing as before, but updated for the changes earlier in the patch series. In particular, the build_and_insert_cast function and find_widening_optab_handler_and_mode changes have been moved up to patch 2. And this update changes the way the casts are handled, partly because it got unwieldy towards the end of the patch series, and partly because I found a few bugs. I've also ensured that it checks the precision of the types, rather than the mode size to ensure that it is bitfield safe. OK? Andrew 2011-07-14 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_mult_to_widen): Convert unsupported unsigned multiplies to signed. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-6.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-6.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, unsigned char *b, signed char *c) +{ + return a + (long long)*b * (long long)*c; +} + +/* { dg-final { scan-assembler "smlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2067,12 +2067,13 @@ is_widening_mult_p (gimple stmt, static bool convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) { - tree lhs, rhs1, rhs2, type, type1, type2, tmp; + tree lhs, rhs1, rhs2, type, type1, type2, tmp = NULL; enum insn_code handler; enum machine_mode to_mode, from_mode, actual_mode; optab op; int actual_precision; location_t loc = gimple_location (stmt); + bool from_unsigned1, from_unsigned2; lhs = gimple_assign_lhs (stmt); type = TREE_TYPE (lhs); @@ -2084,10 +2085,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) to_mode = TYPE_MODE (type); from_mode = TYPE_MODE (type1); + from_unsigned1 = TYPE_UNSIGNED (type1); + from_unsigned2 = TYPE_UNSIGNED (type2); - if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2)) + if (from_unsigned1 && from_unsigned2) op = umul_widen_optab; - else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2)) + else if (!from_unsigned1 && !from_unsigned2) op = smul_widen_optab; else op = usmul_widen_optab; @@ -2096,22 +2099,45 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi) 0, &actual_mode); if (handler == CODE_FOR_nothing) -return false; +{ + if (op != smul_widen_optab) + { + from_mode = GET_MODE_WIDER_MODE (from_mode); + if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode)) + return false; + + op = smul_widen_optab; + handler = find_widening_optab_handler_and_mode (op, to_mode, + from_mode, 0, + &actual_mode); + + if (handler == CODE_FOR_nothing) + return false; + + from_unsigned1 = from_unsigned2 = false; + } + else + return false; +} /* Ensure that the inputs to the handler are in the correct precison for the opcode. This will be the full mode size. */ actual_precision = GET_MODE_PRECISION (actual_mode); - if (actual_precision != TYPE_PRECISION (type1)) + if (actual_precision != TYPE_PRECISION (type1) + || from_unsigned1 != TYPE_UNSIGNED (type1)) { tmp = create_tmp_var (build_nonstandard_integer_type -(actual_precision, TYPE_UNSIGNED (type1)), +(actual_precision, from_unsigned1), NULL); rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1); - +} + if (actual_precision != TYPE_PRECISION (type2) + || from_unsigned2 != TYPE_UNSIGNED (type2)) +{ /* Reuse the same type info, if possible. */ - if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) + if (!tmp || from_unsigned1 != from_unsigned2) tmp = create_tmp_var (build_nonstandard_integer_type -(actual_precision, TYPE_UNSIGNED (type2)), +(actual_precision, from_unsigned2), NULL); rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2); } @@ -2136,7 +2162,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; - tree type, type1, type2, tmp; + tree type, type1, type2, optype, tmp = NULL; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; optab this_optab; @@ -2145,6 +2171,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum machine_mode to_mode, from_mode, actual_mode; location_t loc = gimple_location (stmt); int actual_precision; + bool from_unsigned1, from_unsigned2; lhs = gimple_assign_lhs (stmt); type = TREE_TYPE (lhs); @@ -2238,9 +2265,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, to_mode = TYPE_MODE (type); from_mode = TYPE_MODE (type1); + from_unsigned1 = TYPE_UNSIGNED (type1); + from_unsigned2 = TYPE_UNSIGNED (type2); - if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) -ret
Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching
This update changes only the context modified by changes to patch 2. The patch has already been approved. I'm just posting it for completeness. Andrew 2011-07-14 Andrew Stubbs gcc/ * tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single conversion statement separating multiply-and-accumulate. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; +return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, enum tree_code code) { gimple rhs1_stmt = NULL, rhs2_stmt = NULL; + gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt; tree type, type1, type2, tmp; tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs; enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK; @@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, else return false; + /* Allow for one conversion statement between the multiply + and addition/subtraction statement. If there are more than + one conversions then we assume they would invalidate this + transformation. If that's not the case then they should have + been folded before now. */ + if (CONVERT_EXPR_CODE_P (rhs1_code)) +{ + conv1_stmt = rhs1_stmt; + rhs1 = gimple_assign_rhs1 (rhs1_stmt); + if (TREE_CODE (rhs1) == SSA_NAME) + { + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + } + else + return false; +} + if (CONVERT_EXPR_CODE_P (rhs2_code)) +{ + conv2_stmt = rhs2_stmt; + rhs2 = gimple_assign_rhs1 (rhs2_stmt); + if (TREE_CODE (rhs2) == SSA_NAME) + { + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); + } + else + return false; +} + /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call is_widening_mult_p, but we still need the rhs returns. @@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs2; + conv_stmt = conv1_stmt; } else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR) { @@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, &type2, &mult_rhs2)) return false; add_rhs = rhs1; + conv_stmt = conv2_stmt; } else return false; @@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2)) return false; + /* If there was a conversion between the multiply and addition + then we need to make sure it fits a multiply-and-accumulate. + The should be a single mode change which does not change the + value. */ + if (conv_stmt) +{ + tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt)); + tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt)); + int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2); + bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2); + + if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type)) + { + /* Conversion is a truncate. */ + if (TYPE_PRECISION (to_type) < data_size) + return false; + } + else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type)) + { + /* Conversion is an extend. Check it's the right sort. */ + if (TYPE_UNSIGNED (from_type) != is_unsigned + && !(is_unsigned && TYPE_PRECISION (from_type) > data_size)) + return false; + } + /* else convert is a no-op for our purposes. */ +} + /* Verify that the machine can perform a widening multiply accumulate in this mode/signedness combination, otherwise this transformation is likely to pessimize code. */
Re: [PATCH (2/7)] Widening multiplies by more than one mode
On Thu, Jul 14, 2011 at 4:10 PM, Andrew Stubbs wrote: > On 12/07/11 12:04, Richard Guenther wrote: >> >> I wonder if we want to restrict the WIDEN_* operations to operate >> on types that have matching type/mode precision(**). > > I've now modified the patch to allow bitfields, or other case where the > precision is smaller than the mode-size. I've also addressed the formatting > issues you pointed out (and in fact reorganised the code slightly to make > the rest of the series a bit cleaner). > > As in the previous version of this patch, it's necessary to convert the > input values to the proper mode for the machine instruction, so the basic > tools for supporting the bitfields were already there - I just had to tweak > the conditionals to take bitfields into account. > > The only this I haven't done is modified tree.def. Looking at it though, I > don't thing any needs changing? The code is still valid, and the comments > are correct (in fact, they may have been wrong before). Ah, it indeed talks about at least twice the precision already. > Is this version OK? Ok. Thanks, Richard. > Andrew >
Re: [PATCH (2/7)] Widening multiplies by more than one mode
On 12/07/11 12:04, Richard Guenther wrote: I wonder if we want to restrict the WIDEN_* operations to operate on types that have matching type/mode precision(**). I've now modified the patch to allow bitfields, or other case where the precision is smaller than the mode-size. I've also addressed the formatting issues you pointed out (and in fact reorganised the code slightly to make the rest of the series a bit cleaner). As in the previous version of this patch, it's necessary to convert the input values to the proper mode for the machine instruction, so the basic tools for supporting the bitfields were already there - I just had to tweak the conditionals to take bitfields into account. The only this I haven't done is modified tree.def. Looking at it though, I don't thing any needs changing? The code is still valid, and the comments are correct (in fact, they may have been wrong before). Is this version OK? Andrew 2011-07-14 Andrew Stubbs gcc/ * config/arm/arm.md (maddhidi4): Remove '*' from name. * expr.c (expand_expr_real_2): Use find_widening_optab_handler. * optabs.c (find_widening_optab_handler_and_mode): New function. (expand_widen_pattern_expr): Use find_widening_optab_handler. (expand_binop_directly): Likewise. (expand_binop): Likewise. * optabs.h (find_widening_optab_handler): New macro define. (find_widening_optab_handler_and_mode): New prototype. * tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR type precision rules. (verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR. * tree-ssa-math-opts.c (build_and_insert_cast): New function. (is_widening_mult_rhs_p): Allow widening by more than one mode. Explicitly disallow mis-matched input types. (convert_mult_to_widen): Use find_widening_optab_handler, and cast input types to fit the new handler. (convert_plusminus_to_widen): Likewise. gcc/testsuite/ * gcc.target/arm/wmul-bitfield-1.c: New file. --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -1857,7 +1857,7 @@ (set_attr "predicable" "yes")] ) -(define_insn "*maddhidi4" +(define_insn "maddhidi4" [(set (match_operand:DI 0 "s_register_operand" "=r") (plus:DI (mult:DI (sign_extend:DI --- a/gcc/expr.c +++ b/gcc/expr.c @@ -7638,19 +7638,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, { enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0)); this_optab = usmul_widen_optab; - if (mode == GET_MODE_2XWIDER_MODE (innermode)) + if (find_widening_optab_handler (this_optab, mode, innermode, 0) + != CODE_FOR_nothing) { - if (widening_optab_handler (this_optab, mode, innermode) - != CODE_FOR_nothing) - { - if (TYPE_UNSIGNED (TREE_TYPE (treeop0))) - expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, - EXPAND_NORMAL); - else - expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0, - EXPAND_NORMAL); - goto binop3; - } + if (TYPE_UNSIGNED (TREE_TYPE (treeop0))) + expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, + EXPAND_NORMAL); + else + expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0, + EXPAND_NORMAL); + goto binop3; } } /* Check for a multiplication with matching signedness. */ @@ -7665,10 +7662,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab; this_optab = zextend_p ? umul_widen_optab : smul_widen_optab; - if (mode == GET_MODE_2XWIDER_MODE (innermode) - && TREE_CODE (treeop0) != INTEGER_CST) + if (TREE_CODE (treeop0) != INTEGER_CST) { - if (widening_optab_handler (this_optab, mode, innermode) + if (find_widening_optab_handler (this_optab, mode, innermode, 0) != CODE_FOR_nothing) { expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, @@ -7677,7 +7673,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode, unsignedp, this_optab); return REDUCE_BIT_FIELD (temp); } - if (widening_optab_handler (other_optab, mode, innermode) + if (find_widening_optab_handler (other_optab, mode, innermode, 0) != CODE_FOR_nothing && innermode == word_mode) { --- a/gcc/optabs.c +++ b/gcc/optabs.c @@ -225,6 +225,37 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1) return 1; } +/* Find a widening optab even if it doesn't widen as much as we want. + E.g. if from_mode is HImode, and to_mode is DImode, and there is no + direct HI->SI insn, then return SI->DI, if that exists. + If PERMIT_NON_WIDENING is non-zero then this can be used with + non-widening optabs also. */ + +enum insn_code +find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode, + enum machine_mode from_mode, + int permit_non_widening, + enum machine_mode *found_mode) +{ + for (; (permit_non_widening || from_mode !=
Re: [PATCH] New IPA-CP with real function cloning
Hi, I'll send a new version of IPA-CP incorporating most of the feedback in a new thread but let me also comment on some of the points here: On Fri, Jul 08, 2011 at 08:24:31PM +0200, Jan Hubicka wrote: > > > > { > > > > /* Pointer to an array of structures describing individual formal > > > > parameters. */ > > > > struct ipcp_lattice *lattices; > > > > > > Hmm, how we get here around the need to mark this GTY(). I.e are we sure > > > that all the known_vals > > > must be referneced from elsewhere at ggc time? > > > > (Scalar) constants that are results of arithmetic jump functions may > > not be referenced from elsewhere, everything else is referenced from > > the jump functions. If it is a problem it is already present in the > > current IPA-CP. ipa_node_params and lattices are not GTYed there > > either. > > Hmm, I guess it is not really problem only because the lattices are used > only in ipa-cp so the values do not really live across GGC call. Well, technically they survive until after inlining (because of indirect inlining which also derives information from the lattices corresponding to node->inlined_to node. Results of arithmetic functions are not going to be accessed during inlining when compiling any reasonable program but... > > > > > > > > /* ipa_edge_args stores information related to a callsite and > > > > particularly its > > > >arguments. It can be accessed by the IPA_EDGE_REF macro. */ > > > > typedef struct GTY(()) ipa_edge_args > > > > > > probably edge_summary would be my preferred name. > > > > Ugh, this is the current name, we may change it later. In any event > > the name should probably tell that the summary is about parameters. > > Hmm, OK, it is not bad name after all. :-) > > > > > > > || !inline_summary (node)->versionable > > > > || TYPE_ATTRIBUTES (TREE_TYPE (node->decl)) > > > > > > This is getting an issue for Fortran that attach the function arguments > > > quite commonly, > > > perhaps we should start moving ahead on this and ruling out only the > > > arguments that > > > gets can't be preserved. > > > > Yes, for example in 433.milc basically all the functions are > > considered not versionable because of this. I also thought of not > > removing parameters for such functions. > > > > > Also you need to test attributes of DECL itself. > > > > Really? We are not doing it now, nether for IPA-CP nor for IPA-SRA > > (which is good at triggering problems with these). > > Hmm, all the ipa-inline code checks DECL_ATTRIBUTES only. I believe the type > attributes ends up being copied to decl attributes but not vice versa. > I guess the testcase should be > > __attribute__ ((really_bad_attribute)) > function (int param) > { > use param > } > > and then let ipa-cp to invent param is a constant. what would be such a "really_bad_attribute" ? > > > > > > > > I think this all should be part of can_change_signature_p code, since we > > > can version > > > with all attributes I can thunk of when we don't change signature. > > > > > > > || cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE) > > > > res = false; > > > > else > > > > /* Removing arguments doesn't work if the function takes varargs > > > >or use __builtin_apply_args. > > > >FIXME: handle this together with can_change_signature flag. */ > > > > for (edge = node->callees; edge; edge = edge->next_callee) > > > > { > > > > tree t = edge->callee->decl; > > > > if (DECL_BUILT_IN_CLASS (t) == BUILT_IN_NORMAL > > > > && (DECL_FUNCTION_CODE (t) == BUILT_IN_APPLY_ARGS > > > > || DECL_FUNCTION_CODE (t) == BUILT_IN_VA_START)) > > > > > > can_change_sigunature will also handle apply_args. > > > Add VA_START code there, too. For the other use of this flag (in i386) > > > VA_START The last one already is VA_START... or do you mean a different one? > > > rules out the change, too, but so far we explicitely tested that in the > > > backend. > > > > > > Iguess with these changes, inline_summary (node)->versionable should > > > coincide > > > with IPA_NODE_REF (node)->node_versionable making this whole code > > > unnecesary > > > (it was supposed to work this way, I am not sure why we now have to > > > versionable > > > flags). > > > > That would be nice. I was wondering why the difference between the > > two. Again, I am yet to decide whether this should be done as a > > followup or within this change. > > OK. BTW, it will be a followup change. > > > If you use for_node_thunk_and_aliases you can remove the recursion you do > > > above > > > and it will work for aliases of thunk correctly, too ;) > > > > But I wouldn't be able to check the edge for hotness. > > BTW currently the edges from thunks miss any profile info. > (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate > those > (it is difficult ot measure count of an alias). > I'm
Re: [Patch, Fortran] Mark "token" of static coarrays "restrict"
For what it's worth, it compiles and works on my 32bit Linux too. But this is such a simple patch it *had* to work. On 07/14/2011 11:01 AM, Tobias Burnus wrote: As the title says: Mark "token" of static coarrays "restrict" Bootstrapped and regtested on x86-64-linux. OK for the trunk? Tobias -- I'm not overweight, I'm undertall.
Re: Correct fix for scheduler bug PR11320
> ??? Original code: > >basic_block b = BLOCK_FOR_INSN (insn); > edge e; > for (e = b->succ; e; e = e->succ_next) > ! if ((e->flags & EDGE_FALLTHRU) == 0) > ! { > ! bitmap_operation (set, set, e->dest->global_live_at_start, > ! BITMAP_IOR); > ! } > } > > Code after the revert: > >FOR_EACH_EDGE (e, ei, b->succs) > +if ((e->flags & EDGE_FALLTHRU) == 0) >bitmap_ior_into (used, df_get_live_in (e->dest)); > > As far as I can tell these are identical, modulo the change in variable > name ("set" -> "used" which seems like a better name). Yes, the code does the same thing, but the original patch did clear up the confusion set/use in sched_analyze_insn and compute_jump_reg_dependencies, in particular in the comment of the latter. But OK, never mind. -- Eric Botcazou
[PATCH] More efficiently canoncialize operands in fold_stmt
We currently go through a tree expression generation when canonicalizing operand order. That's quite wasteful. The following patch re-orders things in a way to avoid this and remove some duplicate code. Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-07-14 Richard Guenther * gimple-fold.c (fold_gimple_assign): Remove operand swapping. (fold_stmt_1): Do it here directly on gimple and as a first thing. Index: gcc/gimple-fold.c === *** gcc/gimple-fold.c (revision 176267) --- gcc/gimple-fold.c (working copy) *** fold_gimple_assign (gimple_stmt_iterator *** 817,842 if (!result) result = fold_binary_loc (loc, subcode, ! TREE_TYPE (gimple_assign_lhs (stmt)), ! gimple_assign_rhs1 (stmt), ! gimple_assign_rhs2 (stmt)); if (result) { STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; - - /* Fold might have produced non-GIMPLE, so if we trust it blindly -we lose canonicalization opportunities. Do not go again -through fold here though, or the same non-GIMPLE will be -produced. */ - if (commutative_tree_code (subcode) - && tree_swap_operands_p (gimple_assign_rhs1 (stmt), -gimple_assign_rhs2 (stmt), false)) - return build2 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)), -gimple_assign_rhs2 (stmt), -gimple_assign_rhs1 (stmt)); } break; --- 817,831 if (!result) result = fold_binary_loc (loc, subcode, ! TREE_TYPE (gimple_assign_lhs (stmt)), ! gimple_assign_rhs1 (stmt), ! gimple_assign_rhs2 (stmt)); if (result) { STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; } break; *** fold_gimple_assign (gimple_stmt_iterator *** 852,869 STRIP_USELESS_TYPE_CONVERSION (result); if (valid_gimple_rhs_p (result)) return result; - - /* Fold might have produced non-GIMPLE, so if we trust it blindly -we lose canonicalization opportunities. Do not go again -through fold here though, or the same non-GIMPLE will be -produced. */ - if (commutative_ternary_tree_code (subcode) - && tree_swap_operands_p (gimple_assign_rhs1 (stmt), -gimple_assign_rhs2 (stmt), false)) - return build3 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)), - gimple_assign_rhs2 (stmt), - gimple_assign_rhs1 (stmt), - gimple_assign_rhs3 (stmt)); } break; --- 841,846 *** fold_stmt_1 (gimple_stmt_iterator *gsi, *** 1576,1583 case GIMPLE_ASSIGN: { unsigned old_num_ops = gimple_num_ops (stmt); ! tree new_rhs = fold_gimple_assign (gsi); tree lhs = gimple_assign_lhs (stmt); if (new_rhs && !useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs))) --- 1553,1574 case GIMPLE_ASSIGN: { unsigned old_num_ops = gimple_num_ops (stmt); ! enum tree_code subcode = gimple_assign_rhs_code (stmt); tree lhs = gimple_assign_lhs (stmt); + tree new_rhs; + /* First canonicalize operand order. This avoids building new + trees if this is the only thing fold would later do. */ + if ((commutative_tree_code (subcode) +|| commutative_ternary_tree_code (subcode)) + && tree_swap_operands_p (gimple_assign_rhs1 (stmt), +gimple_assign_rhs2 (stmt), false)) + { + tree tem = gimple_assign_rhs1 (stmt); + gimple_assign_set_rhs1 (stmt, gimple_assign_rhs2 (stmt)); + gimple_assign_set_rhs2 (stmt, tem); + changed = true; + } + new_rhs = fold_gimple_assign (gsi); if (new_rhs && !useless_type_conversion_p (TREE_TYPE (lhs), TREE_TYPE (new_rhs)))
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Wed, Jul 13, 2011 at 09:58:08AM -0700, Richard Henderson wrote: > Why the force_operand? You've got register inputs. Either the target > is going to support the operation or it isn't. I agree that it doesn't seem to be necessary. I've used force_operand since ivopts (add_cost) is doing it without seeing a clear reason for it. So I've removed it now. > Saving cost data dependent on speed, which is non-constant. > You probably need to make this a two dimensional array. Fixed. Here is an updated version. Bye, -Andreas- 2011-07-14 Andreas Krebbel * tree-ssa-math-opts.c (compute_costs): New function. (convert_mult_to_fma): Take costs into account when propagating multiplications into several additions. * config/s390/s390.c (z196_costs): Adjust costs for madbr and maebr. Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c.orig --- gcc/tree-ssa-math-opts.c *** convert_plusminus_to_widen (gimple_stmt_ *** 2185,2190 --- 2185,2236 return true; } + /* Computing the costs for calculating RTX with CODE in MODE. */ + + static unsigned + compute_costs (enum machine_mode mode, enum rtx_code code, bool speed) + { + rtx insn; + unsigned cost; + + switch (GET_RTX_LENGTH (code)) + { + case 2: + insn = gen_rtx_fmt_ee (code, mode, +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), +gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2)); + break; + case 3: + insn = gen_rtx_fmt_eee (code, mode, + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2), + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3)); + break; + default: + gcc_unreachable (); + } + + if (dump_file && (dump_flags & TDF_DETAILS)) + { + fprintf (dump_file, "Calculating costs of %s in %s mode. RTX is:\n", + GET_RTX_NAME (code), GET_MODE_NAME (mode)); + print_rtl (dump_file, insn); + } + + cost = rtx_cost (insn, SET, speed); + + /* If the backend returns a cost of zero it is most certainly lying. + Set this to one in order to notice that we already calculated it + once. */ + cost = cost ? cost : 1; + + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, "\n%s in %s costs %d\n\n", + GET_RTX_NAME (code), GET_MODE_NAME (mode), cost); + + return cost; + } + /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2 with uses in additions and subtractions to form fused multiply-add operations. Returns true if successful and MUL_STMT should be removed. */ *** convert_mult_to_fma (gimple mul_stmt, tr *** 2197,2202 --- 2243,2254 gimple use_stmt, neguse_stmt, fma_stmt; use_operand_p use_p; imm_use_iterator imm_iter; + enum machine_mode mode; + int uses = 0; + bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt)); + static unsigned mul_cost[2][NUM_MACHINE_MODES]; + static unsigned add_cost[2][NUM_MACHINE_MODES]; + static unsigned fma_cost[2][NUM_MACHINE_MODES]; if (FLOAT_TYPE_P (type) && flag_fp_contract_mode == FP_CONTRACT_OFF) *** convert_mult_to_fma (gimple mul_stmt, tr *** 2213, if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; /* Make sure that the multiplication statement becomes dead after ! the transformation, thus that all uses are transformed to FMAs. ! This means we assume that an FMA operation has the same cost ! as an addition. */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result) { enum tree_code use_code; --- 2265,2281 if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) return false; + mode = TYPE_MODE (type); + + if (!fma_cost[speed][mode]) + { + fma_cost[speed][mode] = compute_costs (mode, FMA, speed); + add_cost[speed][mode] = compute_costs (mode, PLUS, speed); + mul_cost[speed][mode] = compute_costs (mode, MULT, speed); + } + /* Make sure that the multiplication statement becomes dead after ! the transformation, thus that all uses are transformed to FMAs. */ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result) { enum tree_code use_code; *** convert_mult_to_fma (gimple mul_stmt, tr *** 2292,2297 --- 2351,2357 if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt)) return false; + uses++; /* While it is possible to validate whether or not the exact form that we've recognized is available in the backend, the assumption is that the transformation is never a loss. For instance, suppose *** convert_
[PATCH] Fold to more canonical TRUHT_NOT_EXPRs
The following generates boolean-typed TRUTH_NOT_EXPRs instead of operating on converted boolean operands. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-07-14 Richard Guenther * fold-const.c (fold_binary_loc): Convert the !bool_var result, not bool_var when folding bool_var != 1 or bool_var == 0. Index: gcc/fold-const.c === --- gcc/fold-const.c(revision 176266) +++ gcc/fold-const.c(working copy) @@ -12156,14 +12156,16 @@ fold_binary_loc (location_t loc, /* bool_var != 1 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE && integer_onep (arg1) && code == NE_EXPR) -return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, - fold_convert_loc (loc, type, arg0)); +return fold_convert_loc (loc, type, +fold_build1_loc (loc, TRUTH_NOT_EXPR, + TREE_TYPE (arg0), arg0)); /* bool_var == 0 becomes !bool_var. */ if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE && integer_zerop (arg1) && code == EQ_EXPR) -return fold_build1_loc (loc, TRUTH_NOT_EXPR, type, - fold_convert_loc (loc, type, arg0)); +return fold_convert_loc (loc, type, +fold_build1_loc (loc, TRUTH_NOT_EXPR, + TREE_TYPE (arg0), arg0)); /* !exp != 0 becomes !exp */ if (TREE_CODE (arg0) == TRUTH_NOT_EXPR && integer_zerop (arg1)
Re: Correct fix for scheduler bug PR11320
> Any particular bits you still see that don't get reverted with this patch? The ebb_compute_jump_reg_dependencies changes. The original patch has: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. Mark registers live on entry of non-fallthrough blocks as used by the jump. but you're reverting only: * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. -- Eric Botcazou
Re: Correct fix for scheduler bug PR11320
On 07/14/11 14:18, Eric Botcazou wrote: >> Any particular bits you still see that don't get reverted with this patch? > > The ebb_compute_jump_reg_dependencies changes. The original patch has: > > * sched-ebb.c (compute_jump_reg_dependencies): New prototype. > Mark registers live on entry of the fallthrough block and conditionally > set as set by the jump. Mark registers live on entry of non-fallthrough > blocks as used by the jump. > > but you're reverting only: > > * sched-ebb.c (compute_jump_reg_dependencies): New prototype. > Mark registers live on entry of the fallthrough block and conditionally > set as set by the jump. > ??? Original code: basic_block b = BLOCK_FOR_INSN (insn); edge e; for (e = b->succ; e; e = e->succ_next) ! if ((e->flags & EDGE_FALLTHRU) == 0) ! { ! bitmap_operation (set, set, e->dest->global_live_at_start, ! BITMAP_IOR); ! } } Code after the revert: FOR_EACH_EDGE (e, ei, b->succs) +if ((e->flags & EDGE_FALLTHRU) == 0) bitmap_ior_into (used, df_get_live_in (e->dest)); As far as I can tell these are identical, modulo the change in variable name ("set" -> "used" which seems like a better name). Bernd
Re: Correct fix for scheduler bug PR11320
On 07/14/11 13:57, Eric Botcazou wrote: >> The real problem here is that the ia64 backend lies to the rest of the >> compiler; it produces a load instruction without showing a MEM anywhere >> in the instruction pattern. Hence, the following patch, which reverts >> the bogus scheduler changes and adds a MEM to a pattern in ia64.md. > > This is probably the root cause of the problem, indeed. But you don't revert > everything so, if this isn't an oversight, then the ChangeLog is incorrect. > And there is another change in sched-deps.c not mentioned in the ChangeLog. Well, the actual code has completely changed in the meantime. All the hunks of the original patch failed :) I can write a new ChangeLog entry if that seems important. Any particular bits you still see that don't get reverted with this patch? Bernd
Re: Correct fix for scheduler bug PR11320
> The real problem here is that the ia64 backend lies to the rest of the > compiler; it produces a load instruction without showing a MEM anywhere > in the instruction pattern. Hence, the following patch, which reverts > the bogus scheduler changes and adds a MEM to a pattern in ia64.md. This is probably the root cause of the problem, indeed. But you don't revert everything so, if this isn't an oversight, then the ChangeLog is incorrect. And there is another change in sched-deps.c not mentioned in the ChangeLog. -- Eric Botcazou
Re: Ping: The TI C6X port
On 06/14/11 22:52, Vladimir Makarov wrote: > On 06/06/2011 07:26 AM, Bernd Schmidt wrote: >> Ping^3 for the C6X port. Now with extra patches: >> >> Additional preliminary scheduler tweaks: >> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02408.html >> > It is ok for me. Thanks, Bernd. I've committed this with one change: I've dropped the hunk that resolves debug_insn dependencies rather than deleting them. We should do that, but when I tested these changes together with the backtracking scheduler, I saw some problems, so I'll postpone this. Bernd
Re: RFA: Avoid unnecessary clearing in union initialisers
"H.J. Lu" writes: > This caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736 Sorry for the breakage. It was due to a humiliating stupid mistake in the hunk to update all_zeros_p: @@ -5129,13 +5152,12 @@ mostly_zeros_p (const_tree exp) all_zeros_p (const_tree exp) { if (TREE_CODE (exp) == CONSTRUCTOR) - { - HOST_WIDE_INT nz_elts, count; - bool must_clear; + HOST_WIDE_INT nz_elts, init_elts; + bool complete_p; - categorize_ctor_elements (exp, &nz_elts, &count, &must_clear); - return nz_elts == 0; + categorize_ctor_elements (exp, &nz_elts, &init_elts, &complete_p); + return nz_elts == init_elts; } return initializer_zerop (exp); which was due to trying a different approach and not properly reverting it. I've applied the following as obvious. Tested on x86_64-linux-gnu, and on 255.vortex. Richard gcc/ PR middle-end/49736 * expr.c (all_zeros_p): Undo bogus part of last change. Index: gcc/expr.c === --- gcc/expr.c 2011-07-14 11:46:42.0 +0100 +++ gcc/expr.c 2011-07-14 11:47:40.0 +0100 @@ -5157,7 +5157,7 @@ all_zeros_p (const_tree exp) bool complete_p; categorize_ctor_elements (exp, &nz_elts, &init_elts, &complete_p); - return nz_elts == init_elts; + return nz_elts == 0; } return initializer_zerop (exp);
Re: [patch] fix typo in doc/extend.texi, committed as obvious
On 07/14/2011 01:44 PM, Richard Guenther wrote: > On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose wrote: >> fix a typo in doc/extend.texi, committed as obvious. > > Specify the architecture to generate code for when compiling the > -function. If you select the @code{"target("cpu=power7)"} attribute when > +function. If you select the @code{"target("cpu=power7")} attribute when > > still mismatched number of '"'s. I suggest to drop the first one. fixed as well.
Re: [patch] fix typo in doc/extend.texi, committed as obvious
On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose wrote: > fix a typo in doc/extend.texi, committed as obvious. Specify the architecture to generate code for when compiling the -function. If you select the @code{"target("cpu=power7)"} attribute when +function. If you select the @code{"target("cpu=power7")} attribute when still mismatched number of '"'s. I suggest to drop the first one. Richard. > Matthias > > >
Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr
On Wed, Jul 13, 2011 at 2:15 PM, Kai Tietz wrote: > 2011/7/13 Richard Guenther : >> On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz wrote: >>> 2011/7/13 Richard Guenther : On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz wrote: > Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying > issue is that fold-const re-inttroduces TRUTH_AND/OR and co. I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air when you present it with BIT_ variants as input. >>> >>> Well, look into fold-const's fold_binary_loc function and see >>> >>> /* ARG0 is the first operand of EXPR, and ARG1 is the second operand. >>> >>> First check for cases where an arithmetic operation is applied to a >>> compound, conditional, or comparison operation. Push the arithmetic >>> operation inside the compound or conditional to see if any folding >>> can then be done. Convert comparison to conditional for this purpose. >>> The also optimizes non-constant cases that used to be done in >>> expand_expr. >>> >>> Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR, >>> one of the operands is a comparison and the other is a comparison, a >>> BIT_AND_EXPR with the constant 1, or a truth value. In that case, the >>> code below would make the expression more complex. Change it to a >>> TRUTH_{AND,OR}_EXPR. Likewise, convert a similar NE_EXPR to >>> TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR. */ >>> >>> if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR >>> || code == EQ_EXPR || code == NE_EXPR) >>> && ((truth_value_p (TREE_CODE (arg0)) >>> && (truth_value_p (TREE_CODE (arg1)) >>> || (TREE_CODE (arg1) == BIT_AND_EXPR >>> && integer_onep (TREE_OPERAND (arg1, 1) >>> || (truth_value_p (TREE_CODE (arg1)) >>> && (truth_value_p (TREE_CODE (arg0)) >>> || (TREE_CODE (arg0) == BIT_AND_EXPR >>> && integer_onep (TREE_OPERAND (arg0, 1))) >>> { >>> tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR >>> : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR >>> : TRUTH_XOR_EXPR, >>> boolean_type_node, >>> fold_convert_loc (loc, boolean_type_node, arg0), >>> fold_convert_loc (loc, boolean_type_node, arg1)); >>> >>> if (code == EQ_EXPR) >>> tem = invert_truthvalue_loc (loc, tem); >>> >>> return fold_convert_loc (loc, type, tem); >>> } >>> >>> Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands >>> are of kind truth. This is btw the point, why you see that those >>> cases are handled. But as soon as this part is turned off for BIT_- >>> IOR/AND, we need to do the folding for 1-bit precision case explicit. >> >> First of all this checks for a quite complex pattern - where do we pass >> such complex pattern from the gimple level to fold? For the EQ/NE_EXPR >> case forwprop probably might be able to feed it that, but then how does >> it go wrong? The above could also simply be guarded by !in_gimple_form. >> >> Richard. > > See reassoc pass as example I don't see where. > and this hacky maybe_fold_and_comparisons > / maybe_fold_or_comparisons functions. Yes, that case is known - but unless the folding result is sane we drop it anyway. > As indeed we want still be > able to do comparison foldings without getting back an TRUTH-op. Well, if we get back one we have canonicalize_cond_expr_cond to make it sane again. > Additionally we have a lot of passes - like vectorizer - which are > happily try to build new condition on tree-level. This is another That's true and you promised to clean up remaining TRUTH_* consumers / producers when doing the canonicalization to BIT_*. At the moment they get back to BIT_* via force_gimple_operand* which is probably even fine. > place I saw issues and tree-cfg failures. And last but not least those > truth-ops might be reintroduced in gimple_fold, as soon as we see > bitwise-ops on one-bit precision integral type as truth_value. Well, I already said that one-bit-precision as truth_value is wrong. Relying on fold-const.c here with its very strict type rules always was a stretch. If it turns out there are now cases that we want to do differently then we finally have to bite the bullet and do it. There's always two options - duplicate it properly in fold_stmt (or forwprop in case it needs to lookup defs), or remove it from fold and make sure we fold things using fold_stmt at gimplifcation time. Richard. > Kai >
[patch] fix typo in doc/extend.texi, committed as obvious
fix a typo in doc/extend.texi, committed as obvious. Matthias 2011-07-14 Matthias Klose * doc/extend.texi (optimize attribute): Fix typo. Index: doc/extend.texi === --- doc/extend.texi (revision 176263) +++ doc/extend.texi (working copy) @@ -3594,7 +3594,7 @@ @item cpu=@var{CPU} @cindex @code{target("cpu=@var{CPU}")} attribute Specify the architecture to generate code for when compiling the -function. If you select the @code{"target("cpu=power7)"} attribute when +function. If you select the @code{"target("cpu=power7")} attribute when generating 32-bit code, VSX and Altivec instructions are not generated unless you use the @option{-mabi=altivec} option on the command line.
Re: Correct fix for scheduler bug PR11320
On 07/14/11 13:20, Andrey Belevantsev wrote: > Hello Bernd, > > FWIW, we have discussed your change with Alexander and we think you are > right about the scheduler changes. One question is: > > On 14.07.2011 14:03, Bernd Schmidt wrote: >> --- gcc/sched-deps.c(revision 176195) >> +++ gcc/sched-deps.c(working copy) >> @@ -568,7 +568,7 @@ >>(rev1==rev2 >>? reversed_comparison_code (cond2, NULL) >>: GET_CODE (cond2)) >> - && XEXP (cond1, 0) == XEXP (cond2, 0) >> + && rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0)) >>&& XEXP (cond1, 1) == XEXP (cond2, 1)) >> return 1; >>return 0; > > this hunk from conditions_mutex_p seems to be unrelated? Oh yes, sorry about that. That was approved a while ago and I haven't gotten around to checking it in. Bernd
Re: Correct fix for scheduler bug PR11320
Hello Bernd, FWIW, we have discussed your change with Alexander and we think you are right about the scheduler changes. One question is: On 14.07.2011 14:03, Bernd Schmidt wrote: --- gcc/sched-deps.c(revision 176195) +++ gcc/sched-deps.c(working copy) @@ -568,7 +568,7 @@ (rev1==rev2 ? reversed_comparison_code (cond2, NULL) : GET_CODE (cond2)) - && XEXP (cond1, 0) == XEXP (cond2, 0) + && rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0)) && XEXP (cond1, 1) == XEXP (cond2, 1)) return 1; return 0; this hunk from conditions_mutex_p seems to be unrelated? Andrey
[PATCH] Only insert required conversions during gimplification
Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. Richard. 2011-07-14 Richard Guenther * gimplify.c (gimplify_expr): Only do required conversions. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 176266) +++ gcc/gimplify.c (working copy) @@ -6787,22 +6787,20 @@ gimplify_expr (tree *expr_p, gimple_seq case TRUTH_NOT_EXPR: { - tree org_type = TREE_TYPE (*expr_p); - + tree orig_type = TREE_TYPE (*expr_p); *expr_p = gimple_boolify (*expr_p); - if (org_type != boolean_type_node) + if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p))) { - *expr_p = fold_convert (org_type, *expr_p); + *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p); ret = GS_OK; break; } + ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, +is_gimple_val, fb_rvalue); + recalculate_side_effects (*expr_p); + break; } - ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p, - is_gimple_val, fb_rvalue); - recalculate_side_effects (*expr_p); - break; - case ADDR_EXPR: ret = gimplify_addr_expr (expr_p, pre_p, post_p); break; @@ -7227,40 +7225,36 @@ gimplify_expr (tree *expr_p, gimple_seq case TRUTH_OR_EXPR: case TRUTH_XOR_EXPR: { - tree org_type = TREE_TYPE (*expr_p); - + tree orig_type = TREE_TYPE (*expr_p); *expr_p = gimple_boolify (*expr_p); - - /* This shouldn't happen, but due fold-const (and here especially - fold_truth_not_expr) happily uses operand type and doesn't - automatically uses boolean_type as result, we need to keep - orignal type. */ - if (org_type != boolean_type_node) + if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p))) { - *expr_p = fold_convert (org_type, *expr_p); + *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p); ret = GS_OK; break; } - } - /* With two-valued operand types binary truth expressions are -semantically equivalent to bitwise binary expressions. Canonicalize -them to the bitwise variant. */ switch (TREE_CODE (*expr_p)) - { - case TRUTH_AND_EXPR: - TREE_SET_CODE (*expr_p, BIT_AND_EXPR); - break; - case TRUTH_OR_EXPR: - TREE_SET_CODE (*expr_p, BIT_IOR_EXPR); - break; - case TRUTH_XOR_EXPR: - TREE_SET_CODE (*expr_p, BIT_XOR_EXPR); - break; - default: - break; + /* Boolified binary truth expressions are semantically equivalent +to bitwise binary expressions. Canonicalize them to the +bitwise variant. */ + switch (TREE_CODE (*expr_p)) + { + case TRUTH_AND_EXPR: + TREE_SET_CODE (*expr_p, BIT_AND_EXPR); + break; + case TRUTH_OR_EXPR: + TREE_SET_CODE (*expr_p, BIT_IOR_EXPR); + break; + case TRUTH_XOR_EXPR: + TREE_SET_CODE (*expr_p, BIT_XOR_EXPR); + break; + default: + break; + } + + /* Continue classified as tcc_binary. */ + goto expr_2; } - /* Classified as tcc_expression. */ - goto expr_2; case FMA_EXPR: /* Classified as tcc_expression. */
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 12:14 PM, Daniel Carrera wrote: On 07/14/2011 12:04 PM, Tobias Burnus wrote: I was wondering - based on the discussion - whether one should remove the "int error" argument from caf_runtime_error and simply use "exit (EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. You are the boss. The message I got from Nick's post is that it doesn't matter much and that you could even get surprising behaviour because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required to be non-zero. But maybe I missed the point. So it's up to you. Well, for "exit(3)" one knows that one will get a surprising in Windows: An abort. While EXIT_FAILURE [but also exit(1)] have a very high change to be working (nearly) everywhere. At the end, as long as the operating system does not react in a completely odd way, the exit status does not matter - it's the problem of the caller (e.g. some script) to make use of the result. If EXIT_FAILURE is 0 and the script assumes that it is a success, it's either a fault of the script writer, or of the one putting 0 for EXIT_FAILURE into the header, or a fault of the OS designer. Thus, I think the chance that one avoids surprises is higher with exit(EXIT_FAILURE) and one avoids having some random number (for the exit status) in the code. Users usually do not check the exact exit status value but only its 0- vs non-0-ness. Hence, I think exit(EXIT_FAILURE) is better than the current version. Tobias
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther : > > But then how comes the option to override it is useful? It isn't dependent > on the particular case. At least the option should be a --param. > > Richard. > Option is still useful if you want to try feature on platform with no hook implemented and for other performance experiments. I agree --param usage should be better here. I'll fix it. Ilya
Re: [build] Move crtfastmath to toplevel libgcc
Andreas Schwab writes: > Same on ia64: > > Configuration mismatch! > Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o > Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o > crtfastmath.o I believe you need the following patch. As one of my next tasks, I'll work to completely move crtstuff/extra_parts/EXTRA_PARTS/EXTRA_MULTILIB_PARTS over to libgcc so there's no duplication of configuration between gcc and libgcc. If this tests ok, I suppose it can go in as obvious. Rainer 2011-07-14 Rainer Orth Revert: * config.gcc (ia64*-*-elf*): Remove crtfastmath.o from extra_parts. (ia64*-*-freebsd*): Likewise. (ia64*-*-linux*): Likewise. (mips64*-*-linux*): Likewise. (mips*-*-linux*): Likewise. Index: gcc/config.gcc === --- gcc/config.gcc (revision 176266) +++ gcc/config.gcc (working copy) @@ -1594,13 +1594,13 @@ then target_cpu_default="${target_cpu_default}|MASK_GNU_LD" fi - extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o" + extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o" ;; ia64*-*-freebsd*) tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} ia64/sysv4.h ia64/freebsd.h" target_cpu_default="MASK_GNU_AS|MASK_GNU_LD" tmake_file="${tmake_file} ia64/t-ia64" - extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o" + extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o" ;; ia64*-*-linux*) tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h ia64/sysv4.h ia64/linux.h" @@ -1609,7 +1609,7 @@ tmake_file="${tmake_file} t-libunwind-elf ia64/t-glibc-libunwind" fi target_cpu_default="MASK_GNU_AS|MASK_GNU_LD" - extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o" + extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o" ;; ia64*-*-hpux*) tm_file="${tm_file} dbxelf.h elfos.h ia64/sysv4.h ia64/hpux.h" @@ -1849,6 +1849,7 @@ tm_defines="${tm_defines} MIPS_ISA_DEFAULT=65" ;; esac + extra_parts="$extra_parts crtfastmath.o" gnu_ld=yes gas=yes test x$with_llsc != x || with_llsc=yes @@ -1868,6 +1869,7 @@ mipsisa32*) tm_defines="${tm_defines} MIPS_ISA_DEFAULT=32" esac + extra_parts="$extra_parts crtfastmath.o" test x$with_llsc != x || with_llsc=yes ;; mips*-*-openbsd*) -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
> One minor note, you will need to update doc/invoke.texi to document the new > switch before checkin: -ftree-reassoc-width= > > -- > Michael Meissner, IBM Thanks for the note! Here is fixed patch. Ilya -- gcc/ 2011-07-14 Enkovich Ilya PR middle-end/44382 * target.def (reassociation_width): New hook. * doc/tm.texi.in (reassociation_width): New hook documentation. * doc/tm.texi (reassociation_width): Likewise. * doc/invoke.texi (ftree-reassoc-width): New option documented. * hooks.h (hook_int_const_gimple_1): New default hook. * hooks.c (hook_int_const_gimple_1): Likewise. * config/i386/i386.h (ix86_tune_indices): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (TARGET_REASSOC_INT_TO_PARALLEL): New. (TARGET_REASSOC_FP_TO_PARALLEL): Likewise. * config/i386/i386.c (initial_ix86_tune_features): Add X86_TUNE_REASSOC_INT_TO_PARALLEL and X86_TUNE_REASSOC_FP_TO_PARALLEL. (ix86_reassociation_width) implementation of new hook for i386 target. * common.opt (ftree-reassoc-width): New option added. * tree-ssa-reassoc.c (get_required_cycles): New function. (get_reassociation_width): Likewise. (rewrite_expr_tree_parallel): Likewise. (reassociate_bb): Now checks reassociation width to be used and call rewrite_expr_tree_parallel instead of rewrite_expr_tree if needed. (pass_reassoc): TODO_remove_unused_locals flag added. gcc/testsuite/ 2011-07-14 Enkovich Ilya * gcc.dg/tree-ssa/pr38533.c (dg-options): Added option -ftree-reassoc-width=1. * gcc.dg/tree-ssa/reassoc-24.c: New test. * gcc.dg/tree-ssa/reassoc-25.c: Likewise. PR44382.diff Description: Binary data
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 12:04 PM, Tobias Burnus wrote: I was wondering - based on the discussion - whether one should remove the "int error" argument from caf_runtime_error and simply use "exit (EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. You are the boss. The message I got from Nick's post is that it doesn't matter much and that you could even get surprising behaviour because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required to be non-zero. But maybe I missed the point. So it's up to you. Cheers, Daniel. -- I'm not overweight, I'm undertall.
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich wrote: > 2011/7/14 Richard Guenther : >> >> Btw, rather than a new target hook and an option I suggest to use >> a --param which default you can modify in the backend. >> >> Richard. >> > > Introduced target hook does not just define default value for option. > It is meant to make decision in each particular case. For now it > returns a constant value but march dependent heuristics will be added > later. But then how comes the option to override it is useful? It isn't dependent on the particular case. At least the option should be a --param. Richard. > Ilya >
Re: [Patch, Fortran] Allocate + CAF library
*ping* ? On 07/11/2011 08:16 PM, Daniel Carrera wrote: Hello, This is my largest patch so far and the first that I'll commit myself. This patch improves support for the ALLOCATE statement when using the coarray library. Specifically, it adds support for the stat= and errmsg= attributes: ALLOCATE( x(n)[*] , stat=i , errmsg=str ) These attributes are now written by the CAF library. This patch also involves a good amount of code cleanup. ChangeLog is attached. As soon as I get the go-ahead, I'll commit this patch. Cheers, Daniel. -- I'm not overweight, I'm undertall.
Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
On 07/14/2011 11:48 AM, Daniel Carrera wrote: This patch adds a caf_runtime_error function to the non-MPI implementation of Coarray Fortran. It is based on the MPI function of the same name in mpi.c. Ok to commit? I was wondering - based on the discussion - whether one should remove the "int error" argument from caf_runtime_error and simply use "exit (EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf. http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html But one can also do so as follow up patch. Thus, OK for the patch as is - or with replacing all exit(...) by "exit(EXIT_FAILURE)", which uses stdlib.h's EXIT_FAILURE. One then can also drop the "int error" argument to the caf_runtime_error function. Thanks for the patch! Tobias ChangeLog: 2011-07-14 Daniel Carrera * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function based on the function in mpi.c with the same name. (_gfortran_caf_init): Use caf_runtime_error. * caf/mpi.c (caf_runtime_error): Add a note to keep in sync with the function in single.c.
Correct fix for scheduler bug PR11320
PR11320 was a scheduler problem where an instruction was moved backwards across a branch, leading to addl r14 = @ltoffx(.LC2), r1 ;; (p7) addl r14 = 1, r0 <--- r14 clobbered ;; (p7) st4 [r15] = r14 ld8.mov r14 = [r14], .LC2 <--- crash (branch was here) At the time, this was solved by a patch that recognizes when a register is conditionally set in a basic block, and then treats the branch as setting such a register: http://gcc.gnu.org/ml/gcc-patches/2003-07/msg01044.html "The proposed fix is to say that JUMP_INSNs set the registers that are live on entry of the fallthrough block and are conditionally set before the jump, because they can be considered as restoring the former value of the conditionally set registers." While it is true that a jump could be seen as setting the correct value for a register, this isn't at all related to conditional sets. Consider a trivial example: a = b + 3; if (b > 0) { use a } else { use a } where the value of a is different in each branch of the else, with no conditional execution in sight. From the point of view of the uses, it can be argued that the jump sets the value, but this is irrelevant for scheduling purposes. There's nothing wrong with using a wrong value in an instruction hoisted across a branch if the result of that instruction will (eventually) be dead. We do this all the time in sched-ebb. The patch needlessly restricts scheduling opportunities and should be reverted. The real problem here is that the ia64 backend lies to the rest of the compiler; it produces a load instruction without showing a MEM anywhere in the instruction pattern. Hence, the following patch, which reverts the bogus scheduler changes and adds a MEM to a pattern in ia64.md. The represenation is somewhat weird, but no more so than before where the load was represented as a plain LO_SUM. The documentation I have doesn't mention ld8.mov, so I'll have to leave it to the ia64 maintainers to choose between this change or implement something more involved. Bootstrapped and tested without regressions on ia64-linux with languages=c,c++,obj,fortran and --disable-shared; that was what I could get to work at all at the moment. Even then, C++ seems pretty broken. I've also built a 3.3 cross-cc1 to see if the original bug is still fixed (it is). Ok (scheduler & ia64 bits)? Bernd Revert 2003-07-10 Eric Botcazou PR optimization/11320 * sched-int.h (struct deps) [reg_conditional_sets]: New field. (struct sched_info) [compute_jump_reg_dependencies]: New prototype. * sched-deps.c (sched_analyze_insn) [JUMP_INSN]: Update call to current_sched_info->compute_jump_reg_dependencies. Record which registers are used and which registers are set by the jump. Clear deps->reg_conditional_sets after a barrier. Set deps->reg_conditional_sets if the insn is a COND_EXEC. Clear deps->reg_conditional_sets if the insn is not a COND_EXEC. (init_deps): Initialize reg_conditional_sets. (free_deps): Clear reg_conditional_sets. * sched-ebb.c (compute_jump_reg_dependencies): New prototype. Mark registers live on entry of the fallthrough block and conditionally set as set by the jump. Mark registers live on entry of non-fallthrough blocks as used by the jump. * sched-rgn.c (compute_jump_reg_dependencies): New prototype. Mark new parameters as unused. * config/ia64/ia64.md (load_symptr_low): Show a MEM. * config/ia64/ia64.c (ia64_expand_load_address): Generate it. Index: gcc/sched-ebb.c === --- gcc/sched-ebb.c (revision 176195) +++ gcc/sched-ebb.c (working copy) @@ -238,28 +238,18 @@ return 1; } - /* INSN is a JUMP_INSN, COND_SET is the set of registers that are -conditionally set before INSN. Store the set of registers that -must be considered as used by this jump in USED and that of -registers that must be considered as set in SET. */ + /* INSN is a JUMP_INSN. Store the set of registers that +must be considered as used by this jump in USED. */ void -ebb_compute_jump_reg_dependencies (rtx insn, regset cond_set, regset used, - regset set) +ebb_compute_jump_reg_dependencies (rtx insn, regset used) { basic_block b = BLOCK_FOR_INSN (insn); edge e; edge_iterator ei; FOR_EACH_EDGE (e, ei, b->succs) -if (e->flags & EDGE_FALLTHRU) - /* The jump may be a by-product of a branch that has been merged -in the main codepath after being conditionalized. Therefore -it may guard the fallthrough block from using a value that has -conditionally overwritten that of the main codepath. So we -consider that it restores the value of the main codepath. */ - bitmap_and (set, df_get_live_in (e->dest), cond_set); -
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Richard Guenther : > > Btw, rather than a new target hook and an option I suggest to use > a --param which default you can modify in the backend. > > Richard. > Introduced target hook does not just define default value for option. It is meant to make decision in each particular case. For now it returns a constant value but march dependent heuristics will be added later. Ilya
Re: [build] Remove crt0, mcrt0 support
Richard, >> what's your take on this? > > I'm fine with it if you install a deprecation patch on the 4.6 branch > and mention that in the 4.6 changes.html. ok, will do. Thanks. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [build] Move crtfastmath to toplevel libgcc
"H.J. Lu" writes: > On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth > wrote: >> Richard Henderson writes: >> >>> On 07/13/2011 09:57 AM, Rainer Orth wrote: Do you think the revised crtfastmath patch is safe enough to commit together to avoid this mess? >>> >>> Probably. >> >> Ok. I'll will take this on me to get us out of this mess. It has >> survived i386-pc-solaris2.11, sparc-sun-solaris2.11, >> x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the >> risk seems acceptable. >> +# -frandom-seed is necessary to keep the mangled name of the constructor on +# Tru64 Unix stable, but harmless otherwise. >>> >>> Instead of implying permanent stability, I'd mention bootstrap comparison >>> failures specifically. >> >> Ok, will do. > > I think your patch caused: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739 Same on ia64: Configuration mismatch! Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o Andreas. -- Andreas Schwab, sch...@redhat.com GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E "And now for something completely different."
[Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c
This patch adds a caf_runtime_error function to the non-MPI implementation of Coarray Fortran. It is based on the MPI function of the same name in mpi.c. Ok to commit? ChangeLog: 2011-07-14 Daniel Carrera * caf/single.c: Include stdarg.h header. (caf_runtime_error): New function based on the function in mpi.c with the same name. (_gfortran_caf_init): Use caf_runtime_error. * caf/mpi.c (caf_runtime_error): Add a note to keep in sync with the function in single.c. -- I'm not overweight, I'm undertall. Index: libgfortran/caf/single.c === --- libgfortran/caf/single.c (revision 176230) +++ libgfortran/caf/single.c (working copy) @@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI #include /* For fputs and fprintf. */ #include /* For exit and malloc. */ #include /* For memcpy and memset. */ +#include /* For variadic arguments. */ /* Define GFC_CAF_CHECK to enable run-time checking. */ /* #define GFC_CAF_CHECK 1 */ @@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI caf_static_t *caf_static_list = NULL; +/* Keep in sync with mpi.c. */ +static void +caf_runtime_error (int error, const char *message, ...) +{ + va_list ap; + fprintf (stderr, "Fortran runtime error."); + va_start (ap, message); + fprintf (stderr, message, ap); + va_end (ap); + fprintf (stderr, "\n"); + + /* FIXME: Shutdown the Fortran RTL to flush the buffer. PR 43849. */ + exit (error); +} + void _gfortran_caf_init (int *argc __attribute__ ((unused)), char ***argv __attribute__ ((unused)), @@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, if (unlikely (local == NULL || token == NULL)) { + const char msg[] = "Failed to allocate coarray"; if (stat) { *stat = 1; if (errmsg_len > 0) { - const char msg[] = "Failed to allocate coarray"; int len = ((int) sizeof (msg) > errmsg_len) ? errmsg_len : (int) sizeof (msg); memcpy (errmsg, msg, len); @@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, return NULL; } else - { - fprintf (stderr, "ERROR: Failed to allocate coarray"); - exit (1); - } + caf_runtime_error (1, msg); } if (stat) Index: libgfortran/caf/mpi.c === --- libgfortran/caf/mpi.c (revision 176230) +++ libgfortran/caf/mpi.c (working copy) @@ -47,6 +47,7 @@ static int caf_is_finalized; caf_static_t *caf_static_list = NULL; +/* Keep in sync with single.c. */ static void caf_runtime_error (int error, const char *message, ...) { @@ -62,7 +63,7 @@ caf_runtime_error (int error, const char MPI_Abort (MPI_COMM_WORLD, error); /* Should be unreachable, but to make sure also call exit. */ - exit (2); + exit (error); }
Re: [build] Remove crt0, mcrt0 support
On Thu, 14 Jul 2011, Rainer Orth wrote: > Richard, > > [Talking about an immediate removal of the netware target...] > > > "Jan Beulich" writes: > > > > Rainer Orth 07/13/11 4:34 PM >>> > >>>which variant would you prefer: obsoletion now and removal in 4.8 or > >>>immediate removal? > >> > >> Both are fine with me, so unless someone else objects immediate removal > >> would seem better given it had been pretty much unmaintained. > > > > Right: it would be a one-time offort to remove the support, but > > subsequent cleanups wouldn't have to deal with the effectively dead > > code. > > > > I had a quick look and it doesn't seem hard: apart from removing the > > netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc > > and libgcc/config.host changes), there's only a small list (apart from > > netware-related target triplets in the testsuite): > > > > config/elf.m4 > > configure.ac > > contrib/config-list.mk > > gcc/config/i386/i386.c > > gcc/config/i386/i386.h > > gcc/doc/extend.texi > > libstdc++-v3/crossconfig.m4 > > > > configure.ac may have to stay if binutils/src wants to retain the > > report, but that's about it. > > > > Let's see what the release managers/global reviewers think. > > what's your take on this? I'm fine with it if you install a deprecation patch on the 4.6 branch and mention that in the 4.6 changes.html. Richard. -- Richard Guenther Novell / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer
Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)
On Wed, Jul 13, 2011 at 10:30:58AM -0400, Jason Merrill wrote: > On 07/13/2011 10:06 AM, Jakub Jelinek wrote: > >>>--- gcc/testsuite/lib/dg-pch.exp.jj2011-01-03 18:58:03.0 > >>>+0100 > >>>+++ gcc/testsuite/lib/dg-pch.exp 2011-07-12 23:13:50.943670171 +0200 > >>>- dg-test -keep-output "./$bname$suffix" "$otherflags $flags" "" > >>>+ dg-test -keep-output "./$bname$suffix" "-gno-record-gcc-switches > >>>$otherflags $flags" "" > >> > >It is only necessary if somebody wants to make -grecord-gcc-switches > >the default (for bootstrap/regtest I've tweaked common.opt to do that > >to test it better). PCH is a big mess and screws debuginfo in many ways, > >in this case it was just small differences in DW_AT_producer, but > >we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc. > > Why would PCH change DW_AT_producer? Because we're restoring > single_comp_unit_die from the PCH? Then perhaps we should set > DW_AT_producer in output_comp_unit rather than gen_compile_unit_die. output_comp_unit is too late, because at that point has the form been finalized and sizes calculated etc. But at the start of dwarf2out_finish it is possible. Here is a PCH friendly variant of the patch which tries to set right producer from the start, but early in dwarf2out_finish double checks if PCH hasn't changed it and if it did, updates it back to the expected string. 2011-07-14 Jakub Jelinek PR other/32998 * common.opt (grecord-gcc-switches, gno-record-gcc-switches): New options. * dwarf2out.c: Include opts.h. (dchar_p): New typedef. Define heap VEC for it. (producer_string): New variable. (gen_producer_string): New function. (gen_compile_unit_die): Use it. (dwarf2out_finish): Fix up comp_unit_die () DW_AT_producer if needed. * Makefile.in (dwarf2out.o): Depend on $(OPTS_H). --- gcc/common.opt.jj 2011-07-13 17:31:09.0 +0200 +++ gcc/common.opt 2011-07-14 10:36:40.0 +0200 @@ -2184,6 +2184,14 @@ ggdb Common JoinedOrMissing Generate debug information in default extended format +gno-record-gcc-switches +Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0) +Don't record gcc command line switches in DWARF DW_AT_producer. + +grecord-gcc-switches +Common RejectNegative Var(dwarf_record_gcc_switches,1) +Record gcc command line switches in DWARF DW_AT_producer. + gstabs Common JoinedOrMissing Negative(gstabs+) Generate debug information in STABS format --- gcc/dwarf2out.c.jj 2011-07-13 17:31:18.0 +0200 +++ gcc/dwarf2out.c 2011-07-14 11:04:50.0 +0200 @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3. #include "tree-pass.h" #include "tree-flow.h" #include "cfglayout.h" +#include "opts.h" static void dwarf2out_source_line (unsigned int, const char *, int, bool); static rtx last_var_location_insn; @@ -18108,13 +18109,125 @@ gen_ptr_to_mbr_type_die (tree type, dw_d add_type_attribute (ptr_die, TREE_TYPE (type), 0, 0, context_die); } +typedef const char *dchar_p; /* For DEF_VEC_P. */ +DEF_VEC_P(dchar_p); +DEF_VEC_ALLOC_P(dchar_p,heap); + +static char *producer_string; + +/* Return a heap allocated producer string including command line options + if -grecord-gcc-switches. */ + +static char * +gen_producer_string (void) +{ + size_t j; + VEC(dchar_p, heap) *switches = NULL; + const char *language_string = lang_hooks.name; + char *producer, *tail; + const char *p; + size_t len = dwarf_record_gcc_switches ? 0 : 3; + size_t plen = strlen (language_string) + 1 + strlen (version_string); + + for (j = 1; dwarf_record_gcc_switches && j < save_decoded_options_count; j++) +switch (save_decoded_options[j].opt_index) + { + case OPT_o: + case OPT_d: + case OPT_dumpbase: + case OPT_dumpdir: + case OPT_auxbase: + case OPT_auxbase_strip: + case OPT_quiet: + case OPT_version: + case OPT_v: + case OPT_w: + case OPT_L: + case OPT_D: + case OPT_I: + case OPT_U: + case OPT_SPECIAL_unknown: + case OPT_SPECIAL_ignore: + case OPT_SPECIAL_program_name: + case OPT_SPECIAL_input_file: + case OPT_grecord_gcc_switches: + case OPT_gno_record_gcc_switches: + case OPT__output_pch_: + case OPT_fdiagnostics_show_location_: + case OPT_fdiagnostics_show_option: + case OPT: + case OPT__sysroot_: + /* Ignore these. */ + continue; + default: + if (save_decoded_options[j].orig_option_with_args_text[0] != '-') + continue; + switch (save_decoded_options[j].orig_option_with_args_text[1]) + { + case 'M': + case 'i': + case 'W': + continue; + case 'n': + if (save_decoded_options[j].orig_option_with_args_text[2] == 'o') + continue; + break; + case 'f': + if (strncmp (save_decoded_options[j].orig_option_with_args_te
Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions
On Wed, Jul 13, 2011 at 11:49 PM, Steven Bosscher wrote: > On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther > wrote: >> On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel >> wrote: >>> Hi, >>> >>> the widening_mul pass might increase the number of multiplications in >>> the code by transforming >>> >>> a = b * c >>> d = a + 2 >>> e = a + 3 >>> >>> into: >>> >>> d = b * c + 2 >>> e = b * c + 3 >>> >>> under the assumption that an FMA instruction is not more expensive >>> than a simple add. This certainly isn't always true. While e.g. on >>> s390 an fma is indeed not slower than an add execution-wise it has >>> disadvantages regarding instruction grouping. It doesn't group with >>> any other instruction what has a major impact on the instruction >>> dispatch bandwidth. >>> >>> The following patch tries to figure out the costs for adds, mults and >>> fmas by building an RTX and asking the backends cost function in order >>> to estimate whether it is whorthwhile doing the transformation. >>> >>> With that patch the 436.cactus hotloop contains 28 less >>> multiplications than before increasing performance slightly (~2%). >>> >>> Bootstrapped and regtested on x86_64 and s390x. >> >> Ick ;) > > +1 > >> Maybe this is finally the time to introduce target hook(s) to >> get us back costs for trees? For this case we'd need two >> actually, or just one - dependent on what finegrained information >> we pass. Choices: >> >> tree_code_cost (enum tree_code) >> tree_code_cost (enum tree_code, enum machine_mode mode) >> unary_cost (enum tree_code, tree actual_arg0) // args will be mostly >> SSA names or constants, but at least they are typed - works for >> mixed-typed operations >> binary_cost (...) >> ... >> unary_cost (enum tree_code, enum tree_code arg0_kind) // constant >> vs. non-constant arg, but lacks type/mode > > Or maybe add a cost function for all named insns (i.e. > http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)? > I think that any form of lower GIMPLE will not be so low level that > more combinations will exist than the available named patterns. It > should be possible to write a gen* tool using rtx_costs to compute > some useful cost metric for all named patterns. How complicated that > could be (modes, reg vs. mem, etc.), I don't know... But at least that > way we don't end up with multiple target costs depending on the IR in > use. Yeah, it occured to me as well that when we look for supportable operations via optabs the same mechanism should also be able to provide a cost, maybe as simple as attaching one to the named expanders. Generating RTL from GIMPLE passes just to be able to use rtx_cost is, well ... gross. Yes, we do it in IVOPTs (and that case is even more complex), but I don't think we want to start doing it elsewhere (look how the vectorizer for example uses new target hooks instead of generating vectorized RTL and then using rtx_cost). Richard. > Ciao! > Steven >
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther wrote: > 2011/7/14 Michael Meissner : >> One minor note, you will need to update doc/invoke.texi to document the new >> switch before checkin: -ftree-reassoc-width= > > You also need approval and somebody to review the patch before checkin. Btw, rather than a new target hook and an option I suggest to use a --param which default you can modify in the backend. Richard. > Richard. > >> -- >> Michael Meissner, IBM >> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA >> meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 >> >
Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement
2011/7/14 Michael Meissner : > One minor note, you will need to update doc/invoke.texi to document the new > switch before checkin: -ftree-reassoc-width= You also need approval and somebody to review the patch before checkin. Richard. > -- > Michael Meissner, IBM > 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA > meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899 >