date:20110714

RE: CR16 Port addition

2011-07-14 Thread Sumanth Gundapaneni

PING 3: For review
Hi,
Please review the attached patch and you can view the 
explanations for the earlier communication below.

NOTE: From now on , Jayant (jayant.so...@kpitcummins.com) will be 
posting the patches related to CR16. Please feel free to contact him 
if you need any information related to the patches posted. 

Thanks in advance,
Sumanth G,
PUNE (India).

= Begin Message ==
-Original Message-
From: Sumanth Gundapaneni
Sent: Monday, May 30, 2011 6:57 PM
To: 'Joseph Myers'
Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jayant R. Sonar
Subject: RE: CR16 Port addition

Hi Joseph and Richard,

Thanks for your time in reviewing the CR16 port and once again I am grateful 
for your valuable suggestions.

Please find attached the patch "cr16-gcc.patch" which contains modifications as 
suggested by Joseph in his previous mail.

For your kind information, I am providing the recent posts regarding
CR16 patches.
Initial post : http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01676.html
Second post  : http://gcc.gnu.org/ml/gcc-patches/2011-01/msg00803.html  
Third post   : http://gcc.gnu.org/ml/gcc-patches/2011-04/msg00624.html
Fourth post  : http://gcc.gnu.org/ml/gcc-patches/2011-05/msg01441.html 

Please review the patch and let me know if there should be any modifications in 
it.

Joseph, please go through my explanation for your comments.

>Please remove the remaining top level configure changes from the patch.

The top level configure changes have been removed as advised.


>Now you seem to have a stray change to the generated file
>libstdc++-v3/configure. Again, you can probably just eliminate this

The configure changes in libtsdc++-v3 are removed too and you can find 
the same in the updated patch.


>and make the code in unwind-dw2-* use those macros, 
>instead of having separate copies of the files.

A new file unwind-helper.h file is created in libgcc/config/cr16 directory 
as per your suggestion and defined a few macros in this newly defined file 
which are getting called from gcc/unwind-dw2.c. Please review the same 
and do let me know for further modifications, if any. We have identified 
issues related to exception handling using prints in unwind code and 
will debug the same to a greater extent in near future with further 
development of GNU based tools.


>Since you first submitted the port, a file contrib/config-list.mk

"cr16-elf" is added to config-list.mk file.


gcc/ChangeLog:
--
2011-05-28  Sumanth G 
Jayant R Sonar 

* config.gcc: Add cr16-* support.

* doc/extend.texi: Document cr16 extensions.
* doc/install.texi: Document cr16 install.
* doc/invoke.texi: Document cr16 options.
* doc/md.texi: Document cr16 constraints.

* config/cr16/cr16.c: New file.
* config/cr16/cr16.h: New file.
* config/cr16/cr16.md: New file.
* config/cr16/cr16.opt: New file.
* config/cr16/cr16-libgcc.s: New file.
* config/cr16/cr16-protos.h: New file.

* config/cr16/divmodhi3.c: New file.
* config/cr16/predicates.md: New file.
* config/cr16/constraints.md: New file.
* config/cr16/t-cr16: New file.

libgcc/ChangeLog:
-
2011-05-28  Sumanth G 
Jayant R Sonar 

* config.host: Add National Semiconductor CR16 target (cr16-*-*).
* config/cr16/crti.S: New file.
* config/cr16/crtlibid.S: New file.
* config/cr16/crtn.S: New file.
* config/cr16/t-cr16: New file.
* config/cr16/unwind-helper.h: New file.

contrib/ChangeLog:
--
2011-05-28  Sumanth G 
Jayant R Sonar 

* config-list.mk: Add National Semiconductor CR16 target.
   
Thanks in advance,
Sumanth G,
PUNE (India).

= End Message ==

[committed] Fix ada build on HP-UX 10.20

2011-07-14 Thread John David Anglin

The patch below fixes ada bootstrap on HP-UX 10.  Patch preapproved
by Eric.  Tested on HP-UX 10.20.

Dave
-- 
J. David Anglin  dave.ang...@nrc-cnrc.gc.ca
National Research Council of Canada  (613) 990-0752 (FAX: 952-6602)

2011-07-14  John David Anglin  

PR ada/46350
* s-taprop-hpux-dce.adb (Abort_Task): Remove unnecessary cast.

Index: s-taprop-hpux-dce.adb
===
--- s-taprop-hpux-dce.adb   (revision 175188)
+++ s-taprop-hpux-dce.adb   (working copy)
@@ -888,8 +888,7 @@
 
   if T.Common.State = Interrupt_Server_Blocked_On_Event_Flag then
  System.Interrupt_Management.Operations.Interrupt_Self_Process
-   (System.Interrupt_Management.Interrupt_ID
- (PIO.Get_Interrupt_ID (T)));
+   (PIO.Get_Interrupt_ID (T));
   end if;
end Abort_Task;

[PATCH 5/9] dwarf2cfi: Remove dw_cfi_row_ref typedef.

2011-07-14 Thread Richard Henderson

---
 gcc/dwarf2cfi.c |   23 +++
 1 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 745e137..51fb824 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -74,14 +74,13 @@ typedef struct GTY(()) dw_cfi_row_struct
   HOST_WIDE_INT args_size;
 } dw_cfi_row;
 
-typedef dw_cfi_row *dw_cfi_row_ref;
 
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
 
 /* The state of the first row of the FDE table, which includes the
state provided by the CIE.  */
-static GTY(()) dw_cfi_row_ref cie_cfi_row;
+static GTY(()) dw_cfi_row *cie_cfi_row;
 
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
@@ -209,10 +208,10 @@ new_cfi (void)
 
 /* Return a newly allocated CFI row, with no defined data.  */
 
-static dw_cfi_row_ref
+static dw_cfi_row *
 new_cfi_row (void)
 {
-  dw_cfi_row_ref row = ggc_alloc_cleared_dw_cfi_row ();
+  dw_cfi_row *row = ggc_alloc_cleared_dw_cfi_row ();
 
   row->cfa.reg = INVALID_REGNUM;
 
@@ -221,10 +220,10 @@ new_cfi_row (void)
 
 /* Return a copy of an existing CFI row.  */
 
-static dw_cfi_row_ref
-copy_cfi_row (dw_cfi_row_ref src)
+static dw_cfi_row *
+copy_cfi_row (dw_cfi_row *src)
 {
-  dw_cfi_row_ref dst = ggc_alloc_dw_cfi_row ();
+  dw_cfi_row *dst = ggc_alloc_dw_cfi_row ();
 
   *dst = *src;
   dst->reg_save = VEC_copy (dw_cfi_ref, gc, src->reg_save);
@@ -235,7 +234,7 @@ copy_cfi_row (dw_cfi_row_ref src)
 /* Free an allocated CFI row.  */
 
 static void
-free_cfi_row (dw_cfi_row_ref row)
+free_cfi_row (dw_cfi_row *row)
 {
   if (row != NULL)
 {
@@ -311,7 +310,7 @@ add_cfi_restore (unsigned reg)
that the register column is no longer saved.  */
 
 static void
-update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi)
+update_row_reg_save (dw_cfi_row *row, unsigned column, dw_cfi_ref cfi)
 {
   if (VEC_length (dw_cfi_ref, row->reg_save) <= column)
 VEC_safe_grow_cleared (dw_cfi_ref, gc, row->reg_save, column + 1);
@@ -466,10 +465,10 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_location *loc, 
dw_cfa_location *remember)
 }
 
 /* The current, i.e. most recently generated, row of the CFI table.  */
-static dw_cfi_row_ref cur_row;
+static dw_cfi_row *cur_row;
 
 /* The row state from a preceeding DW_CFA_remember_state.  */
-static dw_cfi_row_ref remember_row;
+static dw_cfi_row *remember_row;
 
 /* The register used for saving registers to the stack, and its offset
from the CFA.  */
@@ -2316,7 +2315,7 @@ dwarf2out_frame_debug (rtx insn, bool after_p)
 /* Emit CFI info to change the state from OLD_ROW to NEW_ROW.  */
 
 static void
-change_cfi_row (dw_cfi_row_ref old_row, dw_cfi_row_ref new_row)
+change_cfi_row (dw_cfi_row *old_row, dw_cfi_row *new_row)
 {
   size_t i, n_old, n_new, n_max;
   dw_cfi_ref cfi;
-- 
1.7.6

[PATCH 9/9] dwarf2cfi: Generate and connect traces.

2011-07-14 Thread Richard Henderson

This kinda-sorta corresponds to Bernd's 007-dw2cfi patch.  Certainly
the same concepts of splitting the instruction stream into extended
basic blocks is the same.  This patch does a bit better job with the
documentation.  Also, I'm a bit more explicit about matching things
up with the similar code from the regular CFG routines.

What's missing at this point is any attempt to use DW_CFA_remember_state.
I've deferred that for the moment because it's easy to test the state
change code across epilogues, whereas the shrink-wrapping code is not
in this tree and section switching is difficult to force.
---
 gcc/dwarf2cfi.c |  792 ---
 1 files changed, 398 insertions(+), 394 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 9aa1208..10e5740 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "flags.h"
 #include "rtl.h"
 #include "function.h"
+#include "basic-block.h"
 #include "dwarf2.h"
 #include "dwarf2out.h"
 #include "dwarf2asm.h"
@@ -86,34 +87,30 @@ DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
 /* Since we no longer have a proper CFG, we're going to create a facsimile
of one on the fly while processing the frame-related insns.
 
-   We create dw_trace structures for each instruction trace beginning at
-   at a label following a barrier (or beginning of the function), and
-   ending at a barrier (or the end of the function).
+   We create dw_trace_info structures for each extended basic block beginning
+   and ending at a "save point".  Save points are labels, barriers, certain
+   notes, and of course the beginning and end of the function.
 
As we encounter control transfer insns, we propagate the "current"
-   row state across the edges to the starts of traces.  If an edge goes
-   to a label that is not the start of a trace, we ignore it.  This
-   assumes that previous compiler transformations were correct, and that
-   we will reach the same row state from any source.  (We can perform some
-   limited validation of this assumption, but without the full CFG we
-   cannot be sure of full validation coverage.  It is expensive, so we
-   only do so with checking enabled.)
+   row state across the edges to the starts of traces.  When checking is
+   enabled, we validate that we propagate the same data from all sources.
 
All traces are members of the TRACE_INFO array, in the order in which
they appear in the instruction stream.
 
-   All labels are given an LUID that indexes the LABEL_INFO array.  If
-   the label is the start of a trace, the TRACE pointer will be non-NULL
-   and point into the TRACE_INFO array.  */
+   All save points are present in the TRACE_INDEX hash, mapping the insn
+   starting a trace to the dw_trace_info describing the trace.  */
 
 typedef struct
 {
-  /* The label that begins the trace.  This will be NULL for the first
- trace beginning at function entry.  */
-  rtx label;
+  /* The insn that begins the trace.  */
+  rtx head;
 
   /* The row state at the beginning and end of the trace.  */
-  dw_cfi_row *enter_row, *exit_row;
+  dw_cfi_row *beg_row, *end_row;
+
+  /* True if this trace immediately follows NOTE_INSN_SWITCH_TEXT_SECTIONS.  */
+  bool switch_sections;
 
   /* The following variables contain data used in interpreting frame related
  expressions.  These are not part of the "real" row state as defined by
@@ -147,24 +144,15 @@ typedef struct
 DEF_VEC_O (dw_trace_info);
 DEF_VEC_ALLOC_O (dw_trace_info, heap);
 
-typedef struct
-{
-  dw_trace_info *trace;
-
-#ifdef ENABLE_CHECKING
-  dw_cfi_row *check_row;
-#endif
-} dw_label_info;
+typedef dw_trace_info *dw_trace_info_ref;
 
-DEF_VEC_O (dw_label_info);
-DEF_VEC_ALLOC_O (dw_label_info, heap);
+DEF_VEC_P (dw_trace_info_ref);
+DEF_VEC_ALLOC_P (dw_trace_info_ref, heap);
 
 /* The variables making up the pseudo-cfg, as described above.  */
-#if 0
-static VEC (int, heap) *uid_luid;
-static VEC (dw_label_info, heap) *label_info;
 static VEC (dw_trace_info, heap) *trace_info;
-#endif
+static VEC (dw_trace_info_ref, heap) *trace_work_list;
+static htab_t trace_index;
 
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
@@ -189,9 +177,6 @@ static dw_trace_info *cur_trace;
 /* The current, i.e. most recently generated, row of the CFI table.  */
 static dw_cfi_row *cur_row;
 
-/* The row state from a preceeding DW_CFA_remember_state.  */
-static dw_cfi_row *remember_row;
-
 /* We delay emitting a register save until either (a) we reach the end
of the prologue or (b) the register is clobbered.  This clusters
register saves so that there are fewer pc advances.  */
@@ -211,9 +196,6 @@ static VEC(queued_reg_save, heap) *queued_reg_saves;
emitting this data, i.e. updating CUR_ROW, without async unwind.  */
 static HOST_WIDE_INT queued_args_size;
 
-/* True if remember_state should be emitted before following CFI directive.  */
-static bool

[PATCH 8/9] dwarf2cfi: Introduce dw_trace_info.

2011-07-14 Thread Richard Henderson

This patch only introduces the structure definition and adjusts
the existing routines to use the new cur_trace global to access
the variables that were moved into the structure.
---
 gcc/dwarf2cfi.c |  440 +--
 1 files changed, 266 insertions(+), 174 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 1d6413f..9aa1208 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -74,7 +74,98 @@ typedef struct GTY(()) dw_cfi_row_struct
   HOST_WIDE_INT args_size;
 } dw_cfi_row;
 
-
+/* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
+typedef struct GTY(()) reg_saved_in_data_struct {
+  rtx orig_reg;
+  rtx saved_in_reg;
+} reg_saved_in_data;
+
+DEF_VEC_O (reg_saved_in_data);
+DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
+
+/* Since we no longer have a proper CFG, we're going to create a facsimile
+   of one on the fly while processing the frame-related insns.
+
+   We create dw_trace structures for each instruction trace beginning at
+   at a label following a barrier (or beginning of the function), and
+   ending at a barrier (or the end of the function).
+
+   As we encounter control transfer insns, we propagate the "current"
+   row state across the edges to the starts of traces.  If an edge goes
+   to a label that is not the start of a trace, we ignore it.  This
+   assumes that previous compiler transformations were correct, and that
+   we will reach the same row state from any source.  (We can perform some
+   limited validation of this assumption, but without the full CFG we
+   cannot be sure of full validation coverage.  It is expensive, so we
+   only do so with checking enabled.)
+
+   All traces are members of the TRACE_INFO array, in the order in which
+   they appear in the instruction stream.
+
+   All labels are given an LUID that indexes the LABEL_INFO array.  If
+   the label is the start of a trace, the TRACE pointer will be non-NULL
+   and point into the TRACE_INFO array.  */
+
+typedef struct
+{
+  /* The label that begins the trace.  This will be NULL for the first
+ trace beginning at function entry.  */
+  rtx label;
+
+  /* The row state at the beginning and end of the trace.  */
+  dw_cfi_row *enter_row, *exit_row;
+
+  /* The following variables contain data used in interpreting frame related
+ expressions.  These are not part of the "real" row state as defined by
+ Dwarf, but it seems like they need to be propagated into a trace in case
+ frame related expressions have been sunk.  */
+  /* ??? This seems fragile.  These variables are fragments of a larger
+ expression.  If we do not keep the entire expression together, we risk
+ not being able to put it together properly.  Consider forcing targets
+ to generate self-contained expressions and dropping all of the magic
+ interpretation code in this file.  Or at least refusing to shrink wrap
+ any frame related insn that doesn't contain a complete expression.  */
+
+  /* The register used for saving registers to the stack, and its offset
+ from the CFA.  */
+  dw_cfa_location cfa_store;
+
+  /* A temporary register holding an integral value used in adjusting SP
+ or setting up the store_reg.  The "offset" field holds the integer
+ value, not an offset.  */
+  dw_cfa_location cfa_temp;
+
+  /* A set of registers saved in other registers.  This is the inverse of
+ the row->reg_save info, if the entry is a DW_CFA_register.  This is
+ implemented as a flat array because it normally contains zero or 1
+ entry, depending on the target.  IA-64 is the big spender here, using
+ a maximum of 5 entries.  */
+  VEC(reg_saved_in_data, heap) *regs_saved_in_regs;
+
+} dw_trace_info;
+
+DEF_VEC_O (dw_trace_info);
+DEF_VEC_ALLOC_O (dw_trace_info, heap);
+
+typedef struct
+{
+  dw_trace_info *trace;
+
+#ifdef ENABLE_CHECKING
+  dw_cfi_row *check_row;
+#endif
+} dw_label_info;
+
+DEF_VEC_O (dw_label_info);
+DEF_VEC_ALLOC_O (dw_label_info, heap);
+
+/* The variables making up the pseudo-cfg, as described above.  */
+#if 0
+static VEC (int, heap) *uid_luid;
+static VEC (dw_label_info, heap) *label_info;
+static VEC (dw_trace_info, heap) *trace_info;
+#endif
+
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
 
@@ -82,6 +173,8 @@ cfi_vec cie_cfi_vec;
state provided by the CIE.  */
 static GTY(()) dw_cfi_row *cie_cfi_row;
 
+static GTY(()) reg_saved_in_data *cie_return_save;
+
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
 /* The insn after which a new CFI note should be emitted.  */
@@ -90,6 +183,34 @@ static rtx add_cfi_insn;
 /* When non-null, add_cfi will add the CFI to this vector.  */
 static cfi_vec *add_cfi_vec;
 
+/* The current instruction trace.  */
+static dw_trace_info *cur_trace;
+
+/* The current, i.e. most recently generated, row of the CFI table.  */
+static dw_cfi_row *cur_row;
+
+/* The row state from a preceeding DW_CFA_remember_state.  */
+static dw_cfi_row *remember_row;

[RFC PATCH 0/9] CFG aware dwarf2 cfi generation

2011-07-14 Thread Richard Henderson

This finally brings us to something that can support shrink-wrapping.
As mentioned in the description of the last patch, this is 95% of 
what Bernd had in his last 007-dw2cfg patch, except for the remember/
restore_state stuff.  And hopefully with better comments.

This is the first version of this that has actually made it into
stage3 bootstrap on x86_64, so it isn't well tested yet.  This just
happens to coincide with the end of my work day, and it's been a while
since I've shared state, so I thought I'd post for overnight review.

The complete tree is available at

  git://repo.or.cz/gcc/rth.git rth/cfi-pass


r~


Richard Henderson (9):
  dwarf2cfi: Introduce a dw_cfi_row state.
  dwarf2cfi: Rename cfi_insn to add_cfi_insn.
  dwarf2cfi: Populate CUR_ROW->REG_SAVE.
  dwarf2cfi: Implement change_cfi_row.
  dwarf2cfi: Remove dw_cfi_row_ref typedef.
  dwarf2cfi: Convert queued_reg_save to a VEC.
  dwarf2cfi: Allocate reg_saved_in_data in the heap.
  dwarf2cfi: Introduce dw_trace_info.
  dwarf2cfi: Generate and connect traces.

 gcc/dwarf2cfi.c | 1672 ++-
 gcc/dwarf2out.c |  159 --
 gcc/dwarf2out.h |5 +-
 3 files changed, 1030 insertions(+), 806 deletions(-)

-- 
1.7.6

[PATCH 4/9] dwarf2cfi: Implement change_cfi_row.

2011-07-14 Thread Richard Henderson

Add a generic function to adjust cfi state from one row to another.
Use this to implement text section switching.  This will also be
usable for arbitrary changes around a cfg for shrink-wrapping.
---
 gcc/dwarf2cfi.c |  376 +--
 gcc/dwarf2out.c |  159 ++--
 gcc/dwarf2out.h |3 +-
 3 files changed, 290 insertions(+), 248 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 36fa7f8..745e137 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -285,6 +285,28 @@ add_cfi (dw_cfi_ref cfi)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
 
+static void
+add_cfi_args_size (HOST_WIDE_INT size)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = DW_CFA_GNU_args_size;
+  cfi->dw_cfi_oprnd1.dw_cfi_offset = size;
+
+  add_cfi (cfi);
+}
+
+static void
+add_cfi_restore (unsigned reg)
+{
+  dw_cfi_ref cfi = new_cfi ();
+
+  cfi->dw_cfi_opc = (reg & ~0x3f ? DW_CFA_restore_extended : DW_CFA_restore);
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+
+  add_cfi (cfi);
+}
+
 /* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
that the register column is no longer saved.  */
 
@@ -474,64 +496,109 @@ cfa_equal_p (const dw_cfa_location *loc1, const 
dw_cfa_location *loc2)
  || loc1->base_offset == loc2->base_offset));
 }
 
-/* This routine does the actual work.  The CFA is now calculated from
-   the dw_cfa_location structure.  */
+/* Determine if two CFI operands are identical.  */
 
-static void
-def_cfa_1 (dw_cfa_location *loc_p)
+static bool
+cfi_oprnd_equal_p (enum dw_cfi_oprnd_type t, dw_cfi_oprnd *a, dw_cfi_oprnd *b)
 {
-  dw_cfi_ref cfi;
-  dw_cfa_location loc = *loc_p;
+  switch (t)
+{
+case dw_cfi_oprnd_unused:
+  return true;
+case dw_cfi_oprnd_reg_num:
+  return a->dw_cfi_reg_num == b->dw_cfi_reg_num;
+case dw_cfi_oprnd_offset:
+  return a->dw_cfi_offset == b->dw_cfi_offset;
+case dw_cfi_oprnd_addr:
+  return (a->dw_cfi_addr == b->dw_cfi_addr
+ || strcmp (a->dw_cfi_addr, b->dw_cfi_addr) == 0);
+case dw_cfi_oprnd_loc:
+  return loc_descr_equal_p (a->dw_cfi_loc, b->dw_cfi_loc);
+}
+  gcc_unreachable ();
+}
 
-  if (cfa_store.reg == loc.reg && loc.indirect == 0)
-cfa_store.offset = loc.offset;
+/* Determine if two CFI entries are identical.  */
+
+static bool
+cfi_equal_p (dw_cfi_ref a, dw_cfi_ref b)
+{
+  enum dwarf_call_frame_info opc;
+
+  /* Make things easier for our callers, including missing operands.  */
+  if (a == b)
+return true;
+  if (a == NULL || b == NULL)
+return false;
+
+  /* Obviously, the opcodes must match.  */
+  opc = a->dw_cfi_opc;
+  if (opc != b->dw_cfi_opc)
+return false;
+
+  /* Compare the two operands, re-using the type of the operands as
+ already exposed elsewhere.  */
+  return (cfi_oprnd_equal_p (dw_cfi_oprnd1_desc (opc),
+&a->dw_cfi_oprnd1, &b->dw_cfi_oprnd1)
+ && cfi_oprnd_equal_p (dw_cfi_oprnd2_desc (opc),
+   &a->dw_cfi_oprnd2, &b->dw_cfi_oprnd2));
+}
+
+/* The CFA is now calculated from NEW_CFA.  Consider OLD_CFA in determining
+   what opcode to emit.  Returns the CFI opcode to effect the change, or
+   NULL if NEW_CFA == OLD_CFA.  */
+
+static dw_cfi_ref
+def_cfa_0 (dw_cfa_location *old_cfa, dw_cfa_location *new_cfa)
+{
+  dw_cfi_ref cfi;
 
   /* If nothing changed, no need to issue any call frame instructions.  */
-  if (cfa_equal_p (&loc, &cur_row->cfa))
-return;
+  if (cfa_equal_p (old_cfa, new_cfa))
+return NULL;
 
   cfi = new_cfi ();
 
-  if (loc.reg == cur_row->cfa.reg && !loc.indirect && !cur_row->cfa.indirect)
+  if (new_cfa->reg == old_cfa->reg && !new_cfa->indirect && !old_cfa->indirect)
 {
   /* Construct a "DW_CFA_def_cfa_offset " instruction, indicating
 the CFA register did not change but the offset did.  The data
 factoring for DW_CFA_def_cfa_offset_sf happens in output_cfi, or
 in the assembler via the .cfi_def_cfa_offset directive.  */
-  if (loc.offset < 0)
+  if (new_cfa->offset < 0)
cfi->dw_cfi_opc = DW_CFA_def_cfa_offset_sf;
   else
cfi->dw_cfi_opc = DW_CFA_def_cfa_offset;
-  cfi->dw_cfi_oprnd1.dw_cfi_offset = loc.offset;
+  cfi->dw_cfi_oprnd1.dw_cfi_offset = new_cfa->offset;
 }
 
 #ifndef MIPS_DEBUGGING_INFO  /* SGI dbx thinks this means no offset.  */
-  else if (loc.offset == cur_row->cfa.offset
-  && cur_row->cfa.reg != INVALID_REGNUM
-  && !loc.indirect
-  && !cur_row->cfa.indirect)
+  else if (new_cfa->offset == old_cfa->offset
+  && old_cfa->reg != INVALID_REGNUM
+  && !new_cfa->indirect
+  && !old_cfa->indirect)
 {
   /* Construct a "DW_CFA_def_cfa_register " instruction,
 indicating the CFA register has changed to  but the
 offset has not changed.  */
   cfi->dw_cfi_opc = DW_CFA_def_cfa_register;
-  cfi->dw_cfi_oprnd1.

[PATCH 1/9] dwarf2cfi: Introduce a dw_cfi_row state.

2011-07-14 Thread Richard Henderson

Use it instead of old_cfa, old_args_size, and cfa_remember variables.

Remove the global cfa variable, as it was usually a duplicate of
old_cfa and otherwise confusing.  Always make a local copy of the
cur_row->cfa variable before modification instead.
---
 gcc/dwarf2cfi.c |  208 +++
 gcc/dwarf2out.h |2 +-
 2 files changed, 135 insertions(+), 75 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 4e648ae..1c1b74f 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -58,9 +58,31 @@ along with GCC; see the file COPYING3.  If not see
 /* Maximum size (in bytes) of an artificially generated label.  */
 #define MAX_ARTIFICIAL_LABEL_BYTES 30
 
+/* A collected description of an entire row of the abstract CFI table.  */
+typedef struct GTY(()) dw_cfi_row_struct
+{
+  /* The expression that computes the CFA, expressed in two different ways.
+ The CFA member for the simple cases, and the full CFI expression for
+ the complex cases.  The later will be a DW_CFA_cfa_expression.  */
+  dw_cfa_location cfa;
+  dw_cfi_ref cfa_cfi;
+
+  /* The expressions for any register column that is saved.  */
+  cfi_vec reg_save;
+
+  /* The value of any DW_CFA_GNU_args_size.  */
+  HOST_WIDE_INT args_size;
+} dw_cfi_row;
+
+typedef dw_cfi_row *dw_cfi_row_ref;
+
 /* A vector of call frame insns for the CIE.  */
 cfi_vec cie_cfi_vec;
 
+/* The state of the first row of the FDE table, which includes the
+   state provided by the CIE.  */
+static GTY(()) dw_cfi_row_ref cie_cfi_row;
+
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
 /* The insn after which a new CFI note should be emitted.  */
@@ -185,6 +207,43 @@ new_cfi (void)
   return cfi;
 }
 
+/* Return a newly allocated CFI row, with no defined data.  */
+
+static dw_cfi_row_ref
+new_cfi_row (void)
+{
+  dw_cfi_row_ref row = ggc_alloc_cleared_dw_cfi_row ();
+
+  row->cfa.reg = INVALID_REGNUM;
+
+  return row;
+}
+
+/* Return a copy of an existing CFI row.  */
+
+static dw_cfi_row_ref
+copy_cfi_row (dw_cfi_row_ref src)
+{
+  dw_cfi_row_ref dst = ggc_alloc_dw_cfi_row ();
+
+  *dst = *src;
+  dst->reg_save = VEC_copy (dw_cfi_ref, gc, src->reg_save);
+
+  return dst;
+}
+
+/* Free an allocated CFI row.  */
+
+static void
+free_cfi_row (dw_cfi_row_ref row)
+{
+  if (row != NULL)
+{
+  VEC_free (dw_cfi_ref, gc, row->reg_save);
+  ggc_free (row);
+}
+}
+
 /* Generate a new label for the CFI info to refer to.  */
 
 static char *
@@ -371,28 +430,25 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_location *loc, 
dw_cfa_location *remember)
 }
 }
 
-/* The current rule for calculating the DWARF2 canonical frame address.  */
-static dw_cfa_location cfa;
+/* The current, i.e. most recently generated, row of the CFI table.  */
+static dw_cfi_row_ref cur_row;
 
-/* A copy of the CFA, for comparison purposes.  */
-static dw_cfa_location old_cfa;
+/* The row state from a preceeding DW_CFA_remember_state.  */
+static dw_cfi_row_ref remember_row;
 
 /* The register used for saving registers to the stack, and its offset
from the CFA.  */
 static dw_cfa_location cfa_store;
 
-/* The current save location around an epilogue.  */
-static dw_cfa_location cfa_remember;
-
-/* Like cfa_remember, but a copy of old_cfa.  */
-static dw_cfa_location old_cfa_remember;
+/* A temporary register holding an integral value used in adjusting SP
+   or setting up the store_reg.  The "offset" field holds the integer
+   value, not an offset.  */
+static dw_cfa_location cfa_temp;
 
-/* The running total of the size of arguments pushed onto the stack.  */
+/* The (really) current value for DW_CFA_GNU_args_size.  We delay actually
+   emitting this data, i.e. updating CUR_ROW, without async unwind.  */
 static HOST_WIDE_INT args_size;
 
-/* The last args_size we actually output.  */
-static HOST_WIDE_INT old_args_size;
-
 /* Determine if two dw_cfa_location structures define the same data.  */
 
 bool
@@ -412,21 +468,18 @@ static void
 def_cfa_1 (dw_cfa_location *loc_p)
 {
   dw_cfi_ref cfi;
-  dw_cfa_location loc;
-
-  cfa = *loc_p;
-  loc = *loc_p;
+  dw_cfa_location loc = *loc_p;
 
   if (cfa_store.reg == loc.reg && loc.indirect == 0)
 cfa_store.offset = loc.offset;
 
   /* If nothing changed, no need to issue any call frame instructions.  */
-  if (cfa_equal_p (&loc, &old_cfa))
+  if (cfa_equal_p (&loc, &cur_row->cfa))
 return;
 
   cfi = new_cfi ();
 
-  if (loc.reg == old_cfa.reg && !loc.indirect && !old_cfa.indirect)
+  if (loc.reg == cur_row->cfa.reg && !loc.indirect && !cur_row->cfa.indirect)
 {
   /* Construct a "DW_CFA_def_cfa_offset " instruction, indicating
 the CFA register did not change but the offset did.  The data
@@ -440,10 +493,10 @@ def_cfa_1 (dw_cfa_location *loc_p)
 }
 
 #ifndef MIPS_DEBUGGING_INFO  /* SGI dbx thinks this means no offset.  */
-  else if (loc.offset == old_cfa.offset
-  && old_cfa.reg != INVALID_REGNUM
+  else if (loc.offset == cur_row->cfa.

[PATCH 6/9] dwarf2cfi: Convert queued_reg_save to a VEC.

2011-07-14 Thread Richard Henderson

Also, allocate it in the heap instead of garbage collected.
---
 gcc/dwarf2cfi.c |   51 ++-
 1 files changed, 26 insertions(+), 25 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 51fb824..a800cb4 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1091,14 +1091,16 @@ dwarf2out_notice_stack_adjust (rtx insn, bool after_p)
of the prologue or (b) the register is clobbered.  This clusters
register saves so that there are fewer pc advances.  */
 
-struct GTY(()) queued_reg_save {
-  struct queued_reg_save *next;
+typedef struct {
   rtx reg;
-  HOST_WIDE_INT cfa_offset;
   rtx saved_reg;
-};
+  HOST_WIDE_INT cfa_offset;
+} queued_reg_save;
 
-static GTY(()) struct queued_reg_save *queued_reg_saves;
+DEF_VEC_O (queued_reg_save);
+DEF_VEC_ALLOC_O (queued_reg_save, heap);
+
+static VEC(queued_reg_save, heap) *queued_reg_saves;
 
 /* The caller's ORIG_REG is saved in SAVED_IN_REG.  */
 typedef struct GTY(()) reg_saved_in_data {
@@ -1170,24 +1172,21 @@ record_reg_saved_in_reg (rtx dest, rtx src)
 static void
 queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t i;
 
   /* Duplicates waste space, but it's also necessary to remove them
  for correctness, since the queue gets output in reverse order.  */
-  for (q = queued_reg_saves; q != NULL; q = q->next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 if (compare_reg_or_pc (q->reg, reg))
-  break;
+  goto found;
 
-  if (q == NULL)
-{
-  q = ggc_alloc_queued_reg_save ();
-  q->next = queued_reg_saves;
-  queued_reg_saves = q;
-}
+  q = VEC_safe_push (queued_reg_save, heap, queued_reg_saves, NULL);
 
+ found:
   q->reg = reg;
-  q->cfa_offset = offset;
   q->saved_reg = sreg;
+  q->cfa_offset = offset;
 }
 
 /* Output all the entries in QUEUED_REG_SAVES.  */
@@ -1195,9 +1194,10 @@ queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
 static void
 dwarf2out_flush_queued_reg_saves (void)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t i;
 
-  for (q = queued_reg_saves; q; q = q->next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 {
   unsigned int reg, sreg;
 
@@ -1214,7 +1214,7 @@ dwarf2out_flush_queued_reg_saves (void)
   reg_save (reg, sreg, q->cfa_offset);
 }
 
-  queued_reg_saves = NULL;
+  VEC_truncate (queued_reg_save, queued_reg_saves, 0);
 }
 
 /* Does INSN clobber any register which QUEUED_REG_SAVES lists a saved
@@ -1225,17 +1225,18 @@ dwarf2out_flush_queued_reg_saves (void)
 static bool
 clobbers_queued_reg_save (const_rtx insn)
 {
-  struct queued_reg_save *q;
+  queued_reg_save *q;
+  size_t iq;
 
-  for (q = queued_reg_saves; q; q = q->next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, iq, q)
 {
-  size_t i;
+  size_t ir;
   reg_saved_in_data *rir;
 
   if (modified_in_p (q->reg, insn))
return true;
 
-  FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, i, rir)
+  FOR_EACH_VEC_ELT (reg_saved_in_data, regs_saved_in_regs, ir, rir)
if (compare_reg_or_pc (q->reg, rir->orig_reg)
&& modified_in_p (rir->saved_in_reg, insn))
  return true;
@@ -1250,11 +1251,11 @@ static rtx
 reg_saved_in (rtx reg)
 {
   unsigned int regn = REGNO (reg);
-  struct queued_reg_save *q;
+  queued_reg_save *q;
   reg_saved_in_data *rir;
   size_t i;
 
-  for (q = queued_reg_saves; q; q = q->next)
+  FOR_EACH_VEC_ELT (queued_reg_save, queued_reg_saves, i, q)
 if (q->saved_reg && regn == REGNO (q->saved_reg))
   return q->reg;
 
@@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void)
   XDELETEVEC (barrier_args_size);
   barrier_args_size = NULL;
   regs_saved_in_regs = NULL;
-  queued_reg_saves = NULL;
+  VEC_free (queued_reg_save, heap, queued_reg_saves);
 
   free_cfi_row (cur_row);
   cur_row = NULL;
-- 
1.7.6

[PATCH 7/9] dwarf2cfi: Allocate reg_saved_in_data in the heap.

2011-07-14 Thread Richard Henderson

---
 gcc/dwarf2cfi.c |   19 ++-
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index a800cb4..1d6413f 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -1109,13 +1109,13 @@ typedef struct GTY(()) reg_saved_in_data {
 } reg_saved_in_data;
 
 DEF_VEC_O (reg_saved_in_data);
-DEF_VEC_ALLOC_O (reg_saved_in_data, gc);
+DEF_VEC_ALLOC_O (reg_saved_in_data, heap);
 
 /* A set of registers saved in other registers.  This is implemented as
a flat array because it normally contains zero or 1 entry, depending
on the target.  IA-64 is the big spender here, using a maximum of
5 entries.  */
-static GTY(()) VEC(reg_saved_in_data, gc) *regs_saved_in_regs;
+static VEC(reg_saved_in_data, heap) *regs_saved_in_regs;
 
 static GTY(()) reg_saved_in_data *cie_return_save;
 
@@ -1161,7 +1161,7 @@ record_reg_saved_in_reg (rtx dest, rtx src)
   if (dest == NULL)
 return;
 
-  elt = VEC_safe_push(reg_saved_in_data, gc, regs_saved_in_regs, NULL);
+  elt = VEC_safe_push(reg_saved_in_data, heap, regs_saved_in_regs, NULL);
   elt->orig_reg = src;
   elt->saved_in_reg = dest;
 }
@@ -2699,6 +2699,9 @@ initial_return_save (rtx rtl)
 static unsigned int
 execute_dwarf2_frame (void)
 {
+  gcc_checking_assert (queued_reg_saves == NULL);
+  gcc_checking_assert (regs_saved_in_regs == NULL);
+
   /* The first time we're called, compute the incoming frame state.  */
   if (cie_cfi_vec == NULL)
 {
@@ -2737,7 +2740,7 @@ execute_dwarf2_frame (void)
  cie_return_save = ggc_alloc_reg_saved_in_data ();
  *cie_return_save = *VEC_index (reg_saved_in_data,
 regs_saved_in_regs, 0);
- regs_saved_in_regs = NULL;
+ VEC_pop (reg_saved_in_data, regs_saved_in_regs);
  break;
default:
  gcc_unreachable ();
@@ -2748,12 +2751,10 @@ execute_dwarf2_frame (void)
 }
 
   /* Set up state for generating call frame debug info.  */
-  gcc_checking_assert (queued_reg_saves == NULL);
-  gcc_checking_assert (regs_saved_in_regs == NULL);
-
   cur_row = copy_cfi_row (cie_cfi_row);
   if (cie_return_save)
-VEC_safe_push (reg_saved_in_data, gc, regs_saved_in_regs, cie_return_save);
+VEC_safe_push (reg_saved_in_data, heap,
+  regs_saved_in_regs, cie_return_save);
 
   cfa_store = cur_row->cfa;
   args_size = 0;
@@ -2770,7 +2771,7 @@ execute_dwarf2_frame (void)
   /* Reset all function-specific information, particularly for GC.  */
   XDELETEVEC (barrier_args_size);
   barrier_args_size = NULL;
-  regs_saved_in_regs = NULL;
+  VEC_free (reg_saved_in_data, heap, regs_saved_in_regs);
   VEC_free (queued_reg_save, heap, queued_reg_saves);
 
   free_cfi_row (cur_row);
-- 
1.7.6

[PATCH 2/9] dwarf2cfi: Rename cfi_insn to add_cfi_insn.

2011-07-14 Thread Richard Henderson

Make it consistent with add_cfi_vec.
---
 gcc/dwarf2cfi.c |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index 1c1b74f..eb59f28 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -86,7 +86,7 @@ static GTY(()) dw_cfi_row_ref cie_cfi_row;
 static GTY(()) unsigned long dwarf2out_cfi_label_num;
 
 /* The insn after which a new CFI note should be emitted.  */
-static rtx cfi_insn;
+static rtx add_cfi_insn;
 
 /* When non-null, add_cfi will add the CFI to this vector.  */
 static cfi_vec *add_cfi_vec;
@@ -274,11 +274,13 @@ add_cfi (dw_cfi_ref cfi)
 }
 
   any_cfis_emitted = true;
-  if (cfi_insn != NULL)
+
+  if (add_cfi_insn != NULL)
 {
-  cfi_insn = emit_note_after (NOTE_INSN_CFI, cfi_insn);
-  NOTE_CFI (cfi_insn) = cfi;
+  add_cfi_insn = emit_note_after (NOTE_INSN_CFI, add_cfi_insn);
+  NOTE_CFI (add_cfi_insn) = cfi;
 }
+
   if (add_cfi_vec != NULL)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
@@ -2319,7 +2321,7 @@ create_cfi_notes (void)
 {
   rtx pat;
 
-  cfi_insn = PREV_INSN (insn);
+  add_cfi_insn = PREV_INSN (insn);
 
   if (BARRIER_P (insn))
{
@@ -2342,7 +2344,7 @@ create_cfi_notes (void)
  break;
 
case NOTE_INSN_CFA_RESTORE_STATE:
- cfi_insn = insn;
+ add_cfi_insn = insn;
  dwarf2out_frame_debug_restore_state ();
  break;
}
@@ -2372,13 +2374,13 @@ create_cfi_notes (void)
 
   /* Do not separate tablejump insns from their ADDR_DIFF_VEC.
 Putting the note after the VEC should be ok.  */
-  if (!tablejump_p (insn, NULL, &cfi_insn))
-   cfi_insn = insn;
+  if (!tablejump_p (insn, NULL, &add_cfi_insn))
+   add_cfi_insn = insn;
 
   dwarf2out_frame_debug (insn, true);
 }
 
-  cfi_insn = NULL;
+  add_cfi_insn = NULL;
 }
 
 /* Determine if we need to save and restore CFI information around this
-- 
1.7.6

[PATCH 3/9] dwarf2cfi: Populate CUR_ROW->REG_SAVE.

2011-07-14 Thread Richard Henderson

To be actually used by a subsequent patch.
---
 gcc/dwarf2cfi.c |   27 +--
 1 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/gcc/dwarf2cfi.c b/gcc/dwarf2cfi.c
index eb59f28..36fa7f8 100644
--- a/gcc/dwarf2cfi.c
+++ b/gcc/dwarf2cfi.c
@@ -285,6 +285,17 @@ add_cfi (dw_cfi_ref cfi)
 VEC_safe_push (dw_cfi_ref, gc, *add_cfi_vec, cfi);
 }
 
+/* Perform ROW->REG_SAVE[COLUMN] = CFI.  CFI may be null, indicating
+   that the register column is no longer saved.  */
+
+static void
+update_row_reg_save (dw_cfi_row_ref row, unsigned column, dw_cfi_ref cfi)
+{
+  if (VEC_length (dw_cfi_ref, row->reg_save) <= column)
+VEC_safe_grow_cleared (dw_cfi_ref, gc, row->reg_save, column + 1);
+  VEC_replace (dw_cfi_ref, row->reg_save, column, cfi);
+}
+
 /* This function fills in aa dw_cfa_location structure from a dwarf location
descriptor sequence.  */
 
@@ -574,7 +585,13 @@ reg_save (unsigned int reg, unsigned int sreg, 
HOST_WIDE_INT offset)
   cfi->dw_cfi_oprnd2.dw_cfi_offset = offset;
 }
   else if (sreg == reg)
-cfi->dw_cfi_opc = DW_CFA_same_value;
+{
+  /* While we could emit something like DW_CFA_same_value or
+DW_CFA_restore, we never expect to see something like that
+in a prologue.  This is more likely to be a bug.  A backend
+can always bypass this by using REG_CFA_RESTORE directly.  */
+  gcc_unreachable ();
+}
   else
 {
   cfi->dw_cfi_opc = DW_CFA_register;
@@ -582,6 +599,7 @@ reg_save (unsigned int reg, unsigned int sreg, 
HOST_WIDE_INT offset)
 }
 
   add_cfi (cfi);
+  update_row_reg_save (cur_row, reg, cfi);
 }
 
 /* Given a SET, calculate the amount of stack adjustment it
@@ -1337,6 +1355,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
 {
   rtx src, dest, span;
   dw_cfi_ref cfi = new_cfi ();
+  unsigned regno;
 
   dest = SET_DEST (set);
   src = SET_SRC (set);
@@ -1347,8 +1366,10 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   span = targetm.dwarf_register_span (src);
   gcc_assert (!span);
 
+  regno = dwf_regno (src);
+
   cfi->dw_cfi_opc = DW_CFA_expression;
-  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = dwf_regno (src);
+  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno;
   cfi->dw_cfi_oprnd2.dw_cfi_loc
 = mem_loc_descriptor (XEXP (dest, 0), get_address_mode (dest),
  GET_MODE (dest), VAR_INIT_STATUS_INITIALIZED);
@@ -1356,6 +1377,7 @@ dwarf2out_frame_debug_cfa_expression (rtx set)
   /* ??? We'd like to use queue_reg_save, were the interface different,
  and, as above, we could manage flushing for epilogues.  */
   add_cfi (cfi);
+  update_row_reg_save (cur_row, regno, cfi);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_RESTORE note.  */
@@ -1370,6 +1392,7 @@ dwarf2out_frame_debug_cfa_restore (rtx reg)
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = regno;
 
   add_cfi (cfi);
+  update_row_reg_save (cur_row, regno, NULL);
 }
 
 /* A subroutine of dwarf2out_frame_debug, process a REG_CFA_WINDOW_SAVE.
-- 
1.7.6

[pph] Add symbol table - Fix remaining asm differences (issue4732043)

2011-07-14 Thread Diego Novillo


This patch fixes the remaining assembly differences between non-pph
and pph compiles.

The idea is to make the pph compiler process functions and variables
in the same order that they had been processed in the original
compile.  To do this, we intercept calls to rest_of_decl_compilation
and allocate_struct_function.  At every call, we add the declaration
to a list.

This list becomes the symbol table for the header file, which is then
written to the end of the file.  When reading the file, we read this
table and present the symbols to the middle end in that order.

The symbol table is written in pph_out_symtab and read in
pph_in_symtab.  The other changes are mostly wrapping calls to
rest_of_decl_compilation so that we can add the declarations to the
symbol table.

This simplifies the logic we use to present symbols to the middle end
as it encodes the sequence in the symbol table itself.  No need to
reverse lists of symbols or the other contortions we used to make.

Tested on x86_64.  Applied to branch.


Diego.

cp/ChangeLog.pph
2011-07-14   Diego Novillo  

* Make-lang.in (cp/decl.o): Add dependency on CXX_PPH_STREAMER_H.
(cp/pph-streamer-out.o): Add dependency on CGRAPH_H.
(cp/pph-streamer-in.o): Add dependency on toplev.h

* cp-tree.h (cp_rest_of_decl_compilation): Declare.
* call.c (set_up_extended_ref_temp): Call cp_rest_of_decl_compilation
instead of rest_of_decl_compilation.
* class.c (build_clone): Likewise.
* decl2.c (finish_anon_union): Likewise.
(maybe_emit_vtables): Likewise.
(write_out_vars): Likewise.
* method.c (implicitly_declare_fn): Likewise.
* semantics.c (maybe_add_lambda_conv_op): Likewise.
* decl.c (make_rtl_for_nonlocal_decl): Likewise.
(cp_finish_decl): Likewise.
(start_preparsed_function): Likewise.
(cp_rest_of_decl_compilation): New.

* pph-streamer-in.c: Include toplev.h
(pph_in_language_function): Tidy.
(pph_in_struct_function): Remove parameter DECL.
Support reading shared struct functions.
Read FN->DECL.
(pph_register_decl_in_symtab): Remove.  Update all users.
(pph_register_binding_in_symtab): Remove.  Update all users.
(pph_in_symtab_marker): New.
(pph_in_symtab): New.
(pph_read_file_contents): Call it.
(pph_in_function_decl): Do not call pph_in_struct_function.
* pph-streamer-out.c: Include cgraph.h.
(decls_to_register): New.
(pph_out_chain_filtered): Remove argument REVERSE_P.  Update
all users.
(pph_out_struct_function): Write FN->DECL.
(pph_out_symtab_marker): New.
(pph_out_symtab): New.
(pph_write_file_contents): Call it.
(pph_out_function_decl): Do not call pph_out_struct_function.
(pph_add_decl_to_register): New.
* pph-streamer.h (enum pph_symtab_marker): New.
(struct pph_stream): Remove field FNS_TO_EXPAND.  Update all
users.
(pph_add_decl_to_register): Declare.

testsuite/ChangeLog.pph
* g++.dg/pph/c1pr44948-1a.cc: Adjust expected failure.
* g++.dg/pph/x0hardlookup.h: Add new symbol in global namespace.
* g++.dg/pph/c4meteor-contest.cc: Mark fixed.
* g++.dg/pph/c1attr-warn-unused.cc: Likewise.
* g++.dg/pph/x1globalref.cc: Likewise.
* g++.dg/pph/x1hardlookup.cc: Likewise.


diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index 5f3249e..9634c47 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -271,7 +271,7 @@ cp/decl.o: cp/decl.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) 
cp/decl.h \
   cp/operators.def $(TM_P_H) $(TREE_INLINE_H) $(DIAGNOSTIC_H) $(C_PRAGMA_H) \
   debug.h gt-cp-decl.h $(TIMEVAR_H) $(TARGET_H) $(PLUGIN_H) \
   intl.h tree-iterator.h pointer-set.h $(SPLAY_TREE_H) \
-  c-family/c-objc.h
+  c-family/c-objc.h $(CXX_PPH_STREAMER_H)
 cp/decl2.o: cp/decl2.c $(CXX_TREE_H) $(TM_H) $(FLAGS_H) cp/decl.h \
   output.h toplev.h $(C_COMMON_H) gt-cp-decl2.h $(CGRAPH_H) \
   $(C_PRAGMA_H) $(TREE_DUMP_H) intl.h $(TARGET_H) $(GIMPLE_H) pointer-set.h \
@@ -351,8 +351,8 @@ cp/pph-streamer.o: cp/pph-streamer.c $(CONFIG_H) 
$(SYSTEM_H) coretypes.h \
 cp/pph-streamer-out.o: cp/pph-streamer-out.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \
$(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \
-   cppbuiltin.h tree-iterator.h
+   cppbuiltin.h tree-iterator.h $(CGRAPH_H)
 cp/pph-streamer-in.o: cp/pph-streamer-in.c $(CONFIG_H) $(SYSTEM_H) \
coretypes.h $(TREE_H) tree-pretty-print.h $(LTO_STREAMER_H) \
$(CXX_PPH_STREAMER_H) $(CXX_PPH_H) $(TREE_PASS_H) version.h \
-   cppbuiltin.h tree-iterator.h
+   cppbuiltin.h tree-iterator.h toplev.h
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 56f3408..6988dc7 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -8567,7 +8567,7 @@ set_up_extended_ref_temp (tree

Re: [PATCH] New IPA-CP with real function cloning - updated version

2011-07-14 Thread Jan Hubicka

> 2011-07-14  Martin Jambor  
> 
>   * ipa-prop.h: Include alloc-pool.h, all sorts of updates to general
>   comments.
>   (ipcp_values_pool): Declare.
>   (ipcp_sources_pool): Likewise.
>   (ipcp_lattice): Changed to forward declaration.
>   (ipa_param_descriptor): Removed fields ipcp_lattice, types and
>   cannot_devirtualize.
>   (ipa_node_params): New fields descriptors, lattices, known_vals,
>   clone_for_all_contexts and node dead, removed fields params and
>   count_scale.
>   (ipa_set_param_count): Removed.
>   (ipa_get_param_count): Made to work with descriptors vector.
>   (ipa_get_param): Updated.
>   (ipa_param_cannot_devirtualize_p): Removed.
>   (ipa_param_types_vec_empty): Likewise.
>   (ipa_set_param_used): New function.
>   (ipa_get_param_used): Updated to use descriptors vector.
>   (ipa_func_list): Removed.
>   (ipa_init_func_list): Removed declaration.
>   (ipa_push_func_to_list_1): Likewise.
>   (ipa_pop_func_from_list): Likewise.
>   (ipa_push_func_to_list): Removed.
>   (ipa_lattice_from_jfunc): Remove declaration.
>   (ipa_get_jf_pass_through_result): Declare.
>   (ipa_get_jf_ancestor_result): Likewise.
>   (ipa_value_from_jfunc): Likewise.
>   (ipa_get_lattice): Update.
>   (ipa_lat_is_single_const): New function.
>   * ipa-prop.c (ipa_push_func_to_list_1): Removed.
>   (ipa_init_func_list): Likewise.
>   (ipa_pop_func_from_list): Likewise.
>   (ipa_get_param_decl_index): Fix coding style.
>   (count_formal_params): Removed.
>   (count_formal_params_1): Renamed to count_formal_params.
>   (ipa_populate_param_decls): Update to use descriptors vector.
>   (ipa_initialize_node_params): Likewise.
>   (visit_ref_for_mod_analysis): Use ipa_set_param_used.
>   (ipa_analyze_params_uses): Likewise.
>   (ipa_free_node_params_substructures): Likewise and free also lattices
>   and known values.
>   (duplicate_array): Removed.
>   (ipa_edge_duplication_hook): Add the new edge to the list of edge
>   clones.
>   (ipa_node_duplication_hook): Update to use new lattices.
>   (ipa_free_all_structures_after_ipa_cp): Free alloc pools.
>   (ipa_free_all_structures_after_iinln): Likewise.
>   (ipa_write_node_info): Update to use new lattices.
>   (ipa_read_node_info): Likewise.
>   (ipa_get_jf_pass_through_result): New function.
>   (ipa_get_jf_ancestor_result): Likewise.
>   (ipa_value_from_jfunc): Likewise.
>   (ipa_cst_from_jfunc): Reimplemented using ipa_value_from_jfunc.
>   * ipa-cp.c: Reimplemented.
>   * params.def (PARAM_DEVIRT_TYPE_LIST_SIZE): Removed.
>   (PARAM_IPA_CP_VALUE_LIST_SIZE): New parameter.
>   (PARAM_IPA_CP_EVAL_THRESHOLD): Likewise.
>   * Makefile.in (IPA_PROP_H): Added alloc-pool.h to dependencies.
> 
>   * doc/invoke.texi (devirt-type-list-size): Removed description.
>   (ipa-cp-value-list-size): Added description.
> 
>   * testsuite/gcc.dg/ipa/ipa-1.c: Updated testcase dump scan.
>   * testsuite/gcc.dg/ipa/ipa-2.c: Likewise.
>   * testsuite/gcc.dg/ipa/ipa-3.c: Likewise and made functions static.
>   * testsuite/gcc.dg/ipa/ipa-4.c: Updated testcase dump scan.
>   * testsuite/gcc.dg/ipa/ipa-5.c: Likewise.
>   * testsuite/gcc.dg/ipa/ipa-7.c: Likewise.
>   * testsuite/gcc.dg/ipa/ipa-8.c: Updated testcase dump scan.
>   * testsuite/gcc.dg/ipa/ipacost-1.c: Likewise.
>   * testsuite/gcc.dg/ipa/ipacost-2.c: Likewise and increased sizes
>   of some functions.
>   * testsuite/gcc.dg/ipa/ipcp-1.c: New test.
>   * testsuite/gcc.dg/ipa/ipcp-2.c: Likewise.
>   * testsuite/gcc.dg/tree-ssa/ipa-cp-1.c: Updated testcase.
> 
> 
> /* ipa_node_params stores information related to formal parameters of 
> functions
>and some other information for interprocedural passes that operate on
>parameters (such as ipa-cp).  */
> 
> struct ipa_node_params
> {
>   /* Information about individual formal parameters that are gathered when
>  summaries are generated. */
>   VEC (ipa_param_descriptor_t, heap) *descriptors;
>   /* Pointer to an array of structures describing individual formal
>  parameters.  */
>   struct ipcp_lattice *lattices;

Hmm, I was under impression that the plan was to break away the stuff used by
ipa-cp internally during the propagatoin stage (i.e. ipcp_orig_node/known_vals
and probably lattices from the stuff used to represent parameters and jump
functions, i.e. descriptors.

But this can be handled incrementally.

>   /* FIXME: Can we clobber only the first argument of thunks?  */

Well, we should know how to propagate through it.  But it is not too important 
side case I guess
untill we can devirtualize to them effectively.
>   if (node->alias || node->thunk.thunk_p
>   || ipa_is_called_with_var_arguments (info))
> disable = true;

The patch is OK, thanks!
It would be nice to add

[patch] Fix PR middle-end/49732

2011-07-14 Thread Eric Botcazou

Hi,

this is a regression present on mainline and 4.6 branch.  The compiler crashes 
during gimplification because there is a COMPOUND_EXPR shared between the 
TYPE_SIZE and TYPE_SIZE_UNIT expressions of an array type.  Now this isn't 
supposed to happen because we run an unsharing pass before gimplification.

The problem here is that we have a forward declaration (DECL_EXPR) of a pointer 
type to the array type.  So, during the marking phase of the unsharing pass, 
the array type gets marked as visited through the pointer, which prevents it 
from being walked during the same phase when its own DECL_EXPR is processed.

This pointer/pointed-to type business is an old pattern.  Five years ago, 
Olivier changed gimplify_type_sizes like so:

2006-10-06  Olivier Hainque  

* gimplify.c (gimplify_type_sizes) [POINTER_TYPE, REFERENCE_TYPE]:
Don't recurse on the pointed-to type.

because of a related problem.  It turns out that we need the same change in the 
DECL_EXPR case of walk_tree_1 to fix the bug at hand, which is sort of logical 
as there is a strong relationship between them:

case DECL_EXPR:
  /* If this is a TYPE_DECL, walk into the fields of the type that it's
 defining.  We only want to walk into these fields of a type in this
 case and not in the general case of a mere reference to the type.

 The criterion is as follows: if the field can be an expression, it
 must be walked only here.  This should be in keeping with the fields
 that are directly gimplified in gimplify_type_sizes in order for the
 mark/copy-if-shared/unmark machinery of the gimplifier to work with
 variable-sized types.

 Note that DECLs get walked as part of processing the BIND_EXPR.  */


Tested on x86_64-suse-linux, OK for mainline and 4.6 branch?


2011-07-14  Eric Botcazou  

PR middle-end/49732
* tree.c (walk_tree_1) : Do not walk a pointed-to type.


2011-07-14  Eric Botcazou  

* gnat.dg/pointer_controlled.adb: New test.


-- 
Eric Botcazou
Index: tree.c
===
--- tree.c	(revision 176261)
+++ tree.c	(working copy)
@@ -10596,9 +10596,14 @@ walk_tree_1 (tree *tp, walk_tree_fn func
 	  if (result || !walk_subtrees)
 	return result;
 
-	  result = walk_type_fields (*type_p, func, data, pset, lh);
-	  if (result)
-	return result;
+	  /* But do not walk a pointed-to type since it may itself need to
+	 be walked in the declaration case if it isn't anonymous.  */
+	  if (!POINTER_TYPE_P (*type_p))
+	{
+	  result = walk_type_fields (*type_p, func, data, pset, lh);
+	  if (result)
+		return result;
+	}
 
 	  /* If this is a record type, also walk the fields.  */
 	  if (RECORD_OR_UNION_TYPE_P (*type_p))
-- PR ada/49732
-- Testcase by Vorfeed Canal

-- { dg-do compile }
-- { dg-options "-gnato" }

with Interfaces.C; use Interfaces.C;
with Interfaces.C.Strings; use Interfaces.C.Strings;
with Interfaces.C.Pointers;

procedure Pointer_Controlled is

   function Create (Name : String) return size_t is

  type Name_String is new char_array (0 .. Name'Length);
  type Name_String_Ptr is access Name_String;
  pragma Controlled (Name_String_Ptr);

  Name_Str : constant Name_String_Ptr := new Name_String;
  Name_Len : size_t;

   begin
  To_C (Name, Name_Str.all, Name_Len);
  return 1;
   end;

   Test : size_t;

begin
   Test := Create("ABC");
end;

[4.6] Fix PR tree-optimization/49725

2011-07-14 Thread Eric Botcazou

Hi,

this is the ICE at -O2 on ACATS c34005a introduced on the 4.6 branch by 
Martin's latest SRA patch.  But it's actually the same PRE issue as:
  http://gcc.gnu.org/ml/gcc-patches/2011-03/msg02210.html

Bootstrapped/regtested on i586-suse-linux, OK for 4.6 branch?


2011-07-14  Eric Botcazou  

PR tree-optimization/49725
Backport from mainline
2011-03-31  Eric Botcazou  

* tree-ssa-pre.c (create_component_ref_by_pieces_1) : Drop
a zero minimum index only if it is redundant



-- 
Eric Botcazou
Index: tree-ssa-pre.c
===
--- tree-ssa-pre.c	(revision 176264)
+++ tree-ssa-pre.c	(working copy)
@@ -2874,8 +2874,11 @@ create_component_ref_by_pieces_1 (basic_
 	  return NULL_TREE;
 	if (genop2)
 	  {
-	/* Drop zero minimum index.  */
-	if (tree_int_cst_equal (genop2, integer_zero_node))
+	tree domain_type = TYPE_DOMAIN (TREE_TYPE (genop0));
+	/* Drop zero minimum index if redundant.  */
+	if (integer_zerop (genop2)
+		&& (!domain_type
+		|| integer_zerop (TYPE_MIN_VALUE (domain_type
 	  genop2 = NULL_TREE;
 	else
 	  {

[patch] Fix PR target/48220 for the SPARC

2011-07-14 Thread Eric Botcazou

Hi,

this adds support for DW_OP_GNU_entry_value/DW_TAG_GNU_call_site_parameter on 
SPARC-like architectures (architectures with register windows and explicit 
window save instruction).  The transformation OUTGOING_REGNO -> INCOMING_REGNO 
is explicit for them and not tied to the call-to-subroutine instruction, so 
this needs to be modelled if you want precise variable tracking.

Tested on SPARC/Solaris (both GCC and GDB) and x86/Linux.  OK for mainline?


2011-07-14  Eric Botcazou  

PR target/48220
* doc/md.texi (Standard Names): Document window_save.
* cfgexpand.c (expand_debug_parm_decl): New function extracted from
expand_debug_expr and expand_debug_source_expr.  If the target has
a window_save instruction, adjust the ENTRY_VALUE_EXP.
(expand_debug_expr) : Call expand_debug_parm_decl if the
SSA_NAME_VAR is a parameter.
(expand_debug_source_expr) : Call expand_debug_parm_decl.
* var-tracking.c (parm_reg_t): New type and associated vector type.
(windowed_parm_regs): New variable.
(adjust_insn): If the target has a window_save instruction and this
is the instruction, make its effect on parameter registers explicit.
(next_non_note_insn_var_location): New function.
(emit_notes_in_bb): Use it instead of NEXT_INSN throughout.
(vt_add_function_parameter): If the target has a window_save insn,
adjust the incoming RTL and record that in windowed_parm_regs.
(vt_finalize): Free windowed_parm_regs.


-- 
Eric Botcazou
Index: cfgexpand.c
===
--- cfgexpand.c	(revision 176072)
+++ cfgexpand.c	(working copy)
@@ -2358,8 +2358,60 @@ convert_debug_memory_address (enum machi
   return x;
 }
 
-/* Return an RTX equivalent to the value of the tree expression
-   EXP.  */
+/* Return an RTX equivalent to the value of the parameter DECL.  */
+
+static rtx
+expand_debug_parm_decl (tree decl)
+{
+  rtx incoming = DECL_INCOMING_RTL (decl);
+
+  if (incoming
+  && GET_MODE (incoming) != BLKmode
+  && ((REG_P (incoming) && HARD_REGISTER_P (incoming))
+	  || (MEM_P (incoming)
+	  && REG_P (XEXP (incoming, 0))
+	  && HARD_REGISTER_P (XEXP (incoming, 0)
+{
+  rtx rtl = gen_rtx_ENTRY_VALUE (GET_MODE (incoming));
+
+#ifdef HAVE_window_save
+  /* DECL_INCOMING_RTL uses the INCOMING_REGNO of parameter registers.
+	 If the target machine has an explicit window save instruction, the
+	 actual entry value is the corresponding OUTGOING_REGNO instead.  */
+  if (REG_P (incoming)
+	  && OUTGOING_REGNO (REGNO (incoming)) != REGNO (incoming))
+	incoming
+	  = gen_rtx_REG_offset (incoming, GET_MODE (incoming),
+OUTGOING_REGNO (REGNO (incoming)), 0);
+  else if (MEM_P (incoming))
+	{
+	  rtx reg = XEXP (incoming, 0);
+	  if (OUTGOING_REGNO (REGNO (reg)) != REGNO (reg))
+	{
+	  reg = gen_raw_REG (GET_MODE (reg), OUTGOING_REGNO (REGNO (reg)));
+	  incoming = replace_equiv_address_nv (incoming, reg);
+	}
+	}
+#endif
+
+  ENTRY_VALUE_EXP (rtl) = incoming;
+  return rtl;
+}
+
+  if (incoming
+  && GET_MODE (incoming) != BLKmode
+  && !TREE_ADDRESSABLE (decl)
+  && MEM_P (incoming)
+  && (XEXP (incoming, 0) == virtual_incoming_args_rtx
+	  || (GET_CODE (XEXP (incoming, 0)) == PLUS
+	  && XEXP (XEXP (incoming, 0), 0) == virtual_incoming_args_rtx
+	  && CONST_INT_P (XEXP (XEXP (incoming, 0), 1)
+return incoming;
+
+  return NULL_RTX;
+}
+
+/* Return an RTX equivalent to the value of the tree expression EXP.  */
 
 static rtx
 expand_debug_expr (tree exp)
@@ -3169,36 +3221,12 @@ expand_debug_expr (tree exp)
 		if (SSA_NAME_IS_DEFAULT_DEF (exp)
 		&& TREE_CODE (SSA_NAME_VAR (exp)) == PARM_DECL)
 		  {
-		rtx incoming = DECL_INCOMING_RTL (SSA_NAME_VAR (exp));
-		if (incoming
-			&& GET_MODE (incoming) != BLKmode
-			&& ((REG_P (incoming) && HARD_REGISTER_P (incoming))
-			|| (MEM_P (incoming)
-&& REG_P (XEXP (incoming, 0))
-&& HARD_REGISTER_P (XEXP (incoming, 0)
-		  {
-			op0 = gen_rtx_ENTRY_VALUE (GET_MODE (incoming));
-			ENTRY_VALUE_EXP (op0) = incoming;
-			goto adjust_mode;
-		  }
-		if (incoming
-			&& MEM_P (incoming)
-			&& !TREE_ADDRESSABLE (SSA_NAME_VAR (exp))
-			&& GET_MODE (incoming) != BLKmode
-			&& (XEXP (incoming, 0) == virtual_incoming_args_rtx
-			|| (GET_CODE (XEXP (incoming, 0)) == PLUS
-&& XEXP (XEXP (incoming, 0), 0)
-   == virtual_incoming_args_rtx
-&& CONST_INT_P (XEXP (XEXP (incoming, 0),
-		  1)
-		  {
-			op0 = incoming;
-			goto adjust_mode;
-		  }
+		op0 = expand_debug_parm_decl (SSA_NAME_VAR (exp));
+		if (op0)
+		  goto adjust_mode;
 		op0 = expand_debug_expr (SSA_NAME_VAR (exp));
-		if (!op0)
-		  return NULL;
-		goto adjust_mode;
+		if (op0)
+		  goto adjust_mode;
 		  }
 		return NULL;
 	  }
@@ -3327,36 +3355,14

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Janne Blomqvist

On Thu, Jul 14, 2011 at 23:34, Tobias Burnus  wrote:
> Janne Blomqvist wrote:
>>
>>        * caf/mpi.c (caf_runtime_error): Remove "error" parameter.
>>        Return EXIT_FAILURE instead.
>> > From the patch:
>>
>>  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
>>
>> This is unnecessary, as the call to exit() will call the libgfortran
>> destructor which will close all units, as explained in comment #3 in
>> the PR.
>
> While I think it should be sufficient for single-processor usage, I am not
> sure that that way all I/O buffers gets written before one calls
> MPI_Finalize - nor am I sure how one would handle ERROR STOP with regards to
> I/O.

Ah, I read that comment from caf/single.c, but now I see it in the
caf/mpi.c part of the patch as well. Yeah, in that case it might
matter, depending on how the MPI library implements MPI_Abort(). That
is, does it call exit() (in which case the destructor will be called)
or will it call abort() or raise some other fatal signal.

At least Open MPI seems to implement it by sending SIGTERM

http://www.open-mpi.org/doc/v1.4/man3/MPI_Abort.3.php

Come to think of it, in this case, if you want to use MPI_Abort()
rather than MPI_Finalize() + exit(), you should probably reset the
fatal signal handlers to the default one unless you want the backtrace
to be printed.

-- 
Janne Blomqvist

Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Jan Hubicka

> 
> Well, technically they survive until after inlining (because of
> indirect inlining which also derives information from the lattices
> corresponding to node->inlined_to node.  Results of arithmetic
> functions are not going to be accessed during inlining when compiling
> any reasonable program but...

Hmm, this sounds bad.  We should move it to GTY then incrementally.  I however
though that indirect inlining looks only at jump functions, not at lattices?
> > 
> > __attribute__ ((really_bad_attribute))
> > function (int param)
> > {
> >   use param
> > }
> > 
> > and then let ipa-cp to invent param is a constant.
> 
> what would be such a "really_bad_attribute" ? 

Well, we need to go through the attribute list and prepare list of "bad guys"
instead of forbidding any attribute.
Obviously we should not give up on "pure" attribute, for example.

Another class of attributes are those referring to function arguments that needs
updating if they are still in use after clonning. I think this was original 
reason
for adding the check.

I am not sure if we have attributes that should prevent clonning completely.
> > > > 
> > > > can_change_sigunature will also handle apply_args.
> > > > Add VA_START code there, too.  For the other use of this flag (in i386) 
> > > > VA_START
> 
> The last one already is VA_START... or do you mean a different one?

I meant the code in ipa-inline-analysis.c doing the same checks but skiiping 
va_start since
i386 backend tests it by itself.
> > BTW currently the edges from thunks miss any profile info.
> > (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate 
> > those
> > (it is difficult ot measure count of an alias).
> > 
> 
> I'm not really looking at the edges from thunks to the actual
> function.  OTOH, I assume that edges to a thunk do have a count and
> look at that.

They do have (unless they are thunk to thunk edges), but in any case we ought to
regularize things here, sooner or later someone will get confused with counts
missing in the callgraph.
> 
> > If you walk only hot edges, then you need to make your function descent into
> > both alias refs and thunks calls, or the aliases of thunks will not be seen
> > then.
> 
> Well, looking at bits of the patch again now, aliases to thunks might
> indeed be a problem for a few pieces of it.  I'll send the patch
> nevertheless and ponder about this problem later.

Hmm, please do :)
I will look at the updated patch.

Honza
> 
> Thanks,
> 
> Martin

Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Uros Bizjak

On Thu, Jul 14, 2011 at 12:09 PM, Rainer Orth
 wrote:
> Andreas Schwab  writes:
>
>> Same on ia64:
>>
>> Configuration mismatch!
>> Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
>> Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o 
>> crtfastmath.o

Alpha needs the same fix. I need following patch to bootstrap the compiler:

--cut here--
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 176282)
+++ gcc/config.gcc  (working copy)
@@ -757,6 +757,7 @@
extra_options="${extra_options} alpha/elf.opt"
target_cpu_default="MASK_GAS"
tmake_file="${tmake_file} alpha/t-alpha alpha/t-ieee alpha/t-linux"
+   extra_parts="$extra_parts crtfastmath.o"
;;
 alpha*-*-freebsd*)
tm_file="${tm_file} ${fbsd_tm_file} alpha/elf.h alpha/freebsd.h"
--cut here--

Uros.

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus


Janne Blomqvist wrote:

* caf/mpi.c (caf_runtime_error): Remove "error" parameter.
Return EXIT_FAILURE instead.
> From the patch:

  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.


While I think it should be sufficient for single-processor usage, I am 
not sure that that way all I/O buffers gets written before one calls 
MPI_Finalize - nor am I sure how one would handle ERROR STOP with 
regards to I/O.


In terms of I/O, there are three kinds of I/O, which might need to be 
treated differently:
* I/O to stdout (OUTPUT_UNIT): Here, all output should be collected by 
MPI - I am not sure whether it will come to later in a destructor

* I/O to (local) files
* I/O via the communication library: Here, I see the greatest problems, 
but that's not in F2008's coarrays, but might well be added with the 
Technical Report.


I somehow would feel better if I could ensure that the buffers are 
flushed and the files closed before I pull the MPI plug (MPI_Finalize, 
MPI_Abort).


For reference, the comment Janne is referring to is the one at the 
bottom (comment 3) of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43849


Tobias

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera


On 07/14/2011 09:48 PM, Janne Blomqvist wrote:

2011-07-14  Daniel Carrera



* caf/mpi.c (caf_runtime_error): Remove "error" parameter.
Return EXIT_FAILURE instead.


Well, this changelog entry is incorrect; you're calling
exit(EXIT_FAILURE) and not returning a value.


Ok. What term should I have used? I was thinking that the program itself 
returns a value, but I can see that what I wrote was not optimal.




 From the patch:

  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.


Ok. I didn't want to mess with comments that I didn't fully understand.

--
I'm not overweight, I'm undertall.

Re: [google] Backport patch r175881 from gcc-4_6-branch to google/gcc-4_6 (issue4695051)

2011-07-14 Thread Diego Novillo

On Wed, Jul 13, 2011 at 22:39, Carrot Wei  wrote:
> Hi Diego
>
> The previous patch was done with svn merge.
>
> This new version is done with svnmerge.py. Again tested with

Great, thanks.  This simplifies regular merges, since svnmerge.py will
know that this rev was merged already.  I think svnmerge.py also reads
'svn merge' markers, but I'm not sure.

> make check-g++ RUNTESTFLAGS="--target_board=arm-sim/thumb/arch=armv7-a
> dg.exp=anon-ns1.C"
> make check-g++ RUNTESTFLAGS="dg.exp=anon-ns1.C"
>
> BTW, there are some unexpected property changes after merge, I don't
> how did they come out and how should I deal with them?

Don't worry about those.  The branch used to merge from multiple
sources that had different attributes for those files.  It will be
gone after the next merge I'm doing next week.

The backport is OK, of course.

Diego.

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Janne Blomqvist

On Thu, Jul 14, 2011 at 18:40, Daniel Carrera  wrote:
> And of course, here is the patch and ChangeLog.
>
> 2011-07-14  Daniel Carrera  

>        * caf/mpi.c (caf_runtime_error): Remove "error" parameter.
>        Return EXIT_FAILURE instead.

Well, this changelog entry is incorrect; you're calling
exit(EXIT_FAILURE) and not returning a value.

From the patch:

 /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */

This is unnecessary, as the call to exit() will call the libgfortran
destructor which will close all units, as explained in comment #3 in
the PR.

-- 
Janne Blomqvist

Remove NetWare support

2011-07-14 Thread Rainer Orth

I've got a preliminary NetWare removal patch ready (yet untested), but
have a couple of questions:

* Given that there's a considerable amount of NetWare support still in
  src, toplevel support has to stay.  On the other hand, the reference
  in config/elf.m4 is only used for the LTO plugin and can go, I
  believe.

* The i386 port has some code that seems to be NetWare-specific, namely
  KEEP_AGGREGATE_RETURN_POINTER, ix86_keep_aggregate_return_pointer and
  the callee_pop_aggregate_return attribute.  I'm uncertain if all this
  can go now.

* There are references to config/i386/netware.h in gcc/po/*.po.  Do I
  need to do anything about this when netware.h is removed?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson

On 07/14/2011 09:43 AM, Bernd Schmidt wrote:
> (Although now I wonder if we could instead use one of the speculative
> load instructions? There's one that sets the NaT bit if the load would
> fault, isn't there? It's been so long I can't remember.)

We could, but we also have to insert a check load to match.
It gets very difficult to tell when it's worth it.

Your example

(p7) br.cond.dptk .L5
ld8 r15 = [r14]

becomes

ld8.sa r15 = [r14]  // speculative advanced load
(p7) br.cond.dptk .L5
ld8.c.clr r15 = [r14]   // checked load (clear ALAT)

Note that the speculative load can arbitrarily fail (e.g. tlb miss)
and that the checked load can also arbitrarily re-issue the load.

Note that one can't split "ld8 r14 = [r14]" without additional
register allocation, because the address needs to remain live
until the check.

>> It does raise the question of whether we ought to completely
>> change the way we represent the pairing of LTOFFX/LDXMOV
>> relocations.
> 
> This I can't answer since I don't know the definition of these.

These are markers for linker relaxation.  If the symbol is
within range,

addlx = @ltoffx(foo), gp
...
ld8.mov y = [x], foo// .mov adds the LDXMOV reloc vs foo to ld8 insn

will be relaxed to

nop
...
addly = @gprel(foo), gp

The constraint in using the relocations is that every ldxmov
insn must be fed by an ltoffx reloc with the same symbol, and
that an ltoffx reloc cannot feed any other insn.  That allows
us, at link time, to not consider data flow, merely assert
that if foo is relaxed anywhere, it's relaxed everywhere.

r~

gimplefe patch

2011-07-14 Thread Ketaki

Hi,

This is the patch for gimple front end branch. I have written this
code as a part of Google Summer of Code program.

The patch consist of following changes:

1. The grammar rules for variable declaration are modified to include
parameters. The code for the same is in gp_parse_var_decl function.
2. New function gp_parse_function_declaration is added to recognize
function declaration.
3. Few changes in the function gp_parse_decl have been made to
accomodate the above code.
4. The comments have been updated to reflect the changes in the code.
5. Changelog is updated.

Please find the attached patch file. I hope you find everything as per
the standard

--
Ketaki Tiwatne


file.diff
Description: Binary data

Fix a testcase

2011-07-14 Thread Bernd Schmidt

I've committed the following as obvious. This showed up as a failure
with C6X.


Bernd
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 176278)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,10 +1,14 @@
+2011-07-14  Bernd Schmidt  
+
+   * gcc.dg/pr48770.c: Add dg-require-effective-target fpic.
+
 2011-07-14  Richard Guenther  
 
PR tree-optimization/49651
* gcc.dg/torture/pr49651.c: New testcase.
 
 2011-07-14  Georg-Johann Lay  
-   
+
PR target/43746
* gcc.dg/array-quals-1.c: Don't xfail on AVR.
 
Index: gcc/testsuite/gcc.dg/pr48770.c
===
--- gcc/testsuite/gcc.dg/pr48770.c  (revision 176171)
+++ gcc/testsuite/gcc.dg/pr48770.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target fpic } */
 /* { dg-options "-O -fprofile-arcs -fPIC -fno-dce -fno-forward-propagate" } */
 
 int test_goto2 (int f)

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 18:39, Richard Henderson wrote:
> On 07/14/2011 09:19 AM, Bernd Schmidt wrote:
>> Yes, but not using the fixed got pointer in r1, but a random other
>> register which can have different values in the function.
> 
> Oh, I think I see.
> 
> So if this really had been a PLUS, as implied by the LO_SUM,
> we would have had garbage input, produced garbage output, but
> (eventually) ignored the result.
> 
> But since this really is a load from memory, the garbage
> input is immediately fatal.
> 
> Have I got that right?

This is correct.

> If so, the patch with the use of gen_const_mem is ok.

Will commit.

(Although now I wonder if we could instead use one of the speculative
load instructions? There's one that sets the NaT bit if the load would
fault, isn't there? It's been so long I can't remember.)

> It does raise the question of whether we ought to completely
> change the way we represent the pairing of LTOFFX/LDXMOV
> relocations.

This I can't answer since I don't know the definition of these.


Bernd

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson

On 07/14/2011 09:19 AM, Bernd Schmidt wrote:
> Yes, but not using the fixed got pointer in r1, but a random other
> register which can have different values in the function.

Oh, I think I see.

So if this really had been a PLUS, as implied by the LO_SUM,
we would have had garbage input, produced garbage output, but
(eventually) ignored the result.

But since this really is a load from memory, the garbage
input is immediately fatal.

Have I got that right?

If so, the patch with the use of gen_const_mem is ok.

It does raise the question of whether we ought to completely
change the way we represent the pairing of LTOFFX/LDXMOV
relocations.

r~

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 18:29, Richard Henderson wrote:
> On 07/14/2011 09:23 AM, Bernd Schmidt wrote:
>> Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
>> which doesn't exist in that tree) the load stays behind the branch where
>> it should be.
>>
> 
> RTX_UNCHANGING_P was the bit back then, I believe.

Still ok with that also set:
addl r14 = @ltoff(ap_standalone#), r1
;;
.mii
ld8 r15 = [r14]
addl r14 = @ltoff(.LC2), r1
;;
(p7) addl r14 = 1, r0
;;
.mib
(p7) st4 [r15] = r14
nop.i 0
(p7) br.cond.dptk .L5
.mib
ld8 r14 = [r14]

And not ok if the MEM isn't exposed in RTL:
addl r14 = @ltoff(ap_standalone#), r1
;;
.mii
ld8 r15 = [r14]
addl r14 = @ltoff(.LC2), r1
;;
(p7) addl r14 = 1, r0
;;
.mii
(p7) st4 [r15] = r14
nop.i 0
nop.i 0
.mbb
ld8 r14 = [r14]
(p7) br.cond.dptk .L5

I'm attaching the 3.3 patch.


Bernd

Index: ../../branches/gcc-3_3-branch/gcc/sched-ebb.c
===
--- ../../branches/gcc-3_3-branch/gcc/sched-ebb.c   (revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/sched-ebb.c   (working copy)
@@ -52,8 +52,7 @@ static int schedule_more_p PARAMS ((void
 static const char *ebb_print_insn PARAMS ((rtx, int));
 static int rank PARAMS ((rtx, rtx));
 static int contributes_to_priority PARAMS ((rtx, rtx));
-static void compute_jump_reg_dependencies PARAMS ((rtx, regset, regset,
-  regset));
+static void compute_jump_reg_dependencies PARAMS ((rtx, regset));
 static void schedule_ebb PARAMS ((rtx, rtx));
 
 /* Return nonzero if there are more insns that should be scheduled.  */
@@ -161,30 +160,22 @@ contributes_to_priority (next, insn)
   return 1;
 }
 
-/* INSN is a JUMP_INSN, COND_SET is the set of registers that are
-   conditionally set before INSN.  Store the set of registers that
-   must be considered as used by this jump in USED and that of
-   registers that must be considered as set in SET.  */
+/* INSN is a JUMP_INSN.  Store the set of registers that must be considered
+   to be set by this jump in SET.  */
 
 static void
-compute_jump_reg_dependencies (insn, cond_set, used, set)
+compute_jump_reg_dependencies (insn, set)
  rtx insn;
- regset cond_set, used, set;
+ regset set;
 {
   basic_block b = BLOCK_FOR_INSN (insn);
   edge e;
   for (e = b->succ; e; e = e->succ_next)
-if (e->flags & EDGE_FALLTHRU)
-  /* The jump may be a by-product of a branch that has been merged
-in the main codepath after being conditionalized.  Therefore
-it may guard the fallthrough block from using a value that has
-conditionally overwritten that of the main codepath.  So we
-consider that it restores the value of the main codepath.  */
-  bitmap_operation (set, e->dest->global_live_at_start, cond_set,
-   BITMAP_AND);
-else
-  bitmap_operation (used, used, e->dest->global_live_at_start,
-   BITMAP_IOR);
+if ((e->flags & EDGE_FALLTHRU) == 0)
+  {
+   bitmap_operation (set, set, e->dest->global_live_at_start,
+ BITMAP_IOR);
+  }
 }
 
 /* Used in schedule_insns to initialize current_sched_info for scheduling
Index: ../../branches/gcc-3_3-branch/gcc/emit-rtl.c
===
--- ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/emit-rtl.c(working copy)
@@ -192,6 +192,18 @@ static tree component_ref_for_mem_expr P
 static rtx gen_const_vector_0  PARAMS ((enum machine_mode));
 static void copy_rtx_if_shared_1   PARAMS ((rtx *orig));
 
+/* Generate a memory referring to non-trapping constant memory.  */
+
+rtx
+gen_const_mem (enum machine_mode mode, rtx addr)
+{
+  rtx mem = gen_rtx_MEM (mode, addr);
+  RTX_UNCHANGING_P (mem) = 1;
+  MEM_NOTRAP_P (mem) = 1;
+  return mem;
+}
+
+
 /* Probability of the conditional branch currently proceeded by try_split.
Set to -1 otherwise.  */
 int split_branch_probability = -1;
Index: ../../branches/gcc-3_3-branch/gcc/sched-deps.c
===
--- ../../branches/gcc-3_3-branch/gcc/sched-deps.c  (revision 175226)
+++ ../../branches/gcc-3_3-branch/gcc/sched-deps.c  (working copy)
@@ -981,17 +981,12 @@ sched_analyze_insn (deps, x, insn, loop_
   else
{
  rtx pending, pending_mem;
- regset_head tmp_uses, tmp_sets;
- INIT_REG_SET (&tmp_uses);
- INIT_REG_SET (&tmp_sets);
-
- (*current_sched_info->compute_jump_reg_dependencies)
-   (insn, &deps->reg_conditional_sets, &tmp_uses, &tmp_sets);
- IOR_REG_SET (reg_pending_uses, &tmp_uses);
-

Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jason Merrill


On 07/14/2011 12:29 PM, Jakub Jelinek wrote:

On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote:

On 07/14/2011 05:41 AM, Jakub Jelinek wrote:

Here is a PCH friendly variant of the patch which tries to set right
producer from the start, but early in dwarf2out_finish double checks
if PCH hasn't changed it and if it did, updates it back to the expected
string.


Why not just wait until then to set it?


Because the same routine that sets it is used both to create the single
compilation unit early and to create CUs late (for -feliminate-dwarf2-dups
already during dwarf2out_finish).


Ah.  OK, then.

Jason

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson

On 07/14/2011 09:23 AM, Bernd Schmidt wrote:
> Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
> which doesn't exist in that tree) the load stays behind the branch where
> it should be.
> 

RTX_UNCHANGING_P was the bit back then, I believe.


r~

Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jakub Jelinek

On Thu, Jul 14, 2011 at 12:26:41PM -0400, Jason Merrill wrote:
> On 07/14/2011 05:41 AM, Jakub Jelinek wrote:
> >Here is a PCH friendly variant of the patch which tries to set right
> >producer from the start, but early in dwarf2out_finish double checks
> >if PCH hasn't changed it and if it did, updates it back to the expected
> >string.
> 
> Why not just wait until then to set it?

Because the same routine that sets it is used both to create the single
compilation unit early and to create CUs late (for -feliminate-dwarf2-dups
already during dwarf2out_finish).

Jakub

Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jason Merrill


On 07/14/2011 05:41 AM, Jakub Jelinek wrote:

Here is a PCH friendly variant of the patch which tries to set right
producer from the start, but early in dwarf2out_finish double checks
if PCH hasn't changed it and if it did, updates it back to the expected
string.


Why not just wait until then to set it?

Jason

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 18:19, Bernd Schmidt wrote:
> On 07/14/11 18:03, Richard Henderson wrote:
>> On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
>>> +++ gcc/config/ia64/ia64.c  (working copy)
>>> @@ -1047,7 +1047,7 @@
>>>tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
>>>emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
>>>  
>>> -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
>>> +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);
>>
>> And the bug stems from ... what? 
>>
>> Is this bug still fixed if you change this to gen_const_mem?
> 
> It should be. Testing this isn't straightforward bit tricky since the
> original bug is in gcc-3.3 which doesn't have gen_const_mem, and current
> mainline with just the scheduler patch removed doesn't reproduce it with
> the testcase.

Ok, with gen_const_mem hacked into gcc-3.3 (minus setting MEM_READONLY_P
which doesn't exist in that tree) the load stays behind the branch where
it should be.


Bernd

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 18:03, Richard Henderson wrote:
> On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
>> +++ gcc/config/ia64/ia64.c   (working copy)
>> @@ -1047,7 +1047,7 @@
>>tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
>>emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
>>  
>> -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
>> +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);
> 
> And the bug stems from ... what? 
> 
> Is this bug still fixed if you change this to gen_const_mem?

It should be. Testing this isn't straightforward bit tricky since the
original bug is in gcc-3.3 which doesn't have gen_const_mem, and current
mainline with just the scheduler patch removed doesn't reproduce it with
the testcase.

In mainline, gen_const_mem sets MEM_NOTRAP_P, but it looks like we
handle that correctly in may_trap_p:

  if (/* MEM_NOTRAP_P only relates to the actual position of the memory
 reference; moving it out of context such as when moving code
 when optimizing, might cause its address to become invalid.  */
  code_changed
  || !MEM_NOTRAP_P (x))

and sched_deps uses rtx_addr_can_trap anyway.

> This is a load from the .got.

Yes, but not using the fixed got pointer in r1, but a random other
register which can have different values in the function.

> It's difficult to tell if your raw gen_rtx_MEM with no aliasing
> info doesn't just paper over a problem by preventing it from
> being moved.

The problem isn't about memory aliasing, it's about the pointer register
being clobbered.

Bernd

Re: Fix PR48542: reload register contents reuse crossing setjmp

2011-07-14 Thread Ulrich Weigand

Jeff Law wrote:
> On 06/15/11 21:46, Hans-Peter Nilsson wrote:
> > PR rtl-optimization/48542
> > * reload.c (find_equiv_reg): Stop looking when finding a
> > setjmp-type call.
> > * reload1.c (reload_as_needed): Invalidate all reload
> > registers when crossing a setjmp-type call.
> OK.
> Jeff

I see that this went already in, but I'm wondering why this
change should be necessary.  As far as register use is
concerned, setjmp ought to behave just like a regular function:
if a register is call-clobbered, reload will not attempt to
use it across a function call (*any* function call) anyway;
but if the register is call-saved, setjmp really ought to
restore the old value, *both* times it returns, and so reuse
ought to be allowed ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Jan Hubicka

> > >   if (dec < cs->count)
> > >   cs->count -= dec;
> > >   else
> > >   cs->count = 0;
> > > }
> > > 
> > >   if (dump_file)
> > > dump_profile_updates (orig_node, new_node);
> > > }
> > > 
> > >   if (node->local.can_change_signature)
> > > {
> > >   args_to_skip = BITMAP_GGC_ALLOC ();
> > >   for (i = 0; i < count; i++)
> > >   {
> > > tree t = VEC_index (tree, known_vals, i);
> > > 
> > > if ((t && TREE_CODE (t) != TREE_BINFO)
> > > || !ipa_is_param_used (info, i))
> > >   bitmap_set_bit (args_to_skip, i);
> > >   }
> > > }
> > >   else
> > > args_to_skip = NULL;
> > When we can't change signature, still we can set is_parm_unused flag for 
> > the callee
> > to aid later optimizers.
> 
> I assume I can re-use the node->local.can_change_signature flag?  Is
> that supposed to be set at any given place or can IPA-CP do it on its own?

can_change_signature is currently used by i386 backend and it is set by inliner.
I plan to move it to visibility pass at the time local functions are dentified.
So yes, you can assume it is set and up to date at the time IPA-CP is run.

Honza
> 
> > 
> > Rest of patch looks OK. It definitely reads better than previous ipa-cp.c ;)
> > I suppose we will need to get some experience with the logic deciding 
> > whether to clone..
> 
> 
> Thanks, I'll post the current version in a moment.
> 
> Martin

Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Steven Bosscher

On Thu, Jul 14, 2011 at 11:40 AM, Richard Guenther
 wrote:
> Generating RTL from GIMPLE passes just to be able to use rtx_cost is,
> well ... gross.

Indeed. And it is one of the major blockers for fully separating the
RTL, GIMPLE and target code off into separate modules. It would be
great to get rid of RTL from IVOPTS too...

Ciao!
Steven

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Richard Henderson

On 07/14/2011 03:03 AM, Bernd Schmidt wrote:
> +++ gcc/config/ia64/ia64.c(working copy)
> @@ -1047,7 +1047,7 @@
>tmp = gen_rtx_PLUS (Pmode, tmp, pic_offset_table_rtx);
>emit_insn (gen_rtx_SET (VOIDmode, dest, tmp));
>  
> -  tmp = gen_rtx_LO_SUM (Pmode, dest, src);
> +  tmp = gen_rtx_LO_SUM (Pmode, gen_rtx_MEM (Pmode, dest), src);

And the bug stems from ... what? 

Is this bug still fixed if you change this to gen_const_mem?

This is a load from the .got.  It's constant memory, and it's
always present.  There's nowhere in the function that we cannot
move this load (assuming the address is computed properly) 
where this load will fail.

It's difficult to tell if your raw gen_rtx_MEM with no aliasing
info doesn't just paper over a problem by preventing it from
being moved.

r~

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

>
> You need the target hook that tells how big the reassociation based on the
> type.  Machines have different numbers of functional units, for example, maybe
> 3 integer units and 2 floating point units.  For example, in the PowerPC, I
> would set the basic integer and binary floating point types to 2 to match the
> dispatch rules, but for decimal floating point, I would set it to one, since
> the machines currently only have 1 decimal unit.
>
> With the hook, I could see eliminating the switch and/or --param altogether,
> and doing only in the hook.
>

Hook introduced by this patch meets these requirements. But I think it
is useful to have option to override the hook because you cannot tune
it perfectly for each case.

Ilya

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Michael Meissner

On Thu, Jul 14, 2011 at 11:32:59AM +0200, Richard Guenther wrote:
> On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
>  wrote:
> > 2011/7/14 Michael Meissner :
> >> One minor note, you will need to update doc/invoke.texi to document the new
> >> switch before checkin: -ftree-reassoc-width=
> >
> > You also need approval and somebody to review the patch before checkin.
> 
> Btw, rather than a new target hook and an option I suggest to use
> a --param which default you can modify in the backend.

You need the target hook that tells how big the reassociation based on the
type.  Machines have different numbers of functional units, for example, maybe
3 integer units and 2 floating point units.  For example, in the PowerPC, I
would set the basic integer and binary floating point types to 2 to match the
dispatch rules, but for decimal floating point, I would set it to one, since
the machines currently only have 1 decimal unit.

With the hook, I could see eliminating the switch and/or --param altogether,
and doing only in the hook.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera


And of course, here is the patch and ChangeLog.

2011-07-14  Daniel Carrera  

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function. Use "exit(EXIT_FAILURE)".
(_gfortran_caf_register): Use caf_runtime_error.
(_gfortran_caf_sync_images): Use "exit(EXIT_FAILURE)".
* caf/mpi.c (caf_runtime_error): Remove "error" parameter.
Return EXIT_FAILURE instead.
(_gfortran_caf_register): Update call to caf_runtime_error.
(_gfortran_caf_sync_all): Ditto.
(_gfortran_caf_sync_images): Ditto.
(_gfortran_caf_error_stop_str): Use "exit(EXIT_FAILURE)".


Now I'm just waiting for SVN update before I commit...


On 07/14/2011 05:34 PM, Daniel Carrera wrote:

Hi Tobias,

As per your suggestion, I'm copying to gfotran and gcc-patches. I've
made the change your asked and I'm going to commit the patch.

Cheers,
Daniel.


On 07/14/2011 05:31 PM, Tobias Burnus wrote:

On 07/14/2011 05:05 PM, Daniel Carrera wrote:

+caf_runtime_error (const char *message, ...)
+{
+ va_list ap;
+ fprintf (stderr, "Fortran runtime error.");


Could you replace "." by ": " (colon, space), I think "Fortran runtime
error: Could not ..." looks better than "Fortran runtime error.Could not
...".

Otherwise, the patch is OK. Could you then send the patch as to fortran@
and gcc-patches@?

Tobias

PS: You could directly commit the modified patch, simply reply to this
email, add a CC to fortran@/gcc-patches@ and attach the patch and
changelog. There is no need that I approve the patch again.






--
I'm not overweight, I'm undertall.
Index: libgfortran/caf/single.c
===
--- libgfortran/caf/single.c	(revision 176230)
+++ libgfortran/caf/single.c	(working copy)
@@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI
 #include   /* For fputs and fprintf.  */
 #include  /* For exit and malloc.  */
 #include  /* For memcpy and memset.  */
+#include  /* For variadic arguments.  */
 
 /* Define GFC_CAF_CHECK to enable run-time checking.  */
 /* #define GFC_CAF_CHECK  1  */
@@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with mpi.c.  */
+static void
+caf_runtime_error (const char *message, ...)
+{
+  va_list ap;
+  fprintf (stderr, "Fortran runtime error: ");
+  va_start (ap, message);
+  fprintf (stderr, message, ap);
+  va_end (ap);
+  fprintf (stderr, "\n");
+
+  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
+  exit (EXIT_FAILURE);
+}
+
 void
 _gfortran_caf_init (int *argc __attribute__ ((unused)),
 		char ***argv __attribute__ ((unused)),
@@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, 
 
   if (unlikely (local == NULL || token == NULL))
 {
+  const char msg[] = "Failed to allocate coarray";
   if (stat)
 	{
 	  *stat = 1;
 	  if (errmsg_len > 0)
 	{
-	  const char msg[] = "Failed to allocate coarray";
 	  int len = ((int) sizeof (msg) > errmsg_len) ? errmsg_len
 			  : (int) sizeof (msg);
 	  memcpy (errmsg, msg, len);
@@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, 
 	  return NULL;
 	}
   else
-	{
-	  fprintf (stderr, "ERROR: Failed to allocate coarray");
-	  exit (1);
-	}
+	  caf_runtime_error (msg);
 }
 
   if (stat)
@@ -140,7 +153,7 @@ _gfortran_caf_sync_images (int count __a
   {
 	fprintf (stderr, "COARRAY ERROR: Invalid image index %d to SYNC "
 		 "IMAGES", images[i]);
-	exit (1);
+	exit (EXIT_FAILURE);
   }
 #endif
 
Index: libgfortran/caf/mpi.c
===
--- libgfortran/caf/mpi.c	(revision 176230)
+++ libgfortran/caf/mpi.c	(working copy)
@@ -47,8 +47,9 @@ static int caf_is_finalized;
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with single.c.  */
 static void
-caf_runtime_error (int error, const char *message, ...)
+caf_runtime_error (const char *message, ...)
 {
   va_list ap;
   fprintf (stderr, "Fortran runtime error on image %d: ", caf_this_image);
@@ -59,10 +60,10 @@ caf_runtime_error (int error, const char
 
   /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
   /* FIXME: Do some more effort than just MPI_ABORT.  */
-  MPI_Abort (MPI_COMM_WORLD, error);
+  MPI_Abort (MPI_COMM_WORLD, EXIT_FAILURE);
 
   /* Should be unreachable, but to make sure also call exit.  */
-  exit (2);
+  exit (EXIT_FAILURE);
 }
 
 
@@ -179,7 +180,7 @@ error:
 	  }
   }
 else
-  caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : 1, msg);
+  caf_runtime_error (msg);
   }
 
   return NULL;
@@ -223,7 +224,7 @@ _gfortran_caf_sync_all (int *stat, char 
 	memset (&errmsg[len], ' ', errmsg_len-len);
 	}
   else
-	caf_runtime_error (caf_is_finalized ? STAT_STOPPED_IMAGE : ierr, msg);
+	caf_runtime_error (msg);
 }
 }
 
@@ -291,7 +292,7 @@ _gfortran_caf_sync_images (int count, in
 	memset (&errmsg[len], ' ', errmsg_

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera


Hi Tobias,

As per your suggestion, I'm copying to gfotran and gcc-patches. I've 
made the change your asked and I'm going to commit the patch.


Cheers,
Daniel.


On 07/14/2011 05:31 PM, Tobias Burnus wrote:

On 07/14/2011 05:05 PM, Daniel Carrera wrote:

+caf_runtime_error (const char *message, ...)
+{
+ va_list ap;
+ fprintf (stderr, "Fortran runtime error.");


Could you replace "." by ": " (colon, space), I think "Fortran runtime
error: Could not ..." looks better than "Fortran runtime error.Could not
...".

Otherwise, the patch is OK. Could you then send the patch as to fortran@
and gcc-patches@?

Tobias

PS: You could directly commit the modified patch, simply reply to this
email, add a CC to fortran@/gcc-patches@ and attach the patch and
changelog. There is no need that I approve the patch again.



--
I'm not overweight, I'm undertall.

Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Martin Jambor

Hi,

like with the previous mail, I'll reply to the comments here and then
post a new version of the patch in a separate thread.

On Sun, Jul 10, 2011 at 07:04:21PM +0200, Jan Hubicka wrote:
> > 
> > /* If checking is enabled, verify that no lattice is in the TOP state, i.e. 
> > not
> >bottom, not containing a variable component and without any known value 
> > at
> >the same time.  */
> > 
> > static void
> > verify_propagated_values (void)
> > {
> > #ifdef ENABLE_CHECKING
> 
> Hmm, would not be better to keep this function around to be called from 
> debugger, like
> other verifiers do?

OK.

> >   struct cgraph_node *node;
> > 
> >   FOR_EACH_DEFINED_FUNCTION (node)
> > {
> >   struct ipa_node_params *info = IPA_NODE_REF (node);
> >   int i, count = ipa_get_param_count (info);
> > 
> >   for (i = 0; i < count; i++)
> > {
> >   struct ipcp_lattice *lat = ipa_get_lattice (info, i);
> > 
> >   if (!lat->bottom
> >   && !lat->contains_variable
> >   && lat->values_count == 0)
> > {
> >   if (dump_file)
> > {
> >   fprintf (dump_file, "\nIPA lattices after constant "
> >"propagation:\n");
> >   print_all_lattices (dump_file, true, false);
> > }
> > 
> >   gcc_unreachable ();
> > }
> > }
> > }
> > #endif
> > }
> > 
> > /* Propagate values through a pass-through jump function JFUNC associated 
> > with
> >edge CS, taking values from SRC_LAT and putting them into DEST_LAT.  
> > SRC_IDX
> >is the index of the source parameter.  */
> > 
> > static bool
> > propagate_vals_accross_pass_through (struct cgraph_edge *cs,
> >  struct ipa_jump_func *jfunc,
> >  struct ipcp_lattice *src_lat,
> >  struct ipcp_lattice *dest_lat,
> >  int src_idx)
> > {
> >   struct ipcp_value *src_val;
> >   bool ret = false;
> > 
> >   if (jfunc->value.pass_through.operation == NOP_EXPR)
> > for (src_val = src_lat->values; src_val; src_val = src_val->next)
> >   ret |= add_value_to_lattice (dest_lat, src_val->value, cs,
> >src_val, src_idx);
> >   else if (edge_within_scc (cs))
> > ret = set_lattice_contains_variable (dest_lat);
> 
> Hmm, is the reason for not using artimetic within SCC documented somewhere?
> It seems like code someone will eventually run into and remove without much 
> of consideration.

I added a comment.

> > 
> > /* Calculate devirtualization time bonus for NODE, assuming we know 
> > KNOWN_CSTS
> >and KNOWN_BINFOS.  */
> > 
> > static int
> > devirtualization_time_bonus (struct cgraph_node *node,
> >  VEC (tree, heap) *known_csts,
> >  VEC (tree, heap) *known_binfos)
> 
> Eventually it would be nice to make this common with inliner analysis.  We 
> want to 
> increase inlining limits here too

True.

> >   /* Only bare minimum benefit for clearly un-inlineable targets.  */
> >   res += 1;
> >   callee = cgraph_get_node (target);
> >   if (!callee)
> > continue;
> 
> Hmm, when you test it here, you might probably ask if callee is analyzed and 
> inlinable then ;)

Good point, did that.

> > 
> > /* Return true if cloning NODE is a good idea, given the estimated 
> > TIME_BENEFIT
> >and SIZE_COST and with the sum of frequencies of incoming edges to the
> >potential new clone in FREQUENCIES.  */
> > 
> > static bool
> > good_cloning_opportunity_p (struct cgraph_node *node, int time_benefit,
> > int freq_sum, gcov_type count_sum, int size_cost)
> > {
> >   if (time_benefit == 0
> >   || !flag_ipa_cp_clone
> >   || !optimize_function_for_speed_p (DECL_STRUCT_FUNCTION (node->decl)))
> > return false;
> 
> Still I think cloning oppurtunity is good if the saving on call size reduce 
> enough
> to pay back for function body duplication.
> It would probably make sense then to create simple wrapper functions instead 
> of
> duplicating the function body.

I was already doing that for considering IPA-CP opportunities for all
known contexts - and the effect is more profound still now when we
consider moving costs.

On the other side, when opportunities for function specialization are
concerned, size is always considered a cost and never a benefit when
doing the calculations.  This simplifies a the code a bit because the
size savings are on the side of the caller and so it depends on the
number of call sites that actually get redirected.  That is not known
when estimating size but only we did specialization decisions for all
the callers and so it would need to be an extra thing stored along
values.  I think that since such calls should be picked up by
inlining, it's not really worth the added data and complexity.  But I
could add it later, sure.

> 
> > 
> >   gcc_checking

[testsuite]: Some test case skips.

2011-07-14 Thread Georg-Johann Lay

This patchlet fixes two test cases:

* gcc.dg/pr32912-2.c: TImode is overkill for AVR.
* gcc.dg/pr44674.c: Test if -fprofile-generate is available.

Ok to commit?

Johann

testsuite/
* gcc.dg/pr32912-2.c: Skip for AVR.
* gcc.dg/pr44674.c: Add dg-require-profiling.


Index: gcc.dg/pr32912-2.c
===
--- gcc.dg/pr32912-2.c  (revision 176136)
+++ gcc.dg/pr32912-2.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -w" } */
+/* { dg-skip-if "TImode not supported" { "avr-*-*" } { "*" } { "" } } */

 extern void abort (void);

Index: gcc.dg/pr44674.c
===
--- gcc.dg/pr44674.c(revision 176136)
+++ gcc.dg/pr44674.c(working copy)
@@ -1,5 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options "-O -fprofile-generate" } */
+/* { dg-require-profiling "-fprofile-generate" } */

 void
 jumpfunc (void *p)

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

> 2011/7/14 Richard Guenther :
>>
>> But then how comes the option to override it is useful?  It isn't dependent
>> on the particular case.  At least the option should be a --param.
>>
>> Richard.
>>
>
> Option is still useful if you want to try feature on platform with no
> hook implemented and for other performance experiments. I agree
> --param usage should be better here. I'll fix it.
>
> Ilya
>

Here is a patch with new param instead of new option. Bootstrapped and
checked on x86_64-linux.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data

Re: [PATCH (1/7)] New optab framework for widening multiplies

2011-07-14 Thread Andrew Stubbs


Ping. This is the last unreviewed patch in this series ...

Thanks

Andrew

On 09/07/11 15:43, Andrew Stubbs wrote:

On 23/06/11 15:37, Andrew Stubbs wrote:

This patch should have no effect on the compiler output. It merely
replaces one way to represent widening operations with another, and
refactors the other parts of the compiler to match. The rest of the
patch set uses this new framework to implement the optimization
improvements.

I considered and discarded many approaches to this patch before arriving
at this solution, and I feel sure that there'll be somebody out there
who will think I chose the wrong one, so let me first explain how I got
here 

The aim is to be able to encode and query optabs that have any given
input mode, and any given output mode. This is similar to the
convert_optab, but not compatible with that optab since it is handled
completely differently in the code.

(Just to be clear, the existing widening multiply support only covers
instructions that widen by *one* mode, so it's only ever been necessary
to know the output mode, up to now.)

Option 1 was to add a second dimension to the handlers table in optab_d,
but I discarded this option because it would increase the memory usage
by the square of the number of modes, which is a bit much.

Option 2 was to add a whole new optab, similar to optab_d, but with a
second dimension like convert_optab_d, however this turned out to cause
way too many pointer type mismatches in the code, and would have been
very difficult to fix up.

Option 3 was to add new optab entries for widening by two modes, by
three modes, and so on. True, I would only need to add one extra set for
what I need, but there would be so many places in the code that compare
against smul_widen_optab, for example, that would need to be taught
about these, that it seemed like a bad idea.

Option 4 was to have a separate table that contained the widening
operations, and refer to that whenever a widening entry in the main
optab is referenced, but I found that there was no easy way to do the
mapping without putting some sort of switch table in
widening_optab_handler, and that negates the other advantages.

So, what I've done in the end is add a new pointer entry "widening" into
optab_d, and dynamically build the widening operations table for each
optab that needs it. I've then added new accessor functions that take
both input and output modes, and altered the code to use them where
appropriate.

The down-side of this approach is that the optab entries for widening
operations now have two "handlers" tables, one of which is redundant.
That said, those cases are in the minority, and it is the smaller table
which is unused.

If people find that very distasteful, it might be possible to remove the
*_widen_optab entries and unify smul_optab with smul_widen_optab, and so
on, and save space that way. I've not done so yet, but I expect I could
if people feel strongly about it.

As a side-effect, it's now possible for any optab to be "widening",
should some target happen to have a widening add, shift, or whatever.

Is this patch OK?


This update has been rebaselined to fix some conflicts with other recent
commits in this area.

I also identified a small bug which resulted in the operands to some
commutative operations being reversed. I don't believe the bug did any
harm, logically speaking, but I suppose there could be a testcase that
resulted in worse code being generated. With this fix, I now see exactly
matching output in all my testcases.

Andrew

Re: [Patch, AVR]: Fix PR49487 (ICE for wrong rotate scratch)

2011-07-14 Thread Richard Henderson

On 07/14/2011 12:41 AM, Georg-Johann Lay wrote:
>> Ok to commit and back-port to 4.6?
>> 
>> Johann
>> 
>>  PR target/49487
>>  * config/avr/avr.md (rotl3): Generate SCRATCH instead
>>  of REG.
>>  (*rotw): Use const_int_operand for operands2.
>>  Use match_scatch for operands3.
>>  (*rotb): Ditto
>>  * config/avr/avr.c (avr_rotate_bytes): Treat SCRATCH.

Ok.


r~

[PATCH] Fix PR49651

2011-07-14 Thread Richard Guenther


This fixes PR49651 where we fail to handle aggregate dereferences
in call arguments in PTA properly.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied
where applicable.

Richard.

2011-07-14  Richard Guenther  

PR tree-optimization/49651
* tree-ssa-structalias.c (get_constraint_for_1): Properly
handle dereferences with subvariables.

* gcc.dg/torture/pr49651.c: New testcase.

Index: gcc/tree-ssa-structalias.c
===
*** gcc/tree-ssa-structalias.c  (revision 176272)
--- gcc/tree-ssa-structalias.c  (working copy)
*** get_constraint_for_1 (tree t, VEC (ce_s,
*** 3349,3357 
  
  /* If we are not taking the address then make sure to process
 all subvariables we might access.  */
  cs = *VEC_last (ce_s, *results);
! if (address_p
! || cs.type != SCALAR)
return;
  
  vi = get_varinfo (cs.var);
--- 3349,3366 
  
  /* If we are not taking the address then make sure to process
 all subvariables we might access.  */
+ if (address_p)
+   return;
+ 
  cs = *VEC_last (ce_s, *results);
! if (cs.type == DEREF)
!   {
! /* For dereferences this means we have to defer it
!to solving time.  */
! VEC_last (ce_s, *results)->offset = UNKNOWN_OFFSET;
! return;
!   }
! if (cs.type != SCALAR)
return;
  
  vi = get_varinfo (cs.var);
Index: gcc/testsuite/gcc.dg/torture/pr49651.c
===
*** gcc/testsuite/gcc.dg/torture/pr49651.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr49651.c  (revision 0)
***
*** 0 
--- 1,31 
+ /* { dg-do run } */
+ 
+ extern void abort (void);
+ 
+ struct X {
+ int *p;
+ int *q;
+ };
+ 
+ void __attribute__((noinline, noclone))
+ foo (struct X x) { *x.q = 0; }
+ 
+ volatile int what;
+ struct X y;
+ 
+ int main()
+ {
+   int i, j;
+   struct X x, *p;
+   x.p = &i;
+   x.q = &j;
+   if (what)
+ p = &y;
+   else
+ p = &x;
+   j = 1;
+   foo (*p);
+   if (j != 0)
+ abort ();
+   return 0;
+ }

Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 4:38 PM, Andrew Stubbs  wrote:
> On 28/06/11 17:23, Andrew Stubbs wrote:
>>
>> On 23/06/11 15:43, Andrew Stubbs wrote:
>>>
>>> Patch 4 introduced support for using signed multiplies to code unsigned
>>> multiplies in a narrower mode. Patch 5 then introduced support for
>>> mis-matched input modes.
>>>
>>> These two combined mean that there is case where only the smaller of two
>>> inputs is unsigned, and yet it still tries to user a mode wider than the
>>> larger, signed input. This is bad because it means unnecessary extends
>>> and because the wider operation might not exist.
>>>
>>> This patch catches that case, and ensures that the smaller, unsigned
>>> input, is zero-extended to match the mode of the larger, signed input.
>>>
>>> Of course, both inputs may still have to be extended to fit the nearest
>>> available instruction, so it doesn't make a difference every time.
>>>
>>> OK?
>>
>> This update fixes Janis' issue with the testsuite.
>
> And this version is updated to fit the changes made earlier in the series,
> and also to use the precision, instead of the mode-size, in order to better
> optimize bitfields.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

Re: [PATCH (7/7)] Mixed-sign multiplies using narrowest mode

2011-07-14 Thread Andrew Stubbs


On 28/06/11 17:23, Andrew Stubbs wrote:

On 23/06/11 15:43, Andrew Stubbs wrote:

Patch 4 introduced support for using signed multiplies to code unsigned
multiplies in a narrower mode. Patch 5 then introduced support for
mis-matched input modes.

These two combined mean that there is case where only the smaller of two
inputs is unsigned, and yet it still tries to user a mode wider than the
larger, signed input. This is bad because it means unnecessary extends
and because the wider operation might not exist.

This patch catches that case, and ensures that the smaller, unsigned
input, is zero-extended to match the mode of the larger, signed input.

Of course, both inputs may still have to be extended to fit the nearest
available instruction, so it doesn't make a difference every time.

OK?


This update fixes Janis' issue with the testsuite.


And this version is updated to fit the changes made earlier in the 
series, and also to use the precision, instead of the mode-size, in 
order to better optimize bitfields.


OK?

Andrew
2011-06-24  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Better handle
	unsigned inputs of different modes.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-9.c: New file.
	* gcc.target/arm/wmul-bitfield-2.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-9.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, short *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-bitfield-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+struct bf
+{
+  int a : 3;
+  unsigned int b : 15;
+  int c : 3;
+};
+
+long long
+foo (long long a, struct bf b, struct bf c)
+{
+  return a + b.b * c.c;
+}
+
+/* { dg-final { scan-assembler "smlalbb" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2121,9 +2121,18 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
   if (op != smul_widen_optab)
 	{
-	  from_mode = GET_MODE_WIDER_MODE (from_mode);
-	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
-	return false;
+	  /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+	  if ((TYPE_UNSIGNED (type1)
+	   && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (TYPE_UNSIGNED (type2)
+		  && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+		return false;
+	}
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
@@ -2290,14 +2299,20 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   /* There's no such thing as a mixed sign madd yet, so use a wider mode.  */
   if (from_unsigned1 != from_unsigned2)
 {
-  enum machine_mode mode = GET_MODE_WIDER_MODE (from_mode);
-  if (GET_MODE_PRECISION (mode) < GET_MODE_PRECISION (to_mode))
+  /* We can use a signed multiply with unsigned types as long as
+	 there is a wider mode to use, or it is the smaller of the two
+	 types that is unsigned.  Note that type1 >= type2, always.  */
+  if ((from_unsigned1
+	   && TYPE_PRECISION (type1) == GET_MODE_PRECISION (from_mode))
+	  || (from_unsigned2
+	  && TYPE_PRECISION (type2) == GET_MODE_PRECISION (from_mode)))
 	{
-	  from_mode = mode;
-	  from_unsigned1 = from_unsigned2 = false;
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (from_mode) >= GET_MODE_SIZE (to_mode))
+	return false;
 	}
-  else
-	return false;
+
+  from_unsigned1 = from_unsigned2 = false;
 }
 
   /* If there was a conversion between the multiply and addition

Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 4:34 PM, Andrew Stubbs  wrote:
> On 07/07/11 11:13, Richard Guenther wrote:
>>>
>>> This updates the context changed by my update to patch 3.
>>> >
>>> >  The content of this patch has not changed.
>>
>> Ok.
>
> I know this patch was already approved, but I discovered a bug in this patch
> that missed optimizing the case where the input to multiply did not come
> from an assign statement (this can happen when the value comes from a
> function parameter).
>
> This patch fixes that case, and updates the context changed by my updates
> earlier in the patch series.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

Re: [PATCH][1/N][C][C++][Fortran][Java] Change POINTER_PLUS_EXPR offset type requirements

2011-07-14 Thread Richard Henderson

On 07/14/2011 04:28 AM, Richard Guenther wrote:
> On Tue, 12 Jul 2011, Richard Guenther wrote:
> 
>>
>> This patch is step 1, it abstracts away the type of the offset operand
>> when building a POINTER_PLUS_EXPR.  It does so by introducing
>> fold_build_pointer_plus_expr{_hwi,_loc} helpers and use them in all
>> places that currently build POINTER_PLUS_EXPRs (including those
>> that do not fold the result).  This patch will make steps 2 and more
>> purely middle-end (fingers crossed).
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu for all languages.
>>
>> I also built cc1 for all supported targets via contrib/config-list.mk
>> and saw no fails that relate to this patch.
>>
>> Are the frontend parts ok?
> 
> Ping.  Joseph seems to be on vacation - Richard, can you approve
> the C frontend/family parts?

Yeah, this patch is ok.


r~

Re: [PATCH (6/7)] More widening multiply-and-accumulate pattern matching

2011-07-14 Thread Andrew Stubbs


On 07/07/11 11:13, Richard Guenther wrote:

This updates the context changed by my update to patch 3.
>
>  The content of this patch has not changed.

Ok.


I know this patch was already approved, but I discovered a bug in this 
patch that missed optimizing the case where the input to multiply did 
not come from an assign statement (this can happen when the value comes 
from a function parameter).


This patch fixes that case, and updates the context changed by my 
updates earlier in the patch series.


OK?

Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_rhs_p): Add new argument
	'type'.
	Use 'type' from caller, not inferred from 'rhs'.
	Don't reject non-conversion statements. Do return lhs in this case.
	(is_widening_mult_p): Add new argument 'type'.
	Use 'type' from caller, not inferred from 'stmt'.
	Pass type to is_widening_mult_rhs_p.
	(convert_mult_to_widen): Pass type to is_widening_mult_p.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-8.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, int *b, int *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -1965,7 +1965,8 @@ struct gimple_opt_pass pass_optimize_bswap =
  }
 };
 
-/* Return true if RHS is a suitable operand for a widening multiplication.
+/* Return true if RHS is a suitable operand for a widening multiplication,
+   assuming a target type of TYPE.
There are two cases:
 
  - RHS makes some value at least twice as wide.  Store that value
@@ -1975,32 +1976,43 @@ struct gimple_opt_pass pass_optimize_bswap =
but leave *TYPE_OUT untouched.  */
 
 static bool
-is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
+is_widening_mult_rhs_p (tree type, tree rhs, tree *type_out,
+			tree *new_rhs_out)
 {
   gimple stmt;
-  tree type, type1, rhs1;
+  tree type1, rhs1;
   enum tree_code rhs_code;
 
   if (TREE_CODE (rhs) == SSA_NAME)
 {
-  type = TREE_TYPE (rhs);
   stmt = SSA_NAME_DEF_STMT (rhs);
   if (!is_gimple_assign (stmt))
-	return false;
-
-  rhs_code = gimple_assign_rhs_code (stmt);
-  if (TREE_CODE (type) == INTEGER_TYPE
-	  ? !CONVERT_EXPR_CODE_P (rhs_code)
-	  : rhs_code != FIXED_CONVERT_EXPR)
-	return false;
+	{
+	  rhs1 = NULL;
+	  type1 = TREE_TYPE (rhs);
+	}
+  else
+	{
+	  rhs1 = gimple_assign_rhs1 (stmt);
+	  type1 = TREE_TYPE (rhs1);
+	}
 
-  rhs1 = gimple_assign_rhs1 (stmt);
-  type1 = TREE_TYPE (rhs1);
   if (TREE_CODE (type1) != TREE_CODE (type)
 	  || TYPE_PRECISION (type1) * 2 > TYPE_PRECISION (type))
 	return false;
 
-  *new_rhs_out = rhs1;
+  if (rhs1)
+	{
+	  rhs_code = gimple_assign_rhs_code (stmt);
+	  if (TREE_CODE (type) == INTEGER_TYPE
+	  ? !CONVERT_EXPR_CODE_P (rhs_code)
+	  : rhs_code != FIXED_CONVERT_EXPR)
+	*new_rhs_out = rhs;
+	  else
+	*new_rhs_out = rhs1;
+	}
+  else
+	*new_rhs_out = rhs;
   *type_out = type1;
   return true;
 }
@@ -2015,28 +2027,27 @@ is_widening_mult_rhs_p (tree rhs, tree *type_out, tree *new_rhs_out)
   return false;
 }
 
-/* Return true if STMT performs a widening multiplication.  If so,
-   store the unwidened types of the operands in *TYPE1_OUT and *TYPE2_OUT
-   respectively.  Also fill *RHS1_OUT and *RHS2_OUT such that converting
-   those operands to types *TYPE1_OUT and *TYPE2_OUT would give the
-   operands of the multiplication.  */
+/* Return true if STMT performs a widening multiplication, assuming the
+   output type is TYPE.  If so, store the unwidened types of the operands
+   in *TYPE1_OUT and *TYPE2_OUT respectively.  Also fill *RHS1_OUT and
+   *RHS2_OUT such that converting those operands to types *TYPE1_OUT
+   and *TYPE2_OUT would give the operands of the multiplication.  */
 
 static bool
-is_widening_mult_p (gimple stmt,
+is_widening_mult_p (tree type, gimple stmt,
 		tree *type1_out, tree *rhs1_out,
 		tree *type2_out, tree *rhs2_out)
 {
-  tree type;
-
-  type = TREE_TYPE (gimple_assign_lhs (stmt));
   if (TREE_CODE (type) != INTEGER_TYPE
   && TREE_CODE (type) != FIXED_POINT_TYPE)
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs1 (stmt), type1_out, rhs1_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs1 (stmt), type1_out,
+			   rhs1_out))
 return false;
 
-  if (!is_widening_mult_rhs_p (gimple_assign_rhs2 (stmt), type2_out, rhs2_out))
+  if (!is_widening_mult_rhs_p (type, gimple_assign_rhs2 (stmt), type2_out,
+			   rhs2_out))
 return false;
 
   if (*type1_out == NULL)
@@ -2088,7 +2099,7 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
   if (TREE_CODE (type) != INTEGER_TYPE)
 return false;
 
-  if (!is_widening_mult_p (stmt, &type1, &rhs1, &type2, &rhs2))
+  if (!is_widening_m

Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 4:28 PM, Andrew Stubbs  wrote:
> I've updated this patch following the changes earlier in the patch series.
> There isn't much left.
>
> This should obviate all the review comments. :)

Indeed ;)

> OK?

Ok.

Thanks,
Richard.

> Andrew
>

Re: [PATCH (5/7)] Widening multiplies for mis-matched mode inputs

2011-07-14 Thread Andrew Stubbs

I've updated this patch following the changes earlier in the patch 
series. There isn't much left.


This should obviate all the review comments. :)

OK?

Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (is_widening_mult_p): Remove FIXME.
	Ensure the the larger type is the first operand.

	gcc/testsuite/
	* gcc.target/arm/wmul-7.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+unsigned long long
+foo (unsigned long long a, unsigned char *b, unsigned short *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2053,9 +2053,17 @@ is_widening_mult_p (gimple stmt,
   *type2_out = *type1_out;
 }
 
-  /* FIXME: remove this restriction.  */
-  if (TYPE_PRECISION (*type1_out) != TYPE_PRECISION (*type2_out))
-return false;
+  /* Ensure that the larger of the two operands comes first. */
+  if (TYPE_PRECISION (*type1_out) < TYPE_PRECISION (*type2_out))
+{
+  tree tmp;
+  tmp = *type1_out;
+  *type1_out = *type2_out;
+  *type2_out = tmp;
+  tmp = *rhs1_out;
+  *rhs1_out = *rhs2_out;
+  *rhs2_out = tmp;
+}
 
   return true;
 }

Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 4:23 PM, Andrew Stubbs  wrote:
> On 12/07/11 15:07, Andrew Stubbs wrote:
>>
>> This update does the same thing as before, but updated for the changes
>> earlier in the patch series. In particular, the build_and_insert_cast
>> function and find_widening_optab_handler_and_mode changes have been
>> moved up to patch 2.
>
> And this update changes the way the casts are handled, partly because it got
> unwieldy towards the end of the patch series, and partly because I found a
> few bugs.
>
> I've also ensured that it checks the precision of the types, rather than the
> mode size to ensure that it is bitfield safe.
>
> OK?

Ok.

Thanks,
Richard.

> Andrew
>

Re: [PATCH (4/7)] Unsigned multiplies using wider signed multiplies

2011-07-14 Thread Andrew Stubbs


On 12/07/11 15:07, Andrew Stubbs wrote:

This update does the same thing as before, but updated for the changes
earlier in the patch series. In particular, the build_and_insert_cast
function and find_widening_optab_handler_and_mode changes have been
moved up to patch 2.


And this update changes the way the casts are handled, partly because it 
got unwieldy towards the end of the patch series, and partly because I 
found a few bugs.


I've also ensured that it checks the precision of the types, rather than 
the mode size to ensure that it is bitfield safe.


OK?

Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_mult_to_widen): Convert
	unsupported unsigned multiplies to signed.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-6.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-6.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, unsigned char *b, signed char *c)
+{
+  return a + (long long)*b * (long long)*c;
+}
+
+/* { dg-final { scan-assembler "smlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2067,12 +2067,13 @@ is_widening_mult_p (gimple stmt,
 static bool
 convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 {
-  tree lhs, rhs1, rhs2, type, type1, type2, tmp;
+  tree lhs, rhs1, rhs2, type, type1, type2, tmp = NULL;
   enum insn_code handler;
   enum machine_mode to_mode, from_mode, actual_mode;
   optab op;
   int actual_precision;
   location_t loc = gimple_location (stmt);
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2084,10 +2085,12 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2))
+  if (from_unsigned1 && from_unsigned2)
 op = umul_widen_optab;
-  else if (!TYPE_UNSIGNED (type1) && !TYPE_UNSIGNED (type2))
+  else if (!from_unsigned1 && !from_unsigned2)
 op = smul_widen_optab;
   else
 op = usmul_widen_optab;
@@ -2096,22 +2099,45 @@ convert_mult_to_widen (gimple stmt, gimple_stmt_iterator *gsi)
 		  0, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
-return false;
+{
+  if (op != smul_widen_optab)
+	{
+	  from_mode = GET_MODE_WIDER_MODE (from_mode);
+	  if (GET_MODE_SIZE (to_mode) <= GET_MODE_SIZE (from_mode))
+	return false;
+
+	  op = smul_widen_optab;
+	  handler = find_widening_optab_handler_and_mode (op, to_mode,
+			  from_mode, 0,
+			  &actual_mode);
+
+	  if (handler == CODE_FOR_nothing)
+	return false;
+
+	  from_unsigned1 = from_unsigned2 = false;
+	}
+  else
+	return false;
+}
 
   /* Ensure that the inputs to the handler are in the correct precison
  for the opcode.  This will be the full mode size.  */
   actual_precision = GET_MODE_PRECISION (actual_mode);
-  if (actual_precision != TYPE_PRECISION (type1))
+  if (actual_precision != TYPE_PRECISION (type1)
+  || from_unsigned1 != TYPE_UNSIGNED (type1))
 {
   tmp = create_tmp_var (build_nonstandard_integer_type
-(actual_precision, TYPE_UNSIGNED (type1)),
+(actual_precision, from_unsigned1),
 			NULL);
   rhs1 = build_and_insert_cast (gsi, loc, tmp, rhs1);
-
+}
+  if (actual_precision != TYPE_PRECISION (type2)
+  || from_unsigned2 != TYPE_UNSIGNED (type2))
+{
   /* Reuse the same type info, if possible.  */
-  if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
+  if (!tmp || from_unsigned1 != from_unsigned2)
 	tmp = create_tmp_var (build_nonstandard_integer_type
-(actual_precision, TYPE_UNSIGNED (type2)),
+(actual_precision, from_unsigned2),
 			  NULL);
   rhs2 = build_and_insert_cast (gsi, loc, tmp, rhs2);
 }
@@ -2136,7 +2162,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
   gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
-  tree type, type1, type2, tmp;
+  tree type, type1, type2, optype, tmp = NULL;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
   optab this_optab;
@@ -2145,6 +2171,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   enum machine_mode to_mode, from_mode, actual_mode;
   location_t loc = gimple_location (stmt);
   int actual_precision;
+  bool from_unsigned1, from_unsigned2;
 
   lhs = gimple_assign_lhs (stmt);
   type = TREE_TYPE (lhs);
@@ -2238,9 +2265,21 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 
   to_mode = TYPE_MODE (type);
   from_mode = TYPE_MODE (type1);
+  from_unsigned1 = TYPE_UNSIGNED (type1);
+  from_unsigned2 = TYPE_UNSIGNED (type2);
 
-  if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
-ret

Re: [PATCH (3/7)] Widening multiply-and-accumulate pattern matching

2011-07-14 Thread Andrew Stubbs

This update changes only the context modified by changes to patch 2. The 
patch has already been approved. I'm just posting it for completeness.


Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Permit a single
	conversion statement separating multiply-and-accumulate.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+ int bc = b * c;
+return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2135,6 +2135,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			enum tree_code code)
 {
   gimple rhs1_stmt = NULL, rhs2_stmt = NULL;
+  gimple conv1_stmt = NULL, conv2_stmt = NULL, conv_stmt;
   tree type, type1, type2, tmp;
   tree lhs, rhs1, rhs2, mult_rhs1, mult_rhs2, add_rhs;
   enum tree_code rhs1_code = ERROR_MARK, rhs2_code = ERROR_MARK;
@@ -2177,6 +2178,38 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   else
 return false;
 
+  /* Allow for one conversion statement between the multiply
+ and addition/subtraction statement.  If there are more than
+ one conversions then we assume they would invalidate this
+ transformation.  If that's not the case then they should have
+ been folded before now.  */
+  if (CONVERT_EXPR_CODE_P (rhs1_code))
+{
+  conv1_stmt = rhs1_stmt;
+  rhs1 = gimple_assign_rhs1 (rhs1_stmt);
+  if (TREE_CODE (rhs1) == SSA_NAME)
+	{
+	  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+	  if (is_gimple_assign (rhs1_stmt))
+	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+	}
+  else
+	return false;
+}
+  if (CONVERT_EXPR_CODE_P (rhs2_code))
+{
+  conv2_stmt = rhs2_stmt;
+  rhs2 = gimple_assign_rhs1 (rhs2_stmt);
+  if (TREE_CODE (rhs2) == SSA_NAME)
+	{
+	  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+	  if (is_gimple_assign (rhs2_stmt))
+	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
+	}
+  else
+	return false;
+}
+
   /* If code is WIDEN_MULT_EXPR then it would seem unnecessary to call
  is_widening_mult_p, but we still need the rhs returns.
 
@@ -2190,6 +2223,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs2;
+  conv_stmt = conv1_stmt;
 }
   else if (rhs2_code == MULT_EXPR || rhs2_code == WIDEN_MULT_EXPR)
 {
@@ -2197,6 +2231,7 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
 			   &type2, &mult_rhs2))
 	return false;
   add_rhs = rhs1;
+  conv_stmt = conv2_stmt;
 }
   else
 return false;
@@ -2207,6 +2242,33 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   if (TYPE_UNSIGNED (type1) != TYPE_UNSIGNED (type2))
 return false;
 
+  /* If there was a conversion between the multiply and addition
+ then we need to make sure it fits a multiply-and-accumulate.
+ The should be a single mode change which does not change the
+ value.  */
+  if (conv_stmt)
+{
+  tree from_type = TREE_TYPE (gimple_assign_rhs1 (conv_stmt));
+  tree to_type = TREE_TYPE (gimple_assign_lhs (conv_stmt));
+  int data_size = TYPE_PRECISION (type1) + TYPE_PRECISION (type2);
+  bool is_unsigned = TYPE_UNSIGNED (type1) && TYPE_UNSIGNED (type2);
+
+  if (TYPE_PRECISION (from_type) > TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is a truncate.  */
+	  if (TYPE_PRECISION (to_type) < data_size)
+	return false;
+	}
+  else if (TYPE_PRECISION (from_type) < TYPE_PRECISION (to_type))
+	{
+	  /* Conversion is an extend.  Check it's the right sort.  */
+	  if (TYPE_UNSIGNED (from_type) != is_unsigned
+	  && !(is_unsigned && TYPE_PRECISION (from_type) > data_size))
+	return false;
+	}
+  /* else convert is a no-op for our purposes.  */
+}
+
   /* Verify that the machine can perform a widening multiply
  accumulate in this mode/signedness combination, otherwise
  this transformation is likely to pessimize code.  */

Re: [PATCH (2/7)] Widening multiplies by more than one mode

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 4:10 PM, Andrew Stubbs  wrote:
> On 12/07/11 12:04, Richard Guenther wrote:
>>
>> I wonder if we want to restrict the WIDEN_* operations to operate
>> on types that have matching type/mode precision(**).
>
> I've now modified the patch to allow bitfields, or other case where the
> precision is smaller than the mode-size. I've also addressed the formatting
> issues you pointed out (and in fact reorganised the code slightly to make
> the rest of the series a bit cleaner).
>
> As in the previous version of this patch, it's necessary to convert the
> input values to the proper mode for the machine instruction, so the basic
> tools for supporting the bitfields were already there - I just had to tweak
> the conditionals to take bitfields into account.
>
> The only this I haven't done is modified tree.def. Looking at it though, I
> don't thing any needs changing? The code is still valid, and the comments
> are correct (in fact, they may have been wrong before).

Ah, it indeed talks about at least twice the precision already.

> Is this version OK?

Ok.

Thanks,
Richard.

> Andrew
>

Re: [PATCH (2/7)] Widening multiplies by more than one mode

2011-07-14 Thread Andrew Stubbs


On 12/07/11 12:04, Richard Guenther wrote:

I wonder if we want to restrict the WIDEN_* operations to operate
on types that have matching type/mode precision(**).


I've now modified the patch to allow bitfields, or other case where the 
precision is smaller than the mode-size. I've also addressed the 
formatting issues you pointed out (and in fact reorganised the code 
slightly to make the rest of the series a bit cleaner).


As in the previous version of this patch, it's necessary to convert the 
input values to the proper mode for the machine instruction, so the 
basic tools for supporting the bitfields were already there - I just had 
to tweak the conditionals to take bitfields into account.


The only this I haven't done is modified tree.def. Looking at it though, 
I don't thing any needs changing? The code is still valid, and the 
comments are correct (in fact, they may have been wrong before).


Is this version OK?

Andrew
2011-07-14  Andrew Stubbs  

	gcc/
	* config/arm/arm.md (maddhidi4): Remove '*' from name.
	* expr.c (expand_expr_real_2): Use find_widening_optab_handler.
	* optabs.c (find_widening_optab_handler_and_mode): New function.
	(expand_widen_pattern_expr): Use find_widening_optab_handler.
	(expand_binop_directly): Likewise.
	(expand_binop): Likewise.
	* optabs.h (find_widening_optab_handler): New macro define.
	(find_widening_optab_handler_and_mode): New prototype.
	* tree-cfg.c (verify_gimple_assign_binary): Adjust WIDEN_MULT_EXPR
	type precision rules.
	(verify_gimple_assign_ternary): Likewise for WIDEN_MULT_PLUS_EXPR.
	* tree-ssa-math-opts.c (build_and_insert_cast): New function.
	(is_widening_mult_rhs_p): Allow widening by more than one mode.
	Explicitly disallow mis-matched input types.
	(convert_mult_to_widen): Use find_widening_optab_handler, and cast
	input types to fit the new handler.
	(convert_plusminus_to_widen): Likewise.

	gcc/testsuite/
	* gcc.target/arm/wmul-bitfield-1.c: New file.

--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -1857,7 +1857,7 @@
(set_attr "predicable" "yes")]
 )
 
-(define_insn "*maddhidi4"
+(define_insn "maddhidi4"
   [(set (match_operand:DI 0 "s_register_operand" "=r")
 	(plus:DI
 	  (mult:DI (sign_extend:DI
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7638,19 +7638,16 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	{
 	  enum machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode))
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+		!= CODE_FOR_nothing)
 	{
-	  if (widening_optab_handler (this_optab, mode, innermode)
-		!= CODE_FOR_nothing)
-		{
-		  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
-		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
- EXPAND_NORMAL);
-		  else
-		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
- EXPAND_NORMAL);
-		  goto binop3;
-		}
+	  if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
+		expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
+ EXPAND_NORMAL);
+	  else
+		expand_operands (treeop0, treeop1, NULL_RTX, &op1, &op0,
+ EXPAND_NORMAL);
+	  goto binop3;
 	}
 	}
   /* Check for a multiplication with matching signedness.  */
@@ -7665,10 +7662,9 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	  optab other_optab = zextend_p ? smul_widen_optab : umul_widen_optab;
 	  this_optab = zextend_p ? umul_widen_optab : smul_widen_optab;
 
-	  if (mode == GET_MODE_2XWIDER_MODE (innermode)
-	  && TREE_CODE (treeop0) != INTEGER_CST)
+	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	{
-	  if (widening_optab_handler (this_optab, mode, innermode)
+	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
 		!= CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -7677,7 +7673,7 @@ expand_expr_real_2 (sepops ops, rtx target, enum machine_mode tmode,
 	   unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	  if (widening_optab_handler (other_optab, mode, innermode)
+	  if (find_widening_optab_handler (other_optab, mode, innermode, 0)
 		!= CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -225,6 +225,37 @@ add_equal_note (rtx insns, rtx target, enum rtx_code code, rtx op0, rtx op1)
   return 1;
 }
 
+/* Find a widening optab even if it doesn't widen as much as we want.
+   E.g. if from_mode is HImode, and to_mode is DImode, and there is no
+   direct HI->SI insn, then return SI->DI, if that exists.
+   If PERMIT_NON_WIDENING is non-zero then this can be used with
+   non-widening optabs also.  */
+
+enum insn_code
+find_widening_optab_handler_and_mode (optab op, enum machine_mode to_mode,
+  enum machine_mode from_mode,
+  int permit_non_widening,
+  enum machine_mode *found_mode)
+{
+  for (; (permit_non_widening || from_mode !=

Re: [PATCH] New IPA-CP with real function cloning

2011-07-14 Thread Martin Jambor

Hi,

I'll send a new version of IPA-CP incorporating most of the feedback
in a new thread but let me also comment on some of the points here:

On Fri, Jul 08, 2011 at 08:24:31PM +0200, Jan Hubicka wrote:
> > > > {
> > > >   /* Pointer to an array of structures describing individual formal
> > > >  parameters.  */
> > > >   struct ipcp_lattice *lattices;
> > > 
> > > Hmm, how we get here around the need to mark this GTY(). I.e are we sure 
> > > that all the known_vals
> > > must be referneced from elsewhere at ggc time?
> > 
> > (Scalar) constants that are results of arithmetic jump functions may
> > not be referenced from elsewhere, everything else is referenced from
> > the jump functions.  If it is a problem it is already present in the
> > current IPA-CP.  ipa_node_params and lattices are not GTYed there
> > either.
> 
> Hmm, I guess it is not really problem only because the lattices are used
> only in ipa-cp so the values do not really live across GGC call.

Well, technically they survive until after inlining (because of
indirect inlining which also derives information from the lattices
corresponding to node->inlined_to node.  Results of arithmetic
functions are not going to be accessed during inlining when compiling
any reasonable program but...

> > > > 
> > > > /* ipa_edge_args stores information related to a callsite and 
> > > > particularly its
> > > >arguments.  It can be accessed by the IPA_EDGE_REF macro.  */
> > > > typedef struct GTY(()) ipa_edge_args
> > > 
> > > probably edge_summary would be my preferred name.
> > 
> > Ugh, this is the current name, we may change it later.  In any event
> > the name should probably tell that the summary is about parameters.
> 
> Hmm, OK, it is not bad name after all.

:-)

> > > 
> > > >   || !inline_summary (node)->versionable
> > > >   || TYPE_ATTRIBUTES (TREE_TYPE (node->decl))
> > > 
> > > This is getting an issue for Fortran that attach the function arguments 
> > > quite commonly,
> > > perhaps we should start moving ahead on this and ruling out only the 
> > > arguments that
> > > gets can't be preserved.
> > 
> > Yes, for example in 433.milc basically all the functions are
> > considered not versionable because of this.  I also thought of not
> > removing parameters for such functions.
> > 
> > > Also you need to test attributes of DECL itself.
> > 
> > Really?  We are not doing it now, nether for IPA-CP nor for IPA-SRA
> > (which is good at triggering problems with these).
> 
> Hmm, all the ipa-inline code checks DECL_ATTRIBUTES only.  I believe the type
> attributes ends up being copied to decl attributes but not vice versa.
> I guess the testcase should be
> 
> __attribute__ ((really_bad_attribute))
> function (int param)
> {
>   use param
> }
> 
> and then let ipa-cp to invent param is a constant.

what would be such a "really_bad_attribute" ? 

> > 
> > > 
> > > I think this all should be part of can_change_signature_p code, since we 
> > > can version
> > > with all attributes I can thunk of when we don't change signature.
> > > 
> > > >   || cgraph_function_body_availability (node) <= AVAIL_OVERWRITABLE)
> > > > res =  false;
> > > >   else
> > > > /* Removing arguments doesn't work if the function takes varargs
> > > >or use __builtin_apply_args.
> > > >FIXME: handle this together with can_change_signature flag.  */
> > > > for (edge = node->callees; edge; edge = edge->next_callee)
> > > >   {
> > > > tree t = edge->callee->decl;
> > > > if (DECL_BUILT_IN_CLASS (t) == BUILT_IN_NORMAL
> > > > && (DECL_FUNCTION_CODE (t) == BUILT_IN_APPLY_ARGS
> > > > || DECL_FUNCTION_CODE (t) == BUILT_IN_VA_START))
> > > 
> > > can_change_sigunature will also handle apply_args.
> > > Add VA_START code there, too.  For the other use of this flag (in i386) 
> > > VA_START

The last one already is VA_START... or do you mean a different one?

> > > rules out the change, too, but so far we explicitely tested that in the 
> > > backend.
> > > 
> > > Iguess with these changes, inline_summary (node)->versionable should 
> > > coincide
> > > with IPA_NODE_REF (node)->node_versionable making this whole code 
> > > unnecesary
> > > (it was supposed to work this way, I am not sure why we now have to 
> > > versionable
> > > flags).
> > 
> > That would be nice.  I was wondering why the difference between the
> > two.  Again, I am yet to decide whether this should be done as a
> > followup or within this change.
> 
> OK.  

BTW, it will be a followup change.

> > > If you use for_node_thunk_and_aliases you can remove the recursion you do 
> > > above
> > > and it will work for aliases of thunk correctly, too ;)
> > 
> > But I wouldn't be able to check the edge for hotness.
> 
> BTW currently the edges from thunks miss any profile info.
> (i.e. it will be 0). I guess we ought to make ipa-profile pass to estimate 
> those
> (it is difficult ot measure count of an alias).
> 

I'm

Re: [Patch, Fortran] Mark "token" of static coarrays "restrict"

2011-07-14 Thread Daniel Carrera

For what it's worth, it compiles and works on my 32bit Linux too. But 
this is such a simple patch it *had* to work.


On 07/14/2011 11:01 AM, Tobias Burnus wrote:

As the title says: Mark "token" of static coarrays "restrict"

Bootstrapped and regtested on x86-64-linux.
OK for the trunk?

Tobias



--
I'm not overweight, I'm undertall.

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou

> ??? Original code:
>
>basic_block b = BLOCK_FOR_INSN (insn);
> edge e;
> for (e = b->succ; e; e = e->succ_next)
> ! if ((e->flags & EDGE_FALLTHRU) == 0)
> !   {
> ! bitmap_operation (set, set, e->dest->global_live_at_start,
> !   BITMAP_IOR);
> !   }
>   }
>
> Code after the revert:
>
>FOR_EACH_EDGE (e, ei, b->succs)
> +if ((e->flags & EDGE_FALLTHRU) == 0)
>bitmap_ior_into (used, df_get_live_in (e->dest));
>
> As far as I can tell these are identical, modulo the change in variable
> name ("set" -> "used" which seems like a better name).

Yes, the code does the same thing, but the original patch did clear up the 
confusion set/use in sched_analyze_insn and compute_jump_reg_dependencies,
in particular in the comment of the latter.  But OK, never mind.

-- 
Eric Botcazou

[PATCH] More efficiently canoncialize operands in fold_stmt

2011-07-14 Thread Richard Guenther


We currently go through a tree expression generation when canonicalizing
operand order.  That's quite wasteful.  The following patch re-orders
things in a way to avoid this and remove some duplicate code.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-07-14  Richard Guenther  

* gimple-fold.c (fold_gimple_assign): Remove operand swapping.
(fold_stmt_1): Do it here directly on gimple and as a first thing.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 176267)
--- gcc/gimple-fold.c   (working copy)
*** fold_gimple_assign (gimple_stmt_iterator
*** 817,842 
  
if (!result)
  result = fold_binary_loc (loc, subcode,
!   TREE_TYPE (gimple_assign_lhs (stmt)),
!   gimple_assign_rhs1 (stmt),
!   gimple_assign_rhs2 (stmt));
  
if (result)
  {
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
- 
- /* Fold might have produced non-GIMPLE, so if we trust it blindly
-we lose canonicalization opportunities.  Do not go again
-through fold here though, or the same non-GIMPLE will be
-produced.  */
-   if (commutative_tree_code (subcode)
-   && tree_swap_operands_p (gimple_assign_rhs1 (stmt),
-gimple_assign_rhs2 (stmt), false))
- return build2 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)),
-gimple_assign_rhs2 (stmt),
-gimple_assign_rhs1 (stmt));
  }
break;
  
--- 817,831 
  
if (!result)
  result = fold_binary_loc (loc, subcode,
! TREE_TYPE (gimple_assign_lhs (stmt)),
! gimple_assign_rhs1 (stmt),
! gimple_assign_rhs2 (stmt));
  
if (result)
  {
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
  }
break;
  
*** fold_gimple_assign (gimple_stmt_iterator
*** 852,869 
STRIP_USELESS_TYPE_CONVERSION (result);
if (valid_gimple_rhs_p (result))
return result;
- 
- /* Fold might have produced non-GIMPLE, so if we trust it blindly
-we lose canonicalization opportunities.  Do not go again
-through fold here though, or the same non-GIMPLE will be
-produced.  */
-   if (commutative_ternary_tree_code (subcode)
-   && tree_swap_operands_p (gimple_assign_rhs1 (stmt),
-gimple_assign_rhs2 (stmt), false))
- return build3 (subcode, TREE_TYPE (gimple_assign_lhs (stmt)),
-  gimple_assign_rhs2 (stmt),
-  gimple_assign_rhs1 (stmt),
-  gimple_assign_rhs3 (stmt));
  }
break;
  
--- 841,846 
*** fold_stmt_1 (gimple_stmt_iterator *gsi,
*** 1576,1583 
  case GIMPLE_ASSIGN:
{
unsigned old_num_ops = gimple_num_ops (stmt);
!   tree new_rhs = fold_gimple_assign (gsi);
tree lhs = gimple_assign_lhs (stmt);
if (new_rhs
&& !useless_type_conversion_p (TREE_TYPE (lhs),
   TREE_TYPE (new_rhs)))
--- 1553,1574 
  case GIMPLE_ASSIGN:
{
unsigned old_num_ops = gimple_num_ops (stmt);
!   enum tree_code subcode = gimple_assign_rhs_code (stmt);
tree lhs = gimple_assign_lhs (stmt);
+   tree new_rhs;
+   /* First canonicalize operand order.  This avoids building new
+  trees if this is the only thing fold would later do.  */
+   if ((commutative_tree_code (subcode)
+|| commutative_ternary_tree_code (subcode))
+   && tree_swap_operands_p (gimple_assign_rhs1 (stmt),
+gimple_assign_rhs2 (stmt), false))
+ {
+   tree tem = gimple_assign_rhs1 (stmt);
+   gimple_assign_set_rhs1 (stmt, gimple_assign_rhs2 (stmt));
+   gimple_assign_set_rhs2 (stmt, tem);
+   changed = true;
+ }
+   new_rhs = fold_gimple_assign (gsi);
if (new_rhs
&& !useless_type_conversion_p (TREE_TYPE (lhs),
   TREE_TYPE (new_rhs)))

Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Andreas Krebbel

On Wed, Jul 13, 2011 at 09:58:08AM -0700, Richard Henderson wrote:
> Why the force_operand?  You've got register inputs.  Either the target
> is going to support the operation or it isn't.

I agree that it doesn't seem to be necessary. I've used force_operand
since ivopts (add_cost) is doing it without seeing a clear reason for
it.  So I've removed it now.

> Saving cost data dependent on speed, which is non-constant.
> You probably need to make this a two dimensional array.

Fixed.

Here is an updated version.

Bye,

-Andreas-

2011-07-14  Andreas Krebbel  

* tree-ssa-math-opts.c (compute_costs): New function.
(convert_mult_to_fma): Take costs into account when propagating
multiplications into several additions.
* config/s390/s390.c (z196_costs): Adjust costs for madbr and
maebr.


Index: gcc/tree-ssa-math-opts.c
===
*** gcc/tree-ssa-math-opts.c.orig
--- gcc/tree-ssa-math-opts.c
*** convert_plusminus_to_widen (gimple_stmt_
*** 2185,2190 
--- 2185,2236 
return true;
  }
  
+ /* Computing the costs for calculating RTX with CODE in MODE.  */
+ 
+ static unsigned
+ compute_costs (enum machine_mode mode, enum rtx_code code, bool speed)
+ {
+   rtx insn;
+   unsigned cost;
+ 
+   switch (GET_RTX_LENGTH (code))
+ {
+ case 2:
+   insn = gen_rtx_fmt_ee (code, mode,
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2));
+   break;
+ case 3:
+   insn = gen_rtx_fmt_eee (code, mode,
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1),
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 2),
+ gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 3));
+   break;
+ default:
+   gcc_unreachable ();
+ }
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ {
+   fprintf (dump_file, "Calculating costs of %s in %s mode.  RTX is:\n",
+  GET_RTX_NAME (code), GET_MODE_NAME (mode));
+   print_rtl (dump_file, insn);
+ }
+ 
+   cost = rtx_cost (insn, SET, speed);
+ 
+   /* If the backend returns a cost of zero it is most certainly lying.
+  Set this to one in order to notice that we already calculated it
+  once.  */
+   cost = cost ? cost : 1;
+ 
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "\n%s in %s costs %d\n\n",
+  GET_RTX_NAME (code), GET_MODE_NAME (mode), cost);
+ 
+   return cost;
+ }
+ 
  /* Combine the multiplication at MUL_STMT with operands MULOP1 and MULOP2
 with uses in additions and subtractions to form fused multiply-add
 operations.  Returns true if successful and MUL_STMT should be removed.  */
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2197,2202 
--- 2243,2254 
gimple use_stmt, neguse_stmt, fma_stmt;
use_operand_p use_p;
imm_use_iterator imm_iter;
+   enum machine_mode mode;
+   int uses = 0;
+   bool speed = optimize_bb_for_speed_p (gimple_bb (mul_stmt));
+   static unsigned mul_cost[2][NUM_MACHINE_MODES];
+   static unsigned add_cost[2][NUM_MACHINE_MODES];
+   static unsigned fma_cost[2][NUM_MACHINE_MODES];
  
if (FLOAT_TYPE_P (type)
&& flag_fp_contract_mode == FP_CONTRACT_OFF)
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2213, 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
/* Make sure that the multiplication statement becomes dead after
!  the transformation, thus that all uses are transformed to FMAs.
!  This means we assume that an FMA operation has the same cost
!  as an addition.  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
  {
enum tree_code use_code;
--- 2265,2281 
if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
  return false;
  
+   mode = TYPE_MODE (type);
+ 
+   if (!fma_cost[speed][mode])
+ {
+   fma_cost[speed][mode] = compute_costs (mode, FMA, speed);
+   add_cost[speed][mode] = compute_costs (mode, PLUS, speed);
+   mul_cost[speed][mode] = compute_costs (mode, MULT, speed);
+ }
+ 
/* Make sure that the multiplication statement becomes dead after
!  the transformation, thus that all uses are transformed to FMAs.  */
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, mul_result)
  {
enum tree_code use_code;
*** convert_mult_to_fma (gimple mul_stmt, tr
*** 2292,2297 
--- 2351,2357 
if (gimple_assign_rhs1 (use_stmt) == gimple_assign_rhs2 (use_stmt))
return false;
  
+   uses++;
/* While it is possible to validate whether or not the exact form
 that we've recognized is available in the backend, the assumption
 is that the transformation is never a loss.  For instance, suppose
*** convert_

[PATCH] Fold to more canonical TRUHT_NOT_EXPRs

2011-07-14 Thread Richard Guenther


The following generates boolean-typed TRUTH_NOT_EXPRs instead of
operating on converted boolean operands.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-07-14  Richard Guenther  

* fold-const.c (fold_binary_loc): Convert the !bool_var result,
not bool_var when folding bool_var != 1 or bool_var == 0.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 176266)
+++ gcc/fold-const.c(working copy)
@@ -12156,14 +12156,16 @@ fold_binary_loc (location_t loc,
   /* bool_var != 1 becomes !bool_var. */
   if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE && integer_onep (arg1)
   && code == NE_EXPR)
-return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
-   fold_convert_loc (loc, type, arg0));
+return fold_convert_loc (loc, type,
+fold_build1_loc (loc, TRUTH_NOT_EXPR,
+ TREE_TYPE (arg0), arg0));
 
   /* bool_var == 0 becomes !bool_var. */
   if (TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE && integer_zerop (arg1)
   && code == EQ_EXPR)
-return fold_build1_loc (loc, TRUTH_NOT_EXPR, type,
-   fold_convert_loc (loc, type, arg0));
+return fold_convert_loc (loc, type,
+fold_build1_loc (loc, TRUTH_NOT_EXPR,
+ TREE_TYPE (arg0), arg0));
 
   /* !exp != 0 becomes !exp */
   if (TREE_CODE (arg0) == TRUTH_NOT_EXPR && integer_zerop (arg1)

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou

> Any particular bits you still see that don't get reverted with this patch?

The ebb_compute_jump_reg_dependencies changes.  The original patch has:

* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump. Mark registers live on entry of non-fallthrough
blocks as used by the jump.

but you're reverting only:

* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump.

-- 
Eric Botcazou

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 14:18, Eric Botcazou wrote:
>> Any particular bits you still see that don't get reverted with this patch?
> 
> The ebb_compute_jump_reg_dependencies changes.  The original patch has:
> 
>   * sched-ebb.c (compute_jump_reg_dependencies): New prototype.
>   Mark registers live on entry of the fallthrough block and conditionally
>   set as set by the jump. Mark registers live on entry of non-fallthrough
>   blocks as used by the jump.
> 
> but you're reverting only:
> 
>   * sched-ebb.c (compute_jump_reg_dependencies): New prototype.
>   Mark registers live on entry of the fallthrough block and conditionally
>   set as set by the jump.
> 

??? Original code:

   basic_block b = BLOCK_FOR_INSN (insn);
edge e;
for (e = b->succ; e; e = e->succ_next)
! if ((e->flags & EDGE_FALLTHRU) == 0)
!   {
!   bitmap_operation (set, set, e->dest->global_live_at_start,
! BITMAP_IOR);
!   }
  }

Code after the revert:

   FOR_EACH_EDGE (e, ei, b->succs)
+if ((e->flags & EDGE_FALLTHRU) == 0)
   bitmap_ior_into (used, df_get_live_in (e->dest));

As far as I can tell these are identical, modulo the change in variable
name ("set" -> "used" which seems like a better name).


Bernd

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 13:57, Eric Botcazou wrote:
>> The real problem here is that the ia64 backend lies to the rest of the
>> compiler; it produces a load instruction without showing a MEM anywhere
>> in the instruction pattern. Hence, the following patch, which reverts
>> the bogus scheduler changes and adds a MEM to a pattern in ia64.md.
> 
> This is probably the root cause of the problem, indeed.  But you don't revert 
> everything so, if this isn't an oversight, then the ChangeLog is incorrect.
> And there is another change in sched-deps.c not mentioned in the ChangeLog.

Well, the actual code has completely changed in the meantime. All the
hunks of the original patch failed :) I can write a new ChangeLog entry
if that seems important.

Any particular bits you still see that don't get reverted with this patch?


Bernd

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Eric Botcazou

> The real problem here is that the ia64 backend lies to the rest of the
> compiler; it produces a load instruction without showing a MEM anywhere
> in the instruction pattern. Hence, the following patch, which reverts
> the bogus scheduler changes and adds a MEM to a pattern in ia64.md.

This is probably the root cause of the problem, indeed.  But you don't revert 
everything so, if this isn't an oversight, then the ChangeLog is incorrect.
And there is another change in sched-deps.c not mentioned in the ChangeLog.

-- 
Eric Botcazou

Re: Ping: The TI C6X port

2011-07-14 Thread Bernd Schmidt

On 06/14/11 22:52, Vladimir Makarov wrote:
> On 06/06/2011 07:26 AM, Bernd Schmidt wrote:
>> Ping^3 for the C6X port. Now with extra patches:
>>
>> Additional preliminary scheduler tweaks:
>> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg02408.html
>>
> It is ok for me.  Thanks, Bernd.

I've committed this with one change: I've dropped the hunk that resolves
debug_insn dependencies rather than deleting them. We should do that,
but when I tested these changes together with the backtracking
scheduler, I saw some problems, so I'll postpone this.

Bernd

Re: RFA: Avoid unnecessary clearing in union initialisers

2011-07-14 Thread Richard Sandiford

"H.J. Lu"  writes:
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49736

Sorry for the breakage.  It was due to a humiliating stupid mistake
in the hunk to update all_zeros_p:

@@ -5129,13 +5152,12 @@ mostly_zeros_p (const_tree exp)
 all_zeros_p (const_tree exp)
 {
   if (TREE_CODE (exp) == CONSTRUCTOR)
-
 {
-  HOST_WIDE_INT nz_elts, count;
-  bool must_clear;
+  HOST_WIDE_INT nz_elts, init_elts;
+  bool complete_p;
 
-  categorize_ctor_elements (exp, &nz_elts, &count, &must_clear);
-  return nz_elts == 0;
+  categorize_ctor_elements (exp, &nz_elts, &init_elts, &complete_p);
+  return nz_elts == init_elts;
 }
 
   return initializer_zerop (exp);

which was due to trying a different approach and not properly reverting it.

I've applied the following as obvious.  Tested on x86_64-linux-gnu,
and on 255.vortex.

Richard


gcc/
PR middle-end/49736
* expr.c (all_zeros_p): Undo bogus part of last change.

Index: gcc/expr.c
===
--- gcc/expr.c  2011-07-14 11:46:42.0 +0100
+++ gcc/expr.c  2011-07-14 11:47:40.0 +0100
@@ -5157,7 +5157,7 @@ all_zeros_p (const_tree exp)
   bool complete_p;
 
   categorize_ctor_elements (exp, &nz_elts, &init_elts, &complete_p);
-  return nz_elts == init_elts;
+  return nz_elts == 0;
 }
 
   return initializer_zerop (exp);

Re: [patch] fix typo in doc/extend.texi, committed as obvious

2011-07-14 Thread Matthias Klose

On 07/14/2011 01:44 PM, Richard Guenther wrote:
> On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose  wrote:
>> fix a typo in doc/extend.texi, committed as obvious.
> 
>  Specify the architecture to generate code for when compiling the
> -function.  If you select the @code{"target("cpu=power7)"} attribute when
> +function.  If you select the @code{"target("cpu=power7")} attribute when
> 
> still mismatched number of '"'s.  I suggest to drop the first one.

fixed as well.

Re: [patch] fix typo in doc/extend.texi, committed as obvious

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 1:41 PM, Matthias Klose  wrote:
> fix a typo in doc/extend.texi, committed as obvious.

 Specify the architecture to generate code for when compiling the
-function.  If you select the @code{"target("cpu=power7)"} attribute when
+function.  If you select the @code{"target("cpu=power7")} attribute when

still mismatched number of '"'s.  I suggest to drop the first one.

Richard.

>  Matthias
>
>
>

Re: [patch 1/8 tree-optimization]: Bitwise logic for fold_truth_not_expr

2011-07-14 Thread Richard Guenther

On Wed, Jul 13, 2011 at 2:15 PM, Kai Tietz  wrote:
> 2011/7/13 Richard Guenther :
>> On Wed, Jul 13, 2011 at 1:08 PM, Kai Tietz  wrote:
>>> 2011/7/13 Richard Guenther :
 On Wed, Jul 13, 2011 at 11:04 AM, Kai Tietz  
 wrote:
> Sorrty, the TRUTH_NOT_EXPR isn't here the point at all. The underlying
> issue is that fold-const re-inttroduces TRUTH_AND/OR and co.

 I'm very sure it doesn't re-constrct TRUTH_ variants out of thin air
 when you present it with BIT_ variants as input.
>>>
>>> Well, look into fold-const's fold_binary_loc function and see
>>>
>>>  /* ARG0 is the first operand of EXPR, and ARG1 is the second operand.
>>>
>>>     First check for cases where an arithmetic operation is applied to a
>>>     compound, conditional, or comparison operation.  Push the arithmetic
>>>     operation inside the compound or conditional to see if any folding
>>>     can then be done.  Convert comparison to conditional for this purpose.
>>>     The also optimizes non-constant cases that used to be done in
>>>     expand_expr.
>>>
>>>     Before we do that, see if this is a BIT_AND_EXPR or a BIT_IOR_EXPR,
>>>     one of the operands is a comparison and the other is a comparison, a
>>>     BIT_AND_EXPR with the constant 1, or a truth value.  In that case, the
>>>     code below would make the expression more complex.  Change it to a
>>>     TRUTH_{AND,OR}_EXPR.  Likewise, convert a similar NE_EXPR to
>>>     TRUTH_XOR_EXPR and an EQ_EXPR to the inversion of a TRUTH_XOR_EXPR.  */
>>>
>>>  if ((code == BIT_AND_EXPR || code == BIT_IOR_EXPR
>>>       || code == EQ_EXPR || code == NE_EXPR)
>>>      && ((truth_value_p (TREE_CODE (arg0))
>>>           && (truth_value_p (TREE_CODE (arg1))
>>>               || (TREE_CODE (arg1) == BIT_AND_EXPR
>>>                   && integer_onep (TREE_OPERAND (arg1, 1)
>>>          || (truth_value_p (TREE_CODE (arg1))
>>>              && (truth_value_p (TREE_CODE (arg0))
>>>                  || (TREE_CODE (arg0) == BIT_AND_EXPR
>>>                      && integer_onep (TREE_OPERAND (arg0, 1)))
>>>    {
>>>      tem = fold_build2_loc (loc, code == BIT_AND_EXPR ? TRUTH_AND_EXPR
>>>                         : code == BIT_IOR_EXPR ? TRUTH_OR_EXPR
>>>                         : TRUTH_XOR_EXPR,
>>>                         boolean_type_node,
>>>                         fold_convert_loc (loc, boolean_type_node, arg0),
>>>                         fold_convert_loc (loc, boolean_type_node, arg1));
>>>
>>>      if (code == EQ_EXPR)
>>>        tem = invert_truthvalue_loc (loc, tem);
>>>
>>>      return fold_convert_loc (loc, type, tem);
>>>    }
>>>
>>> Here unconditionally TRUTH_AND/TRUTH_OR gets introduced, if operands
>>> are of kind truth.  This is btw the point, why you see that those
>>> cases are handled.  But as soon as this part is turned off for BIT_-
>>> IOR/AND, we need to do the folding for 1-bit precision case explicit.
>>
>> First of all this checks for a quite complex pattern - where do we pass
>> such complex pattern from the gimple level to fold?  For the EQ/NE_EXPR
>> case forwprop probably might be able to feed it that, but then how does
>> it go wrong?  The above could also simply be guarded by !in_gimple_form.
>>
>> Richard.
>
> See reassoc pass as example

I don't see where.

> and this hacky maybe_fold_and_comparisons
> / maybe_fold_or_comparisons functions.

Yes, that case is known - but unless the folding result is sane we
drop it anyway.

>  As indeed we want still be
> able to do comparison foldings without getting back an TRUTH-op.

Well, if we get back one we have canonicalize_cond_expr_cond to
make it sane again.

> Additionally we have a lot of passes - like vectorizer - which are
> happily try to build new condition on tree-level.  This is another

That's true and you promised to clean up remaining TRUTH_*
consumers / producers when doing the canonicalization to BIT_*.
At the moment they get back to BIT_* via force_gimple_operand*
which is probably even fine.

> place I saw issues and tree-cfg failures. And last but not least those
> truth-ops might be reintroduced in gimple_fold, as soon as we see
> bitwise-ops on one-bit precision integral type as truth_value.

Well, I already said that one-bit-precision as truth_value is wrong.

Relying on fold-const.c here with its very strict type rules always
was a stretch.  If it turns out there are now cases that we want to
do differently then we finally have to bite the bullet and do it.
There's always two options - duplicate it properly in fold_stmt
(or forwprop in case it needs to lookup defs), or remove it from fold
and make sure we fold things using fold_stmt at gimplifcation time.

Richard.

> Kai
>

[patch] fix typo in doc/extend.texi, committed as obvious

2011-07-14 Thread Matthias Klose

fix a typo in doc/extend.texi, committed as obvious.

  Matthias


2011-07-14  Matthias Klose 

* doc/extend.texi (optimize attribute): Fix typo.

Index: doc/extend.texi
===
--- doc/extend.texi (revision 176263)
+++ doc/extend.texi (working copy)
@@ -3594,7 +3594,7 @@
 @item cpu=@var{CPU}
 @cindex @code{target("cpu=@var{CPU}")} attribute
 Specify the architecture to generate code for when compiling the
-function.  If you select the @code{"target("cpu=power7)"} attribute when
+function.  If you select the @code{"target("cpu=power7")} attribute when
 generating 32-bit code, VSX and Altivec instructions are not generated
 unless you use the @option{-mabi=altivec} option on the command line.

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

On 07/14/11 13:20, Andrey Belevantsev wrote:
> Hello Bernd,
> 
> FWIW, we have discussed your change with Alexander and we think you are
> right about the scheduler changes.  One question is:
> 
> On 14.07.2011 14:03, Bernd Schmidt wrote:
>> --- gcc/sched-deps.c(revision 176195)
>> +++ gcc/sched-deps.c(working copy)
>> @@ -568,7 +568,7 @@
>>(rev1==rev2
>>? reversed_comparison_code (cond2, NULL)
>>: GET_CODE (cond2))
>> -  && XEXP (cond1, 0) == XEXP (cond2, 0)
>> +  && rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0))
>>&& XEXP (cond1, 1) == XEXP (cond2, 1))
>>  return 1;
>>return 0;
> 
> this hunk from conditions_mutex_p seems to be unrelated?

Oh yes, sorry about that. That was approved a while ago and I haven't
gotten around to checking it in.


Bernd

Re: Correct fix for scheduler bug PR11320

2011-07-14 Thread Andrey Belevantsev


Hello Bernd,

FWIW, we have discussed your change with Alexander and we think you are 
right about the scheduler changes.  One question is:


On 14.07.2011 14:03, Bernd Schmidt wrote:

--- gcc/sched-deps.c(revision 176195)
+++ gcc/sched-deps.c(working copy)
@@ -568,7 +568,7 @@
  (rev1==rev2
  ? reversed_comparison_code (cond2, NULL)
  : GET_CODE (cond2))
-  && XEXP (cond1, 0) == XEXP (cond2, 0)
+  && rtx_equal_p (XEXP (cond1, 0), XEXP (cond2, 0))
   && XEXP (cond1, 1) == XEXP (cond2, 1))
 return 1;
   return 0;


this hunk from conditions_mutex_p seems to be unrelated?

Andrey

[PATCH] Only insert required conversions during gimplification

2011-07-14 Thread Richard Guenther


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-07-14  Richard Guenther  

* gimplify.c (gimplify_expr): Only do required conversions.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 176266)
+++ gcc/gimplify.c  (working copy)
@@ -6787,22 +6787,20 @@ gimplify_expr (tree *expr_p, gimple_seq
 
case TRUTH_NOT_EXPR:
  {
-   tree org_type = TREE_TYPE (*expr_p);
-
+   tree orig_type = TREE_TYPE (*expr_p);
*expr_p = gimple_boolify (*expr_p);
-   if (org_type != boolean_type_node)
+   if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p)))
  {
-   *expr_p = fold_convert (org_type, *expr_p);
+   *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p);
ret = GS_OK;
break;
  }
+   ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
+is_gimple_val, fb_rvalue);
+   recalculate_side_effects (*expr_p);
+   break;
  }
 
- ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
-  is_gimple_val, fb_rvalue);
- recalculate_side_effects (*expr_p);
- break;
-
case ADDR_EXPR:
  ret = gimplify_addr_expr (expr_p, pre_p, post_p);
  break;
@@ -7227,40 +7225,36 @@ gimplify_expr (tree *expr_p, gimple_seq
case TRUTH_OR_EXPR:
case TRUTH_XOR_EXPR:
  {
-   tree org_type = TREE_TYPE (*expr_p);
-   
+   tree orig_type = TREE_TYPE (*expr_p);
*expr_p = gimple_boolify (*expr_p);
-
-   /* This shouldn't happen, but due fold-const (and here especially
-  fold_truth_not_expr) happily uses operand type and doesn't
-  automatically uses boolean_type as result, we need to keep
-  orignal type.  */
-   if (org_type != boolean_type_node)
+   if (!useless_type_conversion_p (orig_type, TREE_TYPE (*expr_p)))
  {
-   *expr_p = fold_convert (org_type, *expr_p);
+   *expr_p = fold_convert_loc (saved_location, orig_type, *expr_p);
ret = GS_OK;
break;
  }
- }
 
- /* With two-valued operand types binary truth expressions are
-semantically equivalent to bitwise binary expressions.  
Canonicalize
-them to the bitwise variant.  */   switch (TREE_CODE (*expr_p))
- {
- case TRUTH_AND_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_AND_EXPR);
-   break;
- case TRUTH_OR_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_IOR_EXPR);
-   break;
- case TRUTH_XOR_EXPR:
-   TREE_SET_CODE (*expr_p, BIT_XOR_EXPR);
-   break;
- default:
-   break;
+ /* Boolified binary truth expressions are semantically equivalent
+to bitwise binary expressions.  Canonicalize them to the
+bitwise variant.  */
+   switch (TREE_CODE (*expr_p))
+ {
+ case TRUTH_AND_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_AND_EXPR);
+   break;
+ case TRUTH_OR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_IOR_EXPR);
+   break;
+ case TRUTH_XOR_EXPR:
+   TREE_SET_CODE (*expr_p, BIT_XOR_EXPR);
+   break;
+ default:
+   break;
+ }
+
+   /* Continue classified as tcc_binary.  */
+   goto expr_2;
  }
- /* Classified as tcc_expression.  */
- goto expr_2;
 
case FMA_EXPR:
  /* Classified as tcc_expression.  */

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus


On 07/14/2011 12:14 PM, Daniel Carrera wrote:

On 07/14/2011 12:04 PM, Tobias Burnus wrote:

I was wondering - based on the discussion - whether one should remove
the "int error" argument from caf_runtime_error and simply use "exit
(EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf.
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html
But one can also do so as follow up patch.


You are the boss. The message I got from Nick's post is that it 
doesn't matter much and that you could even get surprising behaviour 
because EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is 
not required to be non-zero. But maybe I missed the point. So it's up 
to you.


Well, for "exit(3)" one knows that one will get a surprising in Windows: 
An abort. While EXIT_FAILURE [but also exit(1)] have a very high change 
to be working (nearly) everywhere. At the end, as long as the operating 
system does not react in a completely odd way, the exit status does not 
matter - it's the problem of the caller (e.g. some script) to make use 
of the result. If EXIT_FAILURE is 0 and the script assumes that it is a 
success, it's either a fault of the script writer, or of the one putting 
0 for EXIT_FAILURE into the header, or a fault of the OS designer.


Thus, I think the chance that one avoids surprises is higher with 
exit(EXIT_FAILURE) and one avoids having some random number (for the 
exit status) in the code. Users usually do not check the exact exit 
status value but only its 0- vs non-0-ness.


Hence, I think exit(EXIT_FAILURE) is better than the current version.

Tobias

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

2011/7/14 Richard Guenther :
>
> But then how comes the option to override it is useful?  It isn't dependent
> on the particular case.  At least the option should be a --param.
>
> Richard.
>

Option is still useful if you want to try feature on platform with no
hook implemented and for other performance experiments. I agree
--param usage should be better here. I'll fix it.

Ilya

Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Rainer Orth

Andreas Schwab  writes:

> Same on ia64:
>
> Configuration mismatch!
> Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
> Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o 
> crtfastmath.o

I believe you need the following patch.

As one of my next tasks, I'll work to completely move
crtstuff/extra_parts/EXTRA_PARTS/EXTRA_MULTILIB_PARTS over to libgcc so
there's no duplication of configuration between gcc and libgcc.

If this tests ok, I suppose it can go in as obvious.

Rainer


2011-07-14  Rainer Orth  

Revert:
* config.gcc (ia64*-*-elf*): Remove crtfastmath.o from extra_parts.
(ia64*-*-freebsd*): Likewise.
(ia64*-*-linux*): Likewise.
(mips64*-*-linux*): Likewise.
(mips*-*-linux*): Likewise.

Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 176266)
+++ gcc/config.gcc  (working copy)
@@ -1594,13 +1594,13 @@
then
target_cpu_default="${target_cpu_default}|MASK_GNU_LD"
fi
-   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o"
+   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o"
;;
 ia64*-*-freebsd*)
tm_file="${tm_file} dbxelf.h elfos.h ${fbsd_tm_file} ia64/sysv4.h 
ia64/freebsd.h"
target_cpu_default="MASK_GNU_AS|MASK_GNU_LD"
tmake_file="${tmake_file} ia64/t-ia64"
-   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o"
+   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o"
;;
 ia64*-*-linux*)
tm_file="${tm_file} dbxelf.h elfos.h gnu-user.h linux.h glibc-stdint.h 
ia64/sysv4.h ia64/linux.h"
@@ -1609,7 +1609,7 @@
tmake_file="${tmake_file} t-libunwind-elf 
ia64/t-glibc-libunwind"
fi
target_cpu_default="MASK_GNU_AS|MASK_GNU_LD"
-   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o"
+   extra_parts="crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o"
;;
 ia64*-*-hpux*)
tm_file="${tm_file} dbxelf.h elfos.h ia64/sysv4.h ia64/hpux.h"
@@ -1849,6 +1849,7 @@
tm_defines="${tm_defines} MIPS_ISA_DEFAULT=65"
;;
esac
+   extra_parts="$extra_parts crtfastmath.o"
gnu_ld=yes
gas=yes
test x$with_llsc != x || with_llsc=yes
@@ -1868,6 +1869,7 @@
 mipsisa32*)
tm_defines="${tm_defines} MIPS_ISA_DEFAULT=32"
 esac
+   extra_parts="$extra_parts crtfastmath.o"
test x$with_llsc != x || with_llsc=yes
;;
 mips*-*-openbsd*)


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

> One minor note, you will need to update doc/invoke.texi to document the new
> switch before checkin: -ftree-reassoc-width=
>
> --
> Michael Meissner, IBM

Thanks for the note! Here is fixed patch.

Ilya
--
gcc/

2011-07-14  Enkovich Ilya  

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): New hook documentation.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (ftree-reassoc-width): New option documented.

* hooks.h (hook_int_const_gimple_1): New default hook.

* hooks.c (hook_int_const_gimple_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* common.opt (ftree-reassoc-width): New option added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

(pass_reassoc): TODO_remove_unused_locals flag added.

gcc/testsuite/

2011-07-14  Enkovich Ilya  

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
-ftree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.


PR44382.diff
Description: Binary data

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera


On 07/14/2011 12:04 PM, Tobias Burnus wrote:

I was wondering - based on the discussion - whether one should remove
the "int error" argument from caf_runtime_error and simply use "exit
(EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf.
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html
But one can also do so as follow up patch.


You are the boss. The message I got from Nick's post is that it doesn't 
matter much and that you could even get surprising behaviour because 
EXIT_SUCCESS is not required to be zero and EXIT_FAILURE is not required 
to be non-zero. But maybe I missed the point. So it's up to you.


Cheers,
Daniel.
--
I'm not overweight, I'm undertall.

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 11:59 AM, Ilya Enkovich  wrote:
> 2011/7/14 Richard Guenther :
>>
>> Btw, rather than a new target hook and an option I suggest to use
>> a --param which default you can modify in the backend.
>>
>> Richard.
>>
>
> Introduced target hook does not just define default value for option.
> It is meant to make decision in each particular case. For now it
> returns a constant value but march dependent heuristics will be added
> later.

But then how comes the option to override it is useful?  It isn't dependent
on the particular case.  At least the option should be a --param.

Richard.

> Ilya
>

Re: [Patch, Fortran] Allocate + CAF library

2011-07-14 Thread Daniel Carrera


*ping* ?

On 07/11/2011 08:16 PM, Daniel Carrera wrote:

Hello,

This is my largest patch so far and the first that I'll commit myself.
This patch improves support for the ALLOCATE statement when using the
coarray library. Specifically, it adds support for the stat= and errmsg=
attributes:

ALLOCATE( x(n)[*] , stat=i , errmsg=str )

These attributes are now written by the CAF library. This patch also
involves a good amount of code cleanup.

ChangeLog is attached.

As soon as I get the go-ahead, I'll commit this patch.


Cheers,
Daniel.



--
I'm not overweight, I'm undertall.

Re: [Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Tobias Burnus


On 07/14/2011 11:48 AM, Daniel Carrera wrote:
This patch adds a caf_runtime_error function to the non-MPI 
implementation of Coarray Fortran. It is based on the MPI function of 
the same name in mpi.c.


Ok to commit?


I was wondering - based on the discussion - whether one should remove 
the "int error" argument from caf_runtime_error and simply use "exit 
(EXIT_FAILURE)" for all exit() calls in mpi.c/single.c, cf. 
http://gcc.gnu.org/ml/fortran/2011-07/msg00140.html

But one can also do so as follow up patch.

Thus, OK for the patch as is - or with replacing all exit(...) by 
"exit(EXIT_FAILURE)", which  uses stdlib.h's EXIT_FAILURE. One then can 
also drop the "int error" argument to the caf_runtime_error function.


Thanks for the patch!

Tobias


ChangeLog:

2011-07-14  Daniel Carrera 

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function based on the function in
mpi.c with the same name.
(_gfortran_caf_init): Use caf_runtime_error.
* caf/mpi.c (caf_runtime_error): Add a note to keep in sync
with the function in single.c.

Correct fix for scheduler bug PR11320

2011-07-14 Thread Bernd Schmidt

PR11320 was a scheduler problem where an instruction was moved backwards
across a branch, leading to
addl r14 = @ltoffx(.LC2), r1
;;
(p7) addl r14 = 1, r0   <--- r14 clobbered
;;
(p7) st4 [r15] = r14
ld8.mov r14 = [r14], .LC2   <--- crash
(branch was here)

At the time, this was solved by a patch that recognizes when a register
is conditionally set in a basic block, and then treats the branch as
setting such a register:

http://gcc.gnu.org/ml/gcc-patches/2003-07/msg01044.html

"The proposed fix is to say that JUMP_INSNs set the registers that are
live on entry of the fallthrough block and are conditionally set before
the jump, because they can be considered as restoring the former value
of the conditionally set registers."

While it is true that a jump could be seen as setting the correct value
for a register, this isn't at all related to conditional sets. Consider
a trivial example:

a = b + 3;
if (b > 0) { use a } else { use a }

where the value of a is different in each branch of the else, with no
conditional execution in sight. From the point of view of the uses, it
can be argued that the jump sets the value, but this is irrelevant for
scheduling purposes.

There's nothing wrong with using a wrong value in an instruction hoisted
across a branch if the result of that instruction will (eventually) be
dead. We do this all the time in sched-ebb. The patch needlessly
restricts scheduling opportunities and should be reverted.

The real problem here is that the ia64 backend lies to the rest of the
compiler; it produces a load instruction without showing a MEM anywhere
in the instruction pattern. Hence, the following patch, which reverts
the bogus scheduler changes and adds a MEM to a pattern in ia64.md. The
represenation is somewhat weird, but no more so than before where the
load was represented as a plain LO_SUM. The documentation I have doesn't
mention ld8.mov, so I'll have to leave it to the ia64 maintainers to
choose between this change or implement something more involved.

Bootstrapped and tested without regressions on ia64-linux with
languages=c,c++,obj,fortran and --disable-shared; that was what I could
get to work at all at the moment. Even then, C++ seems pretty broken.
I've also built a 3.3 cross-cc1 to see if the original bug is still
fixed (it is).

Ok (scheduler & ia64 bits)?


Bernd
Revert
2003-07-10  Eric Botcazou  

PR optimization/11320
* sched-int.h (struct deps) [reg_conditional_sets]: New field.
(struct sched_info) [compute_jump_reg_dependencies]: New prototype.
* sched-deps.c (sched_analyze_insn) [JUMP_INSN]: Update call to
current_sched_info->compute_jump_reg_dependencies. Record which
registers are used and which registers are set by the jump.
Clear deps->reg_conditional_sets after a barrier.
Set deps->reg_conditional_sets if the insn is a COND_EXEC.
Clear deps->reg_conditional_sets if the insn is not a COND_EXEC.
(init_deps): Initialize reg_conditional_sets.
(free_deps): Clear reg_conditional_sets.
* sched-ebb.c (compute_jump_reg_dependencies): New prototype.
Mark registers live on entry of the fallthrough block and conditionally
set as set by the jump. Mark registers live on entry of non-fallthrough
blocks as used by the jump.
* sched-rgn.c (compute_jump_reg_dependencies): New prototype.
Mark new parameters as unused.

* config/ia64/ia64.md (load_symptr_low): Show a MEM.
* config/ia64/ia64.c (ia64_expand_load_address): Generate it.

Index: gcc/sched-ebb.c
===
--- gcc/sched-ebb.c (revision 176195)
+++ gcc/sched-ebb.c (working copy)
@@ -238,28 +238,18 @@
   return 1;
 }
 
- /* INSN is a JUMP_INSN, COND_SET is the set of registers that are
-conditionally set before INSN.  Store the set of registers that
-must be considered as used by this jump in USED and that of
-registers that must be considered as set in SET.  */
+ /* INSN is a JUMP_INSN.  Store the set of registers that
+must be considered as used by this jump in USED.  */
 
 void
-ebb_compute_jump_reg_dependencies (rtx insn, regset cond_set, regset used,
-  regset set)
+ebb_compute_jump_reg_dependencies (rtx insn, regset used)
 {
   basic_block b = BLOCK_FOR_INSN (insn);
   edge e;
   edge_iterator ei;
 
   FOR_EACH_EDGE (e, ei, b->succs)
-if (e->flags & EDGE_FALLTHRU)
-  /* The jump may be a by-product of a branch that has been merged
-in the main codepath after being conditionalized.  Therefore
-it may guard the fallthrough block from using a value that has
-conditionally overwritten that of the main codepath.  So we
-consider that it restores the value of the main codepath.  */
-  bitmap_and (set, df_get_live_in (e->dest), cond_set);
-

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Ilya Enkovich

2011/7/14 Richard Guenther :
>
> Btw, rather than a new target hook and an option I suggest to use
> a --param which default you can modify in the backend.
>
> Richard.
>

Introduced target hook does not just define default value for option.
It is meant to make decision in each particular case. For now it
returns a constant value but march dependent heuristics will be added
later.

Ilya

Re: [build] Remove crt0, mcrt0 support

2011-07-14 Thread Rainer Orth

Richard,

>> what's your take on this?
>
> I'm fine with it if you install a deprecation patch on the 4.6 branch
> and mention that in the 4.6 changes.html.

ok, will do.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

Re: [build] Move crtfastmath to toplevel libgcc

2011-07-14 Thread Andreas Schwab

"H.J. Lu"  writes:

> On Wed, Jul 13, 2011 at 10:12 AM, Rainer Orth
>  wrote:
>> Richard Henderson  writes:
>>
>>> On 07/13/2011 09:57 AM, Rainer Orth wrote:
 Do you think the revised crtfastmath patch is safe enough to commit
 together to avoid this mess?
>>>
>>> Probably.
>>
>> Ok.  I'll will take this on me to get us out of this mess.  It has
>> survived i386-pc-solaris2.11, sparc-sun-solaris2.11,
>> x86_64-unknown-linux-gnu, and i386-apple-darwin9.8.0 bootstraps, so the
>> risk seems acceptable.
>>
 +# -frandom-seed is necessary to keep the mangled name of the constructor 
 on
 +# Tru64 Unix stable, but harmless otherwise.
>>>
>>> Instead of implying permanent stability, I'd mention bootstrap comparison
>>> failures specifically.
>>
>> Ok, will do.
>
> I think your patch caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49739

Same on ia64:

Configuration mismatch!
Extra parts from gcc directory: crtbegin.o crtbeginS.o crtend.o crtendS.o
Extra parts from libgcc: crtbegin.o crtend.o crtbeginS.o crtendS.o crtfastmath.o

Andreas.

-- 
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84  5EC7 45C6 250E 6F00 984E
"And now for something completely different."

[Patch, Fortran] Add caf_runtime_error to libgfortran/caf/single.c

2011-07-14 Thread Daniel Carrera

This patch adds a caf_runtime_error function to the non-MPI 
implementation of Coarray Fortran. It is based on the MPI function of 
the same name in mpi.c.


Ok to commit?

ChangeLog:

2011-07-14  Daniel Carrera  

* caf/single.c:  Include stdarg.h header.
(caf_runtime_error): New function based on the function in
mpi.c with the same name.
(_gfortran_caf_init): Use caf_runtime_error.
* caf/mpi.c (caf_runtime_error): Add a note to keep in sync
with the function in single.c.
--
I'm not overweight, I'm undertall.
Index: libgfortran/caf/single.c
===
--- libgfortran/caf/single.c	(revision 176230)
+++ libgfortran/caf/single.c	(working copy)
@@ -28,6 +28,7 @@ see the files COPYING3 and COPYING.RUNTI
 #include   /* For fputs and fprintf.  */
 #include  /* For exit and malloc.  */
 #include  /* For memcpy and memset.  */
+#include  /* For variadic arguments.  */
 
 /* Define GFC_CAF_CHECK to enable run-time checking.  */
 /* #define GFC_CAF_CHECK  1  */
@@ -40,6 +41,21 @@ see the files COPYING3 and COPYING.RUNTI
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with mpi.c.  */
+static void
+caf_runtime_error (int error, const char *message, ...)
+{
+  va_list ap;
+  fprintf (stderr, "Fortran runtime error.");
+  va_start (ap, message);
+  fprintf (stderr, message, ap);
+  va_end (ap);
+  fprintf (stderr, "\n");
+
+  /* FIXME: Shutdown the Fortran RTL to flush the buffer.  PR 43849.  */
+  exit (error);
+}
+
 void
 _gfortran_caf_init (int *argc __attribute__ ((unused)),
 		char ***argv __attribute__ ((unused)),
@@ -73,12 +89,12 @@ _gfortran_caf_register (ptrdiff_t size, 
 
   if (unlikely (local == NULL || token == NULL))
 {
+  const char msg[] = "Failed to allocate coarray";
   if (stat)
 	{
 	  *stat = 1;
 	  if (errmsg_len > 0)
 	{
-	  const char msg[] = "Failed to allocate coarray";
 	  int len = ((int) sizeof (msg) > errmsg_len) ? errmsg_len
 			  : (int) sizeof (msg);
 	  memcpy (errmsg, msg, len);
@@ -88,10 +104,7 @@ _gfortran_caf_register (ptrdiff_t size, 
 	  return NULL;
 	}
   else
-	{
-	  fprintf (stderr, "ERROR: Failed to allocate coarray");
-	  exit (1);
-	}
+	  caf_runtime_error (1, msg);
 }
 
   if (stat)
Index: libgfortran/caf/mpi.c
===
--- libgfortran/caf/mpi.c	(revision 176230)
+++ libgfortran/caf/mpi.c	(working copy)
@@ -47,6 +47,7 @@ static int caf_is_finalized;
 caf_static_t *caf_static_list = NULL;
 
 
+/* Keep in sync with single.c.  */
 static void
 caf_runtime_error (int error, const char *message, ...)
 {
@@ -62,7 +63,7 @@ caf_runtime_error (int error, const char
   MPI_Abort (MPI_COMM_WORLD, error);
 
   /* Should be unreachable, but to make sure also call exit.  */
-  exit (2);
+  exit (error);
 }

Re: [build] Remove crt0, mcrt0 support

2011-07-14 Thread Richard Guenther

On Thu, 14 Jul 2011, Rainer Orth wrote:

> Richard,
> 
> [Talking about an immediate removal of the netware target...]
> 
> > "Jan Beulich"  writes:
> >
> > Rainer Orth  07/13/11 4:34 PM >>>
> >>>which variant would you prefer: obsoletion now and removal in 4.8 or
> >>>immediate removal?
> >>
> >> Both are fine with me, so unless someone else objects immediate removal
> >> would seem better given it had been pretty much unmaintained.
> >
> > Right: it would be a one-time offort to remove the support, but
> > subsequent cleanups wouldn't have to deal with the effectively dead
> > code.
> >
> > I had a quick look and it doesn't seem hard: apart from removing the
> > netware-specific files in gcc and libgcc (and corresponding gcc/config.gcc
> > and libgcc/config.host changes), there's only a small list (apart from
> > netware-related target triplets in the testsuite):
> >
> > config/elf.m4
> > configure.ac
> > contrib/config-list.mk
> > gcc/config/i386/i386.c
> > gcc/config/i386/i386.h
> > gcc/doc/extend.texi
> > libstdc++-v3/crossconfig.m4
> >
> > configure.ac may have to stay if binutils/src wants to retain the
> > report, but that's about it.
> >
> > Let's see what the release managers/global reviewers think.
> 
> what's your take on this?

I'm fine with it if you install a deprecation patch on the 4.6 branch
and mention that in the 4.6 changes.html.

Richard.

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [RFC PATCH] -grecord-gcc-switches (PR other/32998)

2011-07-14 Thread Jakub Jelinek

On Wed, Jul 13, 2011 at 10:30:58AM -0400, Jason Merrill wrote:
> On 07/13/2011 10:06 AM, Jakub Jelinek wrote:
> >>>--- gcc/testsuite/lib/dg-pch.exp.jj2011-01-03 18:58:03.0 
> >>>+0100
> >>>+++ gcc/testsuite/lib/dg-pch.exp   2011-07-12 23:13:50.943670171 +0200
> >>>-  dg-test -keep-output "./$bname$suffix" "$otherflags $flags" ""
> >>>+  dg-test -keep-output "./$bname$suffix" "-gno-record-gcc-switches 
> >>>$otherflags $flags" ""
> >>
> >It is only necessary if somebody wants to make -grecord-gcc-switches
> >the default (for bootstrap/regtest I've tweaked common.opt to do that
> >to test it better).  PCH is a big mess and screws debuginfo in many ways,
> >in this case it was just small differences in DW_AT_producer, but
> >we have e.g. ICEs with PCH and -feliminate-dwarf-dups etc.
> 
> Why would PCH change DW_AT_producer?  Because we're restoring
> single_comp_unit_die from the PCH?  Then perhaps we should set
> DW_AT_producer in output_comp_unit rather than gen_compile_unit_die.

output_comp_unit is too late, because at that point has the form been
finalized and sizes calculated etc.  But at the start of dwarf2out_finish
it is possible.
Here is a PCH friendly variant of the patch which tries to set right
producer from the start, but early in dwarf2out_finish double checks
if PCH hasn't changed it and if it did, updates it back to the expected
string.

2011-07-14  Jakub Jelinek  

PR other/32998
* common.opt (grecord-gcc-switches, gno-record-gcc-switches): New
options.
* dwarf2out.c: Include opts.h.
(dchar_p): New typedef.  Define heap VEC for it.
(producer_string): New variable.
(gen_producer_string): New function.
(gen_compile_unit_die): Use it.
(dwarf2out_finish): Fix up comp_unit_die () DW_AT_producer
if needed.
* Makefile.in (dwarf2out.o): Depend on $(OPTS_H).

--- gcc/common.opt.jj   2011-07-13 17:31:09.0 +0200
+++ gcc/common.opt  2011-07-14 10:36:40.0 +0200
@@ -2184,6 +2184,14 @@ ggdb
 Common JoinedOrMissing
 Generate debug information in default extended format
 
+gno-record-gcc-switches
+Common RejectNegative Var(dwarf_record_gcc_switches,0) Init(0)
+Don't record gcc command line switches in DWARF DW_AT_producer.
+
+grecord-gcc-switches
+Common RejectNegative Var(dwarf_record_gcc_switches,1)
+Record gcc command line switches in DWARF DW_AT_producer.
+
 gstabs
 Common JoinedOrMissing Negative(gstabs+)
 Generate debug information in STABS format
--- gcc/dwarf2out.c.jj  2011-07-13 17:31:18.0 +0200
+++ gcc/dwarf2out.c 2011-07-14 11:04:50.0 +0200
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.  
 #include "tree-pass.h"
 #include "tree-flow.h"
 #include "cfglayout.h"
+#include "opts.h"
 
 static void dwarf2out_source_line (unsigned int, const char *, int, bool);
 static rtx last_var_location_insn;
@@ -18108,13 +18109,125 @@ gen_ptr_to_mbr_type_die (tree type, dw_d
   add_type_attribute (ptr_die, TREE_TYPE (type), 0, 0, context_die);
 }
 
+typedef const char *dchar_p; /* For DEF_VEC_P.  */
+DEF_VEC_P(dchar_p);
+DEF_VEC_ALLOC_P(dchar_p,heap);
+
+static char *producer_string;
+
+/* Return a heap allocated producer string including command line options
+   if -grecord-gcc-switches.  */
+
+static char *
+gen_producer_string (void)
+{
+  size_t j;
+  VEC(dchar_p, heap) *switches = NULL;
+  const char *language_string = lang_hooks.name;
+  char *producer, *tail;
+  const char *p;
+  size_t len = dwarf_record_gcc_switches ? 0 : 3;
+  size_t plen = strlen (language_string) + 1 + strlen (version_string);
+
+  for (j = 1; dwarf_record_gcc_switches && j < save_decoded_options_count; j++)
+switch (save_decoded_options[j].opt_index)
+  {
+  case OPT_o:
+  case OPT_d:
+  case OPT_dumpbase:
+  case OPT_dumpdir:
+  case OPT_auxbase:
+  case OPT_auxbase_strip:
+  case OPT_quiet:
+  case OPT_version:
+  case OPT_v:
+  case OPT_w:
+  case OPT_L:
+  case OPT_D:
+  case OPT_I:
+  case OPT_U:
+  case OPT_SPECIAL_unknown:
+  case OPT_SPECIAL_ignore:
+  case OPT_SPECIAL_program_name:
+  case OPT_SPECIAL_input_file:
+  case OPT_grecord_gcc_switches:
+  case OPT_gno_record_gcc_switches:
+  case OPT__output_pch_:
+  case OPT_fdiagnostics_show_location_:
+  case OPT_fdiagnostics_show_option:
+  case OPT:
+  case OPT__sysroot_:
+   /* Ignore these.  */
+   continue;
+  default:
+   if (save_decoded_options[j].orig_option_with_args_text[0] != '-')
+ continue;
+   switch (save_decoded_options[j].orig_option_with_args_text[1])
+ {
+ case 'M':
+ case 'i':
+ case 'W':
+   continue;
+ case 'n':
+   if (save_decoded_options[j].orig_option_with_args_text[2] == 'o')
+ continue;
+   break;
+ case 'f':
+   if (strncmp (save_decoded_options[j].orig_option_with_args_te

Re: [PATCH] widening_mul: Do cost check when propagating mult into plus/minus expressions

2011-07-14 Thread Richard Guenther

On Wed, Jul 13, 2011 at 11:49 PM, Steven Bosscher  wrote:
> On Wed, Jul 13, 2011 at 4:34 PM, Richard Guenther
>  wrote:
>> On Wed, Jul 13, 2011 at 3:13 PM, Andreas Krebbel
>>  wrote:
>>> Hi,
>>>
>>> the widening_mul pass might increase the number of multiplications in
>>> the code by transforming
>>>
>>> a = b * c
>>> d = a + 2
>>> e = a + 3
>>>
>>> into:
>>>
>>> d = b * c + 2
>>> e = b * c + 3
>>>
>>> under the assumption that an FMA instruction is not more expensive
>>> than a simple add.  This certainly isn't always true.  While e.g. on
>>> s390 an fma is indeed not slower than an add execution-wise it has
>>> disadvantages regarding instruction grouping.  It doesn't group with
>>> any other instruction what has a major impact on the instruction
>>> dispatch bandwidth.
>>>
>>> The following patch tries to figure out the costs for adds, mults and
>>> fmas by building an RTX and asking the backends cost function in order
>>> to estimate whether it is whorthwhile doing the transformation.
>>>
>>> With that patch the 436.cactus hotloop contains 28 less
>>> multiplications than before increasing performance slightly (~2%).
>>>
>>> Bootstrapped and regtested on x86_64 and s390x.
>>
>> Ick ;)
>
> +1
>
>> Maybe this is finally the time to introduce target hook(s) to
>> get us back costs for trees?  For this case we'd need two
>> actually, or just one - dependent on what finegrained information
>> we pass.  Choices:
>>
>>  tree_code_cost (enum tree_code)
>>  tree_code_cost (enum tree_code, enum machine_mode mode)
>>  unary_cost (enum tree_code, tree actual_arg0) // args will be mostly
>> SSA names or constants, but at least they are typed - works for
>> mixed-typed operations
>>  binary_cost (...)
>>  ...
>>  unary_cost (enum tree_code, enum tree_code arg0_kind) // constant
>> vs. non-constant arg, but lacks type/mode
>
> Or maybe add a cost function for all named insns (i.e.
> http://gcc.gnu.org/onlinedocs/gccint/Standard-Names.html#Standard-Names)?
> I think that any form of lower GIMPLE will not be so low level that
> more combinations will exist than the available named patterns. It
> should be possible to write a gen* tool using rtx_costs to compute
> some useful cost metric for all named patterns. How complicated that
> could be (modes, reg vs. mem, etc.), I don't know... But at least that
> way we don't end up with multiple target costs depending on the IR in
> use.

Yeah, it occured to me as well that when we look for supportable operations
via optabs the same mechanism should also be able to provide a cost,
maybe as simple as attaching one to the named expanders.

Generating RTL from GIMPLE passes just to be able to use rtx_cost is,
well ... gross.  Yes, we do it in IVOPTs (and that case is even more complex),
but I don't think we want to start doing it elsewhere (look how the vectorizer
for example uses new target hooks instead of generating vectorized RTL
and then using rtx_cost).

Richard.

> Ciao!
> Steven
>

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther

On Thu, Jul 14, 2011 at 11:31 AM, Richard Guenther
 wrote:
> 2011/7/14 Michael Meissner :
>> One minor note, you will need to update doc/invoke.texi to document the new
>> switch before checkin: -ftree-reassoc-width=
>
> You also need approval and somebody to review the patch before checkin.

Btw, rather than a new target hook and an option I suggest to use
a --param which default you can modify in the backend.

Richard.

> Richard.
>
>> --
>> Michael Meissner, IBM
>> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
>> meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899
>>
>

Re: [PATCH Atom][PR middle-end/44382] Tree reassociation improvement

2011-07-14 Thread Richard Guenther

2011/7/14 Michael Meissner :
> One minor note, you will need to update doc/invoke.texi to document the new
> switch before checkin: -ftree-reassoc-width=

You also need approval and somebody to review the patch before checkin.

Richard.

> --
> Michael Meissner, IBM
> 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
> meiss...@linux.vnet.ibm.com     fax +1 (978) 399-6899
>

1 2 >

1 - 100 of 113 matches

Mail list logo