http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46590

--- Comment #13 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-11-24 
13:50:21 UTC ---
The I/O parts of the FRE cost are due to value-numbering stores in
visit_reference_op_store.  They can be drastically cut by an equivalent
of (not generating code)

Index: trans-io.c
===================================================================
--- trans-io.c  (revision 167111)
+++ trans-io.c  (working copy)
@@ -1670,6 +1670,7 @@ build_dt (tree function, gfc_code * code
   gfc_init_block (&post_iu_block);

   var = gfc_create_var (st_parameter[IOPARM_ptype_dt].type, "dt_parm");
+  gfc_add_modify (&block, var, build_constructor (TREE_TYPE (var), NULL));

   set_error_locus (&block, var, &code->loc);

which constrains lifetime begin of the dt_parm structs (which have their
address taken and thus lifetime analysis will have a hard time).  That
prevents store value numbering to consider all dominated blocks (which
only contain the loops).  The above brings down the previous -O1 numbers to

 tree FRE              :  28.34 (18%) usr   0.39 (18%) sys  28.83 (18%) wall   
1906 kB ( 1%) ggc
 TOTAL                 : 159.87             2.17           162.72            
201091 kB

all other variables are re-used and thus their lifetime isn't limited.

The loads from original[] are the only unconstrained ones, handling
those exhibits quadratic behavior.  It walks up to the first statement
in the function which is

  MEM[(c_char * {ref-all})&original] = MEM[(c_char * {ref-all})&A.0];

and then fails to constant fold using the static initializer of A.0.

FRE optimizes away the self-assignments

  a(1:6:-5) =  a(1:6:-5)

it doesn't do anything useful to the other loops.

Overall it's not completely unreasonable what FRE does for a, b and original
(we could have propagated A.0 to all uses of original).  The I/O struct
walks are the only thing that would be nice to fix.

And of course maybe limit walking in general.

Reply via email to