[Bug other/63155] [4.9/5 Regression] memory hog

2014-10-30 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155

Jakub Jelinek jakub at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|4.9.2   |4.9.3

--- Comment #4 from Jakub Jelinek jakub at gcc dot gnu.org ---
GCC 4.9.2 has been released.


[Bug other/63155] [4.9/5 Regression] memory hog

2014-09-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155

Richard Biener rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2014-09-03
   Target Milestone|--- |4.9.2
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener rguenth at gcc dot gnu.org ---
Clearly caused by the correctness fix for setjmp to wire abnormal edges.

For me it is out-of-ssa which uses too much memory while building
the conflict graph.

We have gigantic PHI nodes here:

_10263(ab) = PHI _109925(D)(ab)(2),, _10592(ab)(1489)

it's fast when optimizing.

At -O0 we have a _lot_ more anonymous SSA names.

-O1:

  bb 4:
  # _1(ab) = PHI _1902(3), _2(ab)(5)
  _1905 = _setjmp (_1(ab));
  if (_1905 == 0)
goto bb 6;
  else
goto bb 8;

  bb 5
  # _2(ab) = PHI _1895(D),  single gigantic PHI

-O0:

  bb 4:
  # _1(ab) = PHI _398164(3), _2(ab)(5)
  # _632(ab) = PHI _397532(D)(ab)(3), _633(ab)(5)
  # _1263(ab) = PHI _397533(D)(ab)(3), _1264(ab)(5)
  # _1894(ab) = PHI _397534(D)(ab)(3), _1895(ab)(5)
  # _2525(ab) = PHI _397535(D)(ab)(3), _2526(ab)(5)
...
  # _396900(ab) = PHI _398160(D)(ab)(3), _396901(ab)(5)
  _398165 = _setjmp (_1(ab));
  if (_398165 == 0)
goto bb 6;
  else
goto bb 8;

  bb 5
  # _2(ab) = PHI _397531(D)(ab)(2)...

  # _396901(ab) = PHI _398160(D)(ab)(2), _3...

gazillion of gigantic PHIs.  And very many PHIs in every block.

It's into-SSA that introduces the difference for the PHI nodes
but already GIMPLIFICATION that introduces very many more
temporaries which is the underlying issue (lookup_tmp_var
!optimize check).

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 214810)
+++ gcc/gimplify.c  (working copy)
@@ -476,7 +476,7 @@ lookup_tmp_var (tree val, bool is_formal
  block, which means it will go into memory, causing much extra
  work in reload and final and poorer code generation, outweighing
  the extra memory allocation here.  */
-  if (!optimize || !is_formal || TREE_SIDE_EFFECTS (val))
+  if (!is_formal || TREE_SIDE_EFFECTS (val))
 ret = create_tmp_from_val (val);
   else
 {

fixes it (but it means that changing the testcase to use more distinct
user variables would produce the same issue even when optimizing).


[Bug other/63155] [4.9/5 Regression] memory hog

2014-09-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155

--- Comment #2 from Richard Biener rguenth at gcc dot gnu.org ---
I wonder why we need to explicitely represent abnormal PHIs in the dispatcher.
All incoming edges are abnormal and all SSA names have to be coalesced anyway.
Thus we could instead have

  bb 5:
/* Not: # _2(ab) = PHI _17(D)(ab)(2), _1(ab)(6), _1(ab)(7), _3(ab)(11),
_3(ab)(12), _4(ab)(15), _4(ab)(16), _5(ab)(20), _5(ab)(21), _5(ab)(22) */
  ABNORMAL_DISPATCHER (0);
  _2(ab) = D.12345;

or simply rewrite all must-coalesce vars out-of-SSA?  (or not into SSA
in the first place)

The question is whether accesses to them should be loads/stores (I think so)
and if that will cause other similar issues.

We'd have to factor abnormal edges into a block to a separate forwarder
of course, with a load of all abnormal vars.

Anyway, not sure why the gimplify code is disabled for -O0 (or why we
don't re-use formal temps more aggressively as they become anonymous
SSA names later anyway).


[Bug other/63155] [4.9/5 Regression] memory hog

2014-09-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155

--- Comment #3 from Richard Biener rguenth at gcc dot gnu.org ---
So the issue is that the setjmp argument needs two temporaries:

  D.2832 = Unity.CurrentAbortFrame;
  D.2833 = Unity.AbortFrame[D.2832];

  bb 18:
  D.2834 = _setjmp (D.2833);

and the EH edge going into the _setjmp call has to merge those through
the abnormal dispatcher.  And that way it receives all of them.  Hmm.

Huh.  Without the abnormal dispatcher they should just get default defs
everywhere (but still many PHI nodes).  Maybe that would be more light-weight.