http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904

--- Comment #38 from rguenther at suse dot de <rguenther at suse dot de> 
2011-12-05 08:27:08 UTC ---
On Fri, 2 Dec 2011, ebotcazou at gcc dot gnu.org wrote:

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50904
> 
> --- Comment #35 from Eric Botcazou <ebotcazou at gcc dot gnu.org> 2011-12-02 
> 21:21:15 UTC ---
> > One thing I notice (and that's the only difference I can spot at the tree
> > level) is that we do not CSE the **2s of
> 
> There are many missed hoisting opportunities, with or without the switch. 
> There are just a few more with the switch, hence the performance regression.

Most of them (not sure if you mean those) are because they are
considered "cheap" by LIM and thus are not moved:

vect_px2gauss.123_641 = &x2gauss;
  invariant up to level 1, cost 1.

vect_cst_.126_659 = { 1.0e+0, 1.0e+0 };
  invariant up to level 1, cost 1.

vect_cst_.128_661 = {D.2126_109, D.2126_109};
  invariant up to level 1, cost 1.
...

vect_px2gauss.120_20 = vect_px2gauss.123_641 + 16;
  invariant up to level 1, cost 2.
...

ivtmp.176_899 = 1;
  invariant up to level 1, cost 1.
...

vect_px2gauss.120_649 = vect_px2gauss.120_336;
  invariant up to level 1, cost 5.
...

ISTR discussing to remove all cost considerations for tree
level loop invariant motion and simply move everything possible
(PRE for example doesn't consider any costs and moves all
invariants).

If you use --param lim-expensive=1 you get all invariants moved
on the tree level - does that solve the slowdown issue?
The issue is of course that this might increase register pressure
as we are not good in re-materializing for example constants
inside a loop.

I'll give --param lim-expensive=1 a try on SPEC 2k6

Reply via email to