Re: [PATCH][0/4][RFC] RPO style value-numbering

2018-08-24 Thread Richard Biener
On Wed, 1 Aug 2018, Richard Biener wrote:

> 
> This rewrites the value-numbering algorithm used for FRE and PRE from
> SSA SCC based to RPO based, thus switching from an algorithm that
> handles SSA SCCs optimistically to one that handles CFG SCCs 
> optimistically.
> 
> The main motivation for this besides being more optimistic was that
> adding CFG context sensitive info is easier in RPO style.  Also
> tracking availability and thus making expression simplification not
> based on values like with SCCVN is possible which allows us to remove
> all the warts that scrap side-info we store on SSA names.  It also
> fixes PR86554 which is another manifestation of the same issue.
> 
> Another motivation was that we're in the need of applying value-numbering
> on regions like when unrolling loops or as part of cleanup on code
> generated by other passes like the vectorizer.  Thus this rewrite
> makes sure that the value-numbering works efficiently on regions
> (though in a non-iterative mode), avoiding work and space that is
> on the order of the function size rather than the region size to work on.
> Sofar the GIMPLE unroller makes use of this, scrapping its own
> simple constant propagation engine.  I expect that DOM could get rid of
> its value-numbering and instead use a non-iterative RPO-VN run as well.
> 
> The patch adds something called predication but it just implements
> what I put on top of SCCVN to not regress in that area.
> 
> With more optimistic handling comes compile-time regressions and
> without limiting I can observe for example a 8% compile-time regression
> on 416.gamess which contains loop depths exceeding 8.  The patch now
> contains heuristics to selectively value-number backedges optimistically
> or not and chooses to do so for the innermost 3 and the outermost loop
> of a nest (controlled by --param rpo-vn-max-loop-depth).  I have not
> yet played with other values of the param nor re-measured compile-time
> for SPEC 2k6.
> 
> I've bootstrapped and tested the series on x86_64-unknown-linux-gnu
> with bootstrap-O1 and regular bootstrap.
> 
> I plan to go forward with this for GCC 9.

This.  I'm momentarily installing 1/4, will re-post 4/4 and install
that on Monday.

Richard.


[PATCH][0/4][RFC] RPO style value-numbering

2018-08-01 Thread Richard Biener


This rewrites the value-numbering algorithm used for FRE and PRE from
SSA SCC based to RPO based, thus switching from an algorithm that
handles SSA SCCs optimistically to one that handles CFG SCCs 
optimistically.

The main motivation for this besides being more optimistic was that
adding CFG context sensitive info is easier in RPO style.  Also
tracking availability and thus making expression simplification not
based on values like with SCCVN is possible which allows us to remove
all the warts that scrap side-info we store on SSA names.  It also
fixes PR86554 which is another manifestation of the same issue.

Another motivation was that we're in the need of applying value-numbering
on regions like when unrolling loops or as part of cleanup on code
generated by other passes like the vectorizer.  Thus this rewrite
makes sure that the value-numbering works efficiently on regions
(though in a non-iterative mode), avoiding work and space that is
on the order of the function size rather than the region size to work on.
Sofar the GIMPLE unroller makes use of this, scrapping its own
simple constant propagation engine.  I expect that DOM could get rid of
its value-numbering and instead use a non-iterative RPO-VN run as well.

The patch adds something called predication but it just implements
what I put on top of SCCVN to not regress in that area.

With more optimistic handling comes compile-time regressions and
without limiting I can observe for example a 8% compile-time regression
on 416.gamess which contains loop depths exceeding 8.  The patch now
contains heuristics to selectively value-number backedges optimistically
or not and chooses to do so for the innermost 3 and the outermost loop
of a nest (controlled by --param rpo-vn-max-loop-depth).  I have not
yet played with other values of the param nor re-measured compile-time
for SPEC 2k6.

I've bootstrapped and tested the series on x86_64-unknown-linux-gnu
with bootstrap-O1 and regular bootstrap.

I plan to go forward with this for GCC 9.

Comments?

Thanks,
Richard.