Re: Merging CCP and VRP?

2005-03-26 Thread Diego Novillo
On Sat, Mar 26, 2005 at 12:00:43PM -0500, Kazu Hirata wrote:

> Have you considered merging CCP and VRP (as suggested by Kenny last
> year at the summit)?
> 
By merging, do you mean *replacing* CCP with VRP?  Yes, it's
doable.  No, it's not a good idea.

Because of its lattice evaluation, VRP is about 3x slower than
CCP.  Consider that we currently schedule a single VRP pass and 3
CCP passes.  On cc1-i-files, that single VRP accounts for ~4
seconds of compile time, the three CCP passes account for ~5
seconds of compile time.

Also consider that currently our VRP pass is pretty quick because
it does not propagate branch probabilities.  It only evaluates
probabilities 0 and 1.  I have plans to change it so that it can
feed information to branch prediction.  That will make it even
more heavyweight.

Furthermore, VRP only deals with GIMPLE registers.  Our CCP pass
can propagate load/store constants.  If we burdened VRP with
loads and stores, it would be even slower.

So, while VRP can subsume some of the actions of CCP, it's much
slower and can't really be run all that often.  It's fine to
allow it to do some constant propagation.  But morphing the two
passes into one will not gain us much.

> :
>   if (a_1 == 0) goto ; else goto ;
> 
> :;
>   a_2 = 0;
>   bar (a_2);  <-- a_2 isn't replaced with 0 yet!
> 
> Note that we don't have bar (0) at the end.  This is because currently
> VRP does not have the equivalent of substitute_and_fold.  We just use
> the range information to fold COND_EXPR; we don't fold each statement
> using constants and ranges gathered by VRP.
> 
Right.  And that is something that is easily doable in
vrp_finalize.  It's in my todo pile, but if you want to do it, go
right ahead.


> So I am thinking about inserting ASSERT_EXPRs up front *before* going
> into SSA, without much of pruning, and then run an enhanced version of
>
I was doing this originally.  It turns out to be easier and
faster to insert the assertions once we are in SSA form.  If you
go back in time in the VRP code, you'll see that it evolved from
there.

What we could do is insert the assertions, do the various passes
that take advantage of them and then remove the assertions once
they are no longer necessary.  I still haven't read in detail
your plan for using ASSERT_EXPRs in the jump threader, but at
first sight it sounded decent.

> statements just before hitting loop optimizers.  :-) I have not
> figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin
> says I just have to teach tree-vn.c how to deal with it.
> 
At one point, all the passes had to deal with ASSERT_EXPRs.
Mostly by ignoring them.  Which is additional, unwanted work
because some of them had to actively know about them being
nothing but fancy copy operations.  That gets in the way of their
work.  I think that ASSERT_EXPRs should only survive as long as
they're useful.

> Last but not least, I'm willing to do the work, but I'd like to be
> more or less on the same page with other people hacking these scalar
> optimizations before I do any significant work.
> 
Sure.  Go ahead.  My short term plan is to start merging the
various components into mainline.  I'll start with the
incremental SSA patches, followed by VRP, the CCP and copy-prop
improvements.  Perhaps we could leave the changes to the threader
in TCB for a little while longer, but first I need to read your
proposal in detail.

> By the way, looking at Kenny's slides from last year, one thing we
> have not implemented in our propagation engine is to process the CFG
> worklist in the DFS order of a dominator tree.  I haven't looked
> closely at this, but if the speed of propagation is a concern, we
> should come back to this.
> 
ISTR either stevenb or dberlin implementing a dom-order
propagation.  I think they got a minor speedup that could be
worth having.


Diego.


Re: Merging CCP and VRP?

2005-03-27 Thread Kazu Hirata
Hi Diego,

> By merging, do you mean *replacing* CCP with VRP?  Yes, it's
> doable.  No, it's not a good idea.

Understood.

Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to
run copy-prop before VRP.  Otherwise, we would end up with lots of

  D.18001_101 = D.18001_198;
  D.18011_102 = D.18001_101->head_;
  D.18001_12 = ASSERT_EXPR ;

Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is
older.  As a result, VRP does not notice that D.18001_198 != 0B.
Currently, we still have these even after copy prop because we don't
allow copy propagation between const and non-const pointers, which I
think is a bit too restrictive.

> At one point, all the passes had to deal with ASSERT_EXPRs.
> Mostly by ignoring them.  Which is additional, unwanted work
> because some of them had to actively know about them being
> nothing but fancy copy operations.  That gets in the way of their
> work.  I think that ASSERT_EXPRs should only survive as long as
> they're useful.

Yes, playing with VRP, I've realized that I am not interested in
ASSERT_EXPRs per se, but I am more interested in SSA_NAME_VALUE_RANGE
that VRP leaves us with.

> Sure.  Go ahead.  My short term plan is to start merging the various
> components into mainline.  I'll start with the incremental SSA
> patches, followed by VRP, the CCP and copy-prop improvements.
> Perhaps we could leave the changes to the threader in TCB for a
> little while longer, but first I need to read your proposal in
> detail.

Thank you.  It turns out that if I blindly insert ASSERT_EXPRs for
everything that allows us to infer a range, my jump threading pass
finds more opportunities than what the current DOM can find.  There
are certain classes of opportunities that my pass misses but DOM
catches, but I'll mention them in a separate message.

> ISTR either stevenb or dberlin implementing a dom-order
> propagation.  I think they got a minor speedup that could be
> worth having.

Huh, whey I talked to them on IRC they didn't seem to have implemented
this.  I'll try to get this issue one of these days.

Kazu Hirata


Re: Merging CCP and VRP?

2005-03-27 Thread Diego Novillo
On Sun, Mar 27, 2005 at 08:08:43PM -0500, Kazu Hirata wrote:

> Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to
> run copy-prop before VRP.  Otherwise, we would end up with lots of
> 
There is a copy-propagation pass before VRP.  Or do you mean
right before?  Sure, the ordering of these passes is in eternal
flux anyway.

> Currently, we still have these even after copy prop because we don't
> allow copy propagation between const and non-const pointers, which I
> think is a bit too restrictive.
> 
I assume these are blocked by may_propagate_copy.  What's
blocking these?  Do they have different alias set numbers or do
we need a conversion from the const to the non-const version?

> > ISTR either stevenb or dberlin implementing a dom-order
> > propagation.  I think they got a minor speedup that could be
> > worth having.
> 
> Huh, whey I talked to them on IRC they didn't seem to have implemented
> this.  I'll try to get this issue one of these days.
> 
Maybe I'm thinking of someone else.  I do remember somebody
playing with this.  Oh, well, it doesn't matter.  It's easy
enough to change at anytime.


Diego.


Re: Merging CCP and VRP?

2005-03-27 Thread Kazu Hirata
Hi Diego,

> There is a copy-propagation pass before VRP.  Or do you mean
> right before?  Sure, the ordering of these passes is in eternal
> flux anyway.

"Before", but doesn't have to be "right before".  The current ordering
is reasonable.

> > Currently, we still have these even after copy prop because we don't
> > allow copy propagation between const and non-const pointers, which I
> > think is a bit too restrictive.
> > 
> I assume these are blocked by may_propagate_copy.  What's
> blocking these?  Do they have different alias set numbers or do
> we need a conversion from the const to the non-const version?

Yes, they are blocked by may_propagate_copy.  Seems to be const
v.s. non-const pointer issue.  I haven't come up with a brakedown of
reasons why copies are blocked.

Kazu Hirata


Re: Merging CCP and VRP?

2005-03-28 Thread Steven Bosscher
On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote:
> Huh, whey I talked to them on IRC they didn't seem to have implemented
> this.  I'll try to get this issue one of these days.

Ehm.  I did in fact implement this.  The trouble was that inserting
blocks into the worklist got more expensive, so there was no overall
win.  You need a smart data structure to allow quick inserts and
quick traverses.  My implementation wasn't very smart, I mostly did
it to see if it would cause things to fall through the lattice more
quickly (and it did).

Gr.
Steven



Re: Merging CCP and VRP?

2005-03-28 Thread Diego Novillo
On Mon, Mar 28, 2005 at 04:08:49PM +0200, Steven Bosscher wrote:
> On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote:
> > Huh, whey I talked to them on IRC they didn't seem to have implemented
> > this.  I'll try to get this issue one of these days.
> 
> Ehm.  I did in fact implement this.
>
Aha!  So I'm not going insane.  Thanks.


Diego.


Re: Merging CCP and VRP?

2005-03-28 Thread Daniel Berlin
On Mon, 2005-03-28 at 16:08 +0200, Steven Bosscher wrote:
> On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote:
> > Huh, whey I talked to them on IRC they didn't seem to have implemented
> > this.  I'll try to get this issue one of these days.
> 
> Ehm.  I did in fact implement this.  The trouble was that inserting
> blocks into the worklist got more expensive, so there was no overall
> win.  You need a smart data structure to allow quick inserts and
> quick traverses.

You'd have to use a priority queue.

Or you could just maintain the relative ordering you get from starting
in some order.
Dunno which ends up being better.
Probably varies from program to program.

>   My implementation wasn't very smart, I mostly did
> it to see if it would cause things to fall through the lattice more
> quickly (and it did).




Re: Merging CCP and VRP?

2005-03-30 Thread Jeffrey A Law
On Sun, 2005-03-27 at 20:08 -0500, Kazu Hirata wrote:
> Hi Diego,
> 
> > By merging, do you mean *replacing* CCP with VRP?  Yes, it's
> > doable.  No, it's not a good idea.
> 
> Understood.
> 
> Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to
> run copy-prop before VRP.  Otherwise, we would end up with lots of
> 
>   D.18001_101 = D.18001_198;
>   D.18011_102 = D.18001_101->head_;
>   D.18001_12 = ASSERT_EXPR ;
> 
> Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is
> older.  As a result, VRP does not notice that D.18001_198 != 0B.
> Currently, we still have these even after copy prop because we don't
> allow copy propagation between const and non-const pointers, which I
> think is a bit too restrictive.
Before we go a lot further with this, I'd like to make sure we're first
all on the same page regarding the semantics of ASSERT_EXPR and whether
or not ASSERT_EXPRs appear on the RHS of MODIFY_EXPRs.

When I looked at using a similar mechanism to store context sensitive
equivalences it became very clear that context sensitive equivalences
must not be recorded with a new assignment.

ie, if we have

if (a_1 == 0)
  {
  }


We do not want to turn that into

if (a_1 == 0)
  {
a_2 = 0;
  }

OR

if (a_1 == 0)
  {
a_2 = ASSERT_EXPR ...
  }

Whatever scheme we use to explicitly expose context sensitive 
equivalences in the IL needs to be a pure expression.

This is documented in one of the many compile-time performance PRs
from 2004.

And, yes, it is imperative that we const/copy propagate into
ASSERT_EXPRs to the fullest extent possible.  Otherwise we lose a lot
of threading opportunities.


Jeff



Re: Merging CCP and VRP?

2005-03-30 Thread Diego Novillo
On Wed, Mar 30, 2005 at 10:58:39AM -0700, Jeffrey A Law wrote:

> Whatever scheme we use to explicitly expose context sensitive 
> equivalences in the IL needs to be a pure expression.
> 
Well, that's the fundamental mechanism behind ASSERT_EXPRs and
VRP.  Remember more details about the problem?  You're being
pretty vague.

> And, yes, it is imperative that we const/copy propagate into
> ASSERT_EXPRs to the fullest extent possible.  Otherwise we lose a lot
> of threading opportunities.
> 
ASSERT_EXPRs are regular expressions, so this is not a problem.


Thanks.  Diego.


Re: Merging CCP and VRP?

2005-03-30 Thread Jeffrey A Law
On Wed, 2005-03-30 at 13:22 -0500, Diego Novillo wrote:
> On Wed, Mar 30, 2005 at 10:58:39AM -0700, Jeffrey A Law wrote:
> 
> > Whatever scheme we use to explicitly expose context sensitive 
> > equivalences in the IL needs to be a pure expression.
> > 
> Well, that's the fundamental mechanism behind ASSERT_EXPRs and
> VRP.  Remember more details about the problem?  You're being
> pretty vague.
We'd have to go back and find the PR.  I don't remember all the
details, but the problem was big enough to make ASSERT_EXPRs a
far inferior solution to the others I'd looked at for recording
context sensitive equivalences.

> 
> > And, yes, it is imperative that we const/copy propagate into
> > ASSERT_EXPRs to the fullest extent possible.  Otherwise we lose a lot
> > of threading opportunities.
> > 
> ASSERT_EXPRs are regular expressions, so this is not a problem.
It is if you don't run const/copy propagation immediately after
insertion of the ASSERT_EXPRs :-)

jeff




Re: Merging CCP and VRP?

2005-03-30 Thread Kazu Hirata
Hi Jeff,

> We'd have to go back and find the PR.  I don't remember all the
> details, but the problem was big enough to make ASSERT_EXPRs a
> far inferior solution to the others I'd looked at for recording
> context sensitive equivalences.

Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP,
etc, all take a lot of time although using the information gathered by
VRP for jump threading purposes is pretty straightforward and fast.

So I am sandwiched between two things.

1) DOM is fairly fast with its iteration is limited, which shouldn't
   be too restrictive if we are running CCP/copy-prop before DOM.  But
   DOM rebuilds information built by other passes, like keeping track
   of nonzero-ness, const/copy table, available expr table, etc.

2) VRP is slow and even slower with PHI node insertion.  But if we use
   its result for jump threading, we are not duplicating much code.
   Plus the context-sensitive equivalences are globally visible and
   fine grained, so we don't have to mix identification of jump
   threading opportunities into a dom walk.  (ASSERT_EXPRs and PHI
   nodes essentially splits a single variable into multiple variables,
   making a bunch of smaller live ranges to allow fine-grained value
   ranges to be stored.)  Having said all this, I can't think of uses
   of these fine-grained context-sensitive equivalences other than
   jump threading, so I can't justify constructing this much
   information just for jump threading.

> > > And, yes, it is imperative that we const/copy propagate into
> > > ASSERT_EXPRs to the fullest extent possible.  Otherwise we lose a lot
> > > of threading opportunities.
> > > 
> > ASSERT_EXPRs are regular expressions, so this is not a problem.
> It is if you don't run const/copy propagation immediately after
> insertion of the ASSERT_EXPRs :-)

It may also be a problem to not run const/copy prop before insert
ASSERT_EXPRs as noted in the problem of

>   D.18001_101 = D.18001_198;
>   D.18011_102 = D.18001_101->head_;
>   D.18001_12 = ASSERT_EXPR ;

where we are asserting a copy of D.18001_198, and D.18001_198 itself
may also be used later.

Kazu Hirata


Re: Merging CCP and VRP?

2005-03-30 Thread Jeffrey A Law
On Wed, 2005-03-30 at 14:19 -0500, Kazu Hirata wrote:
> Hi Jeff,
> 
> > We'd have to go back and find the PR.  I don't remember all the
> > details, but the problem was big enough to make ASSERT_EXPRs a
> > far inferior solution to the others I'd looked at for recording
> > context sensitive equivalences.
> 
> Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP,
> etc, all take a lot of time although using the information gathered by
> VRP for jump threading purposes is pretty straightforward and fast.
> 
> So I am sandwiched between two things.
> 
> 1) DOM is fairly fast with its iteration is limited, which shouldn't
>be too restrictive if we are running CCP/copy-prop before DOM.  But
>DOM rebuilds information built by other passes, like keeping track
>of nonzero-ness, const/copy table, available expr table, etc.
Right.  This is the biggest thing I want to attack once I wrap up the
threading changes and aliasing changes (and being stuck in Atlanta
for 3 days due to the recent storms was certainly a step backwards in
wrapping up the threading changes :(

I've got some ideas on how to avoid the iteration step currently
inside DOM.  I want to start playing with those to see if it actually
makes sense.




> 
> 2) VRP is slow and even slower with PHI node insertion.  But if we use
>its result for jump threading, we are not duplicating much code.
>Plus the context-sensitive equivalences are globally visible and
>fine grained, so we don't have to mix identification of jump
>threading opportunities into a dom walk.
Except that you need to know what expressions are currently available
when you do jump threading.  We currently get that information from the
DOM walk.  We might be able to avoid that in the future, but I don't
think we can do that just yet.

Jeff