Re: Merging CCP and VRP?
On Sat, Mar 26, 2005 at 12:00:43PM -0500, Kazu Hirata wrote: > Have you considered merging CCP and VRP (as suggested by Kenny last > year at the summit)? > By merging, do you mean *replacing* CCP with VRP? Yes, it's doable. No, it's not a good idea. Because of its lattice evaluation, VRP is about 3x slower than CCP. Consider that we currently schedule a single VRP pass and 3 CCP passes. On cc1-i-files, that single VRP accounts for ~4 seconds of compile time, the three CCP passes account for ~5 seconds of compile time. Also consider that currently our VRP pass is pretty quick because it does not propagate branch probabilities. It only evaluates probabilities 0 and 1. I have plans to change it so that it can feed information to branch prediction. That will make it even more heavyweight. Furthermore, VRP only deals with GIMPLE registers. Our CCP pass can propagate load/store constants. If we burdened VRP with loads and stores, it would be even slower. So, while VRP can subsume some of the actions of CCP, it's much slower and can't really be run all that often. It's fine to allow it to do some constant propagation. But morphing the two passes into one will not gain us much. > : > if (a_1 == 0) goto ; else goto ; > > :; > a_2 = 0; > bar (a_2); <-- a_2 isn't replaced with 0 yet! > > Note that we don't have bar (0) at the end. This is because currently > VRP does not have the equivalent of substitute_and_fold. We just use > the range information to fold COND_EXPR; we don't fold each statement > using constants and ranges gathered by VRP. > Right. And that is something that is easily doable in vrp_finalize. It's in my todo pile, but if you want to do it, go right ahead. > So I am thinking about inserting ASSERT_EXPRs up front *before* going > into SSA, without much of pruning, and then run an enhanced version of > I was doing this originally. It turns out to be easier and faster to insert the assertions once we are in SSA form. If you go back in time in the VRP code, you'll see that it evolved from there. What we could do is insert the assertions, do the various passes that take advantage of them and then remove the assertions once they are no longer necessary. I still haven't read in detail your plan for using ASSERT_EXPRs in the jump threader, but at first sight it sounded decent. > statements just before hitting loop optimizers. :-) I have not > figured out how to deal with ASSERT_EXPRs in FRE, but Daniel Berlin > says I just have to teach tree-vn.c how to deal with it. > At one point, all the passes had to deal with ASSERT_EXPRs. Mostly by ignoring them. Which is additional, unwanted work because some of them had to actively know about them being nothing but fancy copy operations. That gets in the way of their work. I think that ASSERT_EXPRs should only survive as long as they're useful. > Last but not least, I'm willing to do the work, but I'd like to be > more or less on the same page with other people hacking these scalar > optimizations before I do any significant work. > Sure. Go ahead. My short term plan is to start merging the various components into mainline. I'll start with the incremental SSA patches, followed by VRP, the CCP and copy-prop improvements. Perhaps we could leave the changes to the threader in TCB for a little while longer, but first I need to read your proposal in detail. > By the way, looking at Kenny's slides from last year, one thing we > have not implemented in our propagation engine is to process the CFG > worklist in the DFS order of a dominator tree. I haven't looked > closely at this, but if the speed of propagation is a concern, we > should come back to this. > ISTR either stevenb or dberlin implementing a dom-order propagation. I think they got a minor speedup that could be worth having. Diego.
Re: Merging CCP and VRP?
Hi Diego, > By merging, do you mean *replacing* CCP with VRP? Yes, it's > doable. No, it's not a good idea. Understood. Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to run copy-prop before VRP. Otherwise, we would end up with lots of D.18001_101 = D.18001_198; D.18011_102 = D.18001_101->head_; D.18001_12 = ASSERT_EXPR ; Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is older. As a result, VRP does not notice that D.18001_198 != 0B. Currently, we still have these even after copy prop because we don't allow copy propagation between const and non-const pointers, which I think is a bit too restrictive. > At one point, all the passes had to deal with ASSERT_EXPRs. > Mostly by ignoring them. Which is additional, unwanted work > because some of them had to actively know about them being > nothing but fancy copy operations. That gets in the way of their > work. I think that ASSERT_EXPRs should only survive as long as > they're useful. Yes, playing with VRP, I've realized that I am not interested in ASSERT_EXPRs per se, but I am more interested in SSA_NAME_VALUE_RANGE that VRP leaves us with. > Sure. Go ahead. My short term plan is to start merging the various > components into mainline. I'll start with the incremental SSA > patches, followed by VRP, the CCP and copy-prop improvements. > Perhaps we could leave the changes to the threader in TCB for a > little while longer, but first I need to read your proposal in > detail. Thank you. It turns out that if I blindly insert ASSERT_EXPRs for everything that allows us to infer a range, my jump threading pass finds more opportunities than what the current DOM can find. There are certain classes of opportunities that my pass misses but DOM catches, but I'll mention them in a separate message. > ISTR either stevenb or dberlin implementing a dom-order > propagation. I think they got a minor speedup that could be > worth having. Huh, whey I talked to them on IRC they didn't seem to have implemented this. I'll try to get this issue one of these days. Kazu Hirata
Re: Merging CCP and VRP?
On Sun, Mar 27, 2005 at 08:08:43PM -0500, Kazu Hirata wrote: > Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to > run copy-prop before VRP. Otherwise, we would end up with lots of > There is a copy-propagation pass before VRP. Or do you mean right before? Sure, the ordering of these passes is in eternal flux anyway. > Currently, we still have these even after copy prop because we don't > allow copy propagation between const and non-const pointers, which I > think is a bit too restrictive. > I assume these are blocked by may_propagate_copy. What's blocking these? Do they have different alias set numbers or do we need a conversion from the const to the non-const version? > > ISTR either stevenb or dberlin implementing a dom-order > > propagation. I think they got a minor speedup that could be > > worth having. > > Huh, whey I talked to them on IRC they didn't seem to have implemented > this. I'll try to get this issue one of these days. > Maybe I'm thinking of someone else. I do remember somebody playing with this. Oh, well, it doesn't matter. It's easy enough to change at anytime. Diego.
Re: Merging CCP and VRP?
Hi Diego, > There is a copy-propagation pass before VRP. Or do you mean > right before? Sure, the ordering of these passes is in eternal > flux anyway. "Before", but doesn't have to be "right before". The current ordering is reasonable. > > Currently, we still have these even after copy prop because we don't > > allow copy propagation between const and non-const pointers, which I > > think is a bit too restrictive. > > > I assume these are blocked by may_propagate_copy. What's > blocking these? Do they have different alias set numbers or do > we need a conversion from the const to the non-const version? Yes, they are blocked by may_propagate_copy. Seems to be const v.s. non-const pointer issue. I haven't come up with a brakedown of reasons why copies are blocked. Kazu Hirata
Re: Merging CCP and VRP?
On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote: > Huh, whey I talked to them on IRC they didn't seem to have implemented > this. I'll try to get this issue one of these days. Ehm. I did in fact implement this. The trouble was that inserting blocks into the worklist got more expensive, so there was no overall win. You need a smart data structure to allow quick inserts and quick traverses. My implementation wasn't very smart, I mostly did it to see if it would cause things to fall through the lattice more quickly (and it did). Gr. Steven
Re: Merging CCP and VRP?
On Mon, Mar 28, 2005 at 04:08:49PM +0200, Steven Bosscher wrote: > On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote: > > Huh, whey I talked to them on IRC they didn't seem to have implemented > > this. I'll try to get this issue one of these days. > > Ehm. I did in fact implement this. > Aha! So I'm not going insane. Thanks. Diego.
Re: Merging CCP and VRP?
On Mon, 2005-03-28 at 16:08 +0200, Steven Bosscher wrote: > On Mar 28, 2005 03:08 AM, Kazu Hirata <[EMAIL PROTECTED]> wrote: > > Huh, whey I talked to them on IRC they didn't seem to have implemented > > this. I'll try to get this issue one of these days. > > Ehm. I did in fact implement this. The trouble was that inserting > blocks into the worklist got more expensive, so there was no overall > win. You need a smart data structure to allow quick inserts and > quick traverses. You'd have to use a priority queue. Or you could just maintain the relative ordering you get from starting in some order. Dunno which ends up being better. Probably varies from program to program. > My implementation wasn't very smart, I mostly did > it to see if it would cause things to fall through the lattice more > quickly (and it did).
Re: Merging CCP and VRP?
On Sun, 2005-03-27 at 20:08 -0500, Kazu Hirata wrote: > Hi Diego, > > > By merging, do you mean *replacing* CCP with VRP? Yes, it's > > doable. No, it's not a good idea. > > Understood. > > Also, if we are inserting ASSERT_EXPRs, it seems to be a good idea to > run copy-prop before VRP. Otherwise, we would end up with lots of > > D.18001_101 = D.18001_198; > D.18011_102 = D.18001_101->head_; > D.18001_12 = ASSERT_EXPR ; > > Note that ASSERT_EXPR is on D.18001_101, not on D.18001_198, which is > older. As a result, VRP does not notice that D.18001_198 != 0B. > Currently, we still have these even after copy prop because we don't > allow copy propagation between const and non-const pointers, which I > think is a bit too restrictive. Before we go a lot further with this, I'd like to make sure we're first all on the same page regarding the semantics of ASSERT_EXPR and whether or not ASSERT_EXPRs appear on the RHS of MODIFY_EXPRs. When I looked at using a similar mechanism to store context sensitive equivalences it became very clear that context sensitive equivalences must not be recorded with a new assignment. ie, if we have if (a_1 == 0) { } We do not want to turn that into if (a_1 == 0) { a_2 = 0; } OR if (a_1 == 0) { a_2 = ASSERT_EXPR ... } Whatever scheme we use to explicitly expose context sensitive equivalences in the IL needs to be a pure expression. This is documented in one of the many compile-time performance PRs from 2004. And, yes, it is imperative that we const/copy propagate into ASSERT_EXPRs to the fullest extent possible. Otherwise we lose a lot of threading opportunities. Jeff
Re: Merging CCP and VRP?
On Wed, Mar 30, 2005 at 10:58:39AM -0700, Jeffrey A Law wrote: > Whatever scheme we use to explicitly expose context sensitive > equivalences in the IL needs to be a pure expression. > Well, that's the fundamental mechanism behind ASSERT_EXPRs and VRP. Remember more details about the problem? You're being pretty vague. > And, yes, it is imperative that we const/copy propagate into > ASSERT_EXPRs to the fullest extent possible. Otherwise we lose a lot > of threading opportunities. > ASSERT_EXPRs are regular expressions, so this is not a problem. Thanks. Diego.
Re: Merging CCP and VRP?
On Wed, 2005-03-30 at 13:22 -0500, Diego Novillo wrote: > On Wed, Mar 30, 2005 at 10:58:39AM -0700, Jeffrey A Law wrote: > > > Whatever scheme we use to explicitly expose context sensitive > > equivalences in the IL needs to be a pure expression. > > > Well, that's the fundamental mechanism behind ASSERT_EXPRs and > VRP. Remember more details about the problem? You're being > pretty vague. We'd have to go back and find the PR. I don't remember all the details, but the problem was big enough to make ASSERT_EXPRs a far inferior solution to the others I'd looked at for recording context sensitive equivalences. > > > And, yes, it is imperative that we const/copy propagate into > > ASSERT_EXPRs to the fullest extent possible. Otherwise we lose a lot > > of threading opportunities. > > > ASSERT_EXPRs are regular expressions, so this is not a problem. It is if you don't run const/copy propagation immediately after insertion of the ASSERT_EXPRs :-) jeff
Re: Merging CCP and VRP?
Hi Jeff, > We'd have to go back and find the PR. I don't remember all the > details, but the problem was big enough to make ASSERT_EXPRs a > far inferior solution to the others I'd looked at for recording > context sensitive equivalences. Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP, etc, all take a lot of time although using the information gathered by VRP for jump threading purposes is pretty straightforward and fast. So I am sandwiched between two things. 1) DOM is fairly fast with its iteration is limited, which shouldn't be too restrictive if we are running CCP/copy-prop before DOM. But DOM rebuilds information built by other passes, like keeping track of nonzero-ness, const/copy table, available expr table, etc. 2) VRP is slow and even slower with PHI node insertion. But if we use its result for jump threading, we are not duplicating much code. Plus the context-sensitive equivalences are globally visible and fine grained, so we don't have to mix identification of jump threading opportunities into a dom walk. (ASSERT_EXPRs and PHI nodes essentially splits a single variable into multiple variables, making a bunch of smaller live ranges to allow fine-grained value ranges to be stored.) Having said all this, I can't think of uses of these fine-grained context-sensitive equivalences other than jump threading, so I can't justify constructing this much information just for jump threading. > > > And, yes, it is imperative that we const/copy propagate into > > > ASSERT_EXPRs to the fullest extent possible. Otherwise we lose a lot > > > of threading opportunities. > > > > > ASSERT_EXPRs are regular expressions, so this is not a problem. > It is if you don't run const/copy propagation immediately after > insertion of the ASSERT_EXPRs :-) It may also be a problem to not run const/copy prop before insert ASSERT_EXPRs as noted in the problem of > D.18001_101 = D.18001_198; > D.18011_102 = D.18001_101->head_; > D.18001_12 = ASSERT_EXPR ; where we are asserting a copy of D.18001_198, and D.18001_198 itself may also be used later. Kazu Hirata
Re: Merging CCP and VRP?
On Wed, 2005-03-30 at 14:19 -0500, Kazu Hirata wrote: > Hi Jeff, > > > We'd have to go back and find the PR. I don't remember all the > > details, but the problem was big enough to make ASSERT_EXPRs a > > far inferior solution to the others I'd looked at for recording > > context sensitive equivalences. > > Yes, inserting a bunch of ASSERT_EXPRs, updating SSA, running VRP, > etc, all take a lot of time although using the information gathered by > VRP for jump threading purposes is pretty straightforward and fast. > > So I am sandwiched between two things. > > 1) DOM is fairly fast with its iteration is limited, which shouldn't >be too restrictive if we are running CCP/copy-prop before DOM. But >DOM rebuilds information built by other passes, like keeping track >of nonzero-ness, const/copy table, available expr table, etc. Right. This is the biggest thing I want to attack once I wrap up the threading changes and aliasing changes (and being stuck in Atlanta for 3 days due to the recent storms was certainly a step backwards in wrapping up the threading changes :( I've got some ideas on how to avoid the iteration step currently inside DOM. I want to start playing with those to see if it actually makes sense. > > 2) VRP is slow and even slower with PHI node insertion. But if we use >its result for jump threading, we are not duplicating much code. >Plus the context-sensitive equivalences are globally visible and >fine grained, so we don't have to mix identification of jump >threading opportunities into a dom walk. Except that you need to know what expressions are currently available when you do jump threading. We currently get that information from the DOM walk. We might be able to avoid that in the future, but I don't think we can do that just yet. Jeff