On Sat, Apr 11, 2015 at 3:12 PM, Thomas Helland <thomashellan...@gmail.com> wrote: >> Yes, copy propagation probably won't be so useful once we have value >> range propagation; the former is a special case of the latter. Note >> that we have a nifty way of actually doing the constant folding >> (nir_constant_expressions.py and nir_constant_expressions.h), which >> you should still use if all the inputs are constant. >> > > I'll have a look at nir_constant_expressions to see what it's about.
It's just something you can feed the constant inputs to your ALU operation and get back the constant output (note that it doesn't handle swizzles for you though). The nifty part is that it's all autogenerated from the same source as the list of opcodes and their properties. For example, this line in nir_opcodes.py: binop("fadd", tfloat, commutative + associative, "src0 + src1") creates a binary operation called nir_op_fadd with floating-point inputs and outputs that's commutative and associative (note that binop() is a helper function which eventually calls opcode() and creates a binary operation where the input and output types are the same -- you have to look at opcode() itself to see everything we're capable of expressing). The "src0 + src1" part isn't just a comment though; it's actually by a C expression which takes two floating-point variables we've created called src0 and src1, and the result of the expression is stored to another floating-point variable called dst (you can also create one or more statements that reference dst directly, and if you don't then we automatically generate something like "dst = (expression);"). Since binop() creates a per-component operation, we know to loop over all the components and store each result in the right component of the destination, which we then send back to the caller. There's more detailed documentation on how the constant expressions work at the top of nir_opcodes.py. Jason's minmax branch was starting to extend this system to support optionally computing the range as well, but I'm also not sure either if that's a good idea or if we should just hand-roll the range calculations. Hand-rolling it seems potentially less elegant but safer and more straightforward. Also, a lot of the benefit of having the constant expressions inline is that it acts as a kind of documentation as well, but that's irrelevant for ranges. > I was thinking of making a separate struct with some flags and > the range of the assignment. I thought of having an is_constant > flag in there to keep control of if the value is constant. > Or do we have a metadata system for sticking useful > information in that gets shared between all passes? Yeah... you can see that that's essentially what I did with my work-in-progress. We do have a metadata system, although here I would think that the same pass would create and then use the information, so we can just keep it in an array indexed by the SSA value index. If other passes want range information, we could consider splitting it into an analysis part and an optimization part and using the metadata infrastructure. > >> > The GCC guys have used this engine to get copy propagation that >> > propagates >> > copies accross conditionals, maybe this makes such a solution more >> > interesting? >> >> I'm not so sure how useful such a general framework will be. Constant >> propagation that handles back-edges seems interesting, but I'm not >> sure it's worth the time to implement something this general as a >> first pass. >> > > I agree with you on this one. > If there was a potential for a lot of code > sharing it would be more appealing. > But if we drop having a separate constant propagation pass > then I see only VRP and copy prop. as potential users. > That's not a large user base for a generalized framework. > >> > >> > Connor: I just remembered you saying something about your freedesktop >> > git repo, so I poked around some and found that you have already done >> > some work on VRP based on SCCP? How far did you get? >> >> I started on it, but then I realized that the approach I was using was >> too cumbersome/complicated so I don't think what I have is too useful. >> Feel free to work on it yourself, although Jason and I have discussed >> it so we have some ideas of how to do it. I've written a few notes on >> this below that you may find useful. >> >> - I have a branch I created while working on VRP that you'll probably >> find useful: >> http://cgit.freedesktop.org/~cwabbott0/mesa/log/?h=nir-worklist >> . The first two commits are already in master, but the last two should >> be useful for implementing SCCP/VRP (although they'll need to be >> rebased, obviously). >> >> - There's a comment in the SCCP paper (5.3, Nodes versus Edges) that >> says: "An alternative way of implementing this would be to add nodes >> to the >> graph and then associate an ExecutableFlag with each node. An >> additional node must be inserted between any node that has more than >> one immediate successor and any successor node that has more than one >> immediate predecessor." > > This is the method mentioned in the VRP (If I remember correctly). Maybe... either way, we don't have to deal with it thanks to NIR not having critical edges. > > > >> I think this procedure is what's usually >> called "splitting critical edges"; in NIR, thanks to the structured >> control flow, there are never any critical edges except for one edge >> case you don't really have to care about too much (namely, an infinite >> loop with one basic block) and therefore you can just use the basic >> block worklist that I added in the branch mentioned above, rather than >> a worklist of basic block edges as the paper describes. >> >> - The reason my pass was becoming so cumbersome was because I was >> trying to solve two problems at once. First, there's actually >> propagating the ranges. Then, there's taking into account restrictions >> on range due to branch predicates. For example, if I have something >> like: >> >> if (x > 0) { >> y = max(x, 0); >> } >> >> then since the use of x is dominated by the then-branch of the if, x >> has to be greater than 0 and we can optimize it. This is a little >> contrived, but we have seen things like: >> >> if (foo) >> break; >> >> /* ... */ >> >> if (foo) >> break; >> >> in the wild, where we could get rid of the redundant break using this >> analysis by recognizing that since the second condition is dominated >> by the else-branch of the first, foo must be false there. I was trying >> to handle this by storing multiple lattice entries for the same SSA >> value, but it was becoming too messy. Instead, we can solve the first >> problem normally, and then to solve the second problem we can create >> new SSA values, using the standard SSA construction algorithm, any >> time where after a certain point the range of a value is restricted >> (namely, the condition of a branch or the index of an array >> dereference). In the first example, we would create a new value x2: >> > > I thnk I've seen this solution mentioned in some paper. > Maybe the ISRP pass mentioned below? I don't remember. > >> if (x > 0) { >> x2 = x; >> y = max(x2, 0); >> } >> >> and the VRP pass will keep track of "special" copies like the one from >> x to x2 that add restrictions on the range. After everything is >> finished, copy propagation and DCE will clean up the extra copies >> created. There's a paper on this somewhere, but I don't quite remember >> the name of it. I'm not sure if you'll be able to get to this over the >> summer, but I thought I'd explain it in case you were interested. >> > > May this paper be "Interprocedural symbolic range propagation"? > Or maybe just " symbolic range propagation"? > I haven't looked much at them, but looks like they might be relevant. Ok, the paper I was talking about is "ABCD: Eliminating Array Bounds Checks on Demand." They call the copies "pi nodes" (gotta have a fancy greek symbol!). I skimmed over the paper you mentioned and couldn't find anything like it, although anyways it doesn't seem to be too useful of a paper since what it's doing is way too fancy for our needs. > >> > >> > If we just want to make an SCCP inspired VRP pass then Connor has work >> > in >> > progress. >> > Finishing that, and loop unrolling, may not be enough work for GSoC? >> >> I'm not sure... on the one hand, there's enough here that it may take >> the entire summer for you to do. On the other hand, it's possible that >> you'll be able to finish it with enough time left over, and there are >> plenty of other things you'd be able to do. It depends on several >> factors, and no one has a crystal ball here. I'm not experienced >> enough with GSoC to be able to give you a recommendation, so it would >> be nice for someone more experienced to give their opinion. >> > > That's what I struggle with. While I've done some programming > projects before they have been quite small, with a large amount > of hardware that needed to be made, so that was the time consuming > part and not the programming. Setting a realistic plan is .... hard. Ok, it seems like we can agree to make the scope more limited. If things get difficult or there's something you can't figure out, remember that we're more than happy to help. Connor _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev