Chris Lattner wrote: [Up-front apology: If this thread continues, I may not be able to reply for several days, as I'll be travelling. I know it's not good form to start a discussion and then skip out just when it gets interesting, and I apologize in advance. If I'd been thinking better, I would have waited to send my initial mesasge until I returned.]
> I understand where you are coming from here, and agree with it. There > *is* value to being able to share things. > > However, there is a cost. I have never heard anything good about WHIRL > from a compilation time standpoint: the continuous lowering approach > does have its own cost. I haven't heard anything either way, but I take your comment to mean that you have heard that WHIRL is slow, and I'm happy to believe that. I'd agree that a data structure capable of representing more things almost certainly imposes some cost over one capable of representing fewer things! So, yes, there's definitely a cost/benefit tradeoff here. > What sort of form do you think it could/would reasonably take? [1] Why > hasn't it already happened? Wouldn't it make more sense to do this > work independently of the LTO work, as the LTO work *depends* on an > efficient IR and tree-ssa would benefit from it anyway? To be clear, I'm really not defending the LTO proposal. I stand by my statement that I don't know enough to have a preference! So, please don't read anything more into what's written here than just the plain words. I did think a little bit about what it would take to make Tree-SSA more efficient. I'm not claiming that they're aren't serious or even fatal flaws in those thoughts; this is just a brain dump. I also don't claim to have measurements showing how much of a difference these changes would make. I'm going to leave TYPE nodes out -- because they're shared with the front-ends, and so will live on anyhow. Similarly for the DECL nodes that correspond to global variables and global functions. So, that leaves EXPR nodes and (perhaps most importantly!) DECLs for local/temporary variables. The first thing to do would be to simplify the local variable DECLs; all we should really need from such a thing is its type (including its alignment, which, despite historical GCC practice is part of its type), its name (for debugging), its location relative to the stack frame (if we want to be able to do optimizations based on the location on the stack, which we may or may not want to do at this point), and whatever mark bits or scratch space are needed by optimization passes. The type and name are shared across all SSA instances of "the same" variable -- so we could use a pointer to a canonical copy of that information. (For user-visible variables, the canonical copy could be the VAR_DECL from the front end.) So, local variables would collapse 176 bytes (on my system) to something more like 32 bytes. The second thing would be to modify expression nodes. As I mentioned, I'd eliminate their TYPE fields. I'd also eliminate their TREE_COMPLEXITY fields, which are already nearly unused. There's no reason TREE_BLOCK should be needed in most expressions; it's only needed on nodes that correspond to lexical blocks. Those changes would eliminate a significant amount of the current size (64 bytes) for expressions. I also think it ought to be possible to eliminate the source_locus field; instead of putting it on every expression, insert line-notes into the statement stream, at least by the time we reach the optimizers. I'd also eliminate uses of TREE_LIST to link together the nodes in CALL_EXPRs; instead use a vector of operands hanging off the end of the CALL_EXPR corresponding to the number of arguments in the call. Similarly, I'd consider using a vector to store statements in a block, or rather than a linked list. Finally, if you wanted, you could flatten expressions so that each expression was, ala LLVM, an "instruction", and all operands were "leaves" rather than themselves trees; that's a subset of the current tree format. I'm not sure that step would in-and-of-itself save memory, but it would be more optimizer-friendly. In my opinion, the reason this work hasn't been done is that (a) it's not trivial, and (b) there was no sufficiently pressing need. GCC uses a lot of memory, and that's been an issue, but it hasn't been a "killer issue" in the sense that huge numbers of people who would otherwise have used GCC went somewhere else. Outside of work done by Apple and CodeSourcery, attacking that probably hasn't been (as far as I know?) funded by any companies. You're correct that LTO, were it to proceed, might make this a killer issue, and then we'd have to attack it -- and so that work should go on the cost list for LTO. You're also correct that some of this work would also benefit GCC as a whole, in that the front-ends would use less memory too, and so you're also correct that there is value in doing at least some of the work independently of LTO -- although there might not be sufficient pressure to make that happen soon. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304