Re: LTO, LLVM, etc.

Mark Mitchell Mon, 05 Dec 2005 18:39:35 -0800

Chris Lattner wrote:

[Up-front apology: If this thread continues, I may not be able to reply
for several days, as I'll be travelling.  I know it's not good form to
start a discussion and then skip out just when it gets interesting, and
I apologize in advance.  If I'd been thinking better, I would have
waited to send my initial mesasge until I returned.]


> I understand where you are coming from here, and agree with it.   There
> *is* value to being able to share things.
> 
> However, there is a cost.  I have never heard anything good about  WHIRL
> from a compilation time standpoint: the continuous lowering  approach
> does have its own cost.

I haven't heard anything either way, but I take your comment to mean
that you have heard that WHIRL is slow, and I'm happy to believe that.
I'd agree that a data structure capable of representing more things
almost certainly imposes some cost over one capable of representing
fewer things!  So, yes, there's definitely a cost/benefit tradeoff here.

> What sort of form do you think it could/would reasonably take? [1]   Why
> hasn't it already happened?  Wouldn't it make more sense to do  this
> work independently of the LTO work, as the LTO work *depends* on  an
> efficient IR and tree-ssa would benefit from it anyway?

To be clear, I'm really not defending the LTO proposal.  I stand by my
statement that I don't know enough to have a preference!  So, please
don't read anything more into what's written here than just the plain words.

I did think a little bit about what it would take to make Tree-SSA more
efficient.  I'm not claiming that they're aren't serious or even fatal
flaws in those thoughts; this is just a brain dump.  I also don't claim
to have measurements showing how much of a difference these changes
would make.

I'm going to leave TYPE nodes out -- because they're shared with the
front-ends, and so will live on anyhow.  Similarly for the DECL nodes
that correspond to global variables and global functions.  So, that
leaves EXPR nodes and (perhaps most importantly!) DECLs for
local/temporary variables.

The first thing to do would be to simplify the local variable DECLs; all
we should really need from such a thing is its type (including its
alignment, which, despite historical GCC practice is part of its type),
its name (for debugging), its location relative to the stack frame (if
we want to be able to do optimizations based on the location on the
stack, which we may or may not want to do at this point), and whatever
mark bits or scratch space are needed by optimization passes.  The type
and name are shared across all SSA instances of "the same" variable --
so we could use a pointer to a canonical copy of that information.  (For
user-visible variables, the canonical copy could be the VAR_DECL from
the front end.)  So, local variables would collapse 176 bytes (on my
system) to something more like 32 bytes.

The second thing would be to modify expression nodes.  As I mentioned,
I'd eliminate their TYPE fields.  I'd also eliminate their
TREE_COMPLEXITY fields, which are already nearly unused.  There's no
reason TREE_BLOCK should be needed in most expressions; it's only needed
on nodes that correspond to lexical blocks.  Those changes would
eliminate a significant amount of the current size (64 bytes) for
expressions.  I also think it ought to be possible to eliminate the
source_locus field; instead of putting it on every expression, insert
line-notes into the statement stream, at least by the time we reach the
optimizers.  I'd also eliminate uses of TREE_LIST to link together the
nodes in CALL_EXPRs; instead use a vector of operands hanging off the
end of the CALL_EXPR corresponding to the number of arguments in the
call.  Similarly, I'd consider using a vector to store statements in a
block, or rather than a linked list.

Finally, if you wanted, you could flatten expressions so that each
expression was, ala LLVM, an "instruction", and all operands were
"leaves" rather than themselves trees; that's a subset of the current
tree format.  I'm not sure that step would in-and-of-itself save memory,
but it would be more optimizer-friendly.

In my opinion, the reason this work hasn't been done is that (a) it's
not trivial, and (b) there was no sufficiently pressing need.  GCC uses
a lot of memory, and that's been an issue, but it hasn't been a "killer
issue" in the sense that huge numbers of people who would otherwise have
used GCC went somewhere else.  Outside of work done by Apple and
CodeSourcery, attacking that probably hasn't been (as far as I know?)
funded by any companies.

You're correct that LTO, were it to proceed, might make this a killer
issue, and then we'd have to attack it -- and so that work should go on
the cost list for LTO.  You're also correct that some of this work would
also benefit GCC as a whole, in that the front-ends would use less
memory too, and so you're also correct that there is value in doing at
least some of the work independently of LTO -- although there might not
be sufficient pressure to make that happen soon.

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: LTO, LLVM, etc.

Reply via email to