Kenneth Zadeck wrote:

This posting is a progress report of my task of encoding and decoding
the GIMPLE stream into LTO.   Included in this posting is a patch that
encodes functions and dumps the result to files.

[I'm sorry for not replying to this sooner. I've been on a plane or in a meeting virtually all of my waking hours since you wrote this...]

Exciting!

2) To have a discussion about the use of DWARF3.  I am now against the
use of DWARF3 for encoding the GIMPLE.

As the person who suggested this to you, I'm sorry it doesn't seem to meet your needs. I'll make a few more specific comments below, but, to echo what I said at the time: I think it was a good idea to try it, but if it's not the right tool for the job, then so be it. My opinion has always been that, for the bodies of functions, DWARF is just an encoding: if it's a decent encoding, then, it's nice that it's a standard; if it's a bad encoding, then, by all means, let's not use it!

I do think DWARF is a good choice for the non-executable information, for the same reasons I did initially:

(a) for debugging, we need to generate most of that information anyhow, so we're piggy-backing on existing code -- and not making object files bigger by encoding the same information twice in the case of "-O2 -g", which is the default way that many GNU applications are built.

(b) it's a standard, and we already have tools for reading DWARF, so it saves the trouble of coming up with a new encoding,

(c) because bugs in the DWARF emission may not result in problems at LTO, we'll be validating our LTO information every time we use GDB, and, similarly, improving the GDB experience every time we fix an LTO bug in this area.

I understand that some of these benefits do not apply to the executable code, and that, even to the extent they may apply, the tradeoffs are different. The comments I've made below about specific issues should therefore be considered as academic responses, not an attempt to argue the decision you have made.

3) To get some one to tell me what option we are going to add to the
compiler to tell it to write this information.

I think a reasonable spelling is probably "-flto". It should not be a -m option since it is not machine-specific. I don't think it should be a -O option either, since writing out LTO information isn't really supposed to affect optimization per se.

2) The code is, by design, fragile.  It takes nothing for granted.
Every case statement has gcc_unreachable as it's default case.

That's the same way I've approached the DWARF reading code, and for the same reason. I think that's exactly the right decision.

1) ABBREV TABLES ARE BAD FOR LTO.

However, this mechanism is only self descriptive if you do not extend
the set of tags. That is not an option for LTO.

Definitely true. When we talked on the phone, we talked about creating a tag corresponding to each GIMPLE tree code. However, you could also create a numeric attribute giving the GIMPLE tree code. If you did that, you might find that the abbreviation table became extremely small -- because almost all interior nodes would be DW_TAG_GIMPLE_EXPR nodes. The downside, of course, is that the storage required to store the nodes would be greater, as it would now contain the expression code (e.g., PLUS_EXPR), rather than having a DW_TAG_GIMPLE_PLUS_EXPR.

I strongly believe that for LTO to work, we are going to have to
implement some mechanism where the function bodies are loaded into the
compiler on demand (possibly kept in cache, but maybe not).

Agreed.

This
will be more cumbersome if we have to keep reloading each object
file's abbrev table just to tear apart a single function in that .o
file.  While the abbrev sections average slightly less than %2 of the
of the size of the GIMPLE encoding for an entire file, each abbrev table
averages about the same size as a single function.

Interesting datapoint.

(Implied, but not stated, in your mail is the fact that the abbreviation table cannot be indexed directly. If it could be, then you wouldn't have to read the entire abbreviation table for each function; you would just read the referenced abbreviations. Because the abbreviation table records are of variable length, it is indeed true that you cannot make random accesses to the table. So, this paragraph is just fleshing out your argument.)

I think the conclusion that you reach (that the size of the tables is a problem) depends on how you expect the compiler to process functions at link-time. My expectation was that you would form a global control-flow graph for the entire program (by reading CFG data encoded in each .o file), eliminate unreachable functions, and then inline/optimize functions one-at-a-time.

If you sort the function-reading so that you prefer to read functions from the same object file in order, then I would expect that you would considerably reduce the impact of reading the abbreviation tables. I'm making the assumption that it f calls N functions, then they probably come from < N object files. I have no data to back up that assumption.

(There is nothing that says that you can only have one abbreviation table for all functions. You can equally well have one abbreviation table per function. In that mode, you trade space (more abbreviation tables, and the same abbreviation appearing in multiple tables) against the fact that you now only need to read the abbreviation tables you need. I'm not claiming this is a good idea.)

I don't find this particular argument (that the abbreviation tables will double file I/O) very convincing. I don't think it's likely that the problem we're going to have with LTO is running out of *virtual* memory, especially as 64-bit hardware becomes nearly universal. The problem is going to be running out of physical memory, and thereby paging like crazy, running out of D-cache. So, I'd assume you'd just read the tables as-needed, and never both discarding them. As long as there is reasonable locality of reference to abbreviation tables (i.e., you can arrange to hit object files in groups), then the cost here doesn't seem like it would be very big.

2) I PROMISED TO USE THE DWARF3 STACK MACHINE AND I DID NOT.

I never imagined you doing this; as per above, I always expected that you would use DWARF tags for the expression nodes. I agree that the stack-machine is ill-suited.

3) THERE IS NO COMPRESSION IN DWARF3.

In 1 file per mode, zlib -9 compression is almost 6:1.  In 1 function
per mode, zlib -9 compression averages about 3:1.

In my opinion, if you considered DWARF + zlib to be satisfactory, then I think that would be fine. For LTO, we're allowed to do whatever we want. I feel the same about your confession that you invented a new record form; if DWARF + extensions is a suitable format, that's fine. In other words, in principle, using a somewhat non-standard variant of DWARF for LTO doesn't seem evil to me -- if that met our needs.

2) LOCAL DECLARATIONS
Mark was going to do all of the types and all of the declarations.
His plan was to use the existing DWARF3 and enhance it where it was
necessary eventually replacing the GCC type trees with direct
references to the DWARF3 symbol table.

> The types and global variables are likely OK, or at least Mark
should be able to add any missing info.

Yes, I agree that if you're not using DWARF for the function bodies, you probably want your own encoding for the local variables.

We will also need to add other structures to the object files.  We
will need to have a version of the cgraph, in a separate section, that
is in a form so that all of the cgraphs from all of the object files
can be read a processed without looking at the actual function bodies.

Definitely.

function only calls other pure functions and so on...  If we simply
label the call graph with the locally pure and locally constant
attributes, the closure phase can be done for all of the functions in
the LTO compilation without having to reprocess their bodies.
Virtually all inteprocedural optimizations, including aliasing, can
and must be structured this way.

You could also label the function declarations. There's a decision to make here as to whether the nodes of the call graph are the same as the DWARF nodes for the functions themselves, or are instead separate entities (which, of course, point to those DWARF nodes). It would be nice, a priori, to have this information in the DWARF nodes because it would allow the debugger to show this information to users and to view it via DWARF readers. However, I can also imagine that it needs to be in the separate call graph.

I have not done this because I do not rule the earth.  That was not
what I was assigned to do, and I agreed that DWARF3 sounded like a
reasonable way to go.  Now that I understand the details of DWARF3, I
have changed my mind about the correct direction.  Now is the time to
make that change before there is a lot of infrastructure built that
assumes the DWARF3 encoding.

I think it's great that you're asking for feedback. My only feedback is that you may not need to make this decision *now*. We could conceivably wire this up, work on the other things (CFG, etc.) and return to the encoding issue. I'm vaguely in favor of that plan, just in that I'm eager to actually see us make something work. On the other hand, building up DWARF reading for this code only to chuck it later does seem wasteful. But, the DWARF reader is already there; it's mostly filling in some blanks. But, filling in blanks is always harder than one expects. So, I think this should really be your call: rework the format now, or later, as you think best.

Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713

Reply via email to