Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

Mark Mitchell Wed, 30 Aug 2006 15:36:33 -0700

Kenneth Zadeck wrote:

This posting is a progress report of my task of encoding and decoding
the GIMPLE stream into LTO.   Included in this posting is a patch that

encodes functions and dumps the result to files.

[I'm sorry for not replying to this sooner. I've been on a plane or ina meeting virtually all of my waking hours since you wrote this...]


Exciting!

2) To have a discussion about the use of DWARF3.  I am now against the
use of DWARF3 for encoding the GIMPLE.

As the person who suggested this to you, I'm sorry it doesn't seem tomeet your needs. I'll make a few more specific comments below, but, toecho what I said at the time: I think it was a good idea to try it, butif it's not the right tool for the job, then so be it. My opinion hasalways been that, for the bodies of functions, DWARF is just anencoding: if it's a decent encoding, then, it's nice that it's astandard; if it's a bad encoding, then, by all means, let's not use it!

I do think DWARF is a good choice for the non-executable information,for the same reasons I did initially:

(a) for debugging, we need to generate most of that information anyhow,so we're piggy-backing on existing code -- and not making object filesbigger by encoding the same information twice in the case of "-O2 -g",which is the default way that many GNU applications are built.

(b) it's a standard, and we already have tools for reading DWARF, so itsaves the trouble of coming up with a new encoding,

(c) because bugs in the DWARF emission may not result in problems atLTO, we'll be validating our LTO information every time we use GDB, and,similarly, improving the GDB experience every time we fix an LTO bug inthis area.

I understand that some of these benefits do not apply to the executablecode, and that, even to the extent they may apply, the tradeoffs aredifferent. The comments I've made below about specific issues shouldtherefore be considered as academic responses, not an attempt to arguethe decision you have made.

3) To get some one to tell me what option we are going to add to the
compiler to tell it to write this information.

I think a reasonable spelling is probably "-flto". It should not be a-m option since it is not machine-specific. I don't think it should bea -O option either, since writing out LTO information isn't reallysupposed to affect optimization per se.

2) The code is, by design, fragile.  It takes nothing for granted.
Every case statement has gcc_unreachable as it's default case.

That's the same way I've approached the DWARF reading code, and for thesame reason. I think that's exactly the right decision.

1) ABBREV TABLES ARE BAD FOR LTO.

However, this mechanism is only self descriptive if you do not extend
the set of tags. That is not an option for LTO.

Definitely true. When we talked on the phone, we talked about creatinga tag corresponding to each GIMPLE tree code. However, you could alsocreate a numeric attribute giving the GIMPLE tree code. If you didthat, you might find that the abbreviation table became extremely small-- because almost all interior nodes would be DW_TAG_GIMPLE_EXPR nodes.The downside, of course, is that the storage required to store thenodes would be greater, as it would now contain the expression code(e.g., PLUS_EXPR), rather than having a DW_TAG_GIMPLE_PLUS_EXPR.

I strongly believe that for LTO to work, we are going to have to
implement some mechanism where the function bodies are loaded into the

compiler on demand (possibly kept in cache, but maybe not).


Agreed.

This
will be more cumbersome if we have to keep reloading each object
file's abbrev table just to tear apart a single function in that .o
file.  While the abbrev sections average slightly less than %2 of the
of the size of the GIMPLE encoding for an entire file, each abbrev table
averages about the same size as a single function.


Interesting datapoint.

(Implied, but not stated, in your mail is the fact that the abbreviationtable cannot be indexed directly. If it could be, then you wouldn'thave to read the entire abbreviation table for each function; you wouldjust read the referenced abbreviations. Because the abbreviation tablerecords are of variable length, it is indeed true that you cannot makerandom accesses to the table. So, this paragraph is just fleshing outyour argument.)

I think the conclusion that you reach (that the size of the tables is aproblem) depends on how you expect the compiler to process functions atlink-time. My expectation was that you would form a global control-flowgraph for the entire program (by reading CFG data encoded in each .ofile), eliminate unreachable functions, and then inline/optimizefunctions one-at-a-time.

If you sort the function-reading so that you prefer to read functionsfrom the same object file in order, then I would expect that you wouldconsiderably reduce the impact of reading the abbreviation tables. I'mmaking the assumption that it f calls N functions, then they probablycome from < N object files. I have no data to back up that assumption.

(There is nothing that says that you can only have one abbreviationtable for all functions. You can equally well have one abbreviationtable per function. In that mode, you trade space (more abbreviationtables, and the same abbreviation appearing in multiple tables) againstthe fact that you now only need to read the abbreviation tables youneed. I'm not claiming this is a good idea.)

I don't find this particular argument (that the abbreviation tables willdouble file I/O) very convincing. I don't think it's likely that theproblem we're going to have with LTO is running out of *virtual* memory,especially as 64-bit hardware becomes nearly universal. The problem isgoing to be running out of physical memory, and thereby paging likecrazy, running out of D-cache. So, I'd assume you'd just read thetables as-needed, and never both discarding them. As long as there isreasonable locality of reference to abbreviation tables (i.e., you canarrange to hit object files in groups), then the cost here doesn't seemlike it would be very big.

2) I PROMISED TO USE THE DWARF3 STACK MACHINE AND I DID NOT.

I never imagined you doing this; as per above, I always expected thatyou would use DWARF tags for the expression nodes. I agree that thestack-machine is ill-suited.

3) THERE IS NO COMPRESSION IN DWARF3.

In 1 file per mode, zlib -9 compression is almost 6:1.  In 1 function
per mode, zlib -9 compression averages about 3:1.

In my opinion, if you considered DWARF + zlib to be satisfactory, then Ithink that would be fine. For LTO, we're allowed to do whatever wewant. I feel the same about your confession that you invented a newrecord form; if DWARF + extensions is a suitable format, that's fine.In other words, in principle, using a somewhat non-standard variant ofDWARF for LTO doesn't seem evil to me -- if that met our needs.

2) LOCAL DECLARATIONS

Mark was going to do all of the types and all of the declarations.
His plan was to use the existing DWARF3 and enhance it where it was
necessary eventually replacing the GCC type trees with direct

references to the DWARF3 symbol table.


> The types and global variables are likely OK, or at least Mark

should be able to add any missing info.

Yes, I agree that if you're not using DWARF for the function bodies, youprobably want your own encoding for the local variables.

We will also need to add other structures to the object files.  We
will need to have a version of the cgraph, in a separate section, that
is in a form so that all of the cgraphs from all of the object files
can be read a processed without looking at the actual function bodies.


Definitely.

function only calls other pure functions and so on...  If we simply
label the call graph with the locally pure and locally constant
attributes, the closure phase can be done for all of the functions in
the LTO compilation without having to reprocess their bodies.
Virtually all inteprocedural optimizations, including aliasing, can
and must be structured this way.

You could also label the function declarations. There's a decision tomake here as to whether the nodes of the call graph are the same as theDWARF nodes for the functions themselves, or are instead separateentities (which, of course, point to those DWARF nodes). It would benice, a priori, to have this information in the DWARF nodes because itwould allow the debugger to show this information to users and to viewit via DWARF readers. However, I can also imagine that it needs to bein the separate call graph.

I have not done this because I do not rule the earth.  That was not
what I was assigned to do, and I agreed that DWARF3 sounded like a
reasonable way to go.  Now that I understand the details of DWARF3, I
have changed my mind about the correct direction.  Now is the time to
make that change before there is a lot of infrastructure built that
assumes the DWARF3 encoding.

I think it's great that you're asking for feedback. My only feedback isthat you may not need to make this decision *now*. We could conceivablywire this up, work on the other things (CFG, etc.) and return to theencoding issue. I'm vaguely in favor of that plan, just in that I'meager to actually see us make something work. On the other hand,building up DWARF reading for this code only to chuck it later does seemwasteful. But, the DWARF reader is already there; it's mostly fillingin some blanks. But, filling in blanks is always harder than oneexpects. So, I think this should really be your call: rework the formatnow, or later, as you think best.


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713

Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

Reply via email to