lto gimple types and debug info

Kenneth Zadeck Thu, 24 Jul 2008 09:26:33 -0700

I have been working on a design for gimple types.  The overall plan is
to gimplify the types, as we gimplify the executable code.   Then we
can release the front end types and recover the space.


There are difficulties with this plan and most of them have to do with
the generation of debug information.  The first problem arises
because several passes in the existing compiler, and possibly more
passes in the future, modify the types.  The existing passes that
modify the types are the structure peeling and reorg passes as well as
the matrix reorg pass.

In talking to people, there appear to be 4 ways to go here.

  1) Screw the debugging.  The motivation behind this option is that
  not only is it easy, but the reality is that we have changed the
  program so much that even if we did "fix up the types" so that the
  matched the generated program, that the code is so different from
  what the user expected that (s)he could not even debug if anyway.
  If the compiler peels field x out of structure y, the user cannot
  look at y.x.  I do not know if the debuggers and the dwarf could be
  taught to hide this, they currently do not.  (Note that teaching
  the debuggers to do a good job with this would be a good topic for
  a paper.)

  2) Generate the debugging for the types early, and then add an
  interface that would parse and regenerate the debugging info with
  the changes.  It is quite likely that this would lock gcc
  completely into dwarf, but that appears to only be a problem for
  AIX at this point, so that may not be that much of a problem.

  3) Generate the debugging for the types late.  The problem here is
  that we want the gimple type system to be stripped of the front end
  specific information, so any front end specific info that is only
  necessary for the debugging, will both bulk up the gimple type
  system and make it less language independent.

  4) Do mostly (3) but generate the debugging info for the fe
  dependent things like typedef and templates in the front ends.


Richi has been an advocate for (1).  I actually think that if these
optimizations become important, people will add the smarts to the
dwarf and the debuggers to hide this from the programmer so I
personally want to avoid designing this so that we cannot even get
this right if we want to.  However, I agree that the problems are hard
and are not going to be solved soon.

Diego suggested (2).   I do not know if he was just throwing it out or
if he is really an advocate of this.   This will be a lot of work and
will be costly to do in the lto1 pass of whopr.

I am leaning to (4).  According to richi and jason, we could (and may
be now) generate the typedefs so that they just point the underlying
type.  This means that the front ends could generate the typedefs, and
we would not need to carry any of that into the gimple type system
except for some dwarf reference.  Then all that we would need is to
add a field to decls that contains the reference of the typedef so that
when the debugging for the decl is generated and the type of the decl
was a typedef, we can generate the debugging info pointing to the
typedef rather than the type.

Richi also advocated not having the type names at the gimple level.
However the type names are necessary for doing type merging for lto
and are important for alias analysis in languages with stronger type
systems than C or C++, so I believe that the type names are going to
be a necessary part of the gimple type system.

In this same vein, I am very interested in using the gimple type
system as a way to start moving gcc from being a C compiler that
accommodates other languages to a compiler that handles different
languages on an equal footing.  The freedom that C and C++ "enjoy" to
basically take a pointer to one type and convert it to a pointer to
almost any other type is not something that is allowed by the other
languages that gcc supports.  Fortran seems to require (and this
should be confirmed by a fortran expert) a very disciplined use of
pointers, even restricting pointers to only being able to point to
variables that are declared to be targets of pointers.  In java it is
also impossible to transform a pointer into another type except in
very restrictive ways.   I would expect that we should be able to
distinguish types as being only advisory as in C and C++ from types
that represent a contract between a programmer and the compiler.
I do not know how Objc and Ada fit into this mesh, it would be nice to
know exactly what those languages could utilize.

What doing (4) does mean is that many of the attributes like "private"
are going to also have to be attached to the gimple types.  I do not
see this as bad as long as it is not too bulky, because again, we
should be able to utilize some of this information in the middle ends

if it is packaged correctly.


As a general principal, the debugging information and lto do not play
well together, especially where whopr comes into play.  The problem is
that whopr expects that the lto1 driver to be able to "quickly"
repackage (cherry pick) all of the information in the incoming .o
files into outgoing .o files that are to be processed by the elements
in a compile farm.  For debugging information that is generated by the
middle end, this is not an issue because that debugging information
will be generated by the compilations that happen at the farm.

But the debugging info that is generated by the front ends is
currently not packaged in a way that is consistent with this cherry
picking approach.  The front end info is put into a fixed number of
sections (one for the strings, one for the types ...).  There is no
way to cherry pick this information without parsing it all out and
then regenerating it.  At the very least, there needs to be a full set
of sections for each function, rather than one set per compilation
unit.  This will allow the local types and other debug info to easily
cherry picked.  The cherry picker may then be able to change the
attributes and names on these sections so that the subsequent link
just concatenates them so that the debugger does not see any of these
games.

The rest of the debug info generated by the front end could be more
problematic.  At the very least, the debug info generated by the front
ends needs to be put in a separate section from the debug info
generated by the middle or back ends since the info generated by the
middle and back ends should be discarded by lto1.

However, I do not know how to cherry pick the front end info (beyond
getting the function specific stuff to the right place) without
parsing all of it.  Ideas are welcome.

I do welcome comments on this.   These are issues that are going to
have to be addressed to get lto off the ground, and the more issues

that are brought to the surface before we start the better.

Kenny

lto gimple types and debug info

Reply via email to