On 8/30/06, Mark Mitchell <[EMAIL PROTECTED]> wrote:
...
I guess my overriding concern is that we're focusing heavily on the data
format here (DWARF?  Something else?  Memory-mappable?  What compression
scheme?) and we may not have enough data.  I guess we just have to pick
something and run with it.  I think we should try to keep that code as
as separate as possible so that we can recover easily if whatever we
pick turns out to be (another) bad choice. :-)

At the risk of stating the obvious and also repeating myself,
please allow me give my thought on this issue.

I think we should take even a step further than "try to keep the code
as separate".
We should try to come up with a set of
procedural and datastructural interface for the input/output
of the program structure,
and try  to *completely* separate the optimization/datastructure cleanup work
from the encoding/decoding.

Beside the basic requirement of being able to pass through
enough information to produce valid program,
I think there is a critical requirement
to implement inter-module/inter-procedural optimization efficiently
- that the I/O interface allows efficient handling of
iterating through module/procedure-level information
without reading each and every module/procedure bodies
(as Ken mentioned).

There are certain amount of information per object/procedure that
are accessed during different optimization and with sufficiently
different pattern -
e.g. type tree is naturally an object-level information
that we may want to go through for each and every object file,
without read all function bodies,
and other function level information such as caller/callee information
would be useful without the function body.

We'll need to identify such information (in other words,
the requirement of the interprocedural optimization/analysis)
so that the new interface would provide ways to walk through them
without loading the entire function bodies - even with large address space,
if the data is scattered everywhere, it becomes extremely inefficient
on modern machines to go through them,
so it's actually more important to identify what logical information
that we want to access during various interprocedural optimizations
and the I/O interface needs to handle them efficiently.

This requirement should dictate how we encode/layout the data
into the disk, before anything else. Also how the information is
presented to the actual inter-module optimization/analysis.

Also, part of defining the interface would involve restricting
the existing structures (e.g. GIMPLE) in possibly more limited form
than what's currently allowed. By virtue of having an interface
that separates the encoding/decoding from the rest of the compilation,
we can throw away and recompute certain information
(e.g. often certain control flow graph can be recovered,
hence does not need to be encoded)
but those details can be worked out as the implementation of the IO interface
gets more in shape.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";

Reply via email to