On 8/30/06, Mark Mitchell <[EMAIL PROTECTED]> wrote: ...
I guess my overriding concern is that we're focusing heavily on the data format here (DWARF? Something else? Memory-mappable? What compression scheme?) and we may not have enough data. I guess we just have to pick something and run with it. I think we should try to keep that code as as separate as possible so that we can recover easily if whatever we pick turns out to be (another) bad choice. :-)
At the risk of stating the obvious and also repeating myself, please allow me give my thought on this issue. I think we should take even a step further than "try to keep the code as separate". We should try to come up with a set of procedural and datastructural interface for the input/output of the program structure, and try to *completely* separate the optimization/datastructure cleanup work from the encoding/decoding. Beside the basic requirement of being able to pass through enough information to produce valid program, I think there is a critical requirement to implement inter-module/inter-procedural optimization efficiently - that the I/O interface allows efficient handling of iterating through module/procedure-level information without reading each and every module/procedure bodies (as Ken mentioned). There are certain amount of information per object/procedure that are accessed during different optimization and with sufficiently different pattern - e.g. type tree is naturally an object-level information that we may want to go through for each and every object file, without read all function bodies, and other function level information such as caller/callee information would be useful without the function body. We'll need to identify such information (in other words, the requirement of the interprocedural optimization/analysis) so that the new interface would provide ways to walk through them without loading the entire function bodies - even with large address space, if the data is scattered everywhere, it becomes extremely inefficient on modern machines to go through them, so it's actually more important to identify what logical information that we want to access during various interprocedural optimizations and the I/O interface needs to handle them efficiently. This requirement should dictate how we encode/layout the data into the disk, before anything else. Also how the information is presented to the actual inter-module optimization/analysis. Also, part of defining the interface would involve restricting the existing structures (e.g. GIMPLE) in possibly more limited form than what's currently allowed. By virtue of having an interface that separates the encoding/decoding from the rest of the compilation, we can throw away and recompute certain information (e.g. often certain control flow graph can be recovered, hence does not need to be encoded) but those details can be worked out as the implementation of the IO interface gets more in shape. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"