On 2012-12-18 01:13, H. S. Teoh wrote:

The problem is not so much the structure preprocessor -> compiler ->
assembler -> linker; the problem is that these logical stages have been
arbitrarily assigned to individual processes residing in their own
address space, communicating via files (or pipes, whatever it may be).

The fact that they are separate processes is in itself not that big of a
problem, but the fact that they reside in their own address space is a
big problem, because you cannot pass any information down the chain
except through rudimentary OS interfaces like files and pipes. Even that
wouldn't have been so bad, if it weren't for the fact that user
interface (in the form of text input / object file format) has also been
conflated with program interface (the compiler has to produce the input
to the assembler, in *text*, and the assembler has to produce object
files that do not encode any direct dependency information because
that's the standard file format the linker expects).

Now consider if we keep the same stages, but each stage is not a
separate program but a *library*. The code then might look, in greatly
simplified form, something like this:

        import libdmd.compiler;
        import libdmd.assembler;
        import libdmd.linker;

        void main(string[] args) {
                // typeof(asmCode) is some arbitrarily complex data
                // structure encoding assembly code, inter-module
                // dependencies, etc.
                auto asmCode = compiler.lex(args)
                        .parse()
                        .optimize()
                        .codegen();

                // Note: no stupid redundant convert to string, parse,
                // convert back to internal representation.
                auto objectCode = assembler.assemble(asmCode);

                // Note: linker has direct access to dependency info,
                // etc., carried over from asmCode -> objectCode.
                auto executable = linker.link(objectCode);
                File output(outfile, "w");
                executable.generate(output);
        }

Note that the types asmCode, objectCode, executable, are arbitrarily
complex, and may contain lazy-evaluated data structure, references to
on-disk temporary storage (for large projects you can't hold everything
in RAM), etc.. Dependency information in asmCode is propagated to
objectCode, as necessary. The linker has full access to all info the
compiler has access to, and can perform inter-module optimization, etc.,
by accessing information available to the *compiler* front-end, not just
some crippled object file format.

The root of the current nonsense is that perfectly-fine data structures
are arbitrarily required to be flattened into some kind of intermediate
form, written to some file (or sent down some pipe), often with loss of
information, then read from the other end, interpreted, and
reconstituted into other data structures (with incomplete info), then
processed. In many cases, information that didn't make it through the
channel has to be reconstructed (often imperfectly), and then used. Most
of these steps are redundant. If the compiler data structures were
already directly available in the first place, none of this baroque
dance is necessary.

I couldn't agree more.

--
/Jacob Carlborg

Reply via email to