G'day all.

I have a suggestion for allowing parrot implementations to execute
code more efficiently.  Add an "instruction" or other annotation which
denotes what registers are "live" at some point in the code.  The
intended usage is that compilers which wish to generate this
information generate such an annotation at the end of every basic block
stating which registers hold data that may be reused in a subsequent
basic block.

Motivation

Many compiler optimisations require "data flow" information.  However
I'm not talking about "high level" optimisation such as those which
restructure the control flow graph.  I'm talking about the kind of
information that you need to make abstract machine code run on stock
hardware (e.g. instruction scheduling, register allocation etc).

Such information is "global", that is, it requires examination of
the whole compilation unit ("subroutine", "function", whatever) to
determine.  However, optimising _implementations_ (as opposed to
optimising _compilers_) generally don't need to perform global
optimisations, only local ones (i.e. optimisations within a basic
block).

Many compilers generate this information during the normal process
of compilation, but subsequently throw it away when the output
listing is generated.  This seems wasteful if consumers (e.g. JIT
compilers) could effectively use that information.

Why this information and not something else?

   - It's simple to recover simple information.  Most information that
     a typical JIT compiler needs (e.g. register lifetimes for register
     allocation, gen/kill sets for instruction scheduling) is easily
     and quickly recoverable from only the live value information.

   - It's straightforward to recover complex information.  If you need,
     say, static single assignment form or global def-use chains, you
     can compute them using only the live value information and the
     control flow graph.  You still don't need expensive global
     analysis.

   - It's safe.  While dataflow information is expensive to compute,
     it's very cheap (one pass over the compilation unit) to check that
     you have a "correct" solution, where "correct" here means
     "conservative".  If you need to know if the information is too
     conservative, you can do that in only one more pass.  So it's easy
     to tell if the information can be trusted or not.

   - It's simple.  You don't need to store more information than a
     simple optimising backend needs.  You also don't need to work out
     how to store complex data structures in the bytecode.

   - It's compact.  Parrot has 32 registers of each type.  If you
     represent the live value sets as bit vectors, you need only
     one 32-bit word for each type plus opcode per basic block.
     And, of course, supplying the information is optional. :-)

Note that the argument for live value information is weakened in JIT
compilers for the JVM because the JVM is a stack machine.  Assuming
that all values on the stack are live is actually a pretty good (if
conservative) approximation to live value information given the way
that stack code is typically compiled.

What do you all think?  Leon mentioned control flow graphs a few days
ago, but I think that live value information is more generally useful
to optimising interpreters and JIT compilers.

Cheers,
Andrew Bromage

Reply via email to