G'day all.
I have a suggestion for allowing parrot implementations to execute
code more efficiently. Add an "instruction" or other annotation which
denotes what registers are "live" at some point in the code. The
intended usage is that compilers which wish to generate this
information generate such an annotation at the end of every basic block
stating which registers hold data that may be reused in a subsequent
basic block.
Motivation
Many compiler optimisations require "data flow" information. However
I'm not talking about "high level" optimisation such as those which
restructure the control flow graph. I'm talking about the kind of
information that you need to make abstract machine code run on stock
hardware (e.g. instruction scheduling, register allocation etc).
Such information is "global", that is, it requires examination of
the whole compilation unit ("subroutine", "function", whatever) to
determine. However, optimising _implementations_ (as opposed to
optimising _compilers_) generally don't need to perform global
optimisations, only local ones (i.e. optimisations within a basic
block).
Many compilers generate this information during the normal process
of compilation, but subsequently throw it away when the output
listing is generated. This seems wasteful if consumers (e.g. JIT
compilers) could effectively use that information.
Why this information and not something else?
- It's simple to recover simple information. Most information that
a typical JIT compiler needs (e.g. register lifetimes for register
allocation, gen/kill sets for instruction scheduling) is easily
and quickly recoverable from only the live value information.
- It's straightforward to recover complex information. If you need,
say, static single assignment form or global def-use chains, you
can compute them using only the live value information and the
control flow graph. You still don't need expensive global
analysis.
- It's safe. While dataflow information is expensive to compute,
it's very cheap (one pass over the compilation unit) to check that
you have a "correct" solution, where "correct" here means
"conservative". If you need to know if the information is too
conservative, you can do that in only one more pass. So it's easy
to tell if the information can be trusted or not.
- It's simple. You don't need to store more information than a
simple optimising backend needs. You also don't need to work out
how to store complex data structures in the bytecode.
- It's compact. Parrot has 32 registers of each type. If you
represent the live value sets as bit vectors, you need only
one 32-bit word for each type plus opcode per basic block.
And, of course, supplying the information is optional. :-)
Note that the argument for live value information is weakened in JIT
compilers for the JVM because the JVM is a stack machine. Assuming
that all values on the stack are live is actually a pretty good (if
conservative) approximation to live value information given the way
that stack code is typically compiled.
What do you all think? Leon mentioned control flow graphs a few days
ago, but I think that live value information is more generally useful
to optimising interpreters and JIT compilers.
Cheers,
Andrew Bromage