> Ordered bytecode
>
> Bytecode should be structured in such a way that reading and executing
> it can be parallelised.
>

Are you suggesting a threaded VM?  I know that the core is being rewritten,
so it's a possibility.  If this is the case, then you'll want to reference
some of the other RFC's relating to threading.

It's an interesting idea, and it reminds me a lot of IA64 chained lists of
non-dependant instructions.
The only way I can imagine this being useful in a non-optimized fashion is
if raw async-io is performed.  Namely, read a minimal chunk, read the chunk,
figure out where the next chunk is (if it's not already cached), initiate an
async read on the newly determined chunk from disk, begin execution /
initialization of the current chunk until the end of the chunk is reached.
In bound async-io (through interrupts) appends itself to the input queue.

If that model could work (unfornately, doesn't work on all OS's), then it
would make maximal use of CPU time (especially on single-CPU's), without
dealing with race-conditions and other such evils inherent in MT
programming.

In general, however, I don't see bytecode reading as being the real
bottle-neck.  Take the following virtual charts:
Compile Time / Byte-code read Time / core execution time # comment
----------------------------------------------------------------
1 / 2 / x # faster to compile out-right.  Need to reimplement byte-code so
it does less.  parallel byte-code doesn't help any

4 / 2 / .5 # we actually get a performance boost here from the byte-code, so
it's not as critical that we shave off .4 total time (assuming that's even
possible)

x / 2 / 20  # compile / loading time is irrelavant compared to the whole
execution time.

4 / 2 / 2    # here is the main candidate for your parallel loader.  We
could possibly interleave execution with byte-loading, thus shaving total
time from 4s to possibly 2.1 or 3.

One variable is whether the extreme loading time is due to excessive
disk-size, or to execessive computation.  If it's due to disk size, then we
have more problems to deal with, and a complete redo of byte-code should be
done.  If it's due to computation (memory allocation /etc ), then single-CPU
implementations aren't going to gain anything.  In fact, the additional
context switching involved in MT for most systems, is going to provide more
overhead than you are likely to save in the operation.  The only real
winners are multi-CPU systems that are mostly idle or otherwise dedicated to
many start and stop operations (say in a parallel make that heavily utilizes
perl-code).

My suggestion (which should probably become it's own RFC), is that we not
store raw byte-code to a ".p[lm]c" file, or what-have-you.  But instead
store a non-ambiguous, token-based version of the original english code.

Essentially, there'd be 2 front ends to perl, the english-eval, and the
token-eval.  Everything else is the same.  The reason for this is that the
source code seems to be an order of magnitude smaller than the compiled
code, and there seems to be a great deal of compilation going on in order to
map that file to memory.

With the token'd approach, the .pmc file would be significantly smaller than
the original .pm file since there would be no white-space, and all operators
/ local-function-calls would be reduced to single byte operations.
Algorithms and optimizations could be added to the token file as an expanded
list of attributes that modify functions / variables (such as whether
something is an int, etc).  The goal would be that the code analysis and
optimization stages would be removed from the compilation.

I don't know what pascal or python use for object code, though I'm sure
they're effectively what perl's byte-code turns out to be (albeit
significantly smaller).  Java takes the whole approach of a true virtual
machine, and uses assembly-type-languge accordingly.  Perl doesn't really
seem to be able to take that route, but we'll see.

It's an interesting topic, but we're not done with it yet.

-Michael





Reply via email to