A quick sketch of the interpreter

Dan Sugalski Thu, 14 Jun 2001 14:34:02 -0700
Okay, here's a quick sketch of what I'm thinking of for the core
architecture of the interpreter. It's not PDD'd yet, as I fully expect
(hope, even) that the sillier parts of it will get ripped to shreds:

=head1 Stacks

The interpreter has multiple stacks, and they're all segmented. The
push/pop opcodes handle allocating new stack segments when
necessary. The segmentation's generally transparent, since very little
besides the push/pop opcodes needs to know anything about the actual
stack architecture.

The stacks are at least:

=over 4

=item Temp stack

for squirreling away the contents of individual registers

=item Register stack

For pushing the entire register file at once. There are four sets, one
for each register type.

=item state stack

For the interpreter's internal state

=back

=head1 Registers

We have four sets. Each set has 64 members

=over 4

=item PMC pointer

These registers point to PMCs

=item stringish pointer

These registers hold pointers to string structures and things like
it. (bigint and bigfloat structs are the same, more or less)

=item integers

Integers. Mostly for temp work and the regex engine ops.

=item floats

Floats. (Duh! :) Mostly for temp math work. Potentially unused.

=back

=head1 Opcodes

Opcodes are all dispatched indirectly via an opcode function
table. Each segment of bytecode (a segment roughly corresponding to a
compilation unit--a precompiled module would be in its own segment,
for example) has its own opcode function table.

Opcodes are all responsible for returning a pointer to the next opcode
to execute. The interpreter can figure out the offset, but we
won't--it's faster for the opcode functions to do the math. (No table
lookups that way)

=head1 The opcode loop

This is a tight loop. All it does is call an opcode function, get back
a pointer to the next opcode to execute, and check the event dispatch
flag. Lather, rinse, repeat ad infinitum.

=head1 Bytecode

Bytecode is both the on-disk representation of a perl program and the
in-memory representation of a perl program.

The bytecode comes in three sections. The fixup and constants sections
have absolute machine addresses in them (after the loader is finished
with them) while the opcode section has none. This will allow us to
mmap precompiled code and share at least some of it amongst multiple
processes.

=over 4

=item fixup section

This section has pointers to various things that we need pointers
to. On-disk the pointers are zeroed, and the loader will fix them up
properly.

=item constant section

The constants section contains all the PMCs for the constants used in
the code. The loader will patch up the various pointer bits as needed
when the code is loaded in.

=item opcode section

This section contains the actual executable code (if stuff fed to an
interpreter can be considered executable) for a perl program. It
should be completely position independent, referring only to variables
that are dynamically allocated, referred to by name, or in the
constant section.

=back


                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk
A quick sketch of the interpreter

Reply via email to