Below is a pod document describing some IMHO worthwhile changes. I hope I didn't miss some issues that could inhibit the implementation.

Comments welcome,
leo
=head1 1. Proposal for a new PMC layout and more

=head2 1.1. Current state - PMC size and structure

PMCs are using too much memory (5 words for an Integer PMC, 11 + n
words plus two indirection for an object with n attributes). The
reduction of IIRC 9 words to the current 5 words almost doubled
execution speed for not too small amounts of allocated PMCs.

OTOH PMCS are rather rigid structures. There is only a fixed amount of
data fields available. If the data the PMC should hold won't fit,
custom malloced extensions (or Buffers) have to be added to the PMC,
which take up more memory and (at least) one more indirection to get
at these data.

=head2 1.2. Current state - STRING buffer headers

STRING buffer headers are kept in separate registers, have their own
opcodes and PMCs have vtable variants dealing with STRINGs. This is
using up a lot of memory and resources. There are currently 480
opcodes dealing with STRINGs and 46 vtable/MMD methods that have STRING*
arguments. But a lot of PMC opcodes and vtables dealing with strings
are still missing.

But a STRING itself is already a rather fat structure (and it'll
still need charset and encoding or such). Thus there isn't really
an advantage to have a distinct STRING type, the more that all HLLs
except Perl6 don't have a notion for such a type - they'll just have
objects aka PMCs.

Further STRINGs will need a vtable to deal with unicode or rather with
different access levels (e.g. length in bytes, codepoints, or chars).

And finally: we currently have STRING* hash keys only, a scheme that
doesn't work for hashing arbitrary objects. In Python a hash key is
just an object that provides a I<hash> vtable method.

=head2 1.3. Current state - STRING memory

In a multi-threaded parrot string memory could move beyond the
interpreter at any time due to the copying collection of variable
sized memory. E.g.:

  Thread 1                              Thread 2

  cursor = s->strstart
                                        collect string memory
                                        s->strstart moved
  end = s->strstart + s->bufused
  while (cursor < end)          // boom

Such code snippets are used all over the place in F<src/string.c>.
This would either need a lock for reading two or better a non-copying
collection of string memory.

The same problems currently arise with hashes and list-based arrays,
which both are using buffer headers and movable memory.

=head1 2. Proposed changes

=over 4

=item PMCs are variable sized.

We just allocate as much as needed to accomodate the object. This
reduces memory usage vastly for small types, simplifies complexer PMCs
and eliminates the additional indirections to access the object's
data.

=item STRINGs become PMCs

This will reduce opcode count by almost one third and simplify the
whole interpreter.

=item Buffers become PMCs

The final unification of Buffers and PMCs.

=back

=head2 2.1 The new PMC layout

A simple PMC consists of a vtable pointer and a data portion.

=head2 2.2 Example: Integer, Float, Object with 2 attributes, Ref

       +----------------+        +----------------+
  pmc->|  vtable        |   pmc->|  vtable        |
       +----------------+        +----------------+
       |  INTVAL        |        |  FLOATVAL      |
       +----------------+        |                |
                                 |                |
                                 +----------------+

       +----------------+        +----------------+
  pmc->|  vtable        |   pmc->|  vtable        |
       +----------------+        +----------------+
       |  attrib_count  |        |  pmc_pointer   |
       +----------------+        +----------------+
       |  attribute #1  |
       +----------------+
       |  attribute #2  |
       +----------------+

A String PMC is a similar structure with all the needed data items
just like the current STRING structure. So eliminating the STRING type
doesn't impose any overhead at all, except the vtable access - but
that will be needed anyway.

=head2 2.3. Where are the flags?

Flags currently take one word per PMC. This is a lot of overhead. But
we basically only need two types of flags:

=over 4

=item What kind is that PMC

This information is just the vtable or additional information (flags)
in the vtable. All PMCs of one kind share one vtable anyway, so its
much cheaper to use the vtable for that information then to provide
one additional word per PMC. Some flags are also eliminated by getting
rid of the distinction between PMCs and Buffers.

=item Flags used during garbage collection

GC flags are garbage collector-specific. A stop-the-world allocator
can e.g. use the 2 low bits of the vtable pointer for the I<live> and
I<on_free_list> flags. With I<ARENA_DOD_FLAGS> a nibble in a separate
memory region is used. An implicit reclamation GC scheme has additonal
pointers to manage the graph of PMCs.

=back

Summary: the GC subsystem is responsible for managing GC flags.

=head2 2.4. And what about buffer headers?

Well, everything is an object is an PMC. STRINGs are PMCs. Buffers are
PMCs. A piece of malloced memory looks like this:

       +----------------+
  pmc->|  vtable        |
       +----------------+
       |  buffer_len    |
       +----------------+
       |  sysmem_ptr -> |
       +----------------+

The destroy vtable takes care of freeing the memory. A buffer with a
fixed length has just the size it needs w/o any indirection, e.g:

       +----------------+
  pmc->|  vtable        |
       +----------------+
       |  buffer_len    |
       +----------------+
       |  memory_ptr -> |---+
       +----------------+   |
       |  fixed amount  |<--+
       |  of data       |
       |  ...           |
       +----------------+

The I<memory_ptr> may or may not be there. But with it, both PMCs
could share parts of the vtable methods to access memory.

To eliminate the problem with multi-threaded copying collection,
memory chunks are organized like above, whatever fits. But there isn't
really a difference to an array anyway. Moving all necessary data into
the PMC itself eliminates the need of such helper buffer headers
vastly.

Thus the memory doesn't move around during garbage collection. Only
resizing can move memory, but that is well known, realloc(3) or any
other allocation scheme has to move memory, if the resize request
can't be resolved in place.

=head2 2.5. But PMCs don't move - how do we resize a PMC to a bigger type

The idea is to replace the PMC that should grow with a reference to
the newly allocated bigger type (if the resize can't be done in
place). That is the PMCs vtable is changed to a I<Ref> vtable pointing
to the newly allocate PMC. The original PMCs data is moved over to the
new one.

During DOD locations that hold such a reference are updated to point
directly to the new type. References in the system areas or external
references are not updated. Chained references (due to repeated size
upgrades) are resolved immediately.

That means that for some limited time (until the next DOD run) we have
one additional indirection to access upgraded PMCs. But the overall
overhead is still low (a reference to a Float PMC sum up together to
only 5 words - same as the current size of one PMC.

Example of an Integer PMC upgraded to a Float

       +--------------+        +------------+
  pmc->|  vtable      |   +--->|  vtable    |
       +--------------+   |    +------------+
       |  pmc_pointer |---+    |  FLOATVAL  |
       +--------------+        |            |
                               |            |
                               +------------+

This one additional indirection is less overhead then we
currently have e.g. in the object structure:

       +--------------+        +------------+   +--------------+
  pmc->|  ...         |   +--->|  data      |-->|  attributes  |
       |  4 words     |   |    +------------+   +--------------+
       |  ...         |   |    |  ...       |   |  ...         |
       +--------------+   |    |            |   |              |
       |  pmc_ext     |---+    |  3 words   |   |              |
       +--------------+        +------------+   +--------------+


=head2 2.6. Morphing Undefs

Currently all binary (and other) opcodes need an existing destination
PMC. The normal sequence a compiler emits is something like this:

  $P0 = new Undef
  $P0 = a + b


This would need upgrading the Undef (2 words) for all PMCs but
integers. The current code to morph to the destination type is by far
the slowest part of such operations.

To eliminate that overhead we could mandate a sequence:

  null $P0
  $P0 = a + b           # create new destination of needed type

and change the signature of the MMD or vtable method to return the
destination PMC. If the passed in destination is the I<Null> PMC, a new
PMC is created else the destination PMC is returned. This would add
one I<if> statement to these methods but it's by far simpler and
faster then the current code in I<morph> or I<pmc_reuse()>.
Specifically the I<null dest_pmc> statement cuts down the live range
of the destination register, so that the IN/OUT behavior of the binary
opcode doesn't need any changes. It still operates on existing PMCs,
if the destination isn't the I<Null> PMC:

  $P0 = global "foo"
  $P0 = a + b           # update existing, possibly morph dest

=head2 2.7. Properties, Shared PMCs

The B<general layout of an extended PMC> looks like this:

       +~~~~~~~~~~~~~~~~+
       (  GC specific   )      optional
       +----------------+
       |  sync_lock  -> |      optional
       +----------------+
       |  property   -> |      optional
       +----------------+
  pmc->|  vtable        |
       +----------------+
       |  pmc data      |
       |  ...           |

Attaching properties to an PMC (or converting an existing PMC into a
shared PMC) allocates the needed pointer space in front of the PMC (as
described in 2.5.) Thus the access of the PMC's data is the same: data
elements have the same location with or w/o the presence of
properties or lock structures.

As the access to properties is done via the vtable, there isn't a
chance to access non-existing memory (or memory belonging to the
previous PMC) in front of the PMC. If it isn't an extended PMC, the
vtable would catch this.

It's of course more efficient to allocate an extended PMC in the first
place, if that information is available at compile time. E.g.
the I<init_pmc_props> or similar can take care of this.

=head2 2.8. next_for_GC pointer

Again this is GC specific. The GC subsystem is responsible for
creating the graph of live objects. We pass a flag
I<VTABLE_PMC_NEEDS_EXT> to the allocator so that the needed space can
be reserved in the objects memory pool, if needed by that GC subsystem.

=head2 2.9. Value-like special PMCs.

The PMCs I<None>, I<Null>, I<True>, I<False>, and possibly more are
immutable singleton values, only one word sized (the vtable) and aren't
upgradable to other types.

=head1 3. Implementation notes

=head2 3.1. Object allocator

We currently already have differently sized buffer-like headers. We
have memory pools per object size for these. That scheme needs only a
slight extension to be usable for arbitrary sized objects. For "small"
PMC sizes we use the exact size. For "bigger" PMCs size classes are
formed by rounding up to e.g. a power of 2. Object sizes beyond a
certain limit (i.e. bigger then memory pools or such) are allocated
directly from the operating system.

=head2 3.2. Steps of implementation

Section 2.6. is fully independent from other parts of this proposal
and can/should be implemented anyway for performance reasons.

Eliminating buffers and strings can also be done gradually. When all
STRING access is converted to PMCs, then of course some bigger changes
are necessary to put the new scheme in place. But as PMC's data access
is already hidden behind macros, the needed changes aren't really huge.

=head1 4. Summary

By using variable sized objects (PMCs) and eliminating the distinction
between PMCs and Buffers, we can simplify the interpreter and we'll
gain a significant increase in execution speed due to less
indirections and a much smaller memory footprint.

=head1 Author

Leopold Toetsch E<lt>[EMAIL PROTECTED]<gt>.

=head1 Version

1.0 - 01.09.2004

Reply via email to