Object freezing

Dan Sugalski Mon, 20 Oct 2003 13:35:41 -0700

Since this has come up again, and it's apparent that the last time around
I wasn't sufficiently clear, it's time to go through this again, and for
the final time. (I will beat this thing into the ground by the time we're
done)



The way object serialization will be handled from the bytecode level is
simple: You call freeze or thaw, like so:

  freeze S3, P5 # Freezes the P5 PMC, and children, to the string in S3
  thaw P5, S3   # Thaw out the serialized object


>From within the PMC vtables, there are two vtable entries that are of
importance:

  void freeze(interpreter, STRING *)
  PMC *thaw(interpreter, STRING *)

freeze tells the PMC to freeze itself and put the results of the freezing
in the string. This is an object method

Thaw tells the class to construct a PMC based on the passed-in string
data. It's a class method, though it may be called on an existing PMC. It
always produces a new PMC. (We can argue over that one if need be)


The runtime will provide the following functionality

  Freezing core singleton data (int, num, string, PMC)
  freezing named lists
  freezing named key/value lists
  thaw core singleton data
  thaw named lists
  thaw named key/value lists

  chill PMC *
  warm STRING *

When freezing or thawing, if child PMCs are frozen/thawed using the
appropriate calls to the runtime (rather than the freeze/thaw method of a
PMC doing it manually for child PMCs) then the runtime will guarantee that
multiply-referenced PMCs will be instantiated only once.

The chill and warm runtime methods take a PMC or a frozen representation
of a PMC (respectively) and provide a human readable version of that PMC.
If the PMC has a  __debug_chill or __debug_warm method (which are
optional) then that method will be called to provide a human-readable
version of the PMC, otherwise the default will be used.

The encoding methods for freezing (and corresponding decoding methods for
thawing) may be overridden to provide an alternate serialization format.
The only requirement of the serialziation format is that it starts with a
minimally valid piece of XML that encodes the format and version of the
serialized format. The rest of the serialization format need not be XML.
This is done because the format and version of the serialized data are
required in the stream, and making it XML incoveniences nobody and makes
the XML folks happy. It's good enough, and not up for discussion.

The overriding API is not, as yet, specified. We can do that when we're
done fighting out over the semantics here.


The following rules will be in place:

  1) Freezing at the destruction level may *not* use any additional memory
     for object traversal
  2) Overlapping freezes are not allowed
  3) Freezing while a freeze is in progress is deferred until the current
     freeze is done
  4) Destruction level freezing may not freeze non-dead PMCs

I can be convinced that #4 is not going to fly, but be aware that it will
potentially cost an extra pointer per PMC.

Requirement #1 generally mandates that all PMCs must have sufficient
information available all the time to perform serialization. It also
mandates that there can't be iterators, save hashes, or other whatnots,
since the potential for destroy-time serialization means those methods are
untenable. I'm open to argument that the freeze and thaw methods can be
context sensitive and the non-destroy case can be memory hungry and
multithreaded, but that's a dodgy thing. Make a really good case.

Note that I do *not* want to have multiple object traversal systems in
parrot! We have one for DOD, and proposals have ranged upwards from there.
No. That is *not* happening--the chance for error is significant, the
side-effects of the error annoying and tough to track down for complex
cases (akin to the trouble with tracking down GC issues), and just not
necessary. (Perhaps desirable for speed/space reasons, but desirable
isn't necessary) This is something that's hidden under a number of layers
of API, so regardless of the outcome it doesn't affect the assembly, PMC,
or runtime API.

Clone can be handled by switching in encodings (one that produces a PMC
rather than an encoded stream, or one that just does an intermediate
string and chews memory, at least for now), and doesn't require a separate
API entry point. I'm unconvinced that objects need to clone themselves
often enough to bother with a separate API entry for 'em, or if one is
provided that VTABLE_clone() should do anything for non-simple objects
besides calling VTABLE_thaw(VTABLE_freeze()).


The thread-safety is an issue, and as such interpreters that share data
should share a mutex. At the moment I think we should have separate
symbolic mutexes that may, potentially, resolve down to a single mutex,
but when freezing an interpreter should grab its freeze mutex. If there
are multiple interperters in the thread group they will all have the same
mutex, and as such will be single-threaded for the freeze.

We can fight about this, too, but the fight is orthogonal to the other
issues.

                                        Dan

Object freezing

Reply via email to