On Tue, 21 Oct 2003, Jeff Clites wrote:

> I don't believe that is quite true. There are a couple of important
> differences between traversal-for-GC and traversal-for-serialization,
> which will be a challenge to reconcile in the one-true-traversal:
>
> 1) Serialization traversals need to "take note" of logical int and
> float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they
> can be serialized, but for GC you only need to worry about GC-able
> objects. It's difficult to come up with a reasonable callback which can
> take either int, float, or PObj arguments.

That's not an issue for us. A PMC is responsible for serializing itself,
so if its got a string, float, or int component then it must take
respnsibility for dumping those components to the serialization stream.
Basically PMCs *must* dump themselves out completely, but the engine
provides support to defer dumping of PMCs so that we don't get into
recursive dumping and blow stack, as well as to make sure that we properly
maintain multiple references to the same PMC.

> 2) It's reasonable for an object to have a pointer to some sort of
> cache object, which is not logically part of the object, and shouldn't
> be serialized along with it. This needs to be traversed for GC
> purposes, but needs to not be traversed for serialization. (Situations
> such as this--physical but not logical membership--are the origin of
> the "mutable" keyword in C++.)

That's what custom mark routines are for, though it does argue that we
should have a separate mark for freezing.

> 3) Traversal for GC needs to do loop detection, but can just stop going
> down a particular branch of the object graph once it encounters an
> object it's seen before. Serialization traversals would need to have a
> way, upon encountering an object seen before, to include in the
> serialization stream an indication that the current object has already
> been serialized, and enough information to enable deserialization code
> to go find it and recreate the loop. The only options I see here are
> either for serialization to involve the allocation of unbounded
> additional memory, or to expand the PObj structure to include a slot
> for a UUID which can be used as a back-reference in a stream, or to
> have serialization break loops (so that deserialized structures never
> have loops).

The loop breaking needs for freezing are the same as for DOD sweeps,
though with freezing we're at an advantage as we know where the tree
starts.

In all cases (I made sure this was in the example, but it might not have
been clear) we only include a marker for child PMCs in the parent PMC's
serialized data, and serialize the child PMCs later on in the stream. So
if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the
place of PMC2's data is just a marker saying "refer to PMC2 here" and then
after the end of PMC1's data in the stream we dump out PMC2's data.

> 1) I assume that ultimately a user-space iterator would end up calling
> the traversal code, right? If so, you can't reasonably mandate that
> only one traversal be in progress at one time. That would be the
> canonical way to compare two ordered collections--get an iterator for
> each, and compare element-by-element.

While it could, I think it's infeasable to use the serialization iterator
for normal user-space iteration, if only because the limits that have to
be on the serialization iterator for use in restricted circumstances are a
bit onerous for general use.

I'm not entirely sure that parrot's going to provide this form of
iteration as it stands anyway--it's not necessary for the core langauge
support and while it'd be really useful there's a limit to the number of
Big Problems I'm up to solving. (Having said that there may, probably
will, be enough introspective capabilites to do this without engine
support)

> 2) I don't see it as a huge problem that serialization code could end
> up creating additional objects if called from a destroy() method.

User code may, parrot may not. The reasons are twofold--while parrot will
let you shoot yourself in the foot, it provides the gun, not the foot.
It should also be possible for carefully written destroy methods to
serialize but not eat any headers or memory. (I can see this being the
case in some embedded applications or systems) If we make it so freezing
is not a guaranteed possibility at destroy time then this can't happen and
it lessens the utility of the system some.

We can, if we choose, loosen the restriction later if sufficient reason is
presented. Can't really tighten it, though, so for now...

> 3) I assume that not every object is assumed to be serializable? For
> instance, an object representing a filehandle can't really be
> serialized in a useful way. So I'm not sure of what sort of "fidelity"
> is required of a generic serialization method--that is, how similar a
> deserialized structure is guaranteed to be to the original.

No fidelity is required at the moment, as we've not put any requirements
at all on what goes in the output stream. It could, I suppose, consist of
a near-infinite stream of fnords or something. That's the next bridge to
burn, but I don't think I'm done being cooked over the current one :)

                                        Dan

Reply via email to