On Tue, 21 Oct 2003, Jeff Clites wrote: > I don't believe that is quite true. There are a couple of important > differences between traversal-for-GC and traversal-for-serialization, > which will be a challenge to reconcile in the one-true-traversal: > > 1) Serialization traversals need to "take note" of logical int and > float slots (e.g., as used in perlint.pmc and perlnum.pmc) so that they > can be serialized, but for GC you only need to worry about GC-able > objects. It's difficult to come up with a reasonable callback which can > take either int, float, or PObj arguments.
That's not an issue for us. A PMC is responsible for serializing itself, so if its got a string, float, or int component then it must take respnsibility for dumping those components to the serialization stream. Basically PMCs *must* dump themselves out completely, but the engine provides support to defer dumping of PMCs so that we don't get into recursive dumping and blow stack, as well as to make sure that we properly maintain multiple references to the same PMC. > 2) It's reasonable for an object to have a pointer to some sort of > cache object, which is not logically part of the object, and shouldn't > be serialized along with it. This needs to be traversed for GC > purposes, but needs to not be traversed for serialization. (Situations > such as this--physical but not logical membership--are the origin of > the "mutable" keyword in C++.) That's what custom mark routines are for, though it does argue that we should have a separate mark for freezing. > 3) Traversal for GC needs to do loop detection, but can just stop going > down a particular branch of the object graph once it encounters an > object it's seen before. Serialization traversals would need to have a > way, upon encountering an object seen before, to include in the > serialization stream an indication that the current object has already > been serialized, and enough information to enable deserialization code > to go find it and recreate the loop. The only options I see here are > either for serialization to involve the allocation of unbounded > additional memory, or to expand the PObj structure to include a slot > for a UUID which can be used as a back-reference in a stream, or to > have serialization break loops (so that deserialized structures never > have loops). The loop breaking needs for freezing are the same as for DOD sweeps, though with freezing we're at an advantage as we know where the tree starts. In all cases (I made sure this was in the example, but it might not have been clear) we only include a marker for child PMCs in the parent PMC's serialized data, and serialize the child PMCs later on in the stream. So if PMC1 has a pointer to PMC2, the stream has PMC1 dumped to it but in the place of PMC2's data is just a marker saying "refer to PMC2 here" and then after the end of PMC1's data in the stream we dump out PMC2's data. > 1) I assume that ultimately a user-space iterator would end up calling > the traversal code, right? If so, you can't reasonably mandate that > only one traversal be in progress at one time. That would be the > canonical way to compare two ordered collections--get an iterator for > each, and compare element-by-element. While it could, I think it's infeasable to use the serialization iterator for normal user-space iteration, if only because the limits that have to be on the serialization iterator for use in restricted circumstances are a bit onerous for general use. I'm not entirely sure that parrot's going to provide this form of iteration as it stands anyway--it's not necessary for the core langauge support and while it'd be really useful there's a limit to the number of Big Problems I'm up to solving. (Having said that there may, probably will, be enough introspective capabilites to do this without engine support) > 2) I don't see it as a huge problem that serialization code could end > up creating additional objects if called from a destroy() method. User code may, parrot may not. The reasons are twofold--while parrot will let you shoot yourself in the foot, it provides the gun, not the foot. It should also be possible for carefully written destroy methods to serialize but not eat any headers or memory. (I can see this being the case in some embedded applications or systems) If we make it so freezing is not a guaranteed possibility at destroy time then this can't happen and it lessens the utility of the system some. We can, if we choose, loosen the restriction later if sufficient reason is presented. Can't really tighten it, though, so for now... > 3) I assume that not every object is assumed to be serializable? For > instance, an object representing a filehandle can't really be > serialized in a useful way. So I'm not sure of what sort of "fidelity" > is required of a generic serialization method--that is, how similar a > deserialized structure is guaranteed to be to the original. No fidelity is required at the moment, as we've not put any requirements at all on what goes in the output stream. It could, I suppose, consist of a near-infinite stream of fnords or something. That's the next bridge to burn, but I don't think I'm done being cooked over the current one :) Dan